This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

DM8148 PCIe RootComplex Gen2 : hangs at Linux boot

Other Parts Discussed in Thread: TMDXEVMPCI

Hi everyone !

I get a problem with TMS320DM8148-based board: kernel sometimes hangs right after PCI-Express Link Training in Gen2 mode. I found some patches to make it work in Gen1 mode, but we really need the link to be Gen2.

The architecture is the following :

  • Main processor is TI8148. It boots on Micro-Sd card. It runs a customized U-boot and then starts linux kernel.
  • A Xilinx FPGA is on the same board.
  • The two chips are communicating through PCI-Express Gen2. TI8148 is RC. FPGA is EP. DM8148 is loading firmware into FPGA, so it has to be powered before the FPGA is.

We're using HardIP on Xilinx FPGA. We don't have any PCIe analyser. As FPGA and ARM are on the same board, I cannot test them independently.

On most machines, the initialisation process runs perfectly, PCIe Link training runs fine.
On some machines, Link training sometimes fails. When it succeeds, the PCIe communication works fine. When it fails, we receive a SIGBUS which leads to kernel stop initializing.
The problems occurs right after writing LTSSM_EN_VAL into PCIe CMD_STATUS. It looks like DM8148's PCIe's registers are not clocked any more.

Looking at DEBUG0 register to see LTSSM State reveals that :

  • Normal operation is 0x11 (State L0) at this point,
  • Abnormal operation is 0x0D (State RECOVERY_CLOCK) at this point, then is goes to 0x0E (State RECOVERY_SPEED), then I lose access to register DEBUG0. Is there any document that show the exact conditions which make the PCIe_SS enter this state ?

If using hook_fault_code to catch SIGBUS, I am able to end kernel init (without detecting FPGA on PCIe), which means that the other modules on DM8148 continue to work properly (Specifically ARM and UART modules).

I tried to update PCIe parts of the kernel. Cherry-picking the four following commits makes Linux to boot, but restrict the link speed to be Gen1 :

  • http://arago-project.org/git/projects/?p=linux-omap3.git;a=commitdiff_plain;h=6369801405c5b10cf2d0837ad89b4a826e11615d
  • http://arago-project.org/git/projects/?p=linux-omap3.git;a=commitdiff_plain;h=0eda6e06528556a826466c20be574f3d1f36a948
  • http://arago-project.org/git/projects/?p=linux-omap3.git;a=commitdiff_plain;h=3e1bd8effac5332322e1dbe98e2c7535f20c0416
  • http://arago-project.org/git/projects/?p=linux-omap3.git;a=commitdiff_plain;h=ee1a09b4ec7d8e821816000262f905f159085388

If I stop at the three first commits, the problems still exists and the kernel does not boot.

Looking through the following posts make me think that DM8148 is able to be in Gen2 mode, so I don't understand why some machines may not be able to boot. The more probable problem I see is 100 MHz clock jitter, but it has been verified to be good.

  • http://e2e.ti.com/support/dsp/c6000_multi-core_dsps/f/639/t/107676.aspx -- April 2011 -- Realistic data throughput of C667x PCIe Gen2 x2 -- Not really related to our problem, just describes gen2x2 capacities on some other chip.
  • http://e2e.ti.com/support/embedded/linux/f/354/t/118939.aspx -- June 2011 -- ti8168 evm pcie ethernet -- The solution "works", in that kernel always boots with the provided patch. However, this solution consists of staying in gen1, while we need to be in gen2.
  • http://e2e.ti.com/support/dsp/c6000_multi-core_dsps/f/639/t/194161.aspx -- June 2011 -- PCIe enumeration fails/PC isn't booting with EVM assembled with adaptor-card in PC -- Looks like the same problem, no solution is given
  • http://e2e.ti.com/support/embedded/linux/f/354/t/203814.aspx -- July 2012 -- PCIe DM8148 custom board problem -- Look like the problem est equivalent to our, but the discussion just points some documentation out.
  • http://e2e.ti.com/support/dsp/c6000_multi-core_dsps/f/639/t/218446.aspx -- October 2012 -- PCIe link up c6678 -- We're not stuck in the same PCIe state.
  • http://e2e.ti.com/support/dsp/c6000_multi-core_dsps/f/639/t/221855.aspx -- October 2012 -- Problems about host enumerating C6678 DSPs -- PCIe endpoint
  • http://e2e.ti.com/support/dsp/davinci_digital_media_processors/f/717/t/224128.aspx -- November 2012 -- About Gen2 operation with x2 link on PCIe of DM8168 -- Signals that should not be used in gen2 mode. As we're using only Arago cherry-picked commits, we assume these conditions are respected. Also says that PCIe Gen2 should work.
  • http://e2e.ti.com/support/dsp/c6000_multi-core_dsps/f/639/t/233620.aspx -- December 2012 -- Failed to enumerate C6670EVM by using TMDXEVMPCI -- ???
  • http://e2e.ti.com/support/dsp/c6000_multi-core_dsps/f/639/t/235639.aspx -- December 2012 -- PCIe link up problems in FPGA connecting to DSP -- Card-specific solution, quoting some clock problems, which have been verified to be good in our systems.
  • http://e2e.ti.com/support/dsp/c6000_multi-core_dsps/f/639/t/244411.aspx -- February 2013 -- Trouble getting TMDXEVMPCI to connect EVMs through PCIe -- Another EVM-specific problem
The most disturbing point n this bug is that I lose access to basic PCIe SubSystem register, as if it weren't clocked any more. However, no action is performed on clock management since I could have accessed it. I don't see the relation between not being able to handle EP due to an eventual hardware error and losing register access.

Another problem can be found in PCIE_RC_UserGuide : we are not able to start RC before EP, because the DM8148 loads FPGA firmware. It does not respect the Troubleshooting "Reset/Power on sequence" desribed in :
http://processors.wiki.ti.com/index.php/TI81XX_PCI_Express_Root_Complex_Driver_User_Guide#Troubleshooting
However, Technical Reference Manual (spruz8a) paragraph 2.7.13.1 "PCIe Reset Isolation" says that Reset is performed by PCIE_SS. As far as I understand this, it means that reseting DM8148'RC must be started by my FPGA EP. Is there another way to reset PCI SubSystem ?

I let Linux perform every actions on PCIe and removed everything in U-boot, assuming U-boot's PCIe was only used to perform Boot on PCIe (as endpoint). Do you confirm that every required PCI configuration access is performed in current kernel driver ?

To summarize, here is the three main questions about my problem :

  • Is there any workaround to be done so PCIe works properly in Gen2 mode when DM8148 has to be powered on before its EP ?
  • How is it possible to lose access to PCIe'CMD_STATUS register by initiating LinkTraining
  • Where do I get PCIe error state conditions ?

Sincerely,