This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TMS320DM8168 Link Training failure and exception with IDT switch during boot

Other Parts Discussed in Thread: TMS320DM8168, CCSTUDIO

We are investigating an issue where we see the TMS320DM8168 fails to obtain link training with an IDT PCIe switch (89PES16T4AG2) during the boot process.  When we did not obtain a working link, we then see an exception occur in the kernel while it is attempting to read either the debug register at offset 0x728 when we have a debug kernel running that prints out successive reads of the DEBUG0 register (see attached 

40000: Success: POST started.
DOMEMTEST
No POSTSIM
No POSTSIM


U-Boot 2010.06-dirty (Jan 20 2016 - 12:14:28)

DRAM:  2 GiB
MMC:   OMAP SD/MMC: 0
Using default environment

Hit any key to stop autoboot:  0
reading u-boot.bin

483632 bytes read
## Starting application at 0x80800000 ...


U-Boot 2010.06-dirty (Jan 20 2016 - 12:09:28)

TI8168-GP rev 2.1

HDVICP clk     : 531MHz
L3 Fast clk    : 493MHz
HDVPSS clk     : 246MHz
Ducati M3 clk  : 246MHz
DSP clk        : 813MHz
ARM clk        : 987MHz
DDR clk        : 796MHz

------------ PLL Settings --------------
MAIN_N        : 64, MAIN_P: 1, OSC_FREQ: 27, FAPLL_K: 8

MAIN_INTFREQ1 : 0x8, MAIN_FRACFREQ1: 0x800000, MAIN_MDIV1: 0x2
MAIN_INTFREQ2 : 0xE, MAIN_FRACFREQ2: 0x0, MAIN_MDIV2: 0x1
MAIN_INTFREQ3 : 0x8, MAIN_FRACFREQ3: 0xAAAAB0, MAIN_MDIV3: 0x3
MAIN_INTFREQ4 : 0x9, MAIN_FRACFREQ4: 0x55554F, MAIN_MDIV4: 0x3
MAIN_INTFREQ5 : 0x9, MAIN_FRACFREQ5: 0x374BC6, MAIN_MDIV5: 0xC

MAIN_MDIV6    : 0x48
MAIN_MDIV7    : 0x4


--------- DDR PLL ----------
DDR_N                  : 0x3B
DDR_P                  : 0x1
DDR_MDIV1              : 0x2
DDR_INTFREQ2           : 0x8
DDDDR_FRACFREQ2R_N     : 0xD99999
DDR_MDIV2              : 0x1E
DDR_INTFREQ3           : 0x8
DDR_FRACFREQ3          : 0x0
DDR_MDIV3              : 0x4
DDR_INTFREQ4           : 0xE
DDR_FRACFREQ4          : 0x0
DDR_MDIV4              : 0x4
DDR_INTFREQ5           : 0xE
DDR_FRACFREQ5          : 0x0
DDR_MDIV5              : 0x4

----------EMIF Timings (identical for 0 & 1)-------
EMIF_TIM1   : 0x1779C9FF
EMIF_TIM2   : 0x50D77FEB
EMIF_TIM3   : 0x00BF8CFF
EMIF_SDREF  : 0x10001841
EMIF_SDCFG  : 0x62A339B2
EMIF_PHYCFG : 0x00000110

----------SW LEVEL Info (EMIF 0) -------
RD_DQS_GATE_BYTE_LANE0: 0x00000136
RD_DQS_GATE_BYTE_LANE1: 0x00000131
RD_DQS_GATE_BYTE_LANE2: 0x00000160
RD_DQS_GATE_BYTE_LANE3: 0x00000155

WR_DQS_RATIO_BYTE_LANE0: 0x0000009F
WR_DQS_RATIO_BYTE_LANE1: 0x0000009D
WR_DQS_RATIO_BYTE_LANE2: 0x000000A8
WR_DQS_RATIO_BYTE_LANE3: 0x000000B0

RD_DQS_RATIO_BYTE_LANE0: 0x00000039
RD_DQS_RATIO_BYTE_LANE1: 0x0000003D
RD_DQS_RATIO_BYTE_LANE2: 0x0000003A
RD_DQS_RATIO_BYTE_LANE3: 0x0000003A

WR_DATA_RATIO_BYTE_LANE0: 0x000000DF
WR_DATA_RATIO_BYTE_LANE1: 0x000000DD
WR_DATA_RATIO_BYTE_LANE2: 0x000000E8
WR_DATA_RATIO_BYTE_LANE3: 0x000000F0

----------SW LEVEL Info (EMIF 1) -------
RD_DQS_GATE_BYTE_LANE0: 0x0000012D
RD_DQS_GATE_BYTE_LANE1: 0x00000123
RD_DQS_GATE_BYTE_LANE2: 0x00000159
RD_DQS_GATE_BYTE_LANE3: 0x00000151

WR_DQS_RATIO_BYTE_LANE0: 0x00000099
WR_DQS_RATIO_BYTE_LANE1: 0x00000095
WR_DQS_RATIO_BYTE_LANE2: 0x000000A3
WR_DQS_RATIO_BYTE_LANE3: 0x000000A2

RD_DQS_RATIO_BYTE_LANE0: 0x0000003A
RD_DQS_RATIO_BYTE_LANE1: 0x0000003C
RD_DQS_RATIO_BYTE_LANE2: 0x00000039
RD_DQS_RATIO_BYTE_LANE3: 0x00000035

WR_DATA_RATIO_BYTE_LANE0: 0x000000D9
WR_DATA_RATIO_BYTE_LANE1: 0x000000D5
WR_DATA_RATIO_BYTE_LANE2: 0x000000E3
WR_DATA_RATIO_BYTE_LANE3: 0x000000E2

I2C:   ready
DRAM:  2 GiB
MMC:   OMAP SD/MMC: 0
Net:   Detected MACID:0:e0:db:45:0:66
Ethernet PHY: BCM54214E @ 1 600d84ae
DaVinci EMAC
Interface:  MMC
  Device 0: Vendor: Man 744a45 Snr 0000075c Rev: 1.0 Prod: SD
            Type: Removable Hard Disk
            Capacity: 7728.0 MB = 7.5 GB (15826944 x 512)
Partition 9: Filesystem: FAT32 "CCC_data  "
reading .softupdate/softupdate.dat
Hit any key to stop autoboot:  0
reading uImage

2805356 bytes read
## Booting kernel from Legacy Image at 81000000 ...
   Image Name:   Linux-2.6.37+
   Image Type:   ARM Linux Kernel Image (uncompressed)
   Data Size:    2805292 Bytes = 2.7 MiB
   Load Address: 80008000
   Entry Point:  80008000
   Verifying Checksum ... OK
   Loading Kernel Image ... OK
OK

Starting kernel ...

Uncompressing Linux... done, booting the kernel.
<5>Linux version 2.6.37+  (gcc version 4.4.3 (GCC) ) #13 PREEMPT Thu Feb 11 08:55:41 CST 2016
<5>Kernel: : UID-log-analysis: UID-37486534: tag=linux-boot
CPU: ARMv7 Processor [413fc082] revision 2 (ARMv7), cr=10c53c7f
CPU: VIPT nonaliasing data cache, VIPT aliasing instruction cache
Machine: ti8168evm
<6>vram size = 33554432 at 0x0
<6>reserved size = 33554432 at 0x0
<6>FB: Reserving 33554432 bytes SDRAM for VRAM
Memory policy: ECC disabled, Data cache writeback
<6>OMAP chip is TI8168 2.1
<7>On node 0 totalpages: 184320
<7>free_area_init_node: node 0, pgdat 805619ec, node_mem_map 80611000
<7>  Normal zone: 2440 pages used for memmap
<7>  Normal zone: 0 pages reserved
<7>  Normal zone: 76408 pages, LIFO batch:15
<7>  HighMem zone: 888 pages used for memmap
<7>  HighMem zone: 104584 pages, LIFO batch:31
<7>pcpu-alloc: s0 r0 d32768 u32768 alloc=1*32768
<7>pcpu-alloc: [0] 0
Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 180992
<5>Kernel command line: console=ttyO2,115200 vram=32M mem=112M@0x80000000 mem=640M@0xC0000000 vmalloc=700M eth=00:E0:DB:45:00:66 root=/dev/mmcblk0p5 rootwait init=/init AAA_BBB=ARES AAA_boardrev=20 bootmode= androidboot.console=ttyO2 androidboot.hardware=ARES
<6>PID hash table entries: 2048 (order: 1, 8192 bytes)
<6>Dentry cache hash table entries: 65536 (order: 6, 262144 bytes)
<6>Inode-cache hash table entries: 32768 (order: 5, 131072 bytes)
<6>Memory: 112MB 608MB = 720MB total
<5>Memory: 724596k/724596k available, 45452k reserved, 421888K highmem
<5>Virtual kernel memory layout:
    vector  : 0xffff0000 - 0xffff1000   (   4 kB)
    fixmap  : 0xfff00000 - 0xfffe0000   ( 896 kB)
    DMA     : 0xffc00000 - 0xffe00000   (   2 MB)
    vmalloc : 0xcc800000 - 0xf8000000   ( 696 MB)
    lowmem  : 0x80000000 - 0xcc400000   (1220 MB)
    pkmap   : 0x7fe00000 - 0x80000000   (   2 MB)
    modules : 0x7f000000 - 0x7fe00000   (  14 MB)
      .init : 0x80008000 - 0x80076000   ( 440 kB)
      .text : 0x80076000 - 0x8051e000   (4768 kB)
      .data : 0x8051e000 - 0x80563180   ( 277 kB)
<6>SLUB: Genslabs=11, HWalign=64, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
<6>NR_IRQS:407
<6>IRQ: Found an INTC at 0xfa200000 (revision 5.0) with 128 interrupts
<6>Total of 128 interrupts on 1 active controller
<6>GPMC revision 6.0
<4>Trying to install interrupt handler for IRQ400
<4>Trying to install interrupt handler for IRQ401
<4>Trying to install interrupt handler for IRQ402
<4>Trying to install interrupt handler for IRQ403
<4>Trying to install interrupt handler for IRQ404
<4>Trying to install interrupt handler for IRQ405
<4>Trying to install interrupt handler for IRQ406
<3>Trying to install type control for IRQ407
<3>Trying to set irq flags for IRQ407
<6>OMAP clockevent source: GPTIMER1 at 27000000 Hz
Console: colour dummy device 80x30
<6>Calibrating delay loop... <c>980.99 BogoMIPS (lpj=490496)
<6>pid_max: default: 32768 minimum: 301
<6>Security Framework initialized
Mount-cache hash table entries: 512
<6>CPU: Testing write buffer coherency: ok
<6>devtmpfs: initialized
<4>omap_voltage_early_init: voltage driver support not added
<6>regulator: core version 0.5
<6>regulator: dummy:
<6>NET: Registered protocol family 16
<3>omap_voltage_domain_lookup: Voltage driver init not yet happened.Faulting!
<4>omap_voltage_add_dev: VDD specified does not exist!
<6>OMAP GPIO hardware version 0.1
<6>OMAP GPIO hardware version 0.1
<6>omap_mux_init: Add partition: #1: core, flags: 0
<3>_omap_mux_get_by_name: Could not find signal i2c2_scl.i2c2_scl
<3>_omap_mux_get_by_name: Could not find signal i2c2_sda.i2c2_sda
<6>Found Ares board.
<6>registered ti816x_gpio_vr device
<6>registered ti816x_sr device
<6>registered ti81xx_vpss device
<6>ti816x_hdmi_init(): Mars board with PG2.1 detected
<6>mcb_clk_sel_pins(): Reparent pin_mux_out_ck clk to mcb_fsx_ck PINCTRL149 (pin AM34)
<6>registered ti81xx on-chip HDMI device
<6>registered ti81xx_fb device
<6>ti81xx_pcie: Invoking PCI BIOS...
<6>ti81xx_pcie: Setting up Host Controller...
<6>ti81xx_pcie: Register base mapped @0xcc820000
Initiate link training
dbg 1 00004a02
dbg 2 00000002
dbg 3 00000602
dbg 4 0091650e
dbg 5 007dae0e
dbg 6 00916b0e
dbg 7 0020300e
dbg 8 00cb8d0e
dbg 9 00352d0e
dbg 10 00ed750e
dbg 11 002b200e
dbg 12 00c0a50e
dbg 13 00f6420e
dbg 14 00089e0e
dbg 15 0046b50e
dbg 16 00fd060e
dbg 17 0070650e
dbg 18 00c1860e
dbg 19 000b8d0e
dbg 20 0000040d
dbg 21 00004a0d
dbg 22 00004a0d
dbg 23 00004a0d
dbg 24 00004a0d
dbg 25 0c00000d
dbg 26 0c00060d
dbg 27 0c00000d
dbg 28 0c004a0d
<1>Unhandled fault: external abort on non-linefetch (0x1008) at 0xcc821728
<0>Internal error: : 1008 [#1] PREEMPT
<0>last sysfs file:
<0>PCI_CLKSTCTRL 00000102 PCI_CLKCTRL 00000002
<d>Modules linked in:
CPU: 0    Not tainted  (2.6.37+ #13)
PC is at ti81xx_pcie_setup+0x288/0x53c
LR is at schedule_timeout+0x1b8/0x1e4
pc : [<80098ed0>]    lr : [<803ebc58>]    psr: 60000013
sp : cbc2be18  ip : cbc2bdb8  fp : cbc2be44
r10: 80536824  r9 : 8058d028  r8 : cbc68840
r7 : 80563778  r6 : 0000000d  r5 : 0000001d  r4 : cbc68880
r3 : cc821700  r2 : 00000001  r1 : 0000001d  r0 : 804acfbd
Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment kernel
Control: 10c5387f  Table: 80004019  DAC: 00000017

PC: 0x80098e50:
8e50  e582100c e5951018 e3510000 0a00000d e2830d41 e2831c17 e590c03c e3ccce3f
8e70  e38cc010 e580c03c e592000c e3c00cff e3800c01 e582000c e5912010 e3c2283f
8e90  e3822801 e5812010 e5932004 e3a05000 e59f02a4 e3822001 e5832004 eb0d4833
8eb0  e59f7240 e3a00001 e0855000 eb007cb5 e5973000 e1a01005 e59f0280 e2833c17
8ed0  e5936028 e1a02006 e206601f eb0d4827 e3560011 059f0268 0a000002 e3550065
8ef0  1affffef e59f025c eb0d4820 e59f31f4 e300e604 e3a0c000 e1a0000c e1a0500c
8f10  e5933000 e2832a01 e2831c02 e1c2e0ba e593e004 e38ee020 e583e004 e593e004
8f30  e582c010 e582c014 e28cc003 e5932004 e3c22020 e5832004 e5932004 e8944004

LR: 0x803ebbd8:
bbd8  e51b3030 e2033001 e1863003 e50b3030 ea000000 e1a06007 e0844005 e50b4034
bbf8  e5963008 e0633004 e3530000 aa000002 e51b3030 e3130001 05864008 e1a00006
bc18  e24b103c ebf32e84 e51b3020 e121f003 e3a00001 eb000ef8 e1a0200d e3c23d7f
bc38  e3c3303f e5933000 e3130002 0a000000 ebfffe83 ebfffd1e e24b003c ebf3308b
bc58  e59f301c e5933000 e0634004 e1c40fc4 e24bd01c e89da8f0 804b177f 80537548
bc78  800b8218 80537550 00200200 e1a0c00d e92dd800 e24cb004 e1a0200d e3c23d7f
bc98  e3a02002 e3c3303f e593300c e5832000 ebffff7c e89da800 e1a0c00d e92dd800
bcb8  e24cb004 e1a0200d e3c23d7f e3a02082 e3c3303f e593300c e5832000 ebffff71

SP: 0xcbc2bd98:
bd98  80586840 cbc68840 8058d028 fffb6cb3 cbc2bdf4 cbc2bdb8 ffffffff cbc2be04
bdb8  0000000d 80563778 cbc2be44 cbc2bdd0 803ed72c 800762a4 804acfbd 0000001d
bdd8  00000001 cc821700 cbc68880 0000001d 0000000d 80563778 cbc68840 8058d028
bdf8  80536824 cbc2be44 cbc2bdb8 cbc2be18 803ebc58 80098ed0 60000013 ffffffff
be18  00000000 805350cc 805238e4 805350cc cbc68840 00000000 00000000 80536808
be38  cbc2be7c cbc2be48 8000c4a0 80098c54 cbc2be6c cbc2be58 803eafa8 00000000
be58  805238e4 80535094 80535094 00000000 00000000 00000000 cbc2be94 cbc2be80
be78  80098b28 8000c414 805238b0 805238e4 cbc2bea4 cbc2be98 80239128 80098ac8

IP: 0xcbc2bd38:
bd38  00000000 cc821728 803ef8d0 803eb46c 00000000 803ef8d0 cbc2bd74 cbc2bd60
bd58  803ef8d0 80083a60 cbc28000 cbc2a000 cbc2bdb4 cbc2bd78 803eb46c 803ef820
bd78  cbc2bdf4 803ebc50 cbc28164 cbc28160 00000000 fffb6cb3 fffb6cb1 80586840
bd98  80586840 cbc68840 8058d028 fffb6cb3 cbc2bdf4 cbc2bdb8 ffffffff cbc2be04
bdb8  0000000d 80563778 cbc2be44 cbc2bdd0 803ed72c 800762a4 804acfbd 0000001d
bdd8  00000001 cc821700 cbc68880 0000001d 0000000d 80563778 cbc68840 8058d028
bdf8  80536824 cbc2be44 cbc2bdb8 cbc2be18 803ebc58 80098ed0 60000013 ffffffff
be18  00000000 805350cc 805238e4 805350cc cbc68840 00000000 00000000 80536808

FP: 0xcbc2bdc4:
bdc4  cbc2bdd0 803ed72c 800762a4 804acfbd 0000001d 00000001 cc821700 cbc68880
bde4  0000001d 0000000d 80563778 cbc68840 8058d028 80536824 cbc2be44 cbc2bdb8
be04  cbc2be18 803ebc58 80098ed0 60000013 ffffffff 00000000 805350cc 805238e4
be24  805350cc cbc68840 00000000 00000000 80536808 cbc2be7c cbc2be48 8000c4a0
be44  80098c54 cbc2be6c cbc2be58 803eafa8 00000000 805238e4 80535094 80535094
be64  00000000 00000000 00000000 cbc2be94 cbc2be80 80098b28 8000c414 805238b0
be84  805238e4 cbc2bea4 cbc2be98 80239128 80098ac8 cbc2bec4 cbc2bea8 8023809c
bea4  80239114 805238b0 805238e4 80535094 00000000 cbc2bee4 cbc2bec8 802381c0

R0: 0x804acf3d:
cf3c  6d657220 66207061 656c6961 3c000a64 69743e36 78783138 6963705f 52203a65
cf5c  73696765 20726574 65736162 70616d20 20646570 25783040 0a783830 3e333c00
cf7c  31386974 705f7878 3a656963 69614620 2064656c 67206f74 50207465 53454943
cf9c  6c632053 0a6b636f 696e4900 74616974 696c2065 74206b6e 6e696172 0a676e69
cfbc  67626400 20642520 78383025 694c000a 74206b6e 6e696172 20676e69 706d6f63
cfdc  6574656c 694c000a 74206b6e 6e696172 20676e69 6f636e69 656c706d 000a6574
cffc  743e343c 78313869 63705f78 203a6569 78544e49 73696420 656c6261 69732064
d01c  2065636e 6c206f6e 63616765 52492079 3c000a51 69743e34 78783138 6963705f
d03c  52203a65 72747365 69746369 4d20676e 63204953 746e756f 206f7420 2078616d

R3: 0xcc821680:
1680 <1>Unhandled fault: external abort on non-linefetch (0x1008) at 0xcc821680
<0>Internal error: : 1008 [#2] PREEMPT
<0>last sysfs file:
<0>PCI_CLKSTCTRL 00000102 PCI_CLKCTRL 00000002
<d>Modules linked in:
CPU: 0    Not tainted  (2.6.37+ #13)
PC is at __copy_from_user+0xac/0x39c
LR is at show_data.clone.0+0x9c/0x148
pc : [<801e27f0>]    lr : [<800828b8>]    psr: 00000193
sp : cbc2bba4  ip : 0000001c  fp : cbc2bc04
r10: 00000008  r9 : 00000000  r8 : 00000000
r7 : 00000000  r6 : cc821680  r5 : 00000000  r4 : cbc2a000
r3 : 00000000  r2 : ffffffe4  r1 : cc821680  r0 : cbc2bbd4
Flags: nzcv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment kernel
Control: 10c5387f  Table: 80004019  DAC: 00000017

PC: 0x801e2770:
2770  ba000013 f5d1f000 e2522060 f5d1f01c ba000002 f5d1f03c f5d1f05c f5d1f07c
2790  e4b13004 e4b14004 e4b15004 e4b16004 e4b17004 e4b18004 e4b1c004 e4b1e004
27b0  e2522020 e8a051f8 aafffff3 e3720060 aafffff2 e212c01c e26cc020 108ff00c
27d0  ea000011 e320f000 e4b13004 e4b14004 e4b15004 e4b16004 e4b17004 e4b18004
27f0  e4b1e004 e08ff00c e320f000 e320f000 e4803004 e4804004 e4805004 e4806004
2810  e4807004 e4808004 e480e004 e8bd01e0 e1b02f82 14f13001 24f14001 24f1c001
2830  14c03001 24c04001 24c0c001 e28dd008 e8bd8011 e26cc004 e35c0002 c4f13001
2850  a4f14001 e4f1e001 c4c03001 a4c04001 e052200c e4c0e001 baffffec e211c003

LR: 0x80082838:
2838  e1520003 8a000040 e206a003 e1a02006 e59f0100 e28aac01 eb0da1ca e28aa01f
2858  e1a0300d e3a05000 e3c34d7f e3c66003 e3c4403f e1a0a2ca e1a09005 e59f00d8
2878  e6ff1076 eb0da1bf e3a07000 e5943018 e5948008 e3833003 e5849008 e5843018
2898  ee033f10 f57ff06f e3a00001 eb0db413 e3a02004 e24b0030 e0861007 eb057fa2
28b8  e50b0038 e3a00001 eb0db3d3 e5943000 e3130002 0a000000 eb0da361 e5943018
28d8  e3580000 13a02001 03a02003 e3c33003 e5848008 e1823003 e5843018 ee033f10
28f8  f57ff06f e51b3038 e59f0050 e3530000 0a000002 e59f0048 eb0da19a ea000001
2918  e51b1030 eb0da197 e2877004 e3570020 1affffd5 e2855001 e59f0028 eb0da191

SP: 0xcbc2bb24:
bb24  00000034 20000193 00000000 cbc2bb50 804a38c9 ffffffff cbc2bb8c cc821680
bb44  00000000 cbc2bc04 cbc2bb58 803ed72c 800762a4 cbc2bbd4 cc821680 ffffffe4
bb64  00000000 cbc2a000 00000000 cc821680 00000000 00000000 00000000 00000008
bb84  cbc2bc04 0000001c cbc2bba4 800828b8 801e27f0 00000193 ffffffff 00000001
bba4  00000000 cc821680 00000000 00000000 cbc2bbd4 00000004 00000000 cbc2a000
bbc4  800828b8 804e0dba 00000000 cbc2bbf4 2078616d 803eafa8 cbc2bdd0 cbc2a000
bbe4  00000000 00000000 00000076 cbc2bc08 804a399b cbc2bc9c cbc2bc08 80082c00
bc04  80082828 00000017 804e0dba 804a2b0f ffffffff cbc2bc7c 0000006e 805384fc

FP: 0xcbc2bb84:
bb84  cbc2bc04 0000001c cbc2bba4 800828b8 801e27f0 00000193 ffffffff 00000001
bba4  00000000 cc821680 00000000 00000000 cbc2bbd4 00000004 00000000 cbc2a000
bbc4  800828b8 804e0dba 00000000 cbc2bbf4 2078616d 803eafa8 cbc2bdd0 cbc2a000
bbe4  00000000 00000000 00000076 cbc2bc08 804a399b cbc2bc9c cbc2bc08 80082c00
bc04  80082828 00000017 804e0dba 804a2b0f ffffffff cbc2bc7c 0000006e 805384fc
bc24  0000005a 00000043 00000076 61542020 3a656c62 30303820 31303430 44202039
bc44  203a4341 30303030 37313030 cbc2bc00 cbc2bc70 800d9ec4 803eaf90 804b6bed
bc64  80588f14 cbc2bc70 00000000 803efb3c cbc2bdd0 cbc28000 80563344 cbc2a000

R0: 0xcbc2bb54:
bb54  800762a4 cbc2bbd4 cc821680 ffffffe4 00000000 cbc2a000 00000000 cc821680
bb74  00000000 00000000 00000000 00000008 cbc2bc04 0000001c cbc2bba4 800828b8
bb94  801e27f0 00000193 ffffffff 00000001 00000000 cc821680 00000000 00000000
bbb4  cbc2bbd4 00000004 00000000 cbc2a000 800828b8 804e0dba 00000000 cbc2bbf4
bbd4  2078616d 803eafa8 cbc2bdd0 cbc2a000 00000000 00000000 00000076 cbc2bc08
bbf4  804a399b cbc2bc9c cbc2bc08 80082c00 80082828 00000017 804e0dba 804a2b0f
bc14  ffffffff cbc2bc7c 0000006e 805384fc 0000005a 00000043 00000076 61542020
bc34  3a656c62 30303820 31303430 44202039 203a4341 30303030 37313030 cbc2bc00

R1: 0xcc821600:
1600 <1>Unhandled fault: external abort on non-linefetch (0x1008) at 0xcc821600
<0>Internal error: : 1008 [#3] PREEMPT
<0>last sysfs file:
<0>PCI_CLKSTCTRL 00000102 PCI_CLKCTRL 00000002
<d>Modules linked in:
CPU: 0    Not tainted  (2.6.37+ #13)
PC is at __copy_from_user+0xac/0x39c
LR is at show_data.clone.0+0x9c/0x148
pc : [<801e27f0>]    lr : [<800828b8>]    psr: 00000193
sp : cbc2b92c  ip : 0000001c  fp : cbc2b98c
r10: 00000008  r9 : 00000000  r8 : 00000000
r7 : 00000000  r6 : cc821600  r5 : 00000000  r4 : cbc2a000
r3 : 00000000  r2 : ffffffe4  r1 : cc821600  r0 : cbc2b95c
Flags: nzcv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment kernel
Control: 10c5387f  Table: 80004019  DAC: 00000017

PC: 0x801e2770:
2770  ba000013 f5d1f000 e2522060 f5d1f01c ba000002 f5d1f03c f5d1f05c f5d1f07c
2790  e4b13004 e4b14004 e4b15004 e4b16004 e4b17004 e4b18004 e4b1c004 e4b1e004
27b0  e2522020 e8a051f8 aafffff3 e3720060 aafffff2 e212c01c e26cc020 108ff00c
27d0  ea000011 e320f000 e4b13004 e4b14004 e4b15004 e4b16004 e4b17004 e4b18004
27f0  e4b1e004 e08ff00c e320f000 e320f000 e4803004 e4804004 e4805004 e4806004
2810  e4807004 e4808004 e480e004 e8bd01e0 e1b02f82 14f13001 24f14001 24f1c001
2830  14c03001 24c04001 24c0c001 e28dd008 e8bd8011 e26cc004 e35c0002 c4f13001
2850  a4f14001 e4f1e001 c4c03001 a4c04001 e052200c e4c0e001 baffffec e211c003

LR: 0x80082838:
2838  e1520003 8a000040 e206a003 e1a02006 e59f0100 e28aac01 eb0da1ca e28aa01f
2858  e1a0300d e3a05000 e3c34d7f e3c66003 e3c4403f e1a0a2ca e1a09005 e59f00d8
2878  e6ff1076 eb0da1bf e3a07000 e5943018 e5948008 e3833003 e5849008 e5843018
2898  ee033f10 f57ff06f e3a00001 eb0db413 e3a02004 e24b0030 e0861007 eb057fa2
28b8  e50b0038 e3a00001 eb0db3d3 e5943000 e3130002 0a000000 eb0da361 e5943018
28d8  e3580000 13a02001 03a02003 e3c33003 e5848008 e1823003 e5843018 ee033f10
28f8  f57ff06f e51b3038 e59f0050 e3530000 0a000002 e59f0048 eb0da19a ea000001
2918  e51b1030 eb0da197 e2877004 e3570020 1affffd5 e2855001 e59f0028 eb0da191

SP: 0xcbc2b8ac:
b8ac  00000034 20000193 00000000 cbc2b8d8 804a38c9 ffffffff cbc2b914 cc821600
b8cc  00000000 cbc2b98c cbc2b8e0 803ed72c 800762a4 cbc2b95c cc821600 ffffffe4
b8ec  00000000 cbc2a000 00000000 cc821600 00000000 00000000 00000000 00000008
b90c  cbc2b98c 0000001c cbc2b92c 800828b8 801e27f0 00000193 ffffffff 00000001
b92c  00000000 cc821600 00000000 00000000 cbc2b95c 00000004 00000000 cbc2a000
b94c  800828b8 804e0dba 00000000 cbc2b97c cbc2bc00 803eafa8 cbc2bb58 cbc2a000
b96c  00000000 00000000 00000076 cbc2b990 804a399b cbc2ba24 cbc2b990 80082be0
b98c  80082828 00000017 804e0dba 804a2b0f ffffffff cbc2ba04 0000006e 805384fc

FP: 0xcbc2b90c:
b90c  cbc2b98c 0000001c cbc2b92c 800828b8 801e27f0 00000193 ffffffff 00000001
b92c  00000000 cc821600 00000000 00000000 cbc2b95c 00000004 00000000 cbc2a000
b94c  800828b8 804e0dba 00000000 cbc2b97c cbc2bc00 803eafa8 cbc2bb58 cbc2a000
b96c  00000000 00000000 00000076 cbc2b990 804a399b cbc2ba24 cbc2b990 80082be0
b98c  80082828 00000017 804e0dba 804a2b0f ffffffff cbc2ba04 0000006e 805384fc
b9ac  0000007a 00000063 00000076 61542020 3a656c62 30303820 31303430 44202039
b9cc  203a4341 30303030 37313030 cbc2ba00 cbc2b9f8 800d9ec4 803eaf90 804b6bed
b9ec  80588f14 cbc2b9f8 00000000 803efb3c cbc2bb58 cbc28000 80563344 cbc2a000

R0: 0xcbc2b8dc:
b8dc  800762a4 cbc2b95c cc821600 ffffffe4 00000000 cbc2a000 00000000 cc821600
b8fc  00000000 00000000 00000000 00000008 cbc2b98c 0000001c cbc2b92c 800828b8
b91c  801e27f0 00000193 ffffffff 00000001 00000000 cc821600 00000000 00000000
b93c  cbc2b95c 00000004 00000000 cbc2a000 800828b8 804e0dba 00000000 cbc2b97c
b95c  cbc2bc00 803eafa8 cbc2bb58 cbc2a000 00000000 00000000 00000076 cbc2b990
b97c  804a399b cbc2ba24 cbc2b990 80082be0 80082828 00000017 804e0dba 804a2b0f
b99c  ffffffff cbc2ba04 0000006e 805384fc 0000007a 00000063 00000076 61542020
b9bc  3a656c62 30303820 31303430 44202039 203a4341 30303030 37313030 cbc2ba00

R1: 0xcc821580:
1580 <1>Unhandled fault: external abort on non-linefetch (0x1008) at 0xcc821580
<0>Internal error: : 1008 [#4] PREEMPT
<0>last sysfs file:
<0>PCI_CLKSTCTRL 00000102 PCI_CLKCTRL 00000002

).  With the normal kernel we see a similar execption at offset 0x004, which I think is the CMD_STATUS register.  The mapping of the register base in both cases is 0xCC820000.  I have tried to use the emulator to examine the DM8168 after the fault, but for the registers all I see is 0xBAD0BAD0 when I try to access registers outside of the ARM core.

 

We are running the link as a x2 at GEN2 speeds.

 

From the DEBUG0 Register traces, we see the LTSSM go from state 2 to state e and then to state d, which seems to indicate that we went into the recovery state during training.

 

Initiate link training

dbg 1 00004a02

dbg 2 00000002

dbg 3 00000602

dbg 4 0091650e

dbg 5 007dae0e

dbg 6 00916b0e

dbg 7 0020300e

dbg 8 00cb8d0e

dbg 9 00352d0e

dbg 10 00ed750e

dbg 11 002b200e

dbg 12 00c0a50e

dbg 13 00f6420e

dbg 14 00089e0e

dbg 15 0046b50e

dbg 16 00fd060e

dbg 17 0070650e

dbg 18 00c1860e

dbg 19 000b8d0e

dbg 20 0000040d

dbg 21 00004a0d

dbg 22 00004a0d

dbg 23 00004a0d

dbg 24 00004a0d

dbg 25 0c00000d

dbg 26 0c00060d

dbg 27 0c00000d

dbg 28 0c004a0d

 

Since we are polling DEBUG0, it is likely we missed some of the state transitions with the debug kernel.

 

This particular problem shows up randomly at nominal conditions (voltage and temperature) on about 50% of the boards that we are testing.  The test is a continuous reboot after the kernel has loaded.  We have examined power, clocks, resets, the registers in the IDT switch, etc.  but so far have not been able to determine the root cause.  I have not seen anything in the IDT switch that lead me to believe that the switch is malfunctioning, but I will keep an open mind.  The traces are all routed as differential pairs with AC coupling caps (0.1 uF) on the transmitter side.

 

Looking the PCIe link between the DM8168 and the switch with a high speed scope after the fault, shows the upstream side of the link active with maybe a K28.5 symbol and the downstream side from the Netra quiescence.

 

 

Everything I seen so far seems to indicate that the PCIe module in the DM8168 has stopped.  The exception, I believe is a bus fault when we try to access registers, the downstream link is dead.  We modified a kernel to output the state of PCI_CLKSTCTRL and PCI_CLKCTRL at the exception and they look okay:

 

<1>Unhandled fault: external abort on non-linefetch (0x1008) at 0xcc821728

>

> <0>Internal error: : 1008 [#1] PREEMPT

>

> <0>last sysfs file:

>

> <0>PCI_CLKSTCTRL 00000102 PCI_CLKCTRL 00000002

 

Do you think there may be something external to the DM8168 that can cause the PCIe module to fail in this manner?  Any ideas on what I should look for next?

  • Hi Ken,

    Can you reproduce this issue on the DM816x TI EVM? Are you using EZSDK 5.05.02.00?

    Are you trying PCIe boot? Do you use DM816x PCIe in EP or RC mode?

    Do you have 2GB DDR3 on your custom board?

    Check if you hit some of the DM816x Silicon errata PCIe advisories: 2.1.42, 2.1.44 and 2.1.66.

    Regards,
    Pavel
  • Also the below two e2e threads looks to be related to your issue:
    e2e.ti.com/.../240588
    e2e.ti.com/.../328491

    Regards,
    Pavel
  • Pavel,

    We have not used the EVM for years as we already have several products in production using the DM8168.  This product has two DM8168s on two different boards, both with PCIe.  The DM8168 is the RC and we are booting from a SD card.  Main store is 2GB of DDR3 memory.  I checked with our software team and we are using the latest drop of the kernel.  The DM8168 is a speed grade 4 device.

    We've checked the eye openings on the x2 link when operating in Gen 2 in both upstream and downstream directions and do not see any issue.  The upstream  PCIe link is all on the same board routed next to a reference layer as we have done in the past on several of our DM8168 designs.

    On at least two of the failing boards, we noted yesterday while probing the switch that the links do not always train to Gen 2 when the exception does not occur.  We are working on an experiment to force the link to GEN 1 to see if the exception will still occur.

    We will review the errata and the other posts along with any errata from the switch manufacturer. 

  • We noticed that boards that tend to fail also tend to train to Gen1 speed instead of Gen2 (when not failing ) and if we force Netra to Gen1 speed  the issue will get resolved .

     

    Attached LeCroy PCIe analyzer traces :  

     

      1. Gen1 training (when Netra  is forced to Gen 1 speed)

      2. Gen1_2_1 training ( when the training end up in Gen1 speed after trained to Gen2 speed )

      3. Gen2 training (when training end successfully in Gen2 speed)

      4. Kernel Panic ( when training failed and DSP kernel panic and never recover )

    Ares_gen1_2_1_train_p.__probe.zipAres_gen1_train_p.zipAres_gen2_train_p3.__probe.zipAres_kernel_panic3_p.__probe.zip

  • Ken,

    As some of your custom boards work fine, some are failing with the same software, I think this is indication for a hardware malfunction of the failing custom boards.

    I will recommend you to perform full DDR3 testing, to verify DDR3 memory is correct. You can use the uboot mtest, CCStudio based DDR test, and DDR3 memory stress test, see the below e2e thread for details:
    e2e.ti.com/.../1046142

    Please perform full DDR3 test on the failing boards.

    Also, for Hardware Design Check, see:

    processors.wiki.ti.com/.../Hardware_Design_Checklist

    DM816x datasheet, section 8.14 Peripheral Component Interconnect Express (PCIe) - Please refer to the routing, design and layout specifications



    For the PCIe reference clock (100MHz, serdes_clkp/n), you can check:

    DM816x datasheet, section 7.3.2 SERDES_CLKN and SERDES_CLKP Input Clock

    processors.wiki.ti.com/.../DM816x_C6A816x_AM389x_PCIe_Clocking_Schemes

    Regards,
    Pavel

  • Pavel,

    DDR3 passes our memory testing and also LeCroy DDR3 compliance analysis. We have not tested DDR3 on the failing boards, but we can do that. We are focusing on PCIe signal integrity at this point. The routing of the differential traces looks good. We are looking at the eye with a good board and a bad board to see if there is any evidence of eye closure.

    Ken
  • PCIe Eye.pdfHere is the comparison of the PCIe between a good and bad board.

    There is no evidence of a HW issue on the board that experience the issues .

    Following are the eye diagrams of both links comparison  and also an error report  that indicate that although the eye looks the same , there are substantial number of errors reported from the downstream (Netra side ) on a bad board  .

    I guess we can move the Netra to the other board to prove the issue follow the device regardless of the board it is being assembled on .

    Koby

  • from IDT :

    Koby,

    There is something wrong with the root complex in the Ares_gen1_train_p.pex trace. Looks like the root complex brings its link down after sending nine TS2 in Rcvry.Rcvr.Cfg (Gen2). This causes the switch port to bring its link down and downgrade to Gen1. According to the PCI Express Specification, the root complex should stay in Rcvry.Rcvr.Cfg until it receives 8 consecutive TS2 Ordered Sets. It should only brings the link down and goes to Detect after 48ms timeout.

    Please refer to the attachment for detail.

    In the Ares_kernel_panic3_p.pex trace, it shows the root complex shuts its link down and the switch port goes into Compliance. There is no history on why the root complex shuts its link down.

    Regards,
    BryanAres.pdf