This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

DS125BR820: AER fail issue

Part Number: DS125BR820

Hello

             Customer use 300 set for testing. First time testing, has 60 set happen AER issue as below. Second time testing, has 29 set happen AER issue.

             Below is AER  log. command that is PCIE command under linux. Customer already do the TX eye diagram and loopback to CPU RX that also have good eye margin on CPU RX tool. The signal quality looks good.  

             What is below AER error? Is it related to link training or not? Has any method to solve it by tuning EQ/VOD of re-driver? Thank you. 

  

BR

Patrick

  • Hi Patrick,

    I found some information on Advanced Error Reporting, it showed the same error codes.  See the codes and information below.

    Capabilities: [100 v1] Advanced Error Reporting 
                     UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                     UEMsk:  DLP+ SDES- TLP+ FCP+ CmpltTO+ CmpltAbrt+ UnxCmplt+ RxOF+ MalfTLP+ ECRC+ UnsupReq+ ACSViol-
                     UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                      CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+ 
                     CEMsk:  RxErr+ BadTLP+ BadDLLP+ Rollover+ Timeout+ NonFatalErr+ 

    Since the error was a correctable error (“ CorrErr “), the interesting part of the AER output is the correctable error status (“ CESta “).  None of the bits are set except for the non-fatal error bit (“ NoNFatalErr+ “).  By the name of it (it’s an error, but not a fatal one…and it was correctable!)  Checking if the error is masked or not (“ CEMsk “) shows that the device vendor elected to mask that error (“ NonFatalErr+ “), so they didn’t think it was something that should be tricked up the PCIe device chain and handled either.  The PCI-SIG defines Correctable Non Fatal Errors as “Advisory” errors, and note that is should be used as an indication of a software problem, not to be indicative of an issue with the integrity or functionality of the PCIe bus.

    Regards,

    Lee