This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

XIO2001: About the causes of Non-Fatal ERROR

Part Number: XIO2001

Hi Team,

I have a question about XIO2001.
When I intentionally generated an Unsupport request (UR), some status changed in the device status. See Table 1 for device status.

See Table 2 for the settings of "8.4.62 TL Control Diagnostic Register 0" and "8.5.5 Uncorrectable Error severity Register" related to error handling.

The PCIe standard is defined as follows.
"If the severity of the UR/CA error is non-fatal, the Completer must handle this case as an Advisory Non-Fatal Error.
A Completer with AER signals the non-fatal error (if enabled) by sending an ERR_COR Message. A Completer without AER sends no Error Message for this case."

In this case, please tell me why only correctable errors should be detected according to the standard, but nonfatal errors are also detected.

<Platform>
CPU:CORE i3-2310E (Sandy Bridge)
PCH:HM65 (Cougar Point)

Best Regards,

Wakumura

  • Hi Wakumura-san,

    I am not sure if I understand your question completely. The XIO2001 by default supports Alternate Error Reporting and Unsupported Request error severity as Non-fatal, so any Advisory Non-Fatal Error reporting is supported. According to the status bits you've indicated, the XIO2001 has correctly reported a Non-fatal error and Unsupported Request.

    Please help to clarify if I've misunderstood. Please also clarify which PCI Express specification you are referencing.

    Best,
    David

  • Hi David-san,

    Thank you for your reply.
    I apologize for the lack of explanation.

    The referenced standard is "PCI Express® Base Specification Revision 6.0".

    The standard's "6.2.3.2.4 Advisory Non-Fatal Error Cases" states the following:" "Advisory Non-Fatal Error" cases are predominantly determined by the role of the detecting agent (Requester,Completer, or In such cases, an agent with AER signals the non-fatal error (if enabled) by sending an ERR_COR Message as an advisory to In such cases, an agent with AER signals the non-fatal error (if enabled) by sending an ERR_COR Message as an advisory to an advisory . "

    From the standard, we recognize that in case of Advisory Non-Fatal Error, ERR_COR is sent and ERR_NONFATAL is not sent.
    Why is ERR_NONFATAL being sent?

    Best Regards,

    Wakumura

  • Hi Wakumura-san,

    Thank you for the additional clarification. Could you please provide the register readbacks from sections 8.5.3, 8.5.4, 8.5.6, and 8.5.7 from the data sheet?

    If possible, a dump of the entire PCI Express register space would be preferred.

    Best,
    David

  • Hi David-san,

    Thank you for your specific instructions.
    Send the results of the register readbacks.

    We are checking to see if it is ok to send a dump of the entire PCI Express register space. We will contact you as soon as we can confirm.
    Please check with the register readback we sent this time.

    Best Regards,

    Wakumura

  • Hi Wakumura-san,

    Thank you for providing the requested register readbacks.

    When reviewing the specifications for PCI Express to PCI/PCI-X Bridges (Revision 1.0), I noted the following in Section 10.1.1.2 (Optional Support):

    Bridges optionally check for ECRC errors while receiving a TLP from PCI Express with ECRC 5 protection. The ability of a bridge to detect ECRC errors is reported via the Advanced Error Reporting capability. 

    When a bridge checks and detects an ECRC error, it must:

    • Drop the transaction (i.e., not forward it to the conventional PCI/PCI-X interface). Since no positive determination can be made of whether the transaction is a request (posted or non10 posted) or completion, the bridge must discard the packet.
    • Set the ECRC Error Status bit in the Uncorrectable Error Status register of the Advanced Error Reporting capability, log the header, update the First Error Pointer field in the Advanced Error Capabilities and Control register, and generate an error message on PCI Express per the Advanced Error Reporting rules (specified in Chapter 7 of PCI Express Base 1.0a). The default 15 error severity for ECRC errors is ERR_NONFATAL. Follow the PCI Express Base 1.0a rules for setting the Non-Fatal Error Detected or Fatal Error Detected bit in the Device Status register.
    • Set the Detected Parity Error bit in the bridge’s Status register.

    Based on this specification's description, I believe the bridge is correctly signaling a non-fatal error.

    Best,
    David

  • Hi David-san,

    Thank you for your reply.

    Table3:Uncorrectable Error Status Register contents are as follows.
    Bit:19 ECRC_ERROR = 0, Bit:20 UR_ERROR = 1.

    You mentioned that ECRC_ERROR causes Non-Fatal Error, but ECRC_ERROR is not sent in the Uncorrectable Error Status Register.
    Our question is why we get ERR_NONFATAL when UR_ERROR occurs.

    Question 1)
    From the datasheet, XIO2001 is compliant with PCI Express® Base Specification Revision 2.0.
    From PCI Express® Base Specification Revision 2.0, p.353 Figure 6-2: Flowchart Showing Sequence of Device Error Signaling and Logging Operations

    UR is Uncorrectable Error (see Table 6-4).
    Since UR is an Adivisory Non-Fatal Error, it does not appear to be a branch that sends ERR_NONFATAL.

    Question (2).
    Since you are sending dump data, what is the cause of ERR_NONFATAL?

    Best Regards,

    Wakumura

  • Hi Wakumura-san,

    It appears that the unsupported request sent is not deemed as Advisory Non-Fatal. Could you please provide additional clarification on how the unsupported request is generated? Is this from the PCIe bus or PCI bus?

    Additionally, in the PCI configuration space log that you've provided, the unsupported request is not being sent. Could you please provide a dump from the occurrence when the unsupported request is sent?

    Best,
    David

  • Hi David-san,

    Thank you for your reply.

    > Could you please provide additional clarification on how the unsupported request is generated ?
    > Is this from the PCIe bus or PCI bus?
    We accessed the PCI configuration space of a PCIe device that does not exist.
    At that time, XIO2001 Device Status Register (7Ah) at Bit3 (Unsupported request detect) is "0→1".
    Device Status Register (7Ah) at Bit1 (NonFatal Error detected) is "0→1".
    Device Status Register (7Ah) at Bit0 (Correctable Error detected) is "0→1".

    > It appears that the unsupported request sent is not deemed as Advisory Non-Fatal.
    Which register can we look at to say that?
    Uncorrectable Error Severity Register (10Ch) at Bit20 (Unsupported request error severity.) is "0 : Error condision is signaled using ERR_NONFATAL".

    > Additionally, in the PCI configuration space log that you've provided, the unsupported request is not being sent.
    Which register can we look at to say that?
    I determined whether XIO2001 detected UR by Device Status Register (7Ah) at Bit3 (Unsupported request detect) is "1".
    I would like to know why when XIO2001 detects UR, it also detects Non-Fatal Error.

    Best Regards,

    Wakumura

  • Hi Wakumura-san,

    UR are categorized as Uncorrectable Non-Fatal by the PCIe specification (Table 6-4). To me this is why both of these bits are set.

    Best,
    David

  • Hi David-san,

    Question 1)

    From the flowchart in Figure 6-2: Flowchart Showing Sequence of Device Error Signaling and Logging Operations,
    In the case of the register readback result, is ERR_NONFATAL sent through the flowchart indicated by the red arrow in Figure 6-2?

    Question 2)

    From PCI Express Base Specification Revision 2.0 6.2.3.2.4.1. Completer Sending a Completion with UR/CA Status, "If the severity of the UR/CA error is If the severity of the UR/CA error is non-fatal, the Completer must handle this case as an Advisory Non-Fatal Error.A Completer with AER signals the non-fatal error (if enabled) by sending an ERR_COR Message. "
    If UR is handled as Advisory Non-FatalError, wouldn't it go through the green arrow flowchart as shown in Figure 6-2?

    Best Regards,

    Wakumura

  • Hi Wakumura-san,

    1. I agree that, if all conditions are met in the red arrow you've shown, ERR_NONFATAL would be sent as a completion status.
    2. If the UR were handled as Advisory Non-Fatal, yes it should go through the green arrow labeled in your diagram.

    I believe that, in this case, the bridge correctly detects the Advisory Non-Fatal Error by setting its status bit 13 to 1 in PCI configuration register 110h (as seen in the PCI configuration space dump).

    Additionally, because the UR is detected, the unsupported request detected bit (bit 3) in PCI configuration space register 7Ah is set to 1. Unsupported Requests are directed to be signaled as ERR_NONFATAL by PCI configuration space register 10Ch, bit 20 being set to 0.

    Additionally, in PCI configuration register 78h (Device Control Register), Correctable error reporting (bit 0) is not enabled, as it is currently set to 0. So, the bridge would not send an ERR_COR message due to this configuration, but correctly detects the correctable error in bit 0 of PCI configuration register 7Ah (this bit is set to 1).


    Best,
    David

  • Hi David-san,

    Does the XIO specification mean that even if the UR is handled as Advisory Non-Fatal, ERR_NONFATAL is also sent?

    If so, does this violate the flowchart in From PCI Express Base Specification Revision 2.0 Figure 6-2 and 6.2.3.2.4.1. Completer Sending a Completion with UR/CA Status ?

    Best Regards,

    Wakumura

  • Hi Wakemura-san,

    No, I do not believe this is the case. This device is certified on the PCI-SIG integrator's list in compliance with PCIe Base Specification Revision 2.0.

    I have noted above that, by default, the bridge is configured to not send ERR_NONFATAL or ERR_COR to the Root Complex, as noted in data sheet Table 8-36. In the flowchart, it is stated that if CERE is not enabled, ERR_COR is not sent upstream. The same logic applies to ERR_FATAL and ERR_NONFATAL, with the added logic of the SERR_EN bit.

    Best
    David

  • Hi David-san,

    Sorry, We already understand the conditions under which ERR_COR, ERR_FATAL, and ERR_NONFATAL are sent upstream.

    The XIO2001 by supports Alternate Error Reporting and Unsupported Request error severity as Non-fatal, so UR is the recognition that becomes Advisory Non-Fatal Error.
    Advisory Non-Fatal Errors are categorized as Correctable errors, so I think XIO Device Status Register (7Ah) at Bit3 (Unsupported request detect) is "0→1" and at Bit0 (Correctable Error detected) is "0→1".
    But I don,t think XIO Device Status Register (7Ah) at Bit1 (NonFatal Error detected) can be "0→1".

    From PCIe Base Specification Revision 2.0, About "Flowchart Showing Sequence of Device Error Signaling and Logging Operations (Figure 6-2)".
    When branching to "Yes" under condition "Advisory Non-Fatal Error? (Section 6.2.3.2.4)", transition to "Set Correctable Error Detected bit in Device Status reg".
    It does not transition to "Set Fatal/Non-Fatal Error Detected bit in Device Status reg".
    Therefore, it is impossible for XIO2001 Device Status Register (7Ah) at Bit1 (NonFatal Error detected) is "0→1".
    Am I wrong in my understanding?

    The fact that NonFatal Errors are not detected (For Advisory Non-Fatal Errors send an ERR_COR message, Instead of sending ERR_NONFATAL) is also written in places other than Figure 2-6.

    <From PCIe Base Specification Revision 2.0>

     ・About "6.2.3.2.4. Advisory Non-Fatal Error Cases"
      The following is written : an agent with AER signals the non-fatal error (if enabled) by sending an ERR_COR Message as an advisory to software, instead of sending ERR_NONFATAL.

     ・About "6.2.3.2.4.1. Completer Sending a Completion with UR/CA Status".
      The following is written : If the severity of the UR/CA error is non-fatal, the Completer must handle this case as an Advisory Non-Fatal Error.
      A Completer with AER signals the non-fatal error (if enabled) by sending an ERR_COR Message.

     ・About "UR Detecting Agent Action (Table 6-4)".
      The following is written : Send ERR_NONFATAL to Root Complex or ERR_COR for the Advisory Non-Fatal Error case described in Section 6.2.3.2.4.1.

    Best Regards,

    Wakumura

  • Hi Wakumura-san,

    I understand the belief that the Advisory Non-Fatal Error should not set the Non-Fatal Error detected status bit. Do you believe that this sole transaction of an Unsupported Request sets the Non-Fatal Error detected bit to 1?

    Best,
    David

  • Hi David-san,

    The steps for XIO2001 to detect UR are as follows.

     1) In my system, XIO2001 is B:7/D:0/F:0.
     2) I intentionally accessed a non-existent device (B:7/D:0/F:3).
     3) At that time, XIO2001 Device Status Register (7Ah) at Bit3 (Unsupported request detect) is "0→1".
    Device Status Register (7Ah) at Bit1 (NonFatal Error detected) is "0→1".
    Device Status Register (7Ah) at Bit0 (Correctable Error detected) is "0→1".

    Please answer some of the questions below.

    Q1:Is it possible that both correctable and non-fatal errors are detected during UR detection depending on register settings ?
    Q2:Is there a potential bug in XIO2001 where both Correctable Error and Non-Fatal Error are detected when detecting UR ?
    Q3:Are there any possible reasons why both correctable and non-fatal errors are detected during UR detection ?

    Best Regards,

    Wakumura

  • Hi Wakumura-san,

    Please allow me to review your questions and provide a response within the next 24 hours.

    Best,
    David

  • Hi Wakumura-san,

    1. I believe this may be possible based on configuration of the bridge.
    2. It is difficult to say if this is a bug or not.
    3. From the PCI Express to PCI/PCI-X Bridge Specification Rev1.0, Unsupported Request error reporting is to be handled according the PCI Express Base Specification Rev. 1.0a. This may be the point of discrepancy.
      • From PCI Express Specification Rev. 1.0a, the sequence of error reporting is specified by Figure 6-2.
      • Based on the figure, the specification outlines to "Set Fatal/Non-Fatal Error Detected bit in Device Status Register".

    I believe this may be why this bit is being set.

    Best,
    David

  • Hi David-san,

    I understand this may be possible based on configuration of the bridge.

    Thank you for your response.

    Best Regards,

    Wakumura

  • Hi David-san,

    Let me confirm two points.

    ・Have you been able to reproduce this phenomenon?

    ・Do you plan to include it in the following errata sheet?

    Best Regards,

    Wakumura

  • HI Wekumura-san,

    We have not seen this issue before, nor have I been able to reproduce it. I will have to continue to review to consider it for the errata at this time.

    Best,
    David