This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

RTOS/AM5728: EtherCAT AL Event Request register issue

Other Parts Discussed in Thread: AM5728

Hi

We also see problem with register 0x220 with AM5728.

It stuck with value 0x50, which means "the SM Change event is set".

This produces constant false indication of "SM Change event."

Are anyone aware of that problem?

Best regards

Rasty

  • The RTOS team have been notified. They will respond here.
  • Hi Rasty, could you please send us more information. Which EtherCAT slave version (and SDK) are you trying?, how is your setup configured? how to trigger de issue? do you have wireshark logs when this happens that you can share with us?

    thank you,
    Paula
  • Hi Paula,
    We follow processors.wiki.ti.com/.../PRU_ICSS_EtherCAT guide for "full slave" demo, till "Processor SDK 4.2 Migration Guide".


    You can easily reproduce it with TwinCAT.
    Just put device in PreOp and read register 0x220. It shall be zero.

    If you're not able to see it we can provide an access to our equipment.

    Best regards
    Rasty
  • Hi Rasty, When switching from OP to PreOP, AM572x EtherCAT slave, in TwinCAT. I see register 0x220 shows 0x50 value. I will consult our developers on what is the expected result.

    thank you,
    Paula

  • Hi Paula,

    Thank you for check,

    I attach wireshark record with a lot of malformed and missing frames.

    Please pay attention to frame 104134. There is not reply to this frame at all.

    Please discuss this with developers. What can be the reason for that?

    Thanks

    Rasty

    currupted-ethercat-traffic-frame-104134-no-reply-from-sitara.zip

  • Hi Rasty, I see 104133 shows in wireshark "Malformed Packet". I believe you have a probe in between your master (Twincat?) and the AM578 IDK board, is this assumption correct?. Could you also share how are you configuring your slave an master? so, we can try to reproduce it in our side.

    Mainly, which master are you using (I guess Twincat), how is your network connected (slave bus configuration), also which cycle time are you using, are you testing DC?, PDI/PDO sizes (if any change there), etc.

    thank you,
    Paula
  • Hi Paula,
    We use our own master, based on Etherlab, which is Open Source. We extensively use it with many other devices, including all types of Backhoff ESC, NetX, Infinneon and FPGA implementation. We did not see missing and corrupter frames at such scale before.
    We use BeckHoff protocol tantalizer connected between master and slave.

    I believe that in order to reproduce it you would need to replicate the similar traffic and command pattern.
    We use DC, PDI/PDO. But the problem happens even before DC and process data is involved.
    We use cycle time of 4msec but we want to reach 0.25 in the future.

    You should try to "re-play" our records and see if you get lost or corrupted frames.

    Bestregards
    Rasty
  • Hi Rasty, for the future please open a thread per issue, so we can track them easier. About the second issue (malformed packets). We will analyze the logs and come back to you. One question, I see Malformed packets, then some normal traffic, and them Malformed packets again. So, the system recovers and fails later? do you have any suspicious on what could trigger it?

    Thank you,

    Paula

  • Hi Paula,

    Question of logistics.. Do you have issue tracker that assigns an ID for every issues? In case an issue is confirmed .and patch is released I'd like to get the patch with the list of issues that it addresses. Is it possible?

    Back to your question.

    majority of traffic are frames with multiple commands, sometimes 4 requests per frame. Like read/write or read/multiple write.

    I guess that PRU has a performance issue and cannot process so many requests in single frame.

    I do not see other explanation why the same pattern works out in majority of cases and fails only from time to time.

    Our Master complains that requests are not fulfilled, which is annoying, but recoverable.

    What is not recoverable is lost write to register 0x120, which is AL control, in that case Slave stack just times out.

    So whole picture is pretty confusing, every run we get different failure pattern.

    Bottom line: expectation is nearly zero corrupted or lost frames, maybe 1 or 2 per month under heavy load, plus electrical noise from application.

    Best regards

    Rasty 

  • Hi Rasty, We have a list of known issues here
    processors.wiki.ti.com/.../PRU_ICSS_EtherCAT_Release_Notes

    Thanks for clarifying. Let us discuss internally possible additional test and come back to you.

    Paula
  • Hi Paula,
    I do not see anything related to AM572x that sounds like explanation.
    I'd like to get recommendations from developers how to deal with that problem.
    Please do not forget original report about behavior of 0x220.
    Best regards
    Rasty
  • Hi Rasty, I haven't forget about the first issue reported (reg 0x220), but that is the reason it is preferable to keep issues in separate threads =).

    About, "Malformed Packet" issue. There is actually one known issue which could be related to this.

    Could you try suggested workaround and let us know if it works?

    thank you,

    Paula

  • Hi Paula,

    Would you please tell me what frame contains commands that match mentioned in errata problem and suggested workaround?

    What do you I miss?

    Thanks

    Rasty

  • Rasty, After checking FMMU configuration in Wireshark, we observed it is a non-overlapping configuration. This could be the issue. Contiguous logical addresses for inputs from all slaves, and for outputs from all slaves, without overlapping input and output from each slave is not supported.

    Two possible options (workarounds):
    1) Use overlapped access from master side. which is also more optimal in terms of datagram utilization. 
    2) Increase TX start delay.

    C:\TI\PRU-ICSS-EtherCAT_Slave_01.00.05.00\protocols\ethercat_slave\include\tiescbsp.h
    #define ENABLE_MULTIPLE_SM_ACCESS_IN_SINGLE_DATAGRAM 1

    #if ENABLE_MULTIPLE_SM_ACCESS_IN_SINGLE_DATAGRAM
    #define TIESC_PORT0_TX_DELAY 0x98
    #else
    #define TIESC_PORT0_TX_DELAY 0x48
    #endif
    #define TIESC_PORT1_TX_DELAY TIESC_PORT0_TX_DELAY

    For your information, this is related to PINDSW-141 known issue (from release notes).

    About Issue1 (wrong value of 0x220 register, when switching between OP to PreOP): we are working to confirm if it is a bug or not. We will keep you post it.

    thank you,
    Paula

  • Hi Paula,

    Suggested workaround "ENABLE_MULTIPLE_SM_ACCESS_IN_SINGLE_DATAGRAM 1" solves problem of missing and corrupted frames like:

    EtherCAT datagram(s): 3 Cmds, 'ARMW': len 8, 'FPRD': len 8, 'LRW': len 22:
    EtherCAT datagram: Cmd: 'ARMW' (13), Len: 8, Adp 0x0, Ado 0x910, Cnt 0
    EtherCAT datagram: Cmd: 'FPRD' (4), Len: 8, Adp 0x1, Ado 0x990, Cnt 0
    EtherCAT datagram: Cmd: 'LRW' (12), Len: 22, Addr 0x0, Cnt 0

    Thank you very much!!!!

    Please explain side effects of  ENABLE_MULTIPLE_SM_ACCESS_IN_SINGLE_DATAGRAM set to "1".

    We will have 100 bytes of process data in each direction, 200 bytes in total. Is it a problem?

    Best regards

    Rasty

  • Hi Rasty, thanks for confirming. I will come back to you with more information about workaround 2 (increasing TX starts delay time).

    However, could you try to Workaround1 (overlapping access mode)? As this is a preferable solution. This is a workaround in the master configuration, instead of in the slaves. And it is a more optimal in terms of Datagram utilization.

    thank you,
    Paula
  • Hi Paula,
    I feel that this workaround fixed my problem by coincidence. Corrupted frames appear long before configuration of FMMU!
    As long as it works I'm fine, but I need to know limitations.

    I have a new/next problem I open separate thread.

    Thanks
    Rasty
  • Hi Rasty, could you please clarify? which workaround do you feel fixed by coincidence 1 or 2? I guess you are referring to 1 (overlapping access mode)... Anyhow, want to confirm =)

    Thanks
    Paula
  • "ENABLE_MULTIPLE_SM_ACCESS_IN_SINGLE_DATAGRAM 1" solved corrupted and lost frames
  • Hi Rasty thanks for confirming. We have more details on this known issue in our PRU-ICSS EtherCAT 1.0.5 errata

    In summary, this issue happens due to the significant overhead in terms of PRU cycles that are need for switching from one FMMU/SM to the second FMMU/SM.
    Such an overhead does not apply in overlapped use case, as both FMMU/Read SM and FMMU/Write SM information, are loaded during processing of datagram's header. This applies in the case of multiple slaves. We are confirming this also apply in case of a single slave.

    If you use workaround 2 (increasing TX start delay time). The limitation would be an increase in process path latency (from 360ns to 760ns). This may impact the cycle time as number of slaves in the network increase. Also with TX_START_DELAY set to 760ns (0x98), min IPG shall be maintained at 850ns, so you would need to be careful in case of a large network of slaves.

    thank you,

    Paula

  • Hi Rasty, let me give you a quick update 

    Issue1: EtherCAT slave wrong register 0x220 value. We were able to root cause it.  And a fix is planned to be integrated on our next release. PRU-ICSS EtherCAT slave 1.0.6

    Issue2: Malformed packets:

    • Workaround1 Input and output overlaid on the same logical address range:  We confirmed this workaround also applies in the case of 1 slave. One branch of IgH master, already has this workaround implemented (ecrt_slave_config_overlapping_pdos( )). For additional details please refer to this E2E thread.
    • Workaround2 increasing TX start delay: You already confirmed it worked for you. Thanks for your feedback!. A list of possible side effects and things to be careful were updated in this thread (above).

    Thank you,

    Paula

  • Hi Paula
    Thank you very much!
  • PRU firmware 1.0.6 solves this problem.
    Thanks