IND-COMMS-SDK: EtherCAT cyclic input data triple-buffering appears to be broken

Dominic Rath

Part Number: IND-COMMS-SDK

Tool/software:

Dear TI team,

we've encountered a rather serious issue with the way the AM64x EtherCAT sub-device handles triple-buffering for input (sub-device -> main-device) data.

We first encountered the issue in a custom application, but have been able to recreate the issue with the "Beckhoff" sub-device example just as well.

Industrial communications SDK 09.02.00.08
ethercat_slave_beckhoff_ssc_demo on an AM64x EVM (Rev. C, SR2.0 hardware)
10 ms EtherCAT main-device cycle
EtherCAT sub-device configured in free-running mode via 0x1c32.1/0x1c33.1

When configured for free-running mode, PDO_InputMapping is called from MainLoop and during my tests updated the inputs every ~10-16us.

PDO_InputMapping calls HW_EscWriteIsr to write the cyclic inputs to ESC memory. In HW_EscWriteIsr bsp_get_process_data_address is called to get the actual offset of the "current" triple-buffer.

I would have expected the updates to iterate between 2 (or 3?) of the three buffers, but using my logging (very low overhead logging to a ring-buffer in DDR memory) I can see that there are ~600 consecutive writes to one address (0x140e in that case), then ~600 writes to the next address (0x1400), then ~600 writes to the next address (0x1407), then ~600 writes to the next address (again 0x140e), and so on. The time between buffer switching is pretty much exactly 10ms, i.e. every time an EtherCAT frame reads out the SyncManager the PDI gets a different buffer next time it calls bsp_get_process_data_address .

Attached is a CSV file with our logging. The sub-device example is only modified to include our logging facility and log output in HW_EscWriteIsr:

void HW_EscWriteIsr(uint8_t *pData, uint16_t Address, uint16_t Len)
{
    int16_t sm_index;
    uint16_t ActualAddr = bsp_get_process_data_address(pruIcss1Handle, Address, Len,
                          &sm_index);

    LOGGERBUF_LOG3(main, 0x1, "HW_EscWriteIsr", Address, Len, ActualAddr);

In the logging "ts" and "tsdiff" are measured via R5f cycle counter (800 MHz). The arguments from above loggin ("Address", "Len" and "ActualAddr") are printed in hex and decimal. You can see that from the start PDO_InputMapping kept writing to the buffer at 0x140e until ~600 updates later when it switches to 0x1400. The duration for those ~600 updates is again ~10ms.

This tripe-buffering behavior is problematic because the sub device can't deliver the most recent data to the network if it is continously updating the same buffer. That is also how we noticed the issue in the first place:

Free running sub-device updating the cyclic input data at ~13x the EtherCAT cycle
Each update stored an incrementing counter in the input data
Every EtherCAT cycle would normally show the counter incrementing by 13 or 14 (the cycles are asynchronous), but every once in a while (e.g. ~80 out of 3000000) the counter would NOT increment between two EtherCAT cycles, i.e. we received the same cyclic input data twice.
That means we didn't get a "slightly" older (where the counter would be +11 or +12 instead of +13/+14 input data update, but we got input data that was a whole EtherCAT cycle old (13 or 14 cyclic input data updates before)

I diffed the EtherCAT sub device firmware between 09.02.00.08 and 09.02.00.15 and these are identical, so I don't expect any fix for this issue in the .15 ind. comms SDK.

We don't know if this problem was also present in older versions, we only noticed it just now.

During "normal" SM-synchronous operation you wouldn't directly notice this issue, because in that case there would be one frame for each input data update.

Best Regards,

Dominic

ethercat_slave_sm3_address_beginning.zip

over 1 year ago

0 PratheeshGangadhar over 1 year ago

TI__Mastermind 49191 points

Dominic Rath said:
10 ms EtherCAT main-device cycle

Hi Dominic

Do you see similar behavior at lower cycle times like 50 or 100 us ?

Regards

Pratheesh

0 Dominic Rath over 1 year ago in reply to PratheeshGangadhar

Mastermind 7560 points

Hello Pratheesh,

not sure what you're asking. I guess you're talking about a lower input data update rate, e.g. updating once per 50us or 100us instead of once per ~10-16us, correct?

We had a scenario like that in the customer application when it was operating DC synchronous:

Both the main-device (network) cycle and the sub device input cycle (not exactly Sync0, but a device-local clock based on DC) where running at 1ms. "Normally" you'd get one input update per one network cycle, but due to main-device jitter there were situations where we had two input update cycles and no network cycle in between. In that case we saw the same triple-buffer address twice. That is basically a symptom of the same error, but in that case you rarely see it, because you mostly have one input update per one network cycle.

If you're talking about much faster network cycles, e.g. 50/100us instead of 10ms: We tested with 1ms network cycle, too. I'll see how much faster this setup can go, but this is a more generic main-device setup that isn't focused on very fast cycle times.

One more thing I'd like to test is how "recent" the data returned is. I.e. if I start putting a counter in the cyclic data, what will the first counter value seen on the network be.

Regards,

Dominic

0 Dominic Rath over 1 year ago in reply to Dominic Rath

Mastermind 7560 points

Small update:

I've added a delay to APPL_InputMapping so that input data updates happen at 150us (instead of ~10-15us) or more, with the same result.
I've capture the EtherCAT frames and the first data returned in SAFEOP (frame #91) from the sub-device is all zeroes, even though I'm now writing a counter, starting with 0x1.
The second cyclic data returned (frame #94) includes the count written in APPL_InputMapping immediately before it switches from buffer 0x140e to buffer 0x1400.

I've attached a pcap trace and my own logging (arg4 is my counter, arg5 is a fixed value of 0x55aa).

Since PDO_InputMapping is called first in StartInputHandler, I believe the first frame should already carry valid data, not zeroes.

Regards,

Dominic

ethercat_slave_counter.zip

0 Dominic Rath over 1 year ago in reply to Dominic Rath

Mastermind 7560 points

Another update:

We also believe that the cyclic data returned is thus always one cycle behind. In the pcap trace you can see that the slave is requested to go to SAFEOP in frame #83. ~7.6ms later (timestamps are not very accurate) the first cyclic frame is seen in frame #91. If we look at the timestamps from the logging, it takes ~7.2ms (5.6 million cycles) from the first PDO_InputMapping (presumably called from StartInputHandler due to the switch to SAFEOP) to the first switch of the triple-buffer (0x140e -> 0x1400). Buffer 0x140e was last written with counter 0x28, which can only be seen in frame #94, another 4ms (more likely 5ms, my cycle time) later.

Can someone confirm our understanding that this is NOT the intended behavior of the triple-buffering? We assume that the PDI should be alternately writing to one of two (or three?) buffers, so that a frame on the network might always pick-up the latest data.
Is this a known issue?
Is this a regression, or was this always broken?

Best Regards,

Dominic

0 Dominic Rath over 1 year ago in reply to Dominic Rath

Mastermind 7560 points

One more update:

I'm seeing the same issue with 09.02.00.15.

Regards,

Dominic

0 Dominic Rath over 1 year ago in reply to Dominic Rath

Mastermind 7560 points

... and the same behavior with 08.06.00.45. I don't intend to check any earlier versions.

0 Aaron Thomas over 1 year ago in reply to Dominic Rath

TI__Genius 10725 points

Hi Dominic,

Please expect a reply by EOD tomorrow. Meanwhile, I'll go through the thread and pcap file provided.

Regards,
Aaron

0 Jonas Ens over 1 year ago in reply to Aaron Thomas

Prodigy 140 points

Hello,

I would just like to add briefly that I have also noticed a similar behavior in the following setup:

Running ethercat slave simple demo of industrial communications sdk 09.01.00.03 on AM64x
- I have modified the stack to send received process data back to the master
- running in SyncManager2 synchronous mode (but also in Distributed Clocks mode with certain settings)
PLC running Codesys as master
- increments process data by 1 on each EtherCAT cycle
- determines the difference between sent and received process data
Tested with EtherCAT cycle time of 100µs and 1000µs

In most EtherCAT cycles the determined difference is 1, just as I would have expected. But in quite a few cycles the difference is bigger than 1. From this I conclude that the process data in the slave is not always up to date.

Regards,
Jonas

0 PratheeshGangadhar over 1 year ago in reply to Dominic Rath

TI__Mastermind 49191 points

Dominic Rath said:
Can someone confirm our understanding that this is NOT the intended behavior of the triple-buffering? We assume that the PDI should be alternately writing to one of two (or three?) buffers, so that a frame on the network might always pick-up the latest data.

Is this a known issue?

Is this a regression, or was this always broken?

This was always the case and matches ET1100 behavior for TxPDO IIRC, as MainDevice picks up the data on next Sync0 in DC mode

https://e2e.ti.com/support/processors-group/processors/f/processors-forum/784264/rtos-processor-sdk-am335x-ethercat-communication-latency---loopback-test/2903185#2903185

This is all assuming the Slave is running in DC mode. And readings we get for 1ms cycle time is

1) PLC to SDevice-> ~0.3 cycle (Rx PDO)

2) SDevice to PLC -> ~1.7 cycles (Tx PDO)

0 Dominic Rath over 1 year ago in reply to PratheeshGangadhar

Mastermind 7560 points

Hello Pratheesh,

I'm not sure how your reply relates to our problem.

About the problem mentioned by Jonas (not related to our project) and my concerns about data always being "too old": We'll verify that on the customer application. From the tests I ran I believe the data would have to be "one cycle behind", but that's not the main issue.

The main issue is that the triple-buffering can't possibly work correctly if we're writing to the same buffer address repeatedly.

I believe I've verified beyond doubt that the TI beckhoff slave example is continuously writing to the same buffer address, and I believe we can agree that this behavior is wrong?

Aaron Thomas: Have you been able to verify our findings?

Regards,

Dominic

0 Aaron Thomas over 1 year ago in reply to Dominic Rath

TI__Genius 10725 points

Hi Dominic,

Dominic Rath said:
Have you been able to verify our findings?

Not yet. I will check upon this and get back to you by end of next week.

Regards,
Aaron

0 Dominic Rath over 1 year ago in reply to Aaron Thomas

Mastermind 7560 points

Hello Aaron,
Hello Pratheesh,

I ran another test in SM synchronous mode. In that case there appears to be NO issue. I'm looping back an output byte to an input byte. In the network trace I can see that the input goes high one frame after the output went high.

I also ran a couple of tests in DC mode. Normally that is fine, but when the frames on the network jitter too much I ran into a similar issues.

In one test case I artificially delayed the main device send frame by 150us (at 1ms cycle time). I saw the same buffer address twice in that case in HW_EscWriteIsr. I don't think that should legitimately happen.

In another test case I artifically delayed the main device send frame by 1200us (at 1ms cycle time). In that case I saw the same buffer address twice, too, but I also didn't get the latest data from APPL_InputMapping when the delayed frame arrived at the sub device:

previous cycles up to frame #20094 are just fine
- sub device is configured as input synchronous, so Sync0 triggers before the frame (I see Sync0 ISR ~110us before PDI ISR)
- at frame #20094 the counter 0x266c is received from the sub device
frame #20095 is delayed by ~1.2ms and immediately followed by #20096
- Sync0 triggered twice, at idx 48928 and at 48933 with no frame in between
  - at idx 48932, HW_EscWriteIsr writes counter 0x266d to the triple buffer at 0x1400
  - at idx 48936 (999us later), HW_EscWriteIsr writes counter 0x266e to the triple buffer at 0x1407
- The PDI ISRs for the delayed frames can be seen at idx 48936 and idx 48939, ~311us AFTER the HW_EscWriteIsr
when frames #20097 and #20098 arrive at the main device, they both see the counter 0x266d that was written ~1.3ms earlier
when frame #20100 is received at the main device, it carries the counter 0x266f
when frame #20102 is received at the main device, it carries the counter 0x2670

I don't think it is legitimate for the sub-device to return stale data in frames #20097 and #20098, when HW_EscWriteIsr was called with newer data ~300us before the frame.

This is of course artifically provoked, but a main device could jitter or a frame could get lost. That could cause a DC synchronous sub device to write an input buffer twice, and I'd expect the LATEST data to be returned. This doesn't seem to be the case here.

I think that the issue in DC mode and the free running issue I initially reported a related. It seems there is a problem when an input buffer gets written twice with no frame in between, which is the default behavior in free-running mode, but could of course happen in DC mode, too.

Let me know if you have questions regarding the test setup and the traces.

Best Regards,

Dominic

ethercat_slave_loopback_dc_input_1200_09020015.zip

0 PratheeshGangadhar over 1 year ago in reply to Dominic Rath

TI__Mastermind 49191 points

Hi Dominic

Thanks for the detailed explanation of scenario. We had a look at the firmware and driver interface and found a potential issue.

In HW_EscWriteIsr ind-comms-sdk/source/industrial_comms/ethercat_slave/beckhoff_stack/stack_hal/tieschw.c at main · TexasInstruments/ind-comms-sdk (github.com) can you try commenting out bsp_process_data_access_complete ind-comms-sdk/source/industrial_comms/ethercat_slave/icss_fwhal/tiescbsp.c at main · TexasInstruments/ind-comms-sdk (github.com) which does assignment of LOCK_PD_BUF_HOST_ACCESS_FINISH to lock_state. This appears that this state is blocking buffer pointer exchange in firmware. Since bsp_process_data_access_complete shared by both Write and Read APIs for testing - please copy updated version directly to HW_EscWriteIsr. Let me know whether this helps.

Regards

Pratheesh

0 Dominic Rath over 1 year ago in reply to PratheeshGangadhar

Mastermind 7560 points

Hello Pratheesh,

thanks for looking into this. I'll give this a try, but I'm not really convinced yet. How is the sub-device firmware supposed to detect that the PDI updated the SM if not by bsp_process_data_access_complete?

A "normal" ESC would detect a write to the last byte of the SM, but that is obviously out of the question for the PRU implementation.

I'm also not completely sure what you mean by this:

PratheeshGangadhar said:
Since bsp_process_data_access_complete shared by both Write and Read APIs for testing - please copy updated version directly to HW_EscWriteIsr.

I guess you just want me to comment out the call to bsp_process_data_access_complete, like this:

HW_EscWriteIsr is only called by PDO_InputMapping, and bsp_process_data_access_complete doesn't do anything except write LOCK_PD_BUF_HOST_ACCESS_FINISH (+ some sanity checking).

I'll test this later when I'm at the office.

Regards,

Dominic

0 Dominic Rath over 1 year ago in reply to Dominic Rath

Mastermind 7560 points

Hello Pratheesh,

with the bsp_process_data_access_complete commented out in HW_EscWriteIsr I'm not getting any inputs back from the sub-device. I can see that the PDI is constantly writing to the buffer at 0x140e in a DC Sync setup, but apparently that buffer is never returned on the network.

I'm not sure if I misunderstood what you wanted me to change. It would be great if you could get back to me if that was the intended change.

Regards,

Dominic

0 PratheeshGangadhar over 1 year ago in reply to Dominic Rath

TI__Mastermind 49191 points

Hi Dominic

Okay. This might require FW change in this case. Will get back on this soon.

Regards

Pratheesh

0 Armin Mueller over 1 year ago in reply to PratheeshGangadhar

Prodigy 181 points

Hello Pratheesh,

any news or confirmations on that topic?

It looks to us as if the etherCAT freerun mode can't be used with the existing FW.

Is that correct?

Armin

0 Aaron Thomas over 1 year ago in reply to Armin Mueller

TI__Genius 10725 points

Hi Armin,

We are investigating an issue seen for the process data address during the startup of cyclic data in FreeRun mode (using TwinCAT). Will keep posted on the same.

Regards,
Aaron

0 Aaron Thomas over 1 year ago in reply to Aaron Thomas

TI__Genius 10725 points

Dominic, Armin,

Closing this thread based on the current status.

Regards,
Aaron

0 Armin Mueller over 1 year ago in reply to Aaron Thomas

Prodigy 181 points

Hello Aaron and Pratheesh,

thank you very much for the fixed etherCAT FW.

Freerun mode now seems to work perfectly.

Could you please add a note about the official availability of the fixed etherCAT FW.

Armin

0 Jonas Ens over 1 year ago in reply to Armin Mueller

Prodigy 140 points

Hello Aaron,

Jonas Ens said:
In most EtherCAT cycles the determined difference is 1, just as I would have expected. But in quite a few cycles the difference is bigger than 1. From this I conclude that the process data in the slave is not always up to date.

Regarding my test described in the post above: Is only Freerun mode affected by this issue or SM2 synchron mode as well?

Regards,
Jonas

+1 Aaron Thomas over 1 year ago in reply to Armin Mueller

TI__Genius 10725 points

To give an overview, the issue was due to mismatch in the swapping related code within the firmware. This will be fixed in the next Industrial Communications SDK release (v10.0).

Regards,
Aaron

0 Aaron Thomas over 1 year ago in reply to Jonas Ens

TI__Genius 10725 points

Hi Jonas,

Looks like the fix is applicable to the issue you are facing too. Recommend to create a new E2E thread for your query.

Regards,
Aaron

0 Jonas Ens over 1 year ago in reply to Aaron Thomas

Prodigy 140 points

Hi Aaron,

Aaron Thomas said:
Recommend to create a new E2E thread for your query.

I have opened a new thread:

IND-COMMS-SDK: EtherCAT cyclic process data buffer inconsistent - Processors forum - Processors - TI E2E support forums

Regards,
Jonas