MCU-PLUS-SDK-AM243X: possible packet-loss in IPC?

Felix Heil

Hello,

so I am currently investigating a problem we face with the IPC-implementation (RPMessage). We use two cores of the Mcu0-cluster and we set the Buffersize to 512 and 12 Buffers for the VRING. I cannot tell if it is really related to the TI-IPC-driver but I may need some help for better understanding. We use the SDK version 9. Also I implemented a fix supplied from here: e2e.ti.com/.../4938712

Edit: jump to the end to understand why we have 12 Buffers and how this seems to be the reason.

We did a performance-test for the IPC. The test sends packets between the cores which are stored on the other core each into a buffer with returning immediately. So we ensure there is no ipc-call inside an ipc-callback, which could potentially deadlock.

The cores have tasks with different prios (freeRTOS) which can send those packets. Those are like 5-6 tasks and they sleep in a range of 1-3 ms and then send a packet. for the problematic situation in question I added a counter on each side. So one core sends a counter-value which increases with each call and the other side also has a counter which increases with each call. both counters must match.

But instead like after 2:40 to 3:10 min it always throws an assert because the counter that is received is always (!) exactly 4 higher than the one on the receiving core. But it's not like a reproducable counter value. It ranges from 60000 to 80000 and is always somewhere in this range. This makes the whole thing a bit more weird.
The times reduce when I reduce the sleep-time of the tasks to a minimum of 1 ms. Then it occurs at like 1:10 min. but the counter-range is still the same.

So to me it seems there are 4 packets lost which are not handled by the receiving core. I thought it may be related to locked interrupts, but we eliminated all the sources for locking interrupts and thus the cores should not block anything. Anyway if this would be a problem it should've happened earlier.

Is there a recommended way to debug such a situation? because when I receive the value it's already to late for a breakpoint. And the ipc-send always returns fine without any issue. Logs are activated as the asserts also are. No warning by the IPC-driver so far.

Edit:

So we implemented the IPC-algorithm provided by SysCfg and its scripts which generate the c-files manually. Thus it was possible to also select 12 buffers instead of 2^n. I changed it back to 4 buffers and the problem does not occur anymore. At least currently, I am running a long-time test now. I guess the IPC-implementation can only handle 2^n buffers?

Best regards

Felix

over 1 year ago

0 Ashwin Raj over 1 year ago

TI__Intellectual 2425 points

Hi Felix,

IPC doesn't have any such requirement that buffer count should be 2^n. Since shared memory is non-cacheable, we can eliminate the possibility of Cache alignment related issues also.

Do other buffer counts such as 8 and 16 work fine?

Does the above issue occur when there is a delay between each message being sent? Just checking if this is some timing issue or not.

Regards,

Ashwin

+1 Felix Heil over 1 year ago in reply to Ashwin Raj

Expert 1195 points

Hey ashwin,

yes 8 and 16 also work fine, it somehow only seems to be an issue with the 12 buffers. At least we can work around this issue here. As addition: we also checked that the ipc is not cached via the MPU and so on.

I also just now filed another topic, which is a more important issue: https://e2e.ti.com/support/microcontrollers/arm-based-microcontrollers-group/arm-based-microcontrollers/f/arm-based-microcontrollers-forum/1320529/mcu-plus-sdk-am243x-ipc-rpmessage-gets-stuck-when-on-high-load

Anyway, since this issue is happening in an already complex project of us, I will now start setting up a test-environment which only utilizes the IPC and create a high load on it and will try to reproduce all the problems here to have a more clear environment for identifying the issues we have. then I can answer your second question way better here :)

Thanks for the reply,

regards

Felix

0 Ashwin Raj over 1 year ago in reply to Felix Heil

TI__Intellectual 2425 points

Hi Felix,

Is this issue resolved?

Regards,

Ashwin

+1 Felix Heil over 1 year ago in reply to Ashwin Raj

Expert 1195 points

hey Ashwin,

I just tested again with 12 buffers and it seems to be working now. Seems it really was an issue related to calling a scheduler-suspend out of an interrupt!

Thanks for the support. With all the other hints like a sender-task and so on we now got an stable working IPC which only comes to its limit when out own buffers are full, because the task didn't have time to handle it, so that's on our side ;)

Best regards

Felix

Arm-based microcontrollers

Arm-based microcontrollers forum

MCU-PLUS-SDK-AM243X: possible packet-loss in IPC?