This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

MCU-PLUS-SDK-AM243X: possible packet-loss in IPC?

Part Number: MCU-PLUS-SDK-AM243X

Hello,

so I am currently investigating a problem we face with the IPC-implementation (RPMessage). We use two cores of the Mcu0-cluster and we set the Buffersize to 512 and 12 Buffers for the VRING. I cannot tell if it is really related to the TI-IPC-driver but I may need some help for better understanding. We use the SDK version 9. Also I implemented a fix supplied from here: e2e.ti.com/.../4938712

Edit: jump to the end to understand why we have 12 Buffers and how this seems to be the reason.

We did a performance-test for the IPC. The test sends packets between the cores which are stored on the other core each into a buffer with returning immediately. So we ensure there is no ipc-call inside an ipc-callback, which could potentially deadlock.

The cores have tasks with different prios (freeRTOS) which can send those packets. Those are like 5-6 tasks and they sleep in a range of 1-3 ms and then send a packet. for the problematic situation in question I added a counter on each side. So one core sends a counter-value which increases with each call and the other side also has a counter which increases with each call. both counters must match.

But instead like after 2:40 to 3:10 min it always throws an assert because the counter that is received is always (!) exactly 4 higher than the one on the receiving core. But it's not like a reproducable counter value. It ranges from 60000 to 80000 and is always somewhere in this range. This makes the whole thing a bit more weird.
The times reduce when I reduce the sleep-time of the tasks to a minimum of 1 ms. Then it occurs at like 1:10 min. but the counter-range is still the same.

So to me it seems there are 4 packets lost which are not handled by the receiving core. I thought it may be related to locked interrupts, but we eliminated all the sources for locking interrupts and thus the cores should not block anything. Anyway if this would be a problem it should've happened earlier.

Is there a recommended way to debug such a situation? because when I receive the value it's already to late for a breakpoint. And the ipc-send always returns fine without any issue. Logs are activated as the asserts also are. No warning by the IPC-driver so far.

Edit:

So we implemented the IPC-algorithm provided by SysCfg and its scripts which generate the c-files manually. Thus it was possible to also select 12 buffers instead of 2^n. I changed it back to 4 buffers and the problem does not occur anymore. At least currently, I am running a long-time test now. I guess the IPC-implementation can only handle 2^n buffers?

Best regards

Felix