PRU-ICSS-INDUSTRIAL-SW: AM4378 - PRP Red Rx issue

Marko Kastelic

Dear support team,

We use on our custom board PRU-ICSS-HSR-PRP-DAN_01.00.02.00 packet software. We found out that when we use PRP network setup, we get a lot of CPU load from time to time. This issue occurs more likely when both ports are used. We pay some attention to debugging whit the Hardware Trace Analyzer (PC Trace) and found out that the program mostly loops in ICSS_EMacOsRxTaskFnc ( ICSS_EmacPollPkt, ICSS_EmacRxPktInfo2, RedRxPktGet...). We become suspicious because network traffic was small, nothing special was sent on the device, and compared to the previous Rx action, we did not make any flood of packets (traced whit Wireshark), but the CPU get occupied whit Rx.
We investigated further and found out that in RedRxPktGet, the program sometimes reads packets which has a length of 2B, and qDesc->wr_ptr has sometimes a value out of queue region (qDesc->wr_ptr == 4). We believe that this causes continuous reading, queue_rd_ptr tries to catch queue_wr_ptr, which is out of the queue buffer region. After some time queue_wr_ptr is corrected and gets back to the queue pointer region and queue_rd_ptr is equalized whit queue_wr_ptr. This situation makes a lot of wrong and duplicated packets, packet drop, retransmission, and most annoying, CPU 100% load which affects our time-critical application.

Another thing that we found is, that on the maximum MTU of the packet (1514B) on PRP when tails are added to the packet, the packet has a size of 1520, which is dropped by NIMUReceivePacket.

We know that 1.00.02 version is not supported anymore, so we start testing PRU-ICSS-HSR-PRP-DAN_01.00.05.01 on IDKam437x development kit, the original packet without our application code, to investigate if the same problem is in the newer version (to check if porting to the new version will make any progress). For debugging purposes, we add on the NDK echo server. Version 1.0.5.1 is somehow more stable but we observe the same problem with wrong pointers and packet length.
For testing we put these lines in code:

    rd_packet_length = (0x1ffc0000 & rd_buf_desc) >> 18;

    size = (rd_packet_length >> 2);

    if((rd_packet_length & 0x00000003) != 0)
    {
        size = size + 1;
    }
    /** our traceing structure*/
    uint32_t structIndex = 0;
    structIndex = red_debug_ind_incr();
    red_debug[structIndex].queueNumber = queueNumber;
    red_debug[structIndex].queue_rd_ptr = queue_rd_ptr;
    red_debug[structIndex].queue_wr_ptr = queue_wr_ptr;
    red_debug[structIndex].rd_packet_length = rd_packet_length;


    /*Compute number of buffer desc required & update rd_ptr in queue */
    update_rd_ptr = ((rd_packet_length >> 5) * 4) + queue_rd_ptr;

    if((rd_packet_length & 0x0000001f) !=
            0)  /* checks multiple of 32 else need to increment by 4 */
    {
        update_rd_ptr += 4;
    }

    /*Check for wrap around */
    if(update_rd_ptr >= rxQueue->queue_size)
    {
        update_rd_ptr = update_rd_ptr - (rxQueue->queue_size -
                                         rxQueue->buffer_desc_offset);
    }
    red_debug[structIndex].queue_update_rd_ptr = update_rd_ptr;
    if (rd_packet_length < 60) {
        __asm__("BKPT");
    }
    if(queue_wr_ptr < rxQueue->buffer_desc_offset){
        __asm__("BKPT");
    }
    if(queue_rd_ptr < rxQueue->buffer_desc_offset){
        __asm__("BKPT");
    }

After some time asm breakpoint is hit and here are screenshots that explain what happen whit wr_ptr and len:

This is the flow of pointers in current and past RedRxPktGet calls (note that in 288 queue_exit_wp_ptr is not updated jet and is an invalid number - this value is stored when the function check if is needed to set rxArg->more = 1;):

After repeating this test, and checking recorded pointers, we found out that the problem occurs more likely when "wr" and "rd" pointers are equal. We also observed that when wr and rd pointers are equal, it sometimes happens that both pointers are adjusted lower in queue position, by PRU, like it trying to prevent buffer wrap. Is that true?

The issue is more likely observed in heavier traffic.
Did you observe this kind of behavior?

We are also suspicious that echo packets (TX) have broken tails after RX issue (we are not 100% sure, but think that we saw once that some "echo" payload was placed on the tail of PRP packet).

We also found this thread: https://e2e.ti.com/support/processors-group/processors/f/processors-forum/1017202/amic110-ethernet-ip-trouble/3765093?tisearch=e2e-sitesearch&keymatch=wr_ptr#3765093, which could potentially be something similar to what we observe. Can you please explain what was the conclusion of this thread?

Do you have any idea? Can you please answer and comment on our findings?

Best regards, Mare

over 1 year ago

0 Marko Mihelin over 1 year ago

Prodigy 220 points

Would it be possible to get the source code for the ICSS-PRU for the PRP-DAN configuration, so we can try and debug it ourselves?

0 Marko Kastelic over 1 year ago

Expert 1770 points

Dear support,

can you give us any update or replay?
Any reply would be greatly appreciated.

Regards, Mare

Processors

Processors forum

PRU-ICSS-INDUSTRIAL-SW: AM4378 - PRP Red Rx issue