Semaphore overflow on NDK with SYS/Bios 6.34.2.18(and higher)

Hi,

We have developed an NDK driver for AM335x over ICSS and CPSW. Recently during our stress test(with IXIA), we found a semaphore count overflow issue similar to SDOCM00099292. This issue is NOT observed using Fping(upto 3 instances simultaneously), it comes up only on IXIA testing.

This is the error message

“ti.sysbios.knl.Semaphore: line 314: assertion failure: A_overflow: Count has exceeded 65535 and rolled over. xdc.runtime.Error.raise: terminating execution”

Here are the details of our setup

  • CCS Version   - 5.4, 5.3 and 5.2(issue comes up only on 5.4/5.3 and depends on SYS/Bios version)
  • NDK Version   - Tested on 2.22.3.20 and 2.22.2.16 (issue persistent on both) 
  • SYS/Bios          - 6.33.4.39, 6.34.2.18, 6.35.1.29 (issue does NOT come on 6.33.4.39)

To reproduce the issue using IXIA, send 1 million broadcast packets to the DUT at line rate.

Thanks,
Vinesh 

  • Hi Vinesh,

    There are some other Semaphores within the NDK that are counting.  When fixing SDOCM00099292, I changed the Semaphore that was causing the issue from counting to binary.  The other Semaphores were reviewed and believed to be correct in needing to be counting.  But, perhaps it's one of those ones that is causing the issue.

    Can you see which one is causing the issue in ROV?

    Vinesh Balan
    SYS/Bios          - 6.33.4.39, 6.34.2.18, 6.35.1.29 (issue does NOT come on 6.33.4.39)

    Interesting that you don't see this problem on BIOS 6.33.4.39 ...?

    Steve

  • In reply to Steven Connell:

    Hi Steve,

    Yes it is interesting. We did all our development over 6.33.4.39 and it never came up. This issue popped up when  our team started updating to CCSv5.4.

    In ROV, how do you identify the semaphore that causes the issue? Does these semaphores have specific ID? And I believe the last time we checked, we could not find any Semaphore count getting upto 65k in ROV after the Assert happened. 

    Thanks,
    Vinesh 

  • In reply to Vinesh Balan:

    Unfortunately matching the Semaphores from ROV to those in code requires a little bit of work.  Here's a couple ways I can think of to do this ...

    1. Put print statements into the NDK at each of the Semaphore creates to print out the handles.  Then you can see if the handle of the overflowing Semaphore (from the error message) matches one of the handles from the prints.  This would require you to rebuild the NDK.

    2. Put a break point at SemCreate and SemCreateBinary in the app.  When the break point hits, step thru the code to get the handle value that's returned from the call to Semaphore_create().  You can then continue stepping until you return to the place where the semaphore was created, in order to see which semaphore that corresponds to.

    I think this is important to do b/c at this point we are assuming that the Semaphore that's overflowing is one of the NDK ones.  But we should confirm that.

    Steve

  • In reply to Steven Connell:

    Steve,

    Either way NDK needs to be rebuild(to enable Debug mode in the second case). I can give you more information sometime next week.

    Steven Connell
    Then you can see if the handle of the overflowing Semaphore (from the error message) matches one of the handles from the prints.

    The error message doesn't give any handle information. Probably we'll have to put a breakpoint in SYS/Bios where the Assert is triggered and crawl back.

    Initially we thought the issue was caused by the driver, but then it showed up in both the versions(ICSS and CPSW). Although the drivers has structural similarity, the implementation is different. That's when I came across SDOCM00099292 and the issue seemed to be very similar. I will let you know the results once I have it.

    Vinesh

  • In reply to Vinesh Balan:

    Ok, sounds good Vinesh.

    Also, for option 2, you can do it without rebuilding by stepping thru the assembly code.

    Steve

  • In reply to Steven Connell:

    Steve,

    We debugged the issue today, and concluded that the issue is NOT caused by an NDK semaphore. The issue was triggered by two different semaphores in the driver/application.

    Thanks for the support Steve.

    Vinesh