This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Loop buffer exception and Missed stall exception on C647x

Other Parts Discussed in Thread: SYSBIOS, OMAP-L138

Hi,

During an endurance test, we encountered the following hardware exception on C674x. The address 0xc1ef22f8 was the very last instruction in an SPLOOP. The CGTool we used was 7.3.8. I couldn't reproduce the crash in our unit test environment. 

Exception at 0xc1ef22f8

EFR=0x2 NRP=0xc1ef22f8

Internal exception: IERR=0x180

Loop buffer exception

Missed stall exception

ti.sysbios.family.c64p.Exception: line 248: E_exceptionMin: pc = 0xc1ef22f8, sp = 0xc39164a0.


And then I realized that in 7.3.9, there was a fix to a similar problem as shown below.

------------------------------------------------------------------------------

FIXED  SDSCM00042974

------------------------------------------------------------------------------

 

Summary            : Resource conflict between instruction in SPLOOP and

                    instruction in its epilog causing hardware exception

 

Fixed in           : 7.3.9

Severity           : S2 - Major

Affected Component : Code Generator

 

Description:

This bug only happens at the epilog area of a SPLOOP. It could cause

resource conflict which leads to a hardware exception. There is no

obvious link between the user's C source code to this bug so it is hard

to avoid this problem at the source code level.

 

The fix of this bug could cause performance degradation. Usually this

happens for an early exit SPLOOP where the loop's trip count is less

than the number of iterations running in parallel. The performance could be

severly worse if this SPLOOP is the inner loop and the outer loop

executes many times. If this happens, there is no work around to fix

this problem.

 

This fix could also cause code size increase. But it is minimal.

Usually it is one NOP per SPLOOP.

Could the problem SDSCM00042974 cause the missed stall exception? Is there a chance that 7.3.9 solves the problem we are seeing?

Best regards,

-YU

  • Read literally, SDSCM00042974 applies only to "resource conflict" exceptions.  I'm sorry, I don't know enough about what "missed stall" means to determine whether SDSCM00042974 is applicable.  If you're using 7.3.8, you should upgrade to 7.3.9; it contains only bug fixes.

  • The missed stall in question is discussed in the section titled Restrictions on Stall Detection Within SPLOOP Operation in the C674x CPU book.  Based on that, I don't think the missed stall discussed here is the same as the resource conflict addressed in SDSCM00042974.  

    I suspect this is a new and different problem.  With that in mind, we would appreciate a test case which allows us to generate the problem loop.  Please see the last part of the forum guidelines for the details.

    Thanks and regards,

    -George

  • Thanks a lot for your responses.

    I don't know the root cause of SDSCM00042974, but I guessed it might be coming from a bad scheduling when SPLOOP buffer is draining, and that the symptom can be sometimes the resource conflict and sometimes the missed stall. As the symptom cannot be reproduced in a simple unit test, I suspect some specific interrupt is needed.

    As I wrote in the first post, I still can't reproduce this hardware exception reliably, so I wanted to hear from someone with a knowledge about SDSCM00042974. Of course, if I could find a test case that would reproduce this problem, I'd be ready to post it.

    Best regards,

    -Yuichi

  • I haven't been able to reproduce this problem, but I happened to realize that when I connect PRU_0 from CCS5.2, the crash (Loop Buffer Exception & Missed Stall Exception) was easily reproduced. Without connecting PRU_0, the unit just runs fine.

    Is it possible that connecting PRU_0 causes this kind of crash?

    Best regards,

    -Yuichi

  • This sounds like a HW issue.  We compiler folks can't help you with that.  Please tell me exactly which device you are using.  Based on that, I'll move this thread to another forum.

    Thanks and regards,

    -George

  • Thanks a lot for your help.

    We are using OMAP-L138.

    The DSP core crashes when PRU_0 is connected from CCS5.2 via Blackhawkv2-USB.

    The place the unit crashes is random, but always "Loop buffer exception" and "Missed stall exception" are raised. I checked the contents of the program memory around the place where it crashed, but it didn't seemed to be altered. I didn't see the differences in L1P and L2 cache neither.

    Thanks and regards,

    -Yuichi

  • Some more question with regards to this issue.

    When returning from an interrupt, is the code on SPLOOP buffer restored from the program memory or from stack? Can "Loop buffer exception" and "Missed stall exception" be raised by a corrupted stack?

    Regards,

    -Yuichi