This thread has been locked.
If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.
Hi,
During an endurance test, we encountered the following hardware exception on C674x. The address 0xc1ef22f8 was the very last instruction in an SPLOOP. The CGTool we used was 7.3.8. I couldn't reproduce the crash in our unit test environment.
Exception at 0xc1ef22f8
EFR=0x2 NRP=0xc1ef22f8
Internal exception: IERR=0x180
Loop buffer exception
Missed stall exception
ti.sysbios.family.c64p.Exception: line 248: E_exceptionMin: pc = 0xc1ef22f8, sp = 0xc39164a0.
And then I realized that in 7.3.9, there was a fix to a similar problem as shown below.
------------------------------------------------------------------------------
FIXED SDSCM00042974
------------------------------------------------------------------------------
Summary : Resource conflict between instruction in SPLOOP and
instruction in its epilog causing hardware exception
Fixed in : 7.3.9
Severity : S2 - Major
Affected Component : Code Generator
Description:
This bug only happens at the epilog area of a SPLOOP. It could cause
resource conflict which leads to a hardware exception. There is no
obvious link between the user's C source code to this bug so it is hard
to avoid this problem at the source code level.
The fix of this bug could cause performance degradation. Usually this
happens for an early exit SPLOOP where the loop's trip count is less
than the number of iterations running in parallel. The performance could be
severly worse if this SPLOOP is the inner loop and the outer loop
executes many times. If this happens, there is no work around to fix
this problem.
This fix could also cause code size increase. But it is minimal.
Usually it is one NOP per SPLOOP.
Could the problem SDSCM00042974 cause the missed stall exception? Is there a chance that 7.3.9 solves the problem we are seeing?
Best regards,
-YU
Read literally, SDSCM00042974 applies only to "resource conflict" exceptions. I'm sorry, I don't know enough about what "missed stall" means to determine whether SDSCM00042974 is applicable. If you're using 7.3.8, you should upgrade to 7.3.9; it contains only bug fixes.
The missed stall in question is discussed in the section titled Restrictions on Stall Detection Within SPLOOP Operation in the C674x CPU book. Based on that, I don't think the missed stall discussed here is the same as the resource conflict addressed in SDSCM00042974.
I suspect this is a new and different problem. With that in mind, we would appreciate a test case which allows us to generate the problem loop. Please see the last part of the forum guidelines for the details.
Thanks and regards,
-George
Thanks a lot for your responses.
I don't know the root cause of SDSCM00042974, but I guessed it might be coming from a bad scheduling when SPLOOP buffer is draining, and that the symptom can be sometimes the resource conflict and sometimes the missed stall. As the symptom cannot be reproduced in a simple unit test, I suspect some specific interrupt is needed.
As I wrote in the first post, I still can't reproduce this hardware exception reliably, so I wanted to hear from someone with a knowledge about SDSCM00042974. Of course, if I could find a test case that would reproduce this problem, I'd be ready to post it.
Best regards,
-Yuichi
I haven't been able to reproduce this problem, but I happened to realize that when I connect PRU_0 from CCS5.2, the crash (Loop Buffer Exception & Missed Stall Exception) was easily reproduced. Without connecting PRU_0, the unit just runs fine.
Is it possible that connecting PRU_0 causes this kind of crash?
Best regards,
-Yuichi
This sounds like a HW issue. We compiler folks can't help you with that. Please tell me exactly which device you are using. Based on that, I'll move this thread to another forum.
Thanks and regards,
-George
Thanks a lot for your help.
We are using OMAP-L138.
The DSP core crashes when PRU_0 is connected from CCS5.2 via Blackhawkv2-USB.
The place the unit crashes is random, but always "Loop buffer exception" and "Missed stall exception" are raised. I checked the contents of the program memory around the place where it crashed, but it didn't seemed to be altered. I didn't see the differences in L1P and L2 cache neither.
Thanks and regards,
-Yuichi
Some more question with regards to this issue.
When returning from an interrupt, is the code on SPLOOP buffer restored from the program memory or from stack? Can "Loop buffer exception" and "Missed stall exception" be raised by a corrupted stack?
Regards,
-Yuichi