Other Parts Discussed in Thread: TMS320F28335, CONTROLSUITE
I promise, this one is weird. Maybe some linker guru will understand what is happening.
It started to occur all of a sudden between two code versions with very minor differences on a project that has been going well for about 6 months now. Unfortunately, I don't have the "before" and "after" source codes although I have much older versions. We can't rely much on these to track the fatal difference...
Device: TMS320F28335
JTAG emulator: xds100v2
CCS v5.3 (v5.4 behaves the same)
Compiler v6.1.0 (v6.1.3 behaves the same)
I have what looks like a RAM corruption issue in the switch table.
Believe it or not, adding/removing NOP instructions somewhere near the top of main() "cures" the problem. When the problem strikes back later after writing or removing some more code elsewhere, I remove/add NOPs and "cure" the problem again.
Now, here is the debugging sequence that reveals the problem:
I click on "Reset CPU".
I check content of RAM at 0xc250. According to the map file, it should be a switch table. It definitely looks like that.
I put a watchpoint on a write at 0xc250 and a HW breakpoint at the top of main() routine the press "resume".
CPU stops at the breakpoint.
Memory Browser shows 0xC250 in red because it has been modified since previous halt. That address is the one an only that has been modified around. Everything else is black.
First observation: something between reset and main() destroyed the first pointer of the switch table
Second observation: the watchpoint did not work
Editing the value through Memory Browser to put back the right pointer allows the code to run properly afterward. No crash whatsoever and nothing writes to that address until a reset occurs.
That is no stack or buffer overflow. At least not in my own code...
So I turned toward the map file. Below are 3 snippets.
This one crashes (0 NOP)
0000c247 _atof
0000c24b _SetDBGIER
0000c24e _InitEQep
0000c24f _InitEQepGpio
0000c250 ___etext__
0000c250 _switch_runstart
0000c250 etext
0000d000 _scibDataReady
This crashes too (1 NOP)
0000c249 _atof
0000c24d _SetDBGIER
0000c250 _InitEQep
0000c251 _InitEQepGpio
0000c252 ___etext__
0000c252 _switch_runstart
0000c252 etext
This one works (2 NOP)
0000c246 _DSP28x_usDelay
0000c24a _atof
0000c24e _SetDBGIER
0000c251 _InitEQep
0000c252 _InitEQepGpio
0000c253 ___etext__
0000c253 etext
0000c254 _switch_runstart
0000d000 _scibDataReady
Since map files for both crashing binaries had etext located at the same address than _switch_runstart and the working binary had different addresses for each symbol, I though I had found the problem.
To prove it, I relocated the .switch section elsewhere far away (RAML6). For sure, there is no symbol allocated over the switch table. But the problem is the same!!! The
first pointer (case 0) is corrupted again!!! That time is contains a different value so that it does not jump into the stack anymore. It just jump somewhere else valid but
the context being totally wrong it crashes soon.
I tried adding some NOP again with the switch table located in L6 but it did not do any good.
I am getting out of breath now.
Does anybody have any idea of what is going on here???