Other Parts Discussed in Thread: SEGGER
Hi,
In order to ask correct question from Segger (or IAR) I would appreciate in case you could guess what kind of operation may put the CPU in such state that ECC is corrupted since I am pretty sure that this is caused somehow by our development environment. I am basically repeating the same "compile and download&debug pattern" and usually everything works but really rarely this prefetch issue raises its head even the flash has been fillen once.
Question 1: In case the debugger sometimes would perform whole chip erase for some reason, would the ECC be corrupted (does erase clear ECC area) - in case yes then this unexpected whole chip erase could be a potential reason for such behavior?
I am using IAR IDE and Segger's Jlink and have encounted the speculative fetching problem a couple of times (3 or 4 to be exact, so not much compared how many times I a have downloaded code) after I have filled the whole memory as instructed here (and my colleague has also encountered this once so this should not be user or individual device-chain issue).
https://e2e.ti.com/support/microcontrollers/hercules/f/312/t/588269
After 1 whole memory filling I have always restored the image size to be "normal size" so in normal debugging the whole memory is not programmed again and again. I also use Segger's compare as "use fastest method" so it should flash only required sections and typical programming takes few seconds. Since this has happened only a couple of times and I download code a lot the probability of this prefetch error is most likely less than per mille but when it hits it takes a while before you understand the root cause - that's why it is rather annoying and would like to solve it.
The FUNC_ERR_ADD has been every time same as in initial problem (0x135a98) where CPU memory has not been ever filled and our code is still only ~0x20000 long so normal programming should not touch to that problematic area and if understood correctly the ECC bits for those 2 quite separate memory locations are not adjacent so "minor mistake" in ECC writing should not either corrupt the value... I have filled the code with 0xFF when performed the full chip flashing in case that matters. And problem is immediately removed when you erase the chip and enable binary fill and use "IAR's flash loaders" to flash whole chip, after that you can return back to "optimized" downloading.
Related to this:
- Question 2: in production you need to program the whole flash (since it is first usage of CPU) so one needs to generate a filled image to be programmed or configure the programming tool so that it fills to the end of the flash?
- Question 3: Firmware update (haven't looked that side yet, I know that there is library available which handles the programming including ECC), do you need to manually (or does the library this) program 0xff (or something else data) to the end of last erased sector in case code does not fill it up? Other option would be to use filled binary also in firmware update...
- Question 4: In case there is still hidden ECC error somewhere in unused flash area, will it be always triggered "immediately" or could it take like week or months (or never) to pop out? Mine hasn't ever came in start up routines before main(), looks like that the error is always activated ~when OS is started and tasks started to run, which is still in good phase since you quite fast notice that nERROR is down and immediately know that there is problem.