This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

RM48L952: Migration from RM44 causes occasional ESM group 3 channel 7 errors - speculative fetch???

Part Number: RM48L952
Other Parts Discussed in Thread: HALCOGEN

Hello,

I have RM44 code running "perfectly" (OS, DMA, some peripheral) on RM44 CPU, after some migration work the code run on RM48 but when code is stopped via debugger ESM Group 3 channel 7 errors are active.

Symptoms are weird since if you step code the issue may not appear, also if I for example disable 1 task (suspend it immediately) issue does not apper but if put that same task to delay-loop (only OS delay) the issue again raises itself. Every time I managed to do whole SafeTI based CPU init and start the OS until this error appeared.

Also the FUNC_ERR_ADD register shows with same SW always same value, but if changing code a bit the address may change changes slightly (a couple of bytes). FUNC_ERR_ADD == 0x135a98. My code is only 0x20000 long and as "same code" (not anymore same since needs some HalCoGen changes) works with RM44 I strongly doubt that there isn't any errors in application, also all HalCoGen files are replaced by RM48 variant (this also crazy work to change CPU singe HalCoGen doesn't offer migration option, first put HCLK same and then use .dil files diff & manually move stuff fro mfile to other and hope the best :)).

I also enable MPU from 0x130000 to 0x1400000 with "no code exec, no priviledge access, no user access" and it does not trigger. Tested MPU by setting it to my application area from 0x10000 to 0x20000 and prefetch abort is generated.

After this started to wonder if DMA still somehow would go crazy and do something funky, but understood that it operates non-priviledge so assumed that MPU would catch it, will it actually catch it? And why MPU could find that error reason because FUNC_ERR_ADD clearly shows that the problem is in MPU monitored area...


After days of frustation and reading e2e & remembering old posts read months ago there is that "speculative fetch", could it cause such behavior since once I fully programmed whole ROM the error looks to gone for ever (with IAR IDE I used "linker->checksum->fill from 0 to 0x2FFFFF" option and flashing took literally ages)?

Questions:

- Why my RM44 didn't suffer this speculative fetching (never programmed whole flash)
- Why stepping the code looked to have great impact did the error appear or not
- Do you really need to manually program the whole flash ONCE (is once enough? takes too long to flash it always while debugging)

- Why MPU didn't catch it
- In the production: you need to provide a binary with a whole flash filled? (And basically field updates could be done with normal binary (in case programming time matters)...)

And finally
- based on the symptom was the root cause speculative fetching and is there any means to check that from somewhere since I do not like issues which "macigally disappers", typically these tends to come back sooner or later unless you really know why it disappeared

  • Hi Jarkko,

    The recommendation to fill the holes with their respective ECC values is to avoid un-correctable ECC error due to speculative fetch. You can use fill command in the project cmd file to fill all the unused flash with 0xFFFFFFFF. Having those 0xFFFFFFFF in the .out file will cause the ECC to be generated by the CCS.

    The FUNC_ERR_ADD has 2 portions: [2:0] is B_OFF, and [31:3] is UNC_ERR_ADD. If the register value is 0x135a98, the UNC_ERR_ADD should be 0x25B53 (0x135a98 >> 3). This address is not within your MPU range (0x130000~0x140000).

    Regards,
    QJ
  • UNC_ERR_ADD description in TRM:
    "This register captures the full 32 bit incoming address when there is a bus parity error. It only captures address of 22:3 for multiple bit ECC errors."
    - So UNC_ERR_ADD do NOT capture the lowest 3 bits it captures bit 22-3 from the address, cannot shift that value by 3 as you did.

    B_OFF description in TRM:
    "the address captured is aligned to a 64-bit boundary with address bits[2:0] equal to 0"

    Since I had value 0x135a98 in whole register I cannot interpret TRM text in any other way that the error was in memory range 0x135a98 - 0x135a9f ...9f lowest three bits put to 0 is ....98


    There was also plenty of other questions which I asked, I want/need answers to those. Also why in RM44 we never had this kind of issue and I also run RM48 OS demo SW on RM48 without problems(and it didn't flash whole memory either).
  • This can be closed, I'll make new thread relating to this since I have 1 specific request.

    Reason why MPU didn't react was that this address was not actually used (because it is speculative fetch not actual use of that address). And based on same source value of UNC_ERR_ADD shall not be shifted, in case of error it just mask out 3 lowest bit from errornous address when putting the value to register.