ESM Group 3 error

Alan Thieman

Other Parts Discussed in Thread: TMS570LS3137, HALCOGEN

I have an application that uses the TMS570LS3137 processor. The HET 1 module is used to produce several PWMs The HET2 module also produces a PWM at a different frequency. The ADCs are triggered by RTI 0 interrupt. I recently changed the main timing loop interrupt. It used to be triggered by RTI 1. Now I am having the HET module trigger the main CPU timing loop by having the HET "CNT" instruction trigger the interrupt; I am no longer using (RTI 1).

This all seems to work, but now I am getting ESM group3 errors during power-up.

Can you give a suggestion what I should look for to fix this error?

over 12 years ago

0 KGreb over 12 years ago

TI__Mastermind 23000 points

Hello Alan,

Can you please tell us which ESM channel in group 3 is triggering? There are only four errors mapped to group 3 on the TMS570LS3137 so there are not many possible causes.

Thanks and Regards,

Karl

0 Alan Thieman over 12 years ago in reply to KGreb

Prodigy 175 points

Karl, it looks like Channel 7 (esmREG Stat3 is 0x00000080)

0 KGreb over 12 years ago in reply to Alan Thieman

TI__Mastermind 23000 points

Hi Alan,

Channel 7 of group 3 indicates an error detected by the internal diagnostics in the flash wrapper (FMC). This error signal is the logical "OR" of several error indications inside the FMC. We need to peel the onion one layer deeper and check the FMC to understand which fault was detected.

In SPNU499A (TRM for LS31x), Table 5-7 notes the possible causes of an ESM group 3/channel 7 error. If the error is repeatable, can you please check the FMC FEDACSTATUS register (0xFFF8 701C) to see what error status bits are set?

Thanks and Regards,
Karl

0 Alan Thieman over 12 years ago in reply to KGreb

Prodigy 175 points

Karl, the value of the FMCFEDACSTATUS (0XFFF8701C) is 0x00000102. The FUNC_ERR_ADD register (0xfff87020) contains 0x000D0888.

0 KGreb over 12 years ago in reply to Alan Thieman

TI__Mastermind 23000 points

Hi Allan,

Sounds like there was an uncorrectable error, either double bit ECC fault or address fault, at address 0xD0888. For next step I would recommend to check your software to determine if the changes made were linked into this region. It is possible that there was a fault in updating the code which resulted in this error or it could be a hardware fault.

If you erase/reprogram the flash, do you still see the error?

If you load other code over the same memory region, do you see the error?

If you have another unit, it would be interesting to see if the second unit fails with the same program image.

Regards,

Karl

0 Charles Johnston over 12 years ago in reply to KGreb

Expert 2690 points

Hi Karl,

I'm helping Alan investigate this problem.

On my setup (HDK Rev E, silicon Rev B), I can consistently repeat the error with the FUNC_ERR_ADD = 0xD18A0. This is close to the address Alan got (HDK Rev D, silicon Rev A), but both addresses are outside the range of our programmed code. (0x20 - 0x1675C).

It persists even if all flash is erased.

If I load a different application on the HDK, the problem goes away. If I then debug with the offending code, the problem does not appear. It only shows up when the HDK has that code loaded and power is applied. If the offending code is loaded via the debugger and the errors are manually cleared, it can be stopped and loaded several times with no errors. When power is removed and reapplied, the error reappears.

It seems like the processor running by itself at full speed will trigger the error.

I'll keep investigating on this side.

Thanks, Charlie Johnston

0 Charles Johnston over 12 years ago in reply to Charles Johnston

Expert 2690 points

Hi Karl,

Here's another clue.

We had Flash ECC Check and ESRAM ECC Check both disabled. When I enabled both, the uncorrectable error went away, but it is replaced by a correctable error. The address is 0xD1968 which is near the uncorrectable error we had before, but also outside the used flash area.

Charlie Johnston

0 Charles Johnston over 12 years ago in reply to Charles Johnston

Expert 2690 points

Hi Karl,

Another clue.

I enabled Memory Parity Self Check Enable for CAN1, 2, and 3. The uncorrectable error went away. Could this be a power up timing issue?

Thanks, Charlie Johnston

0 Sunil Oak over 12 years ago in reply to Charles Johnston

TI__Mastermind 49120 points

Hello,

Do you program code over flash that has been erased fully, or do you only erase flash sectors that are used by your code? One more thing you can try is to fill unused flash memory with a dummy value that will generate an "undefined instruction" exception. This way you can also trap the CPU if it executes code from a location it is not supposed to.

For example, the opcode 0xEEFF3300 will evaluate to a coprocessor # 3 (not implemented on Hercules MCUs) instruction and will result in an undefined instruction exception for both ARM and Thumb instruction sets.

Regards,

Sunil

0 KGreb over 12 years ago in reply to Charles Johnston

TI__Mastermind 23000 points

Hi Charlie,

At this point, it sounds most likely to me that there is an issue with the software rather than the hardware. I can think of two things which could be interesting to dig further:

1. In the linker's map file (assuming you are using TI CGT), are there any symbols or code sections which are linked to the addresses which are causing the faults?

2. Is it possible to set a hardware breakpoint/watchpoint on the offending address? Taking a look at the link register and last stack entries upon the bad code access might give us a few more clues.

Regards,
Karl

0 Roger Périat over 12 years ago in reply to KGreb

Expert 2040 points

I don't think that you have the problem I had, but just in case...

Did you modify something with the CPU modes when doing your modifications? Because I did once switch my software to use the "system mode" (before this I did use only the supervisor mode, FIQ and IRQ mode).

Then, after power up I always had a ESM error (not the same as you). Reason was a uninitialized LR of system mode. Read here: http://e2e.ti.com/support/microcontrollers/hercules/f/312/t/251942.aspx

Roger

0 Charles Johnston over 12 years ago in reply to Roger Périat

Expert 2690 points

Hello all,

Thanks for the quick replies.

Sunil - All of flash is erased before loading with no effect. I'll try your suggestion to trap the error.

Karl - There are no symbols or code near the indicated addresses. I haven't had any luck yet trying to set a breakpoint/watchpoint with CCSv5.2.

Roger - Good idea, but no effect. It looks like the fix for your problem was one of the last changes in HALCoGen 3.05 which was released on 3/15.

To be continued.

Thanks, Charlie

0 Charles Johnston over 12 years ago in reply to Charles Johnston

Expert 2690 points

Hi Karl,

Using the HDK and CCSv5.2, I was not able to get any of the following breakpoints to work:

- The reported address 0xD1BA8 with a range mask of 0x7 (If I read the TRM correctly, only bits 22:3 are captured in the error address register)

- Any access to 0xfff87020 which is where the FlashWrapper_FUncErrAddr is written

- Execution of custom_dabort in sys_selftest .c which should be called when an uncorrectable ECC error occurs on ATCM.

It seems like the error is bypassing the normal traps, or the XDS100v2 is not a powerful enough debugger for what I'm trying to catch.

Thanks, Charlie

0 Sunil Oak over 12 years ago in reply to Charles Johnston

TI__Mastermind 49120 points

Hi Charlie,

Please refer to this earlier post on this subject: http://e2e.ti.com/support/microcontrollers/hercules/f/312/p/228184/807467.aspx

Essentially, the CPU tries to make speculative fetches from the available flash and RAM regions in order to minimize the impact of higher-latency memory accesses. In your case, if this access happens to fetch from a location that does not have the correct corresponding ECC programmed, this would generate an ECC error. There would be no abort generated if the CPU does not actually execute this instruction that was fetched speculatively (matches your observation).

To summarize, you do need to fill any unused flash memory and then program the corresponding ECC codes as well.

Please let us know if you have any issues even after filling unused flash with an undefined instruction trap.

Regards, Sunil

0 Charles Johnston over 12 years ago in reply to Sunil Oak

Expert 2690 points

Hi Sunil,

That was it. The SW change we made just happened to cause a prefetch from an unused and uninitialized flash location. I filled flash by editing sys_link.cmd:

FLASH0 (RX) : origin=0x00000020 length=0x0017FFE0 fill=0xEEFF3300

FLASH1 (RX) : origin=0x00180000 length=0x00180000 fill=0xEEFF3300

The problem now is that a debug load takes 10 minutes instead of 15 seconds. We will be disabling ECC for SW development unless you have a better solution.

Thanks for your time.

Charlie Johnston

0 Charles Johnston over 11 years ago in reply to Charles Johnston

Expert 2690 points

Hi Sunil,

We now have ARM Compiler 5.1.2 which allows us to fill the ECC area without explicitly filling the unused flash. This will require us to use a fill of 0xFFFFFFFF (erased) rather than the preferred 0xEEFF3300 (undefined instruction).

Do you know where I can find the instruction that has the encoding 0xFFFFFFFF to see what kind of abort this would cause if executed?

Thanks, Charlie Johnston

0 Sunil Oak over 11 years ago in reply to Charles Johnston

TI__Mastermind 49120 points

Hi Charlie,

0xFFFFFFFF is an undefined instruction in both the ARM as well as Thumb2 instruction sets for the Cortex R4F processor. This was not the case for the ARM7TDMI processor that we have on our older generation of MCUs. That was the source of the recommended fill value.

Regards, Sunil

0 Charles Johnston over 11 years ago in reply to Sunil Oak

Expert 2690 points

Perfect!

Thanks, Sunil.

By the way, I don't have a green button I can select for the correct answer. Is that a function of my browser or is it a feature you have enable?

Thanks, Charlie

0 Sunil Oak over 11 years ago in reply to Charles Johnston

TI__Mastermind 49120 points

Don't worry about it. I think that is because you are not the initiator of this thread.

I will close this thread and mark my post as a verified answer.

Regards, Sunil

Arm-based microcontrollers

Arm-based microcontrollers forum

ESM Group 3 error