RM46L852: ESM Group1 channel 31 and ESM Group2 Channel 4 after a power cycle

daniweb

Part Number: RM46L852
Other Parts Discussed in Thread: DP83640, UNIFLASH, HALCOGEN

Hi,

I have an issue with a specific HW generation and this is the sequence:
1) I load the SW with the debugger (IAR i-jet) and very fast the debug reaches this point
     SYSCALL_PUSH:
cpu_lock_r12_size:
         0x10: 0x6885         DC16      26757                   ; '…h'
and on the next step:
Abort_Handler:
FIQ_Handler:
IRQ_Handler:
Prefetch_Handler:
SWI_Handler... +2 symbols not displayed:
      0x1a22c: 0xeafffffe     B         Abort_Handler

2) As long as I reset via the debugger (Software and also HW reset over jtag) it is behaving at the same way-
3) Now I remove and give back the power and those errors are reported, ESM Group1 channel 31 and ESM Group2 Channel 4 after a power cycle
For this CPU means CCM-R4 -self-test failed, andFMC -uncorrectable address parity error on accesses to main flash.

4) Now I load a software that clears those errors and run it once
5) Now reload the one that does not reset them and I'm back to the point 1)

I have the same effect on all identical boards of generation2, but this does not occur on the first generation of our boards.
The difference from Generation 1 to Generation2 are:
A) different power supply
B) added JTAG chain with DP83640 PHY

Does anyone have an Idea of the cause?

over 6 years ago

0 Chuck Davenport over 6 years ago

TI__Guru 59540 points

Hello daniweb,

Have you validated that the JTAG chaining is programming the devices correctly? i.e., that there isn't some corrupted code in them? Also, I notice from the snippet of the error message that it seems to have some code at address 0x10. This is usually reserved for the VIM interrupt vector table (0x00-0x20) but this isn't the case for your application? Are you able to set a break point at _cinit or whatever the equivalent is for IAR and then step through to see if the CCMR4 test is actually getting executed? It seems really early in your code for this to be happening since usually there is, at least, the STC execution before CCM-R4 self test if I recall correctly.

0 daniweb over 6 years ago in reply to Chuck Davenport

Prodigy 70 points

Hi Chuck,

first at all thanks for taking time to help me.

I'm in debug mode and I selected verify when downloading, the verification does not complain therefore I assume that it is programmed correctly.
But the colleague has downloaded using uniflash with the XDS100v2 adapter and with this one when is reading back it does not have always the seam result, exactly at address 0x10!
We are using a third party OS and they are doing this:
VIM->FIRQPR0 = 0; /* all interrupts go to IRQ */
Therefore I think that this is wanted.

Sorry but how I can see if CCMR4 test is executed?

Thanks

0 Chuck Davenport over 6 years ago in reply to daniweb

TI__Guru 59540 points

Hello Daniweb,

Sorry for the delay in response. The issue might have to do with a synchronization step that usually needs to occur during startup. Please review the startup sequence application note (www.ti.com/.../spna106d.pdf) to see some of the critical things that need to happen at boot time including this synch step for the lockstep CPUs before CCMR4 is enabled.

Additionally, the use of IRQ or FIQ is irrelevant to the placement of the interrupt vector table. This ties also to the NMI and CPU exceptions and is part of the architecture used and programmers model. You could easily use the Halcogen tool to generate your boot code and see what is done before program execution (even with an RTOS).

0 daniweb over 6 years ago in reply to Chuck Davenport

Prodigy 70 points

Hi Chuck,

on the mean time we found out an issue on the JTAG connection, this solved the random read.
The other failue remained.
We exchanged on 3 board the CPU and in two of them now it is working.
As you suggested (I do not know why I did not think on myself) I generated with halcogen a project, and with this I end up on ESMSR3->ESF3 ==1 therefore Group3 channel 3 that represents RAM event Bank (BOTCM) - ECC uncoverable error.

Now my big question is, can I tell that it was simply a big problem on production (or CPU delivery) or it would to easy to tell this?

Thanks
Daniele

0 Chuck Davenport over 6 years ago in reply to daniweb

TI__Guru 59540 points

Hello Daniele,

You can check the fault status register in the CPU to determine the address at which the error occurred. Remembering that ECC is over a 64 bit word, you can then write/readback the same address location (after clearing the ESM GRP3 channel3 error condition). If the error occurs again, then it is, most likely a permanent failure. The cause of the failure is more difficult to assess, however. It usually means doing some sort of specialized tests to locate the physical fault by means of FA in our labs. Another possible thing to do is to run PBIST on SRAM to see if it passes or fails PBIST. Again, if it fails in addition to the ESM error previously noted, it could indicate a permanent/hard failure. It could also be related to an ECC memory issue as well so be sure to include the ECC bank when running any testing.

If you identify it as a hard failure, you could reach out to a local FAE, TSR or distributor from which you got the devices and arrange for a discussion with our quality department for discussion on how to handle the return/issue.

0 daniweb over 6 years ago in reply to Chuck Davenport

Prodigy 70 points

Hi Chuck,

I use the initialization code project generated by halcogen now with included the Enable CPU selftest.
It never exits this while:
/* Wait for CCM self-test to complete */
/*SAFETYMCUSW 28 D MR:NA <APPROVED> "Hardware status bit read check" */
while ((CCMSR & 0x100U) != 0x100U)
{
}/* Wait */

Thanks
Daniele

0 Chuck Davenport over 6 years ago in reply to daniweb

TI__Guru 59540 points

Hello Daniele,

This is an indication that self-test is ongoing or that the CCMR4 has stopped working. Note the following section from the TRM.

Based on the description of what has happened, it is difficult to tell if this is the cause of the issue or not. Certainly you could try to run in standalone mode and use a GPIO toggle as a marker to check if you get past this point outside of debug and, if so, it could be the root cause.

Arm-based microcontrollers

Arm-based microcontrollers forum

RM46L852: ESM Group1 channel 31 and ESM Group2 Channel 4 after a power cycle