RM57 CCM Question

Dmitri Zakharevski59

In Lockstep mode the outputs of both ARM cores are compared. My question if CPU1 issues an erroneous command with a compare error that CPU2 does not agree with to a peripheral such as GIO, will the command be issued to the peripheral and then 2 cycles later the ESM error will occur or will the ESM error happen before the output is seen?

Thanks!

over 9 years ago

0 Charles Tsai over 9 years ago

TI__Guru**** 184346 points

Hello Dmitri,
The CPU1 is the master CPU while the CPU2 is merely a checker CPU. If CPU1 issues a erroneous command, i.e. to the GIO then this command will eventually reach GIO. However, the error will be detected by the CCM when it compares the outputs between the two CPUs. As the result of the miscompare an error signal is raised to the ESM which will leads to a NMI interrupt. So both actions will happen. One the CCM detects the error and two the erroneous command will reach the GIO.

0 Dmitri Zakharevski59 over 9 years ago in reply to Charles Tsai

Intellectual 905 points

Thanks for the reply Charles, but could you clarify please?

Is the ESM error raised first so that the system will have a chance to stop the erroneous output or 2nd?

In the technical reference it states "While in lockstep mode, the checker CPU's output signals to the system are clamped to inactive safe values."

If the outputs reach the peripheral before the system has a chance to react / prevent it, it has huge implications for safety critical applications.

0 Charles Tsai over 9 years ago in reply to Dmitri Zakharevski59

TI__Guru**** 184346 points

Hi Dmitri,

Let's suppose you want to read from location at offset 0x0 in the GIO module but the CPU1 puts out an offset address of 0x4 instead. In 2 CPU cycles the CCM will detect the error after it compares the address output between CPU1 and CPU2. However, the transaction has already started meaning the address (i.e. 0x0) and all the corresponding control signals have been driven out to the peripheral. It takes about 20 CPU clocks to read the address and in this a wrong address. Even though the ESM should have already received the error and causes a NMI interrupt to the CPU but the protocol requires that the data transfer is completed. The NMI interrupt can not kill an on-going transaction in the middle. A CCM error is a ESM group2 error which will also signal to the outside world via the nERROR pin. Depending on your application safety case, the external system can choose to put the MCU in safe state (i.e. reset the device first and then perform a LBIST of the CPU and decide if you want to remain in safe reset state based on the LBIST result).

0 Dmitri Zakharevski59 over 9 years ago in reply to Charles Tsai

Intellectual 905 points

Thanks Charles,

If I understand correctly, the RM57 cannot be used to guarantee safe processor operations due to:

GPIO A has safety implications and should only be used in a certain system state.
CPU1 incorrectly issues a write to GPIO A, CPU2 correctly issues a write to GPIO B.
The CCM issues an error and asserts nERROR pin.
The external system receives GPIOA asserted followed by nERROR pin asserted.
GPIOA asserted causes the system to fail before nERROR can be detected to "safe" the system.

Is my understanding correct or will nERROR be asserted before GPIOA is asserted so that the system is "safe".

Could outputs be tied to nERROR system outputs so that if it is asserted all outputs are ignored?

0 Christian Herget over 9 years ago in reply to Dmitri Zakharevski59

TI__Expert 6985 points

Hi Dimitri,

The lock-step works in a way where the outputs of the second CPU are solely used to check and compare with the outputs of the first CPU.
This means that if first CPU writes a wrong value or to a wrong address this write will go through the bus matrix and to the peripheral. However, the lock-step (CCM) will most likely detect if there was a mismatch between the two CPUs and signals this to the outside world via the nError pin. In general nobody can guarantee safe operation, but the likelihood that random faults will be detected is high on Hercules devices. There is a chance that the potentially erroneous write to the GIO will arrive before the nError pin signals an error. It depends on the system how to handle this situation. Please keep in mind that we are talking about in ns for the fault detection (leff than a us), but there might be a short glitch on your output. Again it is up to you whether your system can tolerate such a glitch or how to handle maybe filter it.
It should also be said that the whole architecture of the Hercules devices aims for detecting faults and not necessarily trying to prevent them (fault avoidance).

Best Regards,
Christian

0 Dmitri Zakharevski59 over 9 years ago in reply to Christian Herget

Intellectual 905 points

Thanks Christian,

You mentioned ns timing, are there published worst case numbers available?

0 Anthony F. Seely over 9 years ago in reply to Dmitri Zakharevski59

TI__Guru 68830 points

Don't know if we have any published timings on this.
But I think we can put a little more hand waving behind this.

The write needs to propogate through the device interconnect to the IO.
The interconnect runs at a reduced clock rate compared to the CPU.
So this should amount to a delay of between 10 and 20 CPU clock cycles.
Experiments on the bench could be made to refine this number.

Now from the same point (at the CPU boundary) there is 2 cycles delay between the CPUs for the core compare. There are probably a few additional cycles for this to propogate through the ESM to the ERROR pin. With some work a precise answer could be had.

But the point is it's likely that the ERROR path will be faster. And if it's not faster, it's not going to be very many CPU cycles behind the write. So at 330MHz even if you are pessimistic and allow 10 cpu cycles for the ESM path to lag the CPU path thats on the order of a 30ns time where your GIO is active but ERROR isn't.

That's a swag again these delays could probably be measured with the right experiments. I'm not aware that they are published in any datasheet or TRM though.

-Anthony

0 Christian Herget over 9 years ago in reply to Anthony F. Seely

TI__Expert 6985 points

Anthony, Dmitri,

I looked into the safety manual and saw that we list the Error Reporting Time for the CCM to be <1us, but I think this is talking about the user defined ISR and not the nERROR pin. The nERROR pin will signal the fault much faster, however there are situations where you want to filter out short pluses, please let me explain this a bit more.

Most if not all of the self tests of the implemented diagnostics will trigger the nERROR pin. For example if you perform a self test of the CCM this will cause the whole MCU to react to this injected fault like it was a real one, the CCM will signal an error to the ESM, the ESM will send a FIQ to the CPU and trigger the nERROR pin. For that reason the TPS65381 companion chip can filter out what I call short pulse on the nERROR pin, which means that it wont react on those, whereas it would reset the MCU on longer pulses. This is described in SLVSBC4F Figure 5-11 on page 64.

So please keep in mind, that there might be situations where the nERROR pin is intentionally triggered during runtime. What I try to say is, that a simple AND conjunction with for example the GIO signal you mentioned might not be adequate depending on your system. If you plan to perform self tests not only during startup phase, but also during runtime you should consider to implement some sort of filter to mask these short intended pulses on the nERROR pin.

Best Regards,
Christian

0 Dmitri Zakharevski59 over 9 years ago in reply to Christian Herget

Intellectual 905 points

Thank you Christian, Anthony.

This answers my question.

Arm-based microcontrollers

Arm-based microcontrollers forum

RM57 CCM Question