TMS570LC4357: Maximum asynchronous exception delay

Etienne Alepins

Part Number: TMS570LC4357

Hi,

With TMS570LC, we have been experimenting with the delay of various kind of exceptions. The delay is defined by the distance between the instruction that caused the exception and the call to the exception vector (Reset, Undefined, Data Abort, Prefetch Abort, IRQ or FIQ). As expected, synchronous exceptions have a delay of zero. Most asynchronous exceptions (FIQ, async aborts, etc.) have a very small delay of 1 instruction. However, there are 2 types of exceptions were we observed huge delays:

1) FPU interrupts (IRQ)

2) Async data abort caused by a write to Flash (which is obviously not a good thing to do)

A delay of more than 100 nop instructions was observed!! This is a real issue because it means that if a task generates such an exception, the OS has time to complete the task, return to Privileged state and execute some OS internal code before the exception pops up. Our OS doesn't have the capability to handle exceptions within itself without killing the whole system. So a single task is able to bring the whole OS down! This effect in our OS hasn't been observed yet though.

I am writing this post so that you give us the maximum number of clock cycles of delay that each of these 2 exceptions can have. Or a mitigation mean that forces the processor to pop the exception. We are running the core that 300 MHz (GCLK) and the system at 150MHz (HCLK). If I am not wrong, VIM is on HCLK and hence also running 150MHz.

Thanks,

Etienne

over 4 years ago

0 Sunil Oak over 4 years ago

TI__Mastermind 49120 points

Hi Etienne,

IRQ timing is highly dependent on several system-level implementations. Cortex R4/R5 processors do not support nested interrupts by default, which means that any IRQ that comes in while the processor is within any exception (IRQ or higher priority) would have to wait until the processor becomes available again to service the new IRQ. This is also why in applications using an RTOS, most interrupt service routines are very short - mainly just set a flag and return. The actual servicing the interrupt then occurs in one of the tasks assigned for this specific purpose. This keeps the CPU available for servicing new interrupt requests.

If you measure the time for the CPU to address a new IRQ with no current exception being serviced, typical IRQ latency that I have measured in the past is around 47 CPU clock cycles.

I would have to look into the case of an asynchronous data abort and get back to you.

Regards, Sunil

0 Etienne Alepins over 4 years ago in reply to Sunil Oak

Intellectual 775 points

Hi,

My question is not related to ISR servicing time. It is a HW-only question that does not depend on our SW.

I am rather looking for the worst case interval between an instruction that generates an exception and the point where the exception is effectively signaled to the SW by the HW. For example, in my sample code, I have an assembly instruction performing a floating-point division by zero (e.g. vdiv instruction). After that instruction, in the same assembly file, I've inserted hundreds of NOP instructions. In the exception handler, I look at LR-4 to see where the exception occurred: it points to 20 instructions after the vdiv... That's a lot! And I don't know if it could be even more in other conditions (we have caches enabled).

This measure is in number of instructions, but I agree with you that I would prefer an answer in number of CPU clock cycles.

Note: the most asynchronous data abort we've got is trying to write to a Flash address while not being in any erase/program algorithm. Don't forget to allow writes to Flash in the MPU configuration to obtain that result (indeed, that's not very natural).

Étienne

0 Sunil Oak over 4 years ago in reply to Etienne Alepins

TI__Mastermind 49120 points

Hi Etienne,

I used the PMU to measure around 118 CPU clock cycles between writing to the flash and the data abort handler being taken, which is close to what you observed as well.

In terms of the IRQ timing, I had previously measured 47 CPU cycles between an interrupt request being generated and the entry to the interrupt service routine. The point is that this is certainly not the worst case possible, as this time can be affected by the current state of the CPU when the IRQ happens. Note that this measurement was done with the hardware vectored mode of the IRQ response enabled within the CPU. There are other ways to handle an IRQ as well:

i) Jump to the IRQ vector address 0x18 and then jump to the address of the highest-priority pending interrupt request

ii) Jump into an IRQ service routine which is then responsible to call the correct handler based on the number of IRQ requests pending

Both these methods of servicing IRQ would be slower than using the hardware vectored mode.

Regards,

Sunil

0 Etienne Alepins over 4 years ago in reply to Sunil Oak

Intellectual 775 points

Hi,

Thanks for the measures. We are also using hardware vectored IRQ mode. But we also have FIQ exceptions. In fact, I need the worst case time for:

- FPU interrupt - IRQ (HW vector mode)

- FPU interrupt - FIQ

- Asynchronous data abort due to write on Flash

You can always assume that the exception is unmasked when generating the error (i.e. CPSR[A,I,F]=0. Also note that we've observed quite some difference in the delay between the various FPU exceptions (inexact, overflow, underflow, invalid, divide-by-zero, input denormal), the worst being overflow and divide-by-zero.

I would definitely need a worst case delay based on the TMS570LC design. Could you contact the design team to find that out? At the same time, understanding why these 2 kinds of interrupt have a soooo long delay would be nice (it seems like a bad design...) All other asynchronous IRQ/FIQ/Abort have a delay of 0 to 2 instructions!

Thanks.

0 Sunil Oak over 4 years ago in reply to Etienne Alepins

TI__Mastermind 49120 points

Hi Etienne,

A write access to the flash memory (or normally to the RAM on L2) takes ~40 CPU cycles on TMS570LC. This does not include the cycles spent in buffering this write (no error) before it even appears at the flash module boundary (where the error response is generated). Given that, it is certainly imaginable that the CPU takes >100 cycles to respond to this asynchronous write error.

As for the cases of an FPU exception (IRQ) or any other IRQ in general, ARM documentation does not include the number of cycles taken for these exceptions to be addressed. Please note that the compiler also generates instructions for the context switch which also consume several cycles before the first instruction in the service routine is executed. At this point the best information I can provide is what I can measure on silicon with the concerned exceptions unmasked and unblocked.

Regards,

Sunil

0 Etienne Alepins over 4 years ago in reply to Sunil Oak

Intellectual 775 points

Hi,

Do you think there are configurations that can affect the delay of the async abort to Flash or FPU interrupt? I mean, if I measure that delay, can I have a good confidence that the number obtained will not vary?

Also, trying to understand why we are obtaining these measures for IRQ/FIQ, how can we explain that in other IRQ/FIQ cases, the delay is really small? For example, when reading a RAM location with an erroneous ECC code, we've measured 0 cycle delay between the load instruction and the FIQ ESM 2.3 interrupt. Isn't FIQ ESM 2.3 interrupt performing a Core->ESM->VIM path of similar length than the Core->VIM floating-point interrupt? In both cases, the VIM relies on signals produced by the ARM core (i.e. the error detection is done in the core).

Note that when I measure the delay for the IRQ/FIQ, I put at breakpoint at the exception vector (at 0x1C for FIQ and in the VIM handler for IRQ), so instructions generated by the compiler for context switch do not influence my measures. I always work at object code level for these kind of measurements, not source code.

Regards,

Étienne

0 Sunil Oak over 4 years ago in reply to Etienne Alepins

TI__Mastermind 49120 points

Hi Etienne,

The CPU response to asynchronous abort is not enabled by default, and must be enabled during CPU setup by clearing the "A" bit of the CPSR. This A-bit also gets set on any CPU reset condition, or any other CPU abort condition thereby disabling CPU's abort response to errors.

In case of an erroneous ECC code on RAM access, the error is detected by the CPU in parallel to the processing of the data (in case of a data fetch). This ECC error is detected before the "memory write" stage of the pipeline thereby preventing the faulty data to be processed. The CPU then responds with a data abort and also signals this on its "Event Bus". This is why you observe "zero" delay between a read of a location with an ECC error and the CPU's response.

The CPU's event bus signal for RAM ECC error is routed to the ESM, which captures this signal on the next rising edge of VCLK and this generates an interrupt request to the VIM.

Do you measure the delay between the erroneous read and the entry to the FIQ service routine, or between the erroneous read and entry to the data abort handler?

Regards, Sunil

0 Etienne Alepins over 4 years ago in reply to Sunil Oak

Intellectual 775 points

Hi,

RAM ECC error detected upon read do not generate aborts. They generate an FIQ ESM 2.3.

I always measure the delay between the instruction (e.g. erroneous read) and the entry to the exception vector (e.g. abort handler, FIQ handler, Undefined handler, etc.). Not to the service routine (which is usually C code).

Regards,

Étienne

0 Sunil Oak over 4 years ago in reply to Etienne Alepins

TI__Mastermind 49120 points

Hi Etienne,

You are correct about no aborts on ECC errors on RAM accesses on TMS570LC. This is because the RAM is connected to the CPU's AXI port and not as a tightly-coupled memory. The CPU still signals this on the event bus, which is then connected to ESM 2.3, which in turn generates the FIQ.

Regards,

Sunil

Arm-based microcontrollers

Arm-based microcontrollers forum

TMS570LC4357: Maximum asynchronous exception delay