EK-TM4C1294XL: Unusually Large Interrupt Latency

Stefan Manoharan

Part Number: EK-TM4C1294XL

Hi Everyone,

I've been trying to receive SPI signals from my ADS1274EVM to the TM4C1294XL. To do this, I have configured the Falling Edge of the /DRDY signal from the ADC as the interrupt (at GPIO port F) for the microcontroller. While doing this, I found that the latency is unusually large, i.e. 6-7us, whereas it is supposed to be in the neighborhood of 1us (12 Cycles of a 120MHz system clock). I also set the interrupt priority to the highest, although it shouldn't make a difference as the GPIOF Interrupt is the only ISR I have defined. My code is given below, and I'd appreciate if anyone has any suggestions. Some code snippets and outputs from my Logic Analyzer are given below.

Setting Up GPIOF for Interrupts:

// Pin F4 setup for DRDY-Inverse
    SysCtlPeripheralEnable(SYSCTL_PERIPH_GPIOF);        // Enable port F
    GPIOPinTypeGPIOInput(GPIO_PORTF_BASE, GPIO_PIN_0);  // Init PF0 as input

    // Interrupt setup
    GPIOIntDisable(GPIO_PORTF_BASE, GPIO_PIN_0);                // Disable interrupt for PF0 (in case it was enabled)
    GPIOIntTypeSet(GPIO_PORTF_BASE, GPIO_PIN_0, GPIO_FALLING_EDGE);                          // Configure PF0 for falling edge trigger
    GPIOIntClear(GPIO_PORTF_BASE, GPIO_PIN_0);                  // Clear pending interrupts for PF0
    GPIOIntRegister(GPIO_PORTF_BASE, GPIOPortFIntHandler);      // Register ISR for port F
    IntPrioritySet(INT_GPIOF, 0);                                // Set Interrupt Priority Level
    GPIOIntEnable(GPIO_PORTF_BASE, GPIO_PIN_0);                 // Enable interrupt for PF0

My Interrupt Service Routine for GPIOF

//Interrupt Service Routine (ISR) or Interrupt Handler
void GPIOPortFIntHandler(void)
{

        //Clear the GPIO interrupt
        GPIOIntClear(GPIO_PORTF_BASE, GPIO_INT_PIN_0);


        //Fill the buffer elements one by one; collect 64 bits
        SSIDataPut(SSI1_BASE, DummyData);
        SSIDataGet(SSI1_BASE,&pui32DataRxBuffer[0]);             //Get first 16 bits [111:96]
        SSIDataPut(SSI1_BASE, DummyData);
        SSIDataGet(SSI1_BASE,&pui32DataRxBuffer[1]);             //Get second  16 bits [95:80]

        //Add Block Below for 2 Channels
        SSIDataPut(SSI1_BASE, DummyData);
        SSIDataGet(SSI1_BASE,&pui32DataRxBuffer[2]);             //Get third 16 bits [79:64]
        SSIDataPut(SSI1_BASE, DummyData);
        SSIDataGet(SSI1_BASE,&pui32DataRxBuffer[3]);             //Get fourth  16 bits [63:48]

        //Add Block Below for 3 Channels
        SSIDataPut(SSI1_BASE, DummyData);
        SSIDataGet(SSI1_BASE,&pui32DataRxBuffer[4]);             //Get fifth 16 bits [47:32]

        //Add Block Below for 4 Channels
        SSIDataPut(SSI1_BASE, DummyData);
        SSIDataGet(SSI1_BASE,&pui32DataRxBuffer[5]);             //Get sixth 16 bits [31:16]
        SSIDataPut(SSI1_BASE, DummyData);
        SSIDataGet(SSI1_BASE,&pui32DataRxBuffer[6]);             //Get seventh 16 bits [15: 0]

        //Consolidate and concatenate the elements to get Channel 1 and 2
        pui16DataRx0[counter] = ((uint16_t)(pui32DataRxBuffer[0]) << 1) | ((uint16_t)(pui32DataRxBuffer[1]) >> 15);

        //pui16DataRx1[counter] = ((uint16_t)(pui32DataRxBuffer[1]) << 9) | ((uint16_t)(pui32DataRxBuffer[2]) >>  7);
        //pui16DataRx2[counter] = ((uint16_t)(pui32DataRxBuffer[3]) << 1) | ((uint16_t)(pui32DataRxBuffer[4]) >> 15);
        //pui16DataRx3[counter] = ((uint16_t)(pui32DataRxBuffer[4]) << 9) | ((uint16_t)(pui32DataRxBuffer[5]) >>  7);

        //Counter Update
        if(counter < INPUT_VECTOR_SIZE - 1) {counter++;}        //Increment Counter as long as it's below 4999
        else{counter = 0;}                                      //Reset Counter to Zero if it is 4999
        while(SSIBusy(SSI1_BASE)){}

}

Logic Analyzer Output

over 6 years ago

0 cb1_mobile over 6 years ago

Guru 117850 points

May I note the following?

You interrupt upon the falling edge of D_RDY - yet if you switched to rising edge - your interrupt would occur sooner! (by D_RDY's pulse width) Goal is response - not true measure of latency!
You "register" the interrupt - rather than placing it w/in the MCU's "Vector Table" - which I believe - achieves faster response
You impose the time for the interrupt to "clear" - PLUS the time for function, "SSIDataPut()" to execute - and class "those two delays" as interrupt latency! Would not a GPIO toggle - as the very first instruction w/in said handler - provide a "truer picture" of interrupt latency?

When running above certain speeds - does not that MCU suffer expanded number of "wait-states" and/or other "crippling effects?"

Beyond the above - I would switch (ideally) to a different port - and determine if there may be a "latency penalty" extracted by your chosen port - and pin.

I agree w/your calculations and believe that you can improve results...

0 Bruno Saraiva over 6 years ago

Guru 13040 points

cb1's answer is excellent!

A few more pointers:

- Be careful with specific interrupt clearing, such as GPIOIntClear(GPIO_PORTF_BASE, GPIO_INT_PIN_0); - it is wiser to read the interrupt mask, and then clear all interrupts. In case your code grows and adds other pins as interrupt sources, of even if a noise triggers an interrupt, you'd be stuck in a ISR re-entering loop.

- Not only the whole SSI set of commands are not delays, but also it is not good to leave all those things inside an interrupt. It is better that you simply set a control flag inside the ISR, and then service the SSI communication during the main execution.

Stefan Manoharan said:
in the neighborhood of 1us (12 Cycles of a 120MHz system clock)

Actually that amount of time relates to 120 cycles.

(Saleae has been getting a good amount of free advertisement around here...)

PS: If for some reason you are trying to read the converter "as soon as possible" after the value is available and signaled by /DR line, maybe you should consider polling. Still, that "rush" will probably make no difference on your overall system, for the AD converter has a maximum rate of 144K samples per second. There are other phases that will impact your measurement (including whatever numeric processing you will do with all those samples, if any).

Bruno

0 Chester Gillon over 6 years ago in reply to cb1_mobile

Guru 92251 points

cb1_mobile said:
You "register" the interrupt - rather than placing it w/in the MCU's "Vector Table" - which I believe - achieves faster response

Looking at the TivaWare_C_Series-2.1.4.178 source code shows that the GPIOIntRegister() function does place the interrupt handler in the Cortex-M4 "Vector Table"; GPIOIntRegister() updates the Vector Table in RAM. Where the first call to an IntRegister function copies the Vector Table from flash to RAM. Therefore, use of GPIOIntRegister() does't account for the observed large interrupt latency.

I haven't measured the interrupt latency myself, but found the previous thread Minimum gpio interrupt delay for TM4C129 where the code used GPIOIntRegister() and on a TM4C129 with a CPU clock of 120MHz the measured GPIO interrupt latency was about 230 ns.

From looking at the posted code I can't see what explains the large interrupt latency.

Some further questions for the original poster are:

a) Has the CPU frequency been verified to be set to 120 MHz?

b) Is the CPU put in a sleep mode when waiting for an interrupt?

c) Does the main code disable / re-enable interrupts?

0 Bruno Saraiva over 6 years ago in reply to Chester Gillon

Guru 13040 points

Chester Gillon said:
Therefore, use of GPIOIntRegister() does't account for the observed large interrupt latency.

Chester,

It appears that you will have to argue with your colleague who's written the Tivaware User Guide:

So, which statement is true? I would love to adopt only run-time registrations in our company's codes, but the text above has been making us edit the startup file for each project.

Bruno

0 Chester Gillon over 6 years ago in reply to Bruno Saraiva

Guru 92251 points

Bruno Saraiva said:
It appears that you will have to argue with your colleague who's written the Tivaware User Guide

Thank you for pointing me at that part of the Tivaware User Guide. To be honest, my comment about the use of GPIOIntRegister() was that function didn't add any extra software overhead to the interrupt handler. I.e. GPIOIntRegister() doesn't add any software interrupt dispatch logic. As you mention I didn't consider any hardware overhead in moving the Vector Table from flash to ram.

Also, since I am not a TI employee am unable to directly talk to the author of the Tivaware User Guide.

Bruno Saraiva said:
So, which statement is true? I would love to adopt only run-time registrations in our company's codes, but the text above has been making us edit the startup file for each project.

The statement from the Tivaware User Guide makes a vague statement about a small latency without quantifying the expected value of the latency.

The ARM Cortex-M4 Interrupt latency documentation says:

There is a maximum of a twelve cycle latency from asserting the interrupt to execution of the first instruction of the ISR when the memory being accessed has no wait states being applied. When the FPU option is implemented and a floating point context is active and the lazy stacking is not enabled, this maximum latency is increased to twenty nine cycles. The first instructions to be executed are fetched in parallel to the stack push.

The ARM documentation that the first interrupt handler instructions are executed in parallel to the stack push implies that the interrupt handler table fetch has occurred before the stack push occurs, which is at odds with the Tivaware User Guide. That discrepancy between the ARM and Tivaware documentation suggests a measurement should be performed to see what the difference in interrupt latency is for a vector table in flash .vs. ram.

0 Bruno Saraiva over 6 years ago in reply to Chester Gillon

Guru 13040 points

Chester Gillon said:
since I am not a TI employee am unable to directly talk to the author of the Tivaware User Guide.

Sorry, Chester, my mistake. You just have too many points on your counter, hence I failed to properly read your qualification!!! Of course the tone of my message would make much more sense being targeted to a TI employee. I'm sure you will disregard the nonsense.

Nice reply. It's on my (long) to-do list to eventually try an make such measurement.

Bruno

0 cb1_mobile over 6 years ago in reply to Bruno Saraiva

Guru 117850 points

Responses - (now) numbering SIX - suggest the need for an entirely new measurement - that of, "O.P. Latency!"
From tests we've past run - use of the vector table w/in "start-up file" - w/both LX4F & 4C123 - indeed minimizes "interrupt latency!"

0 Stefan Manoharan over 6 years ago in reply to cb1_mobile

Prodigy 70 points

Hi! Thanks for your suggestions, those are good points. Sorry it took so long to reply, I was trying out all suggestions on this post (which are a lot! :-) )

cb1_mobile said:

You interrupt upon the falling edge of D_RDY - yet if you switched to rising edge - your interrupt would occur sooner! (by D_RDY's pulse width) Goal is response - not true measure of latency!

The DRDY rising edge in this case only occurs at the negative edge of the SCLK provided by the MCU- which is a part of the ISR. This is why the falling edge is necessary for the interrupt.

cb1_mobile said:

You "register" the interrupt - rather than placing it w/in the MCU's "Vector Table" - which I believe - achieves faster response

You are right, and I have been placing the handler in the MCU vector table. Now using the vector table alone has worked for me in the past with timer interrupts, but for some reason, it doesn't work with GPIO interrupts. The code, for some reason I'm trying hard to understand, only works when the interrupt is defined in the vector table AND called in the 'register' function.

cb1_mobile said:

You impose the time for the interrupt to "clear" - PLUS the time for function, "SSIDataPut()" to execute - and class "those two delays" as interrupt latency! Would not a GPIO toggle - as the very first instruction w/in said handler - provide a "truer picture" of interrupt latency?

I tried the GPIO toggle; what I found was that the latency is still around 5us. The time for the SCLK to start after the ISR is entered (that is, the delay due to Clear and DataPut) is only around 0.5us. Therefore, I presume the problem has something to do with the process of actually calling the handler.

cb1_mobile said:

When running above certain speeds - does not that MCU suffer expanded number of "wait-states" and/or other "crippling effects?"

Beyond the above - I would switch (ideally) to a different port - and determine if there may be a "latency penalty" extracted by your chosen port - and pin.

What crippling effects do you mean? I'm pretty new to MCU's, and I'm still trying to figure out how they work.

Also, I switched to Port A from Port F, but the SCLK is not being generated on the falling edge of DRDY anymore. I may have to check what's wrong there. Does it happen that different ports have different speeds? I thought the basic latency for every interrupt is 12 cycles, no matter what.

0 cb1_mobile over 6 years ago in reply to Stefan Manoharan

Guru 117850 points

Stefan Manoharan said:
Does it happen that different ports have different speeds? I thought the basic latency for every interrupt is 12 cycles, no matter what.

Your thought was correct (properly performing ports exhibit similar latency.) Yet - what if (some) form of damage - or loading - was localized to your "initially chosen pin?" That was my motivation for directing your "repetition" via a different port & pin. (I did note the satisfactory rise time of "D_Rdy" - which seemed to "escape" any effect of unwanted loading.)

Now neither firm nor I use 129xx MCU (preferring far faster, more capable) yet I recall reading of "limitations imposed" (specifically wait states added) when the MCU runs "beyond" certain system clock speeds. Depending upon execution - it may be necessary to SLOW the System Clock - to achieve Faster Execution - yet this is very much a "balancing act" - and such is best left to your "deeper, more detailed read/review of the MCU manual." (again - I've little interest in that MCU)

Your point regarding different interrupt performance - based upon "peripheral in play" - has escaped my discovery. Has your MCU a Port "equipped with "iNDIVIDUAL PIN INTERRUPTS?" If so - I believe you'd be well served to repeat your test - but upon (that) special Port - and report. In the interim - should time allow - I'll have staff run tests upon our "123 MCUs" and see if we can replicate your "latency test results."

Yours was a very complete response - much thanks for that - and if your "discovery" holds true - should prove of use/value to (many) here... Kindly switch to the "special" Port I've suggested - and test & advise. Merci.

One final suggestion - as firm/I work @ the nexus of legal/tech/finance - we find it advisable to "Always respond quickly - even briefly" which enables your "contact" to know that their message has been received - and warrants your interest. "Silence" - on the other hand - may be viewed as: Message failure's to arrive, Receiver's lack of interest, Receiver's lack of effort, Receiver's "overload" - none of those "good!" A far better alternative would model, "Dear Client - thank you - we've received your request - staff are gathering & analyzing data - will be back to you w/in "hours, days etc." In the US - the millennial population is almost universally "Guilty" of this "wait till perfect to respond" - and that, "Costs us clients" - not all whom will be as "revealing" as I am - here... (i.e. firm is "fired" and do not know, "Why!") Thanks...

0 Stefan Manoharan over 6 years ago in reply to Chester Gillon

Prodigy 70 points

Hi Bruno and Chester,

I attempted to use only the vector table, but the code doesn't enter the ISR when I comment out the IntRegister function. This is strange, because I usually don't have to use IntRegister whenever I use a timer interrupt. Do you know if this function is necessary for certain ports like GPIO's?

0 cb1_mobile over 6 years ago in reply to Stefan Manoharan

Guru 117850 points

Kindly "re-read" my (moments ago) posting - suggesting a method for you to "make such discovery" - or add to the knowledge base - via a Special interrupt capable Port. Thank you. (those fellows are (likely) at work - as am I - and may not be able to respond till (later). (which is my condition - right now...)

One other point - please choose a Port which is "isolated" from other usage - so that we best insure that (other factors) are NOT confounding your test! I must tell you that we've used LX4F & 4C123 - and to my knowledge - Never/Ever employed "Int Register" as I found it ADDED LATENCY - AND required extra care/handling - which proved UNJUSTIFIED! And we always interrupted via GPIO Ports - ALWAYS.

[edit] Staff will "kill me" - yet it dawns that you DID interrupt - yet experienced "added latency" - and I "doubt" our past tests (exhaustively) made such measure!

Have you enabled ALL THREE Interrupt Levels? Processor, Peripheral and individual Interrupt? (I "lay odds" that's the bit which you've missed!) Gotta go... [edit] as fate would have it - there is NO "dedicated" Peripheral-Class Interrupt (that's my language/description) attained by GPIO - you'll note Timer, PWM, QEI etc. ALL have such Peripheral Interrupt mechanisms/assignments...

[edit] #3 - Again - I've NEVER used the IntRegister function - and all of our many Ports properly GPIO Interrupt! Unless this is a limitation w/in your MCU (which we do NOT like/nor use).

Linked is a recent post in which I detailed "Interrupt Mechanics" - should prove of (some) use/assistance - good luck.

https://e2e.ti.com/support/microcontrollers/tiva_arm/f/908/t/605306

0 Stefan Manoharan over 6 years ago in reply to cb1_mobile

Prodigy 70 points

cb1_mobile said:

Have you enabled ALL THREE Interrupt Levels? Processor, Peripheral and individual Interrupt? (I "lay odds" that's the bit which you've missed!)

And your odds were indeed in your favor! Yes, the Int Register function indeed was the culprit; I initiated the interrupt using the format below (as per your suggestion, enabling all three interrupt levels) in order to get an initial latency of 3.15us (which I can handle) and a continuing latency of around 1us- this is a huge improvement, compared to that 6us latency I was dealing with earlier! Thanks a ton for your help!

    // Interrupt setup
    GPIOIntDisable(GPIO_PORTF_BASE, GPIO_PIN_0);                // Disable interrupt for PF0 (in case it was enabled)
    GPIOIntTypeSet(GPIO_PORTF_BASE, GPIO_PIN_0, GPIO_FALLING_EDGE);                          // Configure PF0 for falling edge trigger
    GPIOIntClear(GPIO_PORTF_BASE, GPIO_PIN_0);                  // Clear pending interrupts for PF0
    IntEnable(INT_GPIOF);
    GPIOIntEnable(GPIO_PORTF_BASE, GPIO_PIN_0);                 // Enable interrupt for PF0
    IntMasterEnable();

0 cb1_mobile over 6 years ago in reply to Stefan Manoharan

Guru 117850 points

Good for you - you persisted - followed suggestions - were methodical - and succeeded! Your verify & write-up are appreciated - should benefit many - even those who "Register Interrupts" - and are willing to "jump thru hoops" - to gain, "delayed interrupt response!"

All is clear but for your presentation of "initial vs. continuing" latency! Would you be so good as to "define & describe?" (my expectation is that each/every "falling edge" would experience (nearly) the same latency - assuming (other) program intrusions are blocked and/or avoided. I'm "lost" by the (apparent) special behavior of the "initial latency" - and have never heard that term used before. Kindly do explain.)

Work day just ended (19:37 CST) and I have (another) glorious date w/the treadmill - which when power "went down" 2 nights ago - (almost) sent my 220 lbs. flying. Big deal - except I may be able to relate this to your experience. One would have "hoped" that the treadmill - sensing "Power Falling" could have had a voltage comparator signal to the "Precor" MCU - "Prepare to Coast!" Instead - the treadmill brake grabbed - all the room's lights went out - and "purely by luck" - I was able to grab the handles - and survive. But to go from 6MPH to Braked - so quickly - has to be questioned. (which I'll do - unless some reader "knows" - and cares to advise, here.)

0 Bruno Saraiva over 6 years ago in reply to Stefan Manoharan

Guru 13040 points

Stefan,

I'm still a bit baffled by the amount of time your interrupt is taking to crank in.

- Are you triple sure your MCU is running at 120MHz?

- Are there other long ISR's in your project?

Look at this really small piece of code. It's the first couple of lines in a TIMER interrupt which is actually triggered by a signal level transition (timer capture), not unlike yours.

void Int_Timer0A(void)

{

uint32_t ui32Status;

encoderInternal[0].encoderLevel[3] = ROM_GPIOPinRead(GPIO_ENCGL_BASE, GPIO_ENCGL_PIN); // Do this as soon as possible! We have 120cycles before a possible change

We've measured the instant after the GPIO reading under several conditions - with several other small interrupts in the system (2 almost continuous UART communications @921600, 2300Hz of SPI readings, among other things), and the worst case we got was around 86 processor cycles, or 0.7us. Several times we got less than half of that, for no other ISR was being served.

Another aspect: I know you are looking at your signal in a logic analyzer, which always shows beautiful vertical transitions... but maybe your actual signal is taking too long to fall down, due to some excessive pull up?

Bruno

0 cb1_mobile over 6 years ago in reply to Bruno Saraiva

Guru 117850 points

It would be hoped that poster's Slave Device employs (normal for SPI) "push-pull" output - avoiding the need for "pull-up."
Note that in an earlier post - I thought similarly - but NOT due to "pull-up" - instead potential capacitance - which would cause a similar effect. To reduce (perhaps eliminate) that likelihood - I asked poster to "Change his Port & Pin" - he did so - and no change in performance was noted.

I cannot tell if your "signal level transition" arrived upon a "pure" GPIO pin (like the poster's) or landed upon a "Timer pin." It proves always best to, "Duplicate experimental conditions" which greatly reduces (often prevents) extraneous factors from "plaguing the measurement."

Lastly - what's happened to, "12 System Clocks between Signal Arrival and Entry to ISR?" (you've just reported 86 SysClks) For the "4C123" MCU - the controlling data authority (MCU manual) states: "

■ Deterministic, fast interrupt processing: always 12 cycles, or just 6 cycles with tail-chaining (these values reflect no FPU stacking)

And earlier - w/in this thread poster Chester reported (source ARM) that the "burden created by the (unstacked) FPU - raised interrupt latency to 29 cycles!"

Thus both this poster - And Bruno - are reporting latencies (beyond) that expected. Might it be that the (other) interrupts "in play" are of equal or higher priority - thus "extend the time until the ISR is (finally) entered?

Does the "registration of interrupts" cause even "more latency" than we (long ago) measured - while proving more complex and resource demanding? (slows response, demands SRAM & adds code - "such a deal!") (Might it be that the 74 ADDED SysClks you've noted are the "gift" of your RTOS?) Does not (something) here - "smell rotten?"

0 Bruno Saraiva over 6 years ago in reply to cb1_mobile

Guru 13040 points

cb1_mobile said:
It would be hoped that poster's Slave Device employs (normal for SPI) "push-pull" output - avoiding the need for "pull-up."

I refer to the logic signal trace which tells the MCU that a new measurement is available, not to the SPI communication traces.

cb1_mobile said:
what's happened to, "12 System Clocks between Signal Arrival and Entry to ISR?" (you've just reported 86 SysClks)

This reported 86 cycles is a "real life measurement" in a system with lots of other interrupts enabled. On our systems, while one interrupt is being serviced, the others wait to completion. My point is that, even in a more complex system, with other peripherals working at the same time, the "latency" is still much shorter than what poster is facing in his system. Hence, in his case, I would check the three points suggested:

- System clock

- Trigger signal not raising/lowering as fast as poster believes

- Other very long interrupts running while the signal triggered.

cb1_mobile said:
Might it be that the 74 ADDED SysClks you've noted are the "gift" of your RTOS?

There was no RTOS involved on these numbers. Again, they are not a scientific experiment, but rather just real numbers measured from signal transition to first line of interrupt. We are not trying to prove that 12 cycles is achievable, but rather to point that numbers obtained by poster, around 2ms, are simply too long.

Bruno

0 cb1_mobile over 6 years ago in reply to Bruno Saraiva

Guru 117850 points

Bruno Saraiva said:

cb1_mobile

It would be hoped that poster's Slave Device employs (normal for SPI) "push-pull" output - avoiding the need for "pull-up."

I refer to the logic signal trace which tells the MCU that a new measurement is available, not to the SPI communication traces.

Arm-based microcontrollers

Arm-based microcontrollers forum

EK-TM4C1294XL: Unusually Large Interrupt Latency