TMS320F28388D: CLA1_1 INT False Triggered When STL_CLA_runPESTMicro()

Part Number: TMS320F28388D

Issue description:
When executing the CLA STL runtime PEST (STL_CLA_runPESTMicro) on F2838x, an unexpected CLA1-related PIE interrupt is occasionally false triggered. The interrupt vector corresponds to PIE Group 11 / INT1 (CLA1_1 interrupt) , even though the interrupt is not intentionally enabled by the application().

In my application, I didn't use the CLA1_Task1 and didn't enable CLA1_1 interrupt, and I checked the PIEIER11 and PIEIFR11 when false triggering happens, the INTx1 are both zeros.

image.png

Then I found that once the false triggering happened, the STL_CLA_runPESTMicro() would keep returning STL_CLA_FAIL_TIMEOUT.

Question:

  • Is it expected that CLA STL PEST may assert or route internal signals that can trigger CLA1_1 interrupts and how do I fix this.

  • When executing the CLA STL runtime PEST (STL_CLA_runPESTMicro) on F2838x, an unexpected CLA1-related PIE interrupt is occasionally false triggered. The interrupt vector corresponds to PIE Group 11 / INT1 (CLA1_1 interrupt) , even though the interrupt is not intentionally enabled by the application().

    This behavior is a symptom of using nested interrupts and the code modifies the PIEIERx register outside of a groupx interrupt. 

    The caution on this page explains it: 

    software-dl.ti.com/.../index.html

  • Hi Lori,

    Thanks for your reply. I will check whether our application has the behavior you mentioned.

    I have another question related to our CLA task triggering mechanism.

    Currently, our application CLA task is triggered by an ePWM event every 50 µs. We observed a situation where, after running PEST a few times, the application CLA task is no longer triggered.

    During debugging, we found that the corresponding ePWM Event Trigger Flag register shows that the INT flag is set.

    Based on my understanding, this might be because the ePWM event occurred while PEST was running, but since no handler (CLA task or ISR) was able to service the event at that time, the event remained pending. As a result, subsequent CLA task triggers no longer occur.

    Could you please help confirm whether this understanding is correct?

    If this is the case, what would be the recommended way to handle this situation?
    For example, should we explicitly disable the ePWM event trigger before running PEST, or is there a better or more recommended approach?

    Thanks in advance for your help.

    Best regards,
    Norman

  • Hi Lori,

    Coming back to the original question: After multiple rounds of investigation, we did not find any code path that modifies PIEIER11 outside of the Group 11 ISR.

    However, I observed another important behavior. Before the unexpected triggering of the unregistered and non-enabled CLA1_1 interrupt, the function STL_CLA_runPESTMicro() returns 0x80 (STL_CLA_FAIL_SPURS_INT) several times, and only after that does the unexpected CLA1_1 interrupt occur.

    I then took a closer look at the behavior of STL_CLA_runPESTMicro(), which can be summarized as follows:

    1. DINT

    2. Disable all CLA tasks

    3. Wait for all CLA tasks to finish execution

    4. Run PEST

    5. Clear the PIE_O_IFR11 bits that are used

    6. Read all PIE_O_IFR11 bits; if any bit is non-zero, return 0x80 (STL_CLA_FAIL_SPURS_INT)

    In our application, CLA Task 3 behaves as follows:

    • Triggered by EPWM every 50 µs

    • Executes its task logic

    • Calls CLA_forceSoftwareInterrupt(CLA1_ONLY_BASE, CLA_TASKFLAG_3) to trigger the CLA1_3 interrupt to the CPU

    • Clears the EPWM event

    • Ends the task

    This led me to consider the following scenario:

    It may be possible that CLA Task 3 has already started executing before the CPU enters DINT inside STL_CLA_runPESTMicro(), and after DINT is executed, CLA Task 3 still calls CLA_forceSoftwareInterrupt().
    At this point, since the CPU interrupts are globally disabled (DINT), but the corresponding PIE_O_IFR11 bit is still set, there is no ISR to service the interrupt. As a result, when STL_CLA_runPESTMicro() later reads PIE_O_IFR11, it detects a non-zero value and returns the 0x80 error.

    To validate this hypothesis, I performed an experiment by commenting out CLA_forceSoftwareInterrupt() in CLA Task 3. After doing so, both the 0x80 error and the unexpected CLA1_1 interrupt no longer occurred.

    Based on this result, I would like to ask for your thoughts on the possible relationship between CLA_forceSoftwareInterrupt(), the 0x80 (STL_CLA_FAIL_SPURS_INT) error, and the unexpected triggering of the CLA1_1 interrupt.

    Any insight would be greatly appreciated.

    Best regards,
    Norman

  • The CLA tasks are edge triggered, not level. I've seen the case where the CLA is configured after interrupts have begun and in this case the CLA doesn't respond due to not seeing an edge. In this case you could disable the PWM int or manually clear the int flag when appropriate.

  • Hi Norman,

    Let me think about this and get back to you in a day or two. 

    -Lori

  • Norman,

    The CLA1_1 interrupt - yes, I feel you are on the right track.

    This occurs due to a race timing condition. If the CPU receives an INT11, it will "ask" the PIE for an group11 address to take the interrupt. Normally the PIE will respond with the highest group11 interrupt both enabled and flagged.

    If however, the PIE does not have a group11 interrupt both enabled and flagged, the PIE will default to the vector for first interrupt in the group. In this case CLA1_1.

    So, if the CLA interrupt "sneeks in" to the CPU side just before DINT takes effect, and then if the PIEIFR bits are cleared before the CPU asks for a vector, the PIE will find itself in this situation of no interrupt both flagged and enabled.  

    Section 3.4.4.3 Disabling Interrupts of the TRM has some steps for disabling interrupts to avoid this situation.  Also how are you checking that all tasks have completed? The CPU can read the CLA's MIRUN register to check if any tasks are executing. 

    -Lori

  • Hi Lori,

    Thank you very much for your explanation and suggestions — they have been very helpful.

    Following TI’s recommended procedure, I am now disabling the CLA3 interrupt by calling
    Interrupt_disable(INT_myCLA03) before executing STL_CLA_runPESTMicro(),
    and re-enabling it by calling
    Interrupt_enable(INT_myCLA03) after STL_CLA_runPESTMicro() completes.

    After applying this flow, I no longer observe any unexpected CLA1_1 interrupt occurrences.
    This seems to have successfully avoided the race condition issue discussed earlier.

    However, the 0x80 (STL_CLA_FAIL_SPURS_INT) error is still present. Its behavior is somewhat unusual — it is neither purely random nor triggered at a fixed rate. Based on my observations, it follows the pattern below:

    • STL_CLA_runPESTMicro() is executed once every 12 ms

    • After approximately 20,000 executions,
      STL_CLA_FAIL_SPURS_INT (0x80) starts occurring frequently for a short period

      • Roughly 200+ consecutive occurrences

    • After that, the error disappears completely for a while

    • When the total execution count reaches approximately 90,000,
      the same behavior repeats (again ~200 consecutive occurrences)

    • So far, this pattern appears to repeat in a periodic, burst-like manner

    In other words,

    •  This error does not occur on every PEST execution and is not sporadic or random.

    • This error seems clearly correlated with long-term execution / accumulated run count, and manifests as bursts of spurious interrupt failures

    I wanted to ask whether you might have any insights or experience with similar spurious interrupt behavior that only appears after long-term execution and in a periodic manner.

    In the meantime, I will continue to investigate this issue further to see if I can find additional clues, and I will report back if I discover anything meaningful.

    Thank you again for your help.

    Best regards,
    Norman

  • Also how are you checking that all tasks have completed? The CPU can read the CLA's MIRUN register to check if any tasks are executing. 

    What I mentioned that checking all the tasks have completed is in the  STL_CLA_runPESTMicro() in TI example, and it used the way that you said.

  • Hi Norman,

    I think there is another race condition causing STL_CLA_FAIL_SPURS_INT. 

    Am I correct that if the CLA 3 doesn't interrupt the CPU then this doesn't occur?  Also does the flag correspond to CLA task 3?

    I suspect that an interrupt is coming from the CLA into the PIE and is flagged just after the SW clears the PIEIFR register. This could be just a cycle or two delta from a "working" case where the clear of the PIEIFR occurs first. Can you try adding a few cycles between checking the MIRUN and clearing the PIEIFR?

    -Lori

  • Hi Lori,

    To answer your question:

    Am I correct that if the CLA 3 doesn't interrupt the CPU then this doesn't occur?  Also does the flag correspond to CLA task 3?

    The answer to both questions is yes.

    Following your suggestion, I tried adding a few cycles of delay between checking MIRUN and clearing PIEIFR, but unfortunately this did not improve the situation.



    And then I trying to review the TI library implementation more closely, I noticed the following behavior:

    • After completing a test, the function clears the PIE_O_IFR11 bit corresponding to the CLA task used in that specific test.

    • However, during the spurious-interrupt check, the code checks whether any bit in PIE_O_IFR11 is set.

    • If any bit is found set, the function reports a 0x80 failure (STL_CLA_FAIL_SPURS_INT).

    Based on this behavior, I believe the 0x80 failure occurs under the following condition:

    • During STL_CLA_runPESTMicro(), CLA Task 3 software-triggers a CPU interrupt after DINT, causing the corresponding PIE_O_IFR11 bit to be set.

    • If the current PEST iteration is not checking CLA Task 3 (for example, it is checking Task 6), then the function only clears the PIE_O_IFR11 bit associated with Task 6.

    • As a result, the PIE_O_IFR11 bit set by Task 3 remains asserted, leading to the 0x80 spurious-interrupt failure.

    To validate this hypothesis, I conducted an experiment where I logged which CLA task was under test when the 0x80 failure occurred.
    The results show that every time the 0x80 failure is triggered, the task being checked is not Task 3, which I believe supports the above analysis.

    I would appreciate your thoughts on this interpretation.

    If this is indeed the root cause, do you have any recommended mitigation strategy?
    For example, would it be reasonable to, within STL_CLA_runPESTMicro(), wait until all CLA tasks have completed execution (by checking CLA_O_MIRUN) and then explicitly clear all CLA-related bits in PIE_O_IFR11 once before performing the spurious-interrupt check?



    Additionally, I would like to confirm one aspect of PIE behavior:

    If a specific interrupt is disabled (i.e., PIE_O_IER_x.bit_y = 0), and the corresponding interrupt event occurs, will the associated PIE_O_IFR_x.bit_y still be set?

    Best regards,
    Norman

  • Hi Lori,

    Based on our discussion and the experiment, I realized that the key to resolving the two issues mentioned above (1. CLA1_1 interrupt false trigger and 2. 0x80 (STL_CLA_FAIL_SPURS_INT) error false trigger) is to prevent the CLA1 from triggering a CPU interrupt while the TI library function STL_CLA_runPESTMicro() is executing.

    To implement this, I added a flag mechanism: I set a flag immediately before calling STL_CLA_runPESTMicro() and clear it immediately after. This flag informs the CLA that the CPU is currently executing the PEST routine. The CLA checks this flag to determine whether it should force-trigger the CPU interrupt. While this approach means some CLA interrupts might be missed, this trade-off is acceptable for our application.

    With this modification, the 0x80 error no longer occurs, and the previously mentioned "CLA1_1 interrupt false trigger" issue has been resolved.

    Thanks again for your assistance; it really helped me a lot.

    Best regards,
    Norman

  • Hi Lori,

    By the way, if you know of a better solution that would allow the CPU to avoid missing CLA interrupts, I would be happy to know.

    Best regards,

    Norman

  • Hi Norman,

    I think this is a good solution since your application can tolerate turning off the CLA interrupt of the CPU. This is cleaner than what we were trying to accomplish.

    Regards

    Lori