This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TDA4VM: Vision app with MCAN hangs/freezes at 500 kbps

Part Number: TDA4VM

Hello,

This is with reference to https://e2e.ti.com/support/processors/f/791/p/879255/3255498#3255498.

We have a single camera DL vision app that operates at 30 fps.

After every frame is processed, we have to send the output of that frame via CAN.

The output is generated by a kernel that is working on C66x_2 DSP.

In the main function, we are using appRemoteServiceRun() to transfer this output to MCU_2_1.

We configured MCAN2 to operate at 500 kbps, by modifying the parameters in the MCAN EVM Loopback example as per TI's recommendation.

When we run this app on the TDA4 EVM, the app hangs/freezes after a few thousand frames (usually within the first 4 minutes).

There is no other visual indication of any errors; the app simply hangs and the debug prints on the UART console simply stop.

Note 1: If we use the original settings from the MCAN EVM Loopback example (1 Mbps bit rate), then the same app works fine.

Note 2: If we use 500 kbps bit rate but transmit output of every alternate frame, then the app works fine.

How can we resolve this issue?

Thank you.

  • Hi Sagar,

    Can you connect a JTAG and see where is it failing? And please check if the failed trace is always the same.

    Regards,

    Karan

  • Hello Karan,

    Just to clarify once again, there is no specific frame number at which the app freezes/hangs, but it always does so within the first 1 to 4 minutes.

    Per your suggestion, we debugged the app 3-4 times using CCS.

    The app always appears to fail on the MAIN_Cortex_R5_0_1 (mcu2_1) core.

    The responsible line of code is present within the 'else' condition in the function App_mcanIntr0ISR().

        if (MCAN_INTR_SRC_TRANS_COMPLETE ==
            (intrStatus & MCAN_INTR_SRC_TRANS_COMPLETE))
        {
            gMcanIsrIntr0Flag = 0U;
        }
        else
        {
    printf("\nInterrupt Status: 0x%x\n", intrStatus); //THE APP APPEARS TO FAIL ON THIS LINE
    }

    The CCS console shows the following error message on app hang:

    [MAIN_Cortex_R5_0_1] ti.sysbios.gates.GateMutex: line 99: assertion failure: A_badContext: bad calling context. See GateMutex API doc for details.
    xdc.runtime.Error.raise: terminating execution

    In the attachment below, I have included a screenshot of our CCS debug session and the corresponding UART console log, both gathered just after the app freezes.

    20200225_mcan_500kbps_issue.zip

    Please advise.

    Thank you.

  • Hi Sagar,


    As per https://processors.wiki.ti.com/index.php/GateMutex_Assert section Reason why Assert occurs the reason for the error you see in CCS is due to printing in the interrupt context.

    So you get an interrupt and it is not because of the transfer completion event.

    To be clear - you have a task doing CAN transmit and another doing DL vision app?

    Does changing the CAN data rate back to 1Mbps doesn't create any issue? (the problem comes with only 500kbps?)

    Using a global variable in an interrupt in an application with multiple tasks can be a bad idea as then I believe you will be running a while(gMcanIsrIntr0Flag) in the main thread? (please confirm). In that case you CPU will be busy waiting on the while() and will freeze your other task. If there is a problem in the CAN task it should not affect the other and that is why you should use a semaphore instead of a global variable.

    In that case even if the CAN task is failing to transmit in some case the other task will be scheduled.

    Regards,

    Karan

  • Hello Karan,

    Karan Saxena said:

    To be clear - you have a task doing CAN transmit and another doing DL vision app?

    No. We do not have separate tasks for DL and CAN Tx. A single thread runs both.

    Karan Saxena said:

    Does changing the CAN data rate back to 1Mbps doesn't create any issue? (the problem comes with only 500kbps?)

    Yes. The issue is observed only at 500 kbps bit rate, and that too only if we insist on transmitting CAN every 33 ms. If we increase that to 66 ms, the app works fine.

    Karan Saxena said:

    I believe you will be running a while(gMcanIsrIntr0Flag) in the main thread? (please confirm).

    Yes.

    Karan Saxena said:

    Reason why Assert occurs the reason for the error you see in CCS is due to printing in the interrupt context. So you get an interrupt and it is not because of the transfer completion event.

    This part is a bit confusing to me. If printing causes this assertion, then it should also happen for 1 Mbps at 33 ms, or 500 kbps at 66 ms.

    Karan Saxena said:

    If there is a problem in the CAN task it should not affect the other and that is why you should use a semaphore instead of a global variable. In that case even if the CAN task is failing to transmit in some case the other task will be scheduled.

    We are currently attempting this. I will report our progress as soon as we have tested it.

    Thank you.

  • Hi Sagar,

    Were you able to replace the global variable setting in the ISR to semaphore post-pend?

    Regards,

    Karan

  • Hello Karan,

    Due to some other high-priority work over the past few days, I have not had the bandwidth to try out your suggestion.

    I will keep you posted as soon as I have any update.

    Thank you.

  • Hello Karan,

    My colleague informed me that she followed your advice from another active E2E thread it appears to have resolved the issue from this thread too.

    We are planning to do some more testing over the course of the next few days to ensure that your suggestions are working perfectly with our DL vision app.

    Until then, I would prefer to keep this thread open.

    Once we are assured there are no issues, then I will close this thread.

    Thank you.

  • Hi Sagar,

    The http://e2e.ti.com/support/processors/f/791/p/884293/3292447#3292447 (active thread you mentioned) has got a reply posted by your colleague. Is that resolved too?

    So just to double check -  you are now using the semaphore approach and with that there is no Vision app crash and neither CAN frame drop?

    Regards,

    Karan

  • Hello Karan,

    Karan Saxena said:

    So just to double check -  you are now using the semaphore approach and with that there is no Vision app crash and neither CAN frame drop?

    1. We used the semaphore approach earlier. But our current code is based on the Task_sleep() alternative you suggested in your post https://e2e.ti.com/support/processors/f/791/p/884293/3291847#3291847.

    2. We were working on many things simultaneously, so we have not yet tested this version with our vision app. So I also cannot comment on the CAN frame drop. As I said earlier, we have planned some testing within this week to do just this. Then I will provide the results to you.

    Thank you.