This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TDA4VL EVM Data captured by DMA+timer P is missing or unable to be captured

Part Number: TDA4VM
Other Parts Discussed in Thread: TDA4VL,

Hi, TI expert

We used 12 DMA channels in mcu3-0, and after modifying "rm cfg. c" to allocate DMA resources, the 12 channel DMAs worked normally. However, when using timerP at the same time in mcu3-0, many times the DMAs caught missing or unable to catch data. If timerP is not applicable, this situation will not exist.

Version information: sdk8.02 Linux&rtos.

Please assist in the investigation, thank you~!

  • Hello,

    Can you share the modifications you have made?

    Also, can you share some more details on your usecase, how are you testing this? Are you using an existing application in the SDK or are you using a custom developed application?

    You mentioned Linux&RTOS, are you using the Vision Apps setup?

    Thanks,

    Erick

  • Hi Jia Wentao,

    Is it the same issue that we are discussing on the other thread, data missing on mcu3_0? What is timer configured with? Which timer are you using on mcu3_0?

    Regards.

    Brijesh

  • Hi, Erick

    Also, can you share some more details on your usecase, how are you testing this? Are you using an existing application in the SDK or are you using a custom developed application?

    We are using self-developed programs. We use TimerP_ Create() creates a timer that is used to control GPIO to send waveforms to USS.

    2. When using TimerP_ create(TimerP_ANY, (TimerP_Fxn)timerIoSetIsr,&timerParams); When creating timers, the timer cycle is often inaccurate, and there are occasional issues with incomplete data capture in the DMA.

    2. When using TimerP_ create(1, (TimerP_Fxn)timerIoSetIsr,&timerParams); When creating a timer, the timer cycle appears to be accurate, but once other applications (which may have used DMA) are started, there will be many situations where the DMA of mcu3-0 cannot capture all or all data.

    You mentioned Linux&RTOS, are you using the Vision Apps setup?

    Yes, the code we added in the "tda4/rtos-sdk/vision_apps/platform/j721e/rtos/mcu3_0/" directory.

    Regards.

    jiawentao

  • Hi,Brijesh

    To solve the same problem, the delay may be inaccurate due to load issues on another thread, so I am trying to use TimerP to send command waveforms to USS. However, the following issues were encountered:

    1. When using TimerP_ create(TimerP_ANY, (TimerP_Fxn)timerIoSetIsr,&timerParams); When creating timers, the timer cycle is often inaccurate, and there are occasional issues with incomplete data capture in the DMA.

    2. When using TimerP_ create(1, (TimerP_Fxn)timerIoSetIsr,&timerParams); When creating a timer, the timer cycle appears to be accurate, but once other applications (which may have used DMA) are started, there will be many situations where the DMA of mcu3-0 cannot capture all or all data.


    Additionally, on another thread, I would like to know that MCU 3-0 is an independent CPU core. Why do other CPU cores affect MCU 3-0 and increase its load?

    And you said you can use other GTC Timers, which one can you use? We can continue our discussion on that thread.

    Regards.

    jiawentao

  • Hi,Erick & Brijesh

    May I ask how the resources of dmtimer are allocated between different CPU cores?

    Is the dmtimer corresponding to the timer shown in the following two figures_ What about IO association?

    If we use dmtimer2 as a regular timer and do not use dmtimer signal; How should the timer be configured when dmtimer2 corresponds to "TIMER_IO2" (AD23) used as a regular gpio?

    Looking forward to the prompt response from the two TI experts.

    Regards.

    jiawentao

  • Hi jiawentao,

    I will check it tomorrow and get back to you.

    Regards,

    Brijesh

  • Thanks Brijesh

  • Hi jia wentao,

    For timer usage, please refer to configTIMER_ID macro in the FreeRTOS configuration file. For example, for mcu2_0, As per the file packages\ti\kernel\freertos\config\j721e\r5f\FreeRTOSConfig_mcu2_0.h, this macro is set to 0, which is timer12 as per TimerP_mapId API in packages\ti\osal\soc\j721e\TimerP_default.c.

    Yes, dmtimer are the timers mentioned in the TRM. You can ignore IO pins, if timer is used internally. This is used to get the timer signal output on a pin. Please refer to TRM timer section for more details.

    Regards,

    Brijesh

  • Hi, Brijesh

    Okay, thank you for your reply. There are two important questions:

    1. TimerP_ create(TimerP_ANY, (TimerP_Fxn)timerIoSetIsr,&timerParams);

    If the ID parameter uses TimerP_ Any, after confirmation, the timer id automatically assigned by the system is 0, which is also timer12. This will cause mcu2-0 to call the board_ Stuck during delay, unable to run;

    2. If using TimerP_ create(6, (TimerP_Fxn)timerIoSetIsr,&timerParams); After creating the timer, other applications (which should have used udma) often fail to capture data, and the captured data is 0.

    Regards

    jiawentao

  • hi jia wentao,

    ok, ANY timer should have worked, because in this case, it will allocate any free timer and configures it for the usage. Atleast in SDK8.5, i dont see it even allows using fixed timer, only ANY timer is allowed. Let me check with the team here. 

    Any specific reason for using timer? Is it to change from GTC timer to DM timer for the issue that we are discussing on the other thread? 

    Regards,

    Brijesh

  • Hi,Brijesh

    ok, ANY timer should have worked, because in this case, it will allocate any free timer and configures it for the usage. Atleast in SDK8.5, i dont see it even allows using fixed timer, only ANY timer is allowed. Let me check with the team here. 

    Currently, we use sdk8.02 (Linux&rtos), but if the "TimerP_ANY" system automatically assigns an ID of 0, which is also DMtimer12, this can be confirmed by printing the ID value.

    Also, why can't it be set to a fixed ID? Isn't the timer created by the freertos system also using a fixed ID? For example:

    ./pdk_ jacinto/packages/ti/kernel/freertos/portable/TI_ CGT/r5f/port.c

    pTickTimerHandle = TimerP_ create(configTIMER_ID, &prvPorttimerTickIsr, &timerParams);

    Looking forward to your team's reply

    Any specific reason for using timer? Is it to change from GTC timer to DM timer for the issue that we are discussing on the other thread? 

    Yes, we want to use timerP, but there are two issues with TimerP. One is that the timerP delay is not accurate, and the other is that it can cause DMA to often fail to catch data. Why is this?

    In addition, I also want to know why psd (TIDL) startup causes an increase in load on MCU 3-0?

  • Hi,Brijesh

    Have you received feedback from your team?
    Regards,

    jiawentao

  • Hi jia wentao,

    Sorry for the late reply. We can definitely use ANY timer or even timer with fixed ID, but just need to make sure that this timer is not being used as os tick timer on any other cores.. Can you please check this and then try using time?

    Regards,

    Brijesh

  • Hi, Brijesh

    1.

    As I mentioned earlier, if I use "TimerP_ create (TimerP_ANY,...,...)" to create a timer, the system automatically assigns a timer id of 0, which is Dmtimer12. It will conflict with the default system Dmtimer of mcu2-0.

    There are two methods that I can solve through verification:

    1)TIMERP_ ANY_ MASK changed from 0x00FF to 0x00F0

    2) When creating a timer, use an unused fixed timer ID, such as TimerP_ create(6,..., ...)

    Can you help me confirm if both of the above modifications are correct?

    2.

    If the above modification method is correct,

    I use TimerP_ After creating a timer (6,...,...), it can cause DMA to not catch data at runtime. Why is this? Is there a conflict between timer interrupt and DMA interrupt?

    Regards

    jiawentao

  • Hi jia wentao,

    #1, yes this method is correct. We are going to change this mask to allow only limited timers on the core.. 

    #2, no, there should not be. Because both of them are using different irq. But now there will also be timer isr, which typically is very short, but wondering if this is interfering with DMA ISR. Any specific reason for using TimerP module for using the timer? What i am thinking is to avoid using these modules and directly configuring timer registers to required frequency and then just use it for sleep.. I guess that's what you want also. isn't it? 

    Regards,

    Brijesh

  • Hi, Brijesh

    #1, yes this method is correct. We are going to change this mask to allow only limited timers on the core.. 

    Ok, but after this modification, only 4 timers can be used.

    #2, no, there should not be. Because both of them are using different irq. But now there will also be timer isr, which typically is very short, but wondering if this is interfering with DMA ISR.

    What do you mean by DMA ISR? Udmas_ EventPrms -->eventCb?  I don't think we actually used it, I think we used the polling mode.

    Any specific reason for using TimerP module for using the timer?

    As mentioned by the previous thread, it is used to switch GPIO to avoid being affected by load and resulting in inaccurate latency.

    What i am thinking is to avoid using these modules and directly configuring timer registers to required frequency and then just use it for sleep.. I guess that's what you want also. isn't it? 

    Yes, I think this is possible, but it requires a subtle delay, and it is not affected by the load, is it?

    Regards

    jiawentao

  • Hi jia wentao,

    Ok, but after this modification, only 4 timers can be used.

    But do you require more than 4 timers? I guess one timer is enough for generating the pattern output, isn't it? 

    What do you mean by DMA ISR? Udmas_ EventPrms -->eventCb?  I don't think we actually used it, I think we used the polling mode.

    You are using polling mode for udma? But that can affected by timer interrupt, isnt it? 

    Yes, I think this is possible, but it requires a subtle delay, and it is not affected by the load, is it?

    Yes, in this case, since you are not using interrupt for udma. 

    Regards,

    Brijesh

  • Hi, Brijesh,

        I am a colleague of jia wentao,I hope to continue discussing with you on this issue。

    You are using polling mode for udma? But that can affected by timer interrupt, isnt it? 
    Yes, in this case, since you are not using interrupt for udma. 

    The following is our UDMA initialization related code, how to modify it to use interrupt for udma?

    /home/user/work/tda4/rtos-sdk/vision_apps/utils/hwa/src/app_gpio_dma_api2.c
    
    static int32_t App_create(App_Obj *appObj)
    {
        int32_t             retVal = UDMA_SOK;
        uint32_t            chType;
        Udma_ChPrms         chPrms;
        Udma_ChTxPrms       txPrms;
        Udma_ChRxPrms       rxPrms;
        Udma_EventHandle    eventHandle;
        Udma_EventPrms      eventPrms;
        SemaphoreP_Params   semPrms;
        int32_t             chIdx;
        App_ChObj          *appChObj;
        Udma_ChHandle       chHandle;
        Udma_DrvHandle      drvHandle = appObj->drvHandle;
    
        for(chIdx = 0U; chIdx < APP_NUM_CH; chIdx++)
        {
            appChObj = &appObj->appChObj[chIdx];
            chHandle = appChObj->chHandle;
    
            SemaphoreP_Params_init(&semPrms);
            appChObj->transferDoneSem = SemaphoreP_create(0, &semPrms);
            if(NULL == appChObj->transferDoneSem)
            {
                App_print("[Error] Sem create failed!!\n");
                retVal = UDMA_EFAIL;
            }
    
            if(UDMA_SOK == retVal)
            {
    
                /* Init channel parameters */
                chType = UDMA_CH_TYPE_TR_BLK_COPY;
                UdmaChPrms_init(&chPrms, chType);
                chPrms.fqRingPrms.ringMem   = appChObj->txRingMem;
                chPrms.cqRingPrms.ringMem   = appChObj->txCompRingMem;
                chPrms.tdCqRingPrms.ringMem = appChObj->txTdCompRingMem;
                chPrms.fqRingPrms.ringMemSize   = APP_RING_MEM_SIZE;
                chPrms.cqRingPrms.ringMemSize   = APP_RING_MEM_SIZE;
                chPrms.tdCqRingPrms.ringMemSize = APP_RING_MEM_SIZE;
    
                chPrms.fqRingPrms.elemCnt   = APP_RING_ENTRIES;
                chPrms.cqRingPrms.elemCnt   = APP_RING_ENTRIES;
                chPrms.tdCqRingPrms.elemCnt = APP_RING_ENTRIES;
    
                /* Open channel for block copy */
                retVal = Udma_chOpen(drvHandle, chHandle, chType, &chPrms);
                if(UDMA_SOK != retVal)
                {
                    App_print("[Error]: UDMA channel open failed!!\n");
    				printf("---------Error:%d\n", retVal);
                }
    			else{
                    printf("chIdx:%d UDMA channel open sucess!\n" , chIdx);
    			}
            }
    
            if(UDMA_SOK == retVal)
            {
                /* Config TX channel */
                UdmaChTxPrms_init(&txPrms, chType);
    			txPrms.busPriority     = 0;
    			txPrms.busOrderId      = 15;
                txPrms.dmaPriority     = 0;
                retVal = Udma_chConfigTx(chHandle, &txPrms);
                if(UDMA_SOK != retVal)
                {
                    App_print("[Error] UDMA TX channel config failed!!\n");
                }
            }
    
            if(UDMA_SOK == retVal)
            {
                /* Config RX channel - which is implicitly paired to TX channel in
                 * block copy mode */
                UdmaChRxPrms_init(&rxPrms, chType);
    			rxPrms.busPriority         = 0;
    			rxPrms.busOrderId          = 15;
    			rxPrms.dmaPriority         = 0;
                retVal = Udma_chConfigRx(chHandle, &rxPrms);
                if(UDMA_SOK != retVal)
                {
                    App_print("[Error] UDMA RX channel config failed!!\n");
                }
            }
    
            if(UDMA_SOK == retVal)
            {
                /* Register ring completion callback - for the last channel only */
                eventHandle = &appChObj->cqEventObj;
                UdmaEventPrms_init(&eventPrms);
    			eventPrms.intrPriority      = 0U;
                eventPrms.eventType         = UDMA_EVENT_TYPE_DMA_COMPLETION;
                eventPrms.eventMode         = UDMA_EVENT_MODE_SHARED;
                eventPrms.chHandle          = chHandle;
                eventPrms.masterEventHandle = Udma_eventGetGlobalHandle(drvHandle);
                eventPrms.eventCb           = &App_eventDmaCb;
                eventPrms.appData           = appChObj;
                retVal = Udma_eventRegister(drvHandle, eventHandle, &eventPrms);
                if(UDMA_SOK != retVal)
                {
                    App_print("[Error] UDMA CQ event register failed!!\n");
                }
                else
                {
                    appChObj->cqEventHandle = eventHandle;
                }
            }
    
            if(UDMA_SOK == retVal)
            {
                /* Register teardown ring completion callback */
                eventHandle = &appChObj->tdCqEventObj;
                UdmaEventPrms_init(&eventPrms);
                eventPrms.eventType         = UDMA_EVENT_TYPE_TEARDOWN_PACKET;
                eventPrms.eventMode         = UDMA_EVENT_MODE_SHARED;
                eventPrms.chHandle          = chHandle;
                eventPrms.masterEventHandle = Udma_eventGetGlobalHandle(drvHandle);
                eventPrms.eventCb           = &App_eventTdCb;
                eventPrms.appData           = appChObj;
                retVal = Udma_eventRegister(drvHandle, eventHandle, &eventPrms);
                if(UDMA_SOK != retVal)
                {
                    App_print("[Error] UDMA Teardown CQ event register failed!!\n");
                }
                else
                {
                    appChObj->tdCqEventHandle = eventHandle;
                }
            }
    
            if(UDMA_SOK == retVal)
            {
                /* Channel enable */
                retVal = Udma_chEnable(chHandle);
                if(UDMA_SOK != retVal)
                {
                    App_print("[Error] UDMA channel enable failed!!\n");
                }
            }
    
            if(UDMA_SOK != retVal)
            {
                break;
            }
        }
    
        /* Setup GPIO Pinmux */
        App_setupPinmux();
    
        /* Initialize GPIO now here */
        App_initGpio();
    
        /* Configure GPIO Mux to output event on Interrupt Aggregrator */
        App_setupGpioMuxIr();
    
        /* Setup L2G register to map local event to Global event */
        //App_setupL2G(appObj, 1);
    
        for(chIdx = 0U; chIdx < APP_NUM_CH; chIdx++)
        {
    		appChObj = &appObj->appChObj[chIdx];
            chHandle = appChObj->chHandle;
    		
    		GpioDmaSubmitTrpd(chIdx);
    		retVal = Udma_chPause(chHandle);
    		if(UDMA_SOK != retVal)
    		{
    			App_print("[Error] UDMA channel disable failed!!\n");
    		}
        }
        return (retVal);
    }

    Regards

    jialin xie

  • Hi Brijesh

    directly configuring timer registers to required frequency and then just use it for sleep

    Could you please provide a demo code?

    Regards

    jialin xie

  • Hi, Brijesh

    I would like to add a test conclusion about frame loss when MCU3_0 uses DMA

    When using the TimerP interface to create a Timer, and then use this Timer to switch GPIO to send waveforms to 12 ultrasonic modules, when the image inference application is not started, the DMA capture works normally; once the image inference application is started, the load of C7x_1 will increase to 70%-95%, then DMA capture will drop frames, the frame loss rate is 1%-3%

    Our MCU3_0 application must use a Timer, so I hope you can help solve the problem of DMA capture frame loss when MCU3_0 uses Timer

    Regards

    jialin xie

  • Hi jialin xie,

    But how is the load on DDR BW when the inference application started? Is it too high that DMA is not able to read? or Is the timer getting delayed and so DMA is not working? Could you please give some more information?

    Could you please provide a demo code?

    Sure, i will check and provide it next week.

    Regards,

    Brijesh

  • Hi, Brijesh

    But how is the load on DDR BW when the inference application started?

    Detailed CPU performance/memory statistics,
    ===========================================
    
    CPU: mcu1_0: TASK:           IPC_RX:   0. 0 %
    CPU: mcu1_0: TASK:       REMOTE_SRV:   0.11 %
    CPU: mcu1_0: TASK:        LOAD_TEST:   0. 0 %
    CPU: mcu1_0: TASK:      IPC_TEST_RX:   0. 0 %
    CPU: mcu1_0: TASK:      IPC_TEST_TX:   0. 0 %
    CPU: mcu1_0: TASK:      IPC_TEST_TX:   0. 0 %
    CPU: mcu1_0: TASK:      IPC_TEST_TX:   0. 0 %
    CPU: mcu1_0: TASK:      IPC_TEST_TX:   0. 0 %
    CPU: mcu1_0: TASK:      IPC_TEST_TX:   0. 0 %
    CPU: mcu1_0: TASK:      IPC_TEST_TX:   0. 0 %
    CPU: mcu1_0: TASK:      IPC_TEST_TX:   0. 0 %
    CPU: mcu1_0: TASK:      IPC_TEST_TX:   0. 0 %
    
    CPU: mcu1_0: HEAP:   DDR_SHARED_MEM: size =    8388608 B, free =    8388352 B ( 99 % unused)
    
    CPU: mcu2_0: TASK:           IPC_RX:   0.54 %
    CPU: mcu2_0: TASK:       REMOTE_SRV:   0.13 %
    CPU: mcu2_0: TASK:        LOAD_TEST:   0. 0 %
    CPU: mcu2_0: TASK:       TIVX_CPU_0:   0. 0 %
    CPU: mcu2_0: TASK:          TIVX_NF:   0. 0 %
    CPU: mcu2_0: TASK:        TIVX_LDC1:   0. 0 %
    CPU: mcu2_0: TASK:        TIVX_MSC1:   0.51 %
    CPU: mcu2_0: TASK:        TIVX_MSC2:   0. 0 %
    CPU: mcu2_0: TASK:       TIVX_VISS1:   0. 0 %
    CPU: mcu2_0: TASK:       TIVX_CAPT1:   1.68 %
    CPU: mcu2_0: TASK:       TIVX_CAPT2:   0. 0 %
    CPU: mcu2_0: TASK:       TIVX_DISP1:   0. 0 %
    CPU: mcu2_0: TASK:       TIVX_DISP2:   0. 0 %
    CPU: mcu2_0: TASK:       TIVX_CSITX:   1. 1 %
    CPU: mcu2_0: TASK:       TIVX_CAPT3:   0. 0 %
    CPU: mcu2_0: TASK:       TIVX_CAPT4:   0. 0 %
    CPU: mcu2_0: TASK:       TIVX_CAPT5:   0. 0 %
    CPU: mcu2_0: TASK:       TIVX_CAPT6:   0. 0 %
    CPU: mcu2_0: TASK:       TIVX_CAPT7:   0. 0 %
    CPU: mcu2_0: TASK:       TIVX_CAPT8:   0. 0 %
    CPU: mcu2_0: TASK:      TIVX_DISP_M:   1.31 %
    CPU: mcu2_0: TASK:      TIVX_DISP_M:   0. 0 %
    CPU: mcu2_0: TASK:      TIVX_DISP_M:   0. 0 %
    CPU: mcu2_0: TASK:      TIVX_DISP_M:   0. 0 %
    
    CPU: mcu2_0: HEAP:   DDR_SHARED_MEM: size =   16777216 B, free =   16709376 B ( 99 % unused)
    CPU: mcu2_0: HEAP:           L3_MEM: size =     262144 B, free =     261888 B ( 99 % unused)
    
    CPU: mcu2_1: TASK:           IPC_RX:   0. 0 %
    CPU: mcu2_1: TASK:       REMOTE_SRV:   0. 0 %
    CPU: mcu2_1: TASK:        LOAD_TEST:   0. 0 %
    CPU: mcu2_1: TASK:         TIVX_SDE:   0. 0 %
    CPU: mcu2_1: TASK:         TIVX_DOF:   0. 0 %
    CPU: mcu2_1: TASK:       TIVX_CPU_1:   0. 0 %
    CPU: mcu2_1: TASK:      IPC_TEST_RX:   0. 0 %
    CPU: mcu2_1: TASK:      IPC_TEST_TX:   0. 0 %
    CPU: mcu2_1: TASK:      IPC_TEST_TX:   0. 0 %
    CPU: mcu2_1: TASK:      IPC_TEST_TX:   0. 0 %
    CPU: mcu2_1: TASK:      IPC_TEST_TX:   0. 0 %
    CPU: mcu2_1: TASK:      IPC_TEST_TX:   0. 0 %
    CPU: mcu2_1: TASK:      IPC_TEST_TX:   0. 0 %
    CPU: mcu2_1: TASK:      IPC_TEST_TX:   0. 0 %
    CPU: mcu2_1: TASK:      IPC_TEST_TX:   0. 0 %
    
    CPU: mcu2_1: HEAP:   DDR_SHARED_MEM: size =   16777216 B, free =   16773376 B ( 99 % unused)
    
    CPU: mcu3_0: TASK:           IPC_RX:   0. 0 %
    CPU: mcu3_0: TASK:       REMOTE_SRV:   0.37 %
    CPU: mcu3_0: TASK:        LOAD_TEST:   0. 0 %
    CPU: mcu3_0: TASK:       SNR_IPC_RX:   0. 0 %
    CPU: mcu3_0: TASK:      IPC_TEST_RX:   0. 0 %
    CPU: mcu3_0: TASK:      IPC_TEST_TX:   0. 0 %
    CPU: mcu3_0: TASK:      IPC_TEST_TX:   0. 0 %
    CPU: mcu3_0: TASK:      IPC_TEST_TX:   0. 0 %
    CPU: mcu3_0: TASK:      IPC_TEST_TX:   0. 0 %
    CPU: mcu3_0: TASK:      IPC_TEST_TX:   0. 0 %
    CPU: mcu3_0: TASK:      IPC_TEST_TX:   0. 0 %
    CPU: mcu3_0: TASK:      IPC_TEST_TX:   0. 0 %
    CPU: mcu3_0: TASK:      IPC_TEST_TX:   0. 0 %
    
    CPU: mcu3_0: HEAP:   DDR_SHARED_MEM: size =    8388608 B, free =    8388352 B ( 99 % unused)
    CPU: mcu3_0: HEAP:           L3_MEM: size =     262144 B, free =     234496 B ( 89 % unused)
    
    CPU: mcu3_1: TASK:           IPC_RX:   0. 0 %
    CPU: mcu3_1: TASK:       REMOTE_SRV:   0. 3 %
    CPU: mcu3_1: TASK:        LOAD_TEST:   0. 0 %
    CPU: mcu3_1: TASK:      IPC_TEST_RX:   0. 0 %
    CPU: mcu3_1: TASK:      IPC_TEST_TX:   0. 0 %
    CPU: mcu3_1: TASK:      IPC_TEST_TX:   0. 0 %
    CPU: mcu3_1: TASK:      IPC_TEST_TX:   0. 0 %
    CPU: mcu3_1: TASK:      IPC_TEST_TX:   0. 0 %
    CPU: mcu3_1: TASK:      IPC_TEST_TX:   0. 0 %
    CPU: mcu3_1: TASK:      IPC_TEST_TX:   0. 0 %
    CPU: mcu3_1: TASK:      IPC_TEST_TX:   0. 0 %
    CPU: mcu3_1: TASK:      IPC_TEST_TX:   0. 0 %
    
    CPU: mcu3_1: HEAP:   DDR_SHARED_MEM: size =    8388608 B, free =    8388352 B ( 99 % unused)
    
    CPU:  c6x_1: TASK:           IPC_RX:   0.24 %
    CPU:  c6x_1: TASK:       REMOTE_SRV:   0.42 %
    CPU:  c6x_1: TASK:        LOAD_TEST:   0. 0 %
    CPU:  c6x_1: TASK:         TIVX_CPU:  64.39 %
    CPU:  c6x_1: TASK:      IPC_TEST_RX:   0. 0 %
    CPU:  c6x_1: TASK:      IPC_TEST_TX:   0. 0 %
    CPU:  c6x_1: TASK:      IPC_TEST_TX:   0. 0 %
    CPU:  c6x_1: TASK:      IPC_TEST_TX:   0. 0 %
    CPU:  c6x_1: TASK:      IPC_TEST_TX:   0. 0 %
    CPU:  c6x_1: TASK:      IPC_TEST_TX:   0. 0 %
    CPU:  c6x_1: TASK:      IPC_TEST_TX:   0. 0 %
    CPU:  c6x_1: TASK:      IPC_TEST_TX:   0. 0 %
    CPU:  c6x_1: TASK:      IPC_TEST_TX:   0. 0 %
    
    CPU:  c6x_1: HEAP:   DDR_SHARED_MEM: size =   16777216 B, free =   16749568 B ( 99 % unused)
    CPU:  c6x_1: HEAP:           L2_MEM: size =     229376 B, free =          0 B (  0 % unused)
    CPU:  c6x_1: HEAP:  DDR_SCRATCH_MEM: size =   50331648 B, free =   50331648 B (100 % unused)
    
    CPU:  c6x_2: TASK:           IPC_RX:   0. 0 %
    CPU:  c6x_2: TASK:       REMOTE_SRV:   0.10 %
    CPU:  c6x_2: TASK:        LOAD_TEST:   0. 0 %
    CPU:  c6x_2: TASK:         TIVX_CPU:   0. 0 %
    CPU:  c6x_2: TASK:      IPC_TEST_RX:   0. 0 %
    CPU:  c6x_2: TASK:      IPC_TEST_TX:   0. 0 %
    CPU:  c6x_2: TASK:      IPC_TEST_TX:   0. 0 %
    CPU:  c6x_2: TASK:      IPC_TEST_TX:   0. 0 %
    CPU:  c6x_2: TASK:      IPC_TEST_TX:   0. 0 %
    CPU:  c6x_2: TASK:      IPC_TEST_TX:   0. 0 %
    CPU:  c6x_2: TASK:      IPC_TEST_TX:   0. 0 %
    CPU:  c6x_2: TASK:      IPC_TEST_TX:   0. 0 %
    CPU:  c6x_2: TASK:      IPC_TEST_TX:   0. 0 %
    
    CPU:  c6x_2: HEAP:   DDR_SHARED_MEM: size =   16777216 B, free =   16773376 B ( 99 % unused)
    CPU:  c6x_2: HEAP:           L2_MEM: size =     229376 B, free =     229376 B (100 % unused)
    CPU:  c6x_2: HEAP:  DDR_SCRATCH_MEM: size =   50331648 B, free =   50331648 B (100 % unused)
    
    CPU:  c7x_1: TASK:           IPC_RX:   0. 3 %
    CPU:  c7x_1: TASK:       REMOTE_SRV:   0.12 %
    CPU:  c7x_1: TASK:        LOAD_TEST:   0. 0 %
    CPU:  c7x_1: TASK:      TIVX_CPU_PR:  72.19 %
    CPU:  c7x_1: TASK:      TIVX_CPU_PR:   0. 0 %
    CPU:  c7x_1: TASK:      TIVX_CPU_PR:   0. 0 %
    CPU:  c7x_1: TASK:      TIVX_CPU_PR:   0. 0 %
    CPU:  c7x_1: TASK:      TIVX_CPU_PR:   0. 0 %
    CPU:  c7x_1: TASK:      TIVX_CPU_PR:   0. 0 %
    CPU:  c7x_1: TASK:      TIVX_CPU_PR:   0. 0 %
    CPU:  c7x_1: TASK:      TIVX_CPU_PR:   0. 0 %
    CPU:  c7x_1: TASK:      IPC_TEST_RX:   0. 0 %
    CPU:  c7x_1: TASK:      IPC_TEST_TX:   0. 0 %
    CPU:  c7x_1: TASK:      IPC_TEST_TX:   0. 0 %
    CPU:  c7x_1: TASK:      IPC_TEST_TX:   0. 0 %
    CPU:  c7x_1: TASK:      IPC_TEST_TX:   0. 0 %
    CPU:  c7x_1: TASK:      IPC_TEST_TX:   0. 0 %
    CPU:  c7x_1: TASK:      IPC_TEST_TX:   0. 0 %
    CPU:  c7x_1: TASK:      IPC_TEST_TX:   0. 0 %
    CPU:  c7x_1: TASK:      IPC_TEST_TX:   0. 0 %
    
    CPU:  c7x_1: HEAP:   DDR_SHARED_MEM: size =  268435456 B, free =   82390272 B ( 30 % unused)
    CPU:  c7x_1: HEAP:           L3_MEM: size =    8159232 B, free =          0 B (  0 % unused)
    CPU:  c7x_1: HEAP:           L2_MEM: size =     458752 B, free =     458752 B (100 % unused)
    CPU:  c7x_1: HEAP:           L1_MEM: size =      16384 B, free =          0 B (  0 % unused)
    CPU:  c7x_1: HEAP:  DDR_SCRATCH_MEM: size =  385875968 B, free =  385456891 B ( 99 % unused)
    
    
    -------------------------print times--------------------num:29
    
    Summary of CPU load,
    ====================
    
    CPU: mpu1_0: TOTAL LOAD =  91.70 % ( HWI =   3.90 %, SWI =   4.39 % )
    CPU: mcu1_0: TOTAL LOAD =  43. 0 % ( HWI =   0. 0 %, SWI =   0. 0 % )
    CPU: mcu2_0: TOTAL LOAD =   7. 0 % ( HWI =   0. 0 %, SWI =   0. 0 % )
    CPU: mcu2_1: TOTAL LOAD =   0. 0 % ( HWI =   0. 0 %, SWI =   0. 0 % )
    CPU: mcu3_0: TOTAL LOAD =   7. 0 % ( HWI =   0. 0 %, SWI =   0. 0 % )
    CPU: mcu3_1: TOTAL LOAD =   1. 0 % ( HWI =   0. 0 %, SWI =   0. 0 % )
    CPU:  c6x_1: TOTAL LOAD =   0. 0 % ( HWI =   0. 0 %, SWI =   0. 0 % )
    CPU:  c6x_2: TOTAL LOAD =   0. 0 % ( HWI =   0. 0 %, SWI =   0. 0 % )
    CPU:  c7x_1: TOTAL LOAD = 100. 0 % ( HWI =   0. 0 %, SWI =   0. 0 % )
    
    
    HWA performance statistics,
    ===========================
    
    
    
    DDR performance statistics,
    ===========================
    
    DDR: READ  BW: AVG =   6171 MB/s, PEAK =  10246 MB/s
    DDR: WRITE BW: AVG =   3347 MB/s, PEAK =  90850 MB/s
    DDR: TOTAL BW: AVG =   9518 MB/s, PEAK = 101096 MB/s
    

    This is the load test done on May 9th. It only shows the highest load of C7x (100.0%), and it fluctuates at 50%-100% in reality. The excessive load of MPU1_0 can be ignored (because there was an application with The bug will cause the load of the A72 core to take up 40% more), it can be seen that the DDR BW load is:
    DDR performance statistics,
    =============================

    DDR: READ BW: AVG = 6171 MB/s, PEAK = 10246 MB/s
    DDR: WRITE BW: AVG = 3347 MB/s, PEAK = 90850 MB/s
    DDR: TOTAL BW: AVG = 9518 MB/s, PEAK = 101096 MB/s

    Is it too high that DMA is not able to read? or Is the timer getting delayed and so DMA is not working?

    How to verify this reason?

    When DMA captures frame loss, I can see through the oscilloscope that the waveform sent by the timer through GPIO and the waveform returned by the ultrasonic module are normal. Our timer is timed at 50us, and the low level of 100us is sent to each Ultrasonic module;

  • Hi xie,

    But this is too high average BW. It is reaching upto 9.5GGB. This can definitely affect GPIO DMA performance.. In order to confirm this, is there a way to reduce the DDR BW by lets say reducing the fps? Can you try reducing fps in inference application, lets say by half and see if it performs better?

    Regards,

    Brijesh

  • Hi, Brijesh

    Yes, when I reduce the frame rate of the image inference application, the frame drop of the DMA capture is a little bit better. However, the colleagues in charge of the image reasoning application do not want to reduce the frame rate, once it is reduced, obstacles may be missed. Now it is 33 frames per second, is this frame rate in the normal range?

    We also have a software on MCU3_0 that does not use Timer to send waves, but uses Delay to send waves. After starting the image reasoning application, the probability of DMA frame loss is very low, and the number of lost frames within 10 minutes does not exceed 10 frames. , so I think the Timer ISR may have affected the DMA capture frame loss. When the frame loss problem occurs, the phenomenon of the two software is the same, and the Buffer returned by DMA is empty.

    Regards,

    jialin xie

  • Hi, Brijesh

    Does C7x's DMA use dru by default? Is it possible that it didn't work with dru? How to confirm whether it is using dru now? We have not modified the RTOS code related to C7x

    Regards,

    jialin xie

  • Hi Xie,

    Yes, when I reduce the frame rate of the image inference application, the frame drop of the DMA capture is a little bit better. However, the colleagues in charge of the image reasoning application do not want to reduce the frame rate, once it is reduced, obstacles may be missed. Now it is 33 frames per second, is this frame rate in the normal range?

    That's bit difficult to say. Probably lets first try by making fps to half and see if it improves. We can also try increasing priority of DMA to see if it helps, if the issue is due to DMA. But this can potentially affect other traffic, but we can try.

    We also have a software on MCU3_0 that does not use Timer to send waves, but uses Delay to send waves. After starting the image reasoning application, the probability of DMA frame loss is very low, and the number of lost frames within 10 minutes does not exceed 10 frames. , so I think the Timer ISR may have affected the DMA capture frame loss. When the frame loss problem occurs, the phenomenon of the two software is the same, and the Buffer returned by DMA is empty.

    But i thought this is DMA issue, not the timer issue, since you confirmed that waveforms to USS are sent out correctly even when inference application is started and its only DMA getting delayed, isn't it? If delay is helping, then timer isr is delaying DMA isr (if you are using DMA ISR). 

    Does C7x's DMA use dru by default? Is it possible that it didn't work with dru? How to confirm whether it is using dru now? We have not modified the RTOS code related to C7x

    Well, yes, I think TIDL internally uses DRU channels, whereas GPIO-UDMA uses udma channels.. That should not cause any issue, because if i remember correctly, GPIO-DMA traffic is very less.. Isn't it? 

    Regards,

    Brijesh

  • Hi, Brijesh

    But i thought this is DMA issue, not the timer issue, since you confirmed that waveforms to USS are sent out correctly even when inference application is started and its only DMA getting delayed, isn't it?

    Some time ago, I observed the waveforms of the sent waves and echoes through the oscilloscope on the bench, and I did see that the echoes were correct.


    But I just did further tests on the real car, and I calculated the time difference between entering the Timer ISR 3 times before the frame drop occurred (because it is necessary to enter the Timer ISR 3 times to pull down the GPIO 100us to make the ultrasonic module emit waves, the first two times to enter the ISR to pull down the GPIO, and the third time to enter the ISR to pull up the GPIO);


    I found that after launching the image inference app, the time difference of the Timer fluctuated greatly around 100us every time the frame was dropped, and the time difference between 30us and 180us was fluctuating. That's true around 100us, but it can be small or large after launching the inference app

    volatile int cnt_front = 0;
    volatile uint64_t emit_ts_rec[3] = {0};
    void uss_front_timer_callback(){
    	uint8_t  gpio_send;
    	switch(emit_status){
    		case EMIT_READY:
    			cnt_front ++;
    			if(cnt_front < 3){
    				for(int i = 0; i < 12; i++){
    					uint8_t gpio_send = snrGetGpioIdx(i/4, i%4, GPIO_OUT);
    					GPIOPinWrite_v0(CSL_GPIO0_BASE , gpio_send, GPIO_PIN_HIGH);
    				}
    				emit_ts_rec[cnt_front] = appLogGetTimeInUsec();
    			}else{
    				for(int i = 0; i < 12; i++){
    					uint8_t gpio_send = snrGetGpioIdx(i/4, i%4, GPIO_OUT);
    					GPIOPinWrite_v0(CSL_GPIO0_BASE , gpio_send, GPIO_PIN_LOW);
    				}
    
    				emit_status = ECHO_READY;
    				cnt_front = 0;
    				emit_ts_rec[cnt_front] = appLogGetTimeInUsec();
    			}
    		break;
    		case ECHO_READY:
    			TimerP_stop(FrontUssTimerhandle);
    		break;
    		default:
    		break;
    	}
    }
    
    
    //This is the print code for counting lost frame data after reading dma buf in the while loop
    	cnt_data_sum ++;
    	if(count<(STATUS_LEN+2)*2 || count%2) //bad signal
    	{
    		cnt_data_lost ++;
    		printf("[%d:%d] lost=%lld sum=%lld count=%d timer_delay=%lld\n",sd, chl, cnt_data_lost, cnt_data_sum, count, emit_ts_rec[0] - emit_ts_rec[1]);
    	}else{
    		printf("[%d:%d] count=%d timer_delay=%lld\n",sd, chl, count, emit_ts_rec[0] - emit_ts_rec[1]);
    	}

    Probably lets first try by making fps to half and see if it improves

    I also reduced the frame rate of graphics inference applications, such as from 30ms to 60ms, then to 100ms, and then to 1s, and indeed the frame loss rate was significantly reduced, at 1s/frame, the frame loss rate in 5 minutes was less than one thousandth.

    At present, it seems that it is not DMA capture that drops frames, is it that the Timer interruption is affected by the image inference application? Is there a way to ensure the real-time performance of the Timer ISR if the ISR is too early or too late?

    Regards,

    jialin xie

  • Hi jialin xie,

    That's interesting. Are you running any other thing on mcu3_0? Timer should not get affected by inference function, unless there is a dependency on mcu3_0 in inference function. So are you running some algorithm on mcu3_0, which is required for inference function?

    One more point, can you please make sure that entire timer code and even the data is stored in the internal memory like TCM memory? If they are stored in internal memory of R5F, it will not get affected by DDR traffic.. so is it possible to fit this part of the code in the internal memory? If not TCM memory, you can probably use OCM memory. Each TCM memory is 32KB and OCM memory is 512KB, so total 576KB is available to store this data and code.. I guess it should be sufficient.. 

    Regards,

    Brijesh    

  • Hi, Brijesh

    That's interesting. Are you running any other thing on mcu3_0? Timer should not get affected by inference function, unless there is a dependency on mcu3_0 in inference function. So are you running some algorithm on mcu3_0, which is required for inference function?

    No,mcu3_0 only runs ultrasonic driver software,there are also some IPC interfaces, which communicate with mcu1_0 and A72 core

    can you please make sure that entire timer code and even the data is stored in the internal memory like TCM memory?

    Now the code of MCU3_0 is assigned to DDR memory, I will try to change to OCM after I start working tomorrow

    Thank you very much for your reply

    Regards,

    jialin xie

  • Sure, thanks Xie, will wait for the update from this experiment.

  • Hi, Brijesh

    1. Putting timer code into OCM did not significantly improve;

    2. tda4/rtos sdk/vision_ Apps/platform/j721e/rtos/c7x_ 1/main. c

    After making the following changes to the above file and testing for 10 minutes, the timer remains accurate.

    ---A/vision_ Apps/platform/j721e/rtos/c7x_ 1/main. c

    +++B/vision_ Apps/platform/j721e/rtos/c7x_ 1/main. c

    @@-147,9+147,9 @ @ int main (void)

    OS_ Init();

    -AppC7xClecInitDru();

    +//appC7xClecInitDru();

    -Setup_ Dru_ Qos();

    +//setup_ Dru_ Qos();

    TaskP_ Params_ Init (&tskParams);

    TskParams. priority=8u;

    Regards,

    jiawentao


    So can we lower the QoS priority of C7x or increase the QoS priority of mcu3-1 to solve this problem?

  • Hi jia wentao,

    We can definitely increase QoS priority for mcu3_1. It is not currently setup in the uboot, so it is just default priority. We can try bumping it up. Let me share some changes for the experimentation. 

    Regards,

    Brijesh

  • Hi, Brijesh

    Sorry, correct it,

    Only comment out the setup_ Dru_ Qos(); There will still be uncertain times for timer,

    But put appC7xClecInitDru(); After commenting it out, the timer becomes accurate.

    So, help me take a look at how to adjust C7x or mcu3-1 to make the timer accurate.

    Regards,

    Jiawentao

  • Hi Jia wentao,

    Do you mean only appC7xClecInitDru API is affecting? This is strange, because this API just configures CLEC on C7x to map few DRU events, used by TIDL. Is your algorithm working fine after this change? Because this is used inside TIDL for polling, so wondering if TIDL is working fine after this change. 

    Regards,

    Brijesh

  • Hi, Brijesh

    Do you mean only appC7xClecInitDru API is affecting?

    yes

    I will inquire with my colleagues in the algorithm and reply to you later

    Regards,

    Jiawentao

  • Thanks Jia wentao, Will wait for your update.

    If this API is creating the problem, we might have to narrow down further to see which event is creating the issue.. 

  • Hi, Brijesh

    I learned from my algorithm colleagues that the inference program does not work anymore.

    How should we investigate next?

  • Hi, Brijesh

    https://e2e.ti.com/support/processors-group/processors/f/processors-forum/968710/compiler-processor-sdk-dra8x-tda4x-why-mcu-s-clock-works-not-accurately-when-i-allocate-the-bss-data-in-ddr-area

    The above link also encounters similar problems to me, please refer to whether it helps us with our current problem

    Regards,

    jialin xie

  • Hi jia wentao,

    I learned from my algorithm colleagues that the inference program does not work anymore.

    How should we investigate next?

    That's exactly what i was suspecting. The commenting this API will stop running algorithm and so timer will be accurate. This does not look like the solution.

    Hi Xie,

    The above link also encounters similar problems to me, please refer to whether it helps us with our current problem

    Do you mean changing QoS for R5F is helping? We can also try QoS on mcu3_0, but this will give priority to mcu3_0 in accessing DDR, and i thought you are not running the code from DDR, isn't it?  

    Regards,

    Brijesh

  • and i thought you are not running the code from DDR, isn't it?

    Hi, Brijesh

    The Task Stack of MCU3_0 is allocated on DDR by default. When I try to change to OCM, it will not be able to run. So we only specify key functions such as Timer initialization and Timer ISR in OCM. So can I try to modify QoS, how to modify it?

    Regards,

    jialin xie

  • Hi xie,

    Which boot flow are you running? Are you using SBL or SPL bootflow? Then i will share the changes to enable QoS for mcu3_0/1 and lets see if it helps.

    Rgds,

    Brijesh

  • Hi, Brijesh

    We are using SPL;

    on the other hand, Can the task stack be assigned to OCM/TCM, or can it only run on DDR?

    Regards,

    jialin xie

  • Hi xie,

    Its definitely possible to move stack to OCM or TCM memory. We just need to make sure that this is not being used by other core.. Can you just use TCM memory and see if it works? 

    I will add QoS for mcu3 core in the SPL and share you changes.

    Regards,

    Brijesh

  • Hi xie, jia wentao,

    Is this on TD4VL? Because i dont think we still support QoS support on TDA4VL. Can you please refer to below faq to enable first QoS framework? I will add mcu3_0 on top of these patches. 

    https://e2e.ti.com/support/processors-group/processors/f/processors-forum/1214683/faq-processor-sdk-j721s2-how-to-enable-qos-for-dss-in-sbl-or-in-spl-boot-flow

    Regads,

    Brijesh

  • Hi, Brijesh

    Is this on TD4VL

    We're using TDA4VMid

    Regards,

    jialin xie

  • Hi, Brijesh

    We use tda4vm

    The tda4vm already includes modifications to the relevant patch.

    So you can provide us with a method to increase the QoS priority on mcu3-0 and mcu1-0.

    Regards

  • Hi jia wentao,

    Can you please apply attached patch on uboot folder, rebuild uboot and check it out? I have kept the same priority for mcu3 as mcu2. Lets first see if this helps or improves the situation. 

    /cfs-file/__key/communityserver-discussions-components-files/791/QoS_5F00_For_5F00_Mcu3.patch

    Regards,

    Brijesh

  • Hi,Brijesh

    Good news, after testing, the timer cycle is accurate. What side effects will this modification have?

    Can you provide another patch to modify the QoS priority of mcu1-0? We also have similar issues with mcu1-0.

    Regards,

    jiawentao

  • Hi jia wentao,

    Can we please run it for longer duration, may be over night or something, to confirm it working? 

    Are you running your inference algorithm along with this change? Can you also please check if it is running fine, at correct fps and so on? This change gives priority to mcu3_0, i think it is small amount of traffic, but still will have priority. and this kind of tells that the issue seems to be related to BW. 

    I will share another patch for mcu1_0.

    Regards,

    Brijesh

  • Hi, Brijesh

    At present, colleagues in the inference algorithm have preliminarily determined that the inference algorithm is not a problem.

    Yes, the inference algorithm uses the maximum image frame rate and size, so I think it will affect BW

    We will test it longer.

    Looking forward to mcu1_ 0's patch

    Regards,

    Jiawentao