TDA4VE-Q1: CSIRX image fragmentation issue

Part Number: TDA4VE-Q1

Tool/software:

Hi expert,

Our customer find serious image fragmentation issue on TDA4VE as the below video shown. Could you please help check how to debug?

SDK:

ti-processor-sdk-rtos-j721s2-evm-08_06_01_03

Camera:

4*1920*1536@25fps

Issue describtion:

1. 4 cameras connected to CSI-RX and TDA4VE output the video to cookpit thru ethernet. We can find serious image fragmentation in about 5 secs.

2. We find capture statistic, CRC/ECC/OVERFLOW keeps 0 in MCU2_0, but frame drop count keeps increasing;

3. DDR statistic as below:

Test and Result:

1. If only 1 camera is connected, there is no issue;

2. We try to increase the buffer of capture node from 4 to 6; frame drop count almost disappear, and there is some improvement to this issue, but cannot totally fix it;

capture statistic as below:

3.  If we disable BEV fastray and infopost in C7, there will be no issue;

4. We try to modify the CSIRX driver config as below, and reproduce probability is decreased from 5s/1time to 1minute/1time;

file

function

change

pdk_j721s2_08_06_01_03/packages/ti/drv/csirx/csirx.h

increase chCfgPrms->rxChParams.addrType = TISCI_MSG_VALUE_RM_UDMAP_CH_ATYPE_NON_COHERENT;

chCfgPrms->rxChParams.busOrderId  = 12U;

5. We try to follow suggestions in below E2E link to modify CSIRX driver. But the error is reported.

https://e2e.ti.com/support/processors-group/processors/f/processors-forum/1308436/tda4vm-8m-camera-run-report-incomplete-frame-error/4989762?tisearch=e2e-quicksearch&keymatch=UDMA_CH_TYPE_RX_UHC#4989762

file

function

change

packages/ti/drv/csirx/csirx.h

Csirx_chCfgInit

add "chCfgPrms->rxChParams.addrType = TISCI_MSG_VALUE_RM_UDMAP_CH_ATYPE_NON_COHERENT;" line at the end of the function.

Change  below statement

        UdmaChRxPrms_init(&chCfgPrms->rxChParams, UDMA_CH_TYPE_RX);

     to

        UdmaChRxPrms_init(&chCfgPrms->rxChParams, UDMA_CH_TYPE_RX_UHC);

packages/ti/drv/udma/src/udma_ch.c

UdmaChUtcPrms_init

change below statement

           utcPrms->addrType       = TISCI_MSG_VALUE_RM_UDMAP_CH_ATYPE_PHYS;

  to 

          utcPrms->addrType       = TISCI_MSG_VALUE_RM_UDMAP_CH_ATYPE_NON_COHERENT;

//TISCI_MSG_VALUE_RM_UDMAP_CH_ATYPE_PHYS;

packages/ti/drv/csirx/src/csirx_drvUdma.c

CsirxDrv_setChUdmaParams

change below statement 

                chType = UDMA_CH_TYPE_RX;

 to

                chType = UDMA_CH_TYPE_RX_UHC;

After above modification, capture node report below error when MCU2_0 start up. 

About the issue that UDMA_CH_TYPE_RX_HC and UDMA_CH_TYPE_RX_UHC channel allocation fail, customer look into Udma_rmAllocRxHcCh() and Udma_rmAllocRxUhcCh() and find that numRxHcCh and numRxUhcCh are 0.

Customer traced the code and found that it uses Udma_rmGetSciclientDefaultBoardCfgRmRange() to Query all the resource ranges from the Sciclient Default BoardCfg. if customer wants to use RX hc and uhc channel, what config need to do?

Best Regards,

Xingyu Zhu

  • Hi Xingyu,

    In SDK08.06 release, there was no support for QoS setting on TDA4AE/VE devices. So can you please apply the attached settings from below link and try it again? Also please keep the order id to be 12 in the CSIRX.h header file. 

    https://e2e.ti.com/support/processors-group/processors/f/processors-forum/1214683/faq-processor-sdk-j721s2-how-to-enable-qos-for-dss-in-sbl-or-in-spl-boot-flow

    Regards,

    Brijesh

  • Hi Brijesh,

    Thanks for your reply. Our customer has integrated this QoS patch at the beginning. They also have tried to enable more QoS, but both no improvement.

     In pdk_j721s2_08_06_01_03/packages/ti/drv/csirx/csirx.h

    Change (chCfgPrms->rxChParams.busOrderId  = 12U,  chCfgPrms->rxChParams.addrType = TISCI_MSG_VALUE_RM_UDMAP_CH_ATYPE_NON_COHERENT;)

    static void J721S2_SetupQoS()

    {

        setup_navss_nb();

        /* Workaround to unblock PDK-8359 .

         * setup_main_r5f_qos() results in crashing the UDMA DRU examples on

         * mcu2_0(with SBL uart boot mode) during CSL_REG64_WR(&pRegs->DRUQueues.CFG[queueId], regVal);

         * Hence commenting out the following. */

        setup_main_r5f_qos();

        setup_mcu_r5f_qos();

        setup_vpac_qos();

        setup_dmpac_qos();

        setup_dss_qos();

        /* setup_gpu_qos(); */

        setup_encoder_qos();

    }

    Could you please help to check the possible reason? And do you have any suggestions for test item5? Thanks

  • Hi Xingyu,

    Can we please readback 0x03702010 and 0x03703010 and confirm that value are set to 0x2 and 0x4 respectively? 

    Most likely, CSIRX is detecting the short/long frame error. In order to confirm it, can we please add below code in ti-processor-sdk-rtos-j721s2-evm-08_06_00_11/tiovx/kernels_j7/hwa/capture/vx_capture_target.c file, in tivxCaptureDequeueFrameFromDriver API.

    chId = tivxCaptureGetNodeChannelNum(
    prms,
    instIdx,
    fvid2Frame->chNum);

    if (FVID2_FRAME_STATUS_COMPLETED != fvid2Frame->status)
    {
        VX_PRINT(VX_ZONE_ERROR,
            " CAPTURE: ERROR: Short/Long Frame Detected for ChId%d!!!\n", chId);
    }

    It essentially means CSIRX is not able to write fast enough. Can you try changing busOrderId in the CSIRX driver to value 15? 

    chCfgPrms->rxChParams.busOrderId  = 15U

    Regards,

    Brijesh

  • Hi Brijesh,

    Can we please readback 0x03702010 and 0x03703010 and confirm that value are set to 0x2 and 0x4 respectively? 

    root@imotion:/mnt# ./devmem2 0x03702010
    /dev/mem opened.
    Memory mapped at address 0xffff83573000.
    Read at address 0x03702010 (0xffff83573010): 0x00000002
    root@imotion:/mnt# ./devmem2 0x03703010
    /dev/mem opened.
    Memory mapped at address 0xffff8c1c7000.
    Read at address 0x03703010 (0xffff8c1c7010): 0x00000004

    In order to confirm it, can we please add below code

    Below is the log.

    [MCU2_0] 25.340497 s: VX_ZONE_ERROR:[tivxCaptureDequeueFrameFromDriver:889] CAPTURE: ERROR: Short/Long Frame Detected for ChId0!!!
    [MCU2_0] 25.340577 s: VX_ZONE_ERROR:[tivxCaptureDequeueFrameFromDriver:889] CAPTURE: ERROR: Short/Long Frame Detected for ChId1!!!
    [MCU2_0] 25.340640 s: VX_ZONE_ERROR:[tivxCaptureDequeueFrameFromDriver:889] CAPTURE: ERROR: Short/Long Frame Detected for ChId2!!!
    [MCU2_0] 25.340700 s: VX_ZONE_ERROR:[tivxCaptureDequeueFrameFromDriver:889] CAPTURE: ERROR: Short/Long Frame Detected for ChId3!!!
    [MCU2_0] 28.036739 s: VX_ZONE_ERROR:[tivxCaptureDequeueFrameFromDriver:889] CAPTURE: ERROR: Short/Long Frame Detected for ChId0!!!
    [MCU2_0] 28.036853 s: VX_ZONE_ERROR:[tivxCaptureDequeueFrameFromDriver:889] CAPTURE: ERROR: Short/Long Frame Detected for ChId1!!!
    [MCU2_0] 28.036919 s: VX_ZONE_ERROR:[tivxCaptureDequeueFrameFromDriver:889] CAPTURE: ERROR: Short/Long Frame Detected for ChId2!!!
    [MCU2_0] 28.036991 s: VX_ZONE_ERROR:[tivxCaptureDequeueFrameFromDriver:889] CAPTURE: ERROR: Short/Long Frame Detected for ChId3!!!

    Can you try changing busOrderId in the CSIRX driver to value 15? 

    Customer has tried different RX and TX busOrderId combination and the test result is as below:

    RX ID TX ID issue frequency
    15 8 1times/min
    12 8 1times/min
    12 12 6times/min
    8 8 30times/min
    8 12  30times/min
  • It essentially means CSIRX is not able to write fast enough

    Customer find we can adjust burstsize in pdk_j721s2_08_06_01_03/packages/ti/drv/udma/src/udma_ch.c UdmaChRxPrms_init(), but need to enable HC or UHC channel first.

    Customer try to add some print in the UDMA driver in Kernel and find HC and UHC channel is not 0. But when they try to config CSIRX channel to UDMA_CH_FLAG_HC or UDMA_CH_FLAG_UHC, Udma_rmAllocRxHcCh() will be allocated fail since numRxHcCh. You can find the detail in test item5.

    So does it mean we need to enable CSIRX UDMA UHC channel first?

  • Hi Xingyu,

    CSIRX internally uses BCDMA channels, so i really doubt it can use HC/UHC channels of the UDMA. 

    Since you are using Short/Long frame errors from the CSIRX, it means CSIRX is not able to write the data fast enough and so you are seeing artifacts on the display. 

    I dont understand on TX ID in the above table. Which parameter are you changing here? 

    Essentially we should just need to change rxChPrms.busOrderId to value more than 10. This should help. 

    What all other components are running in the system? Is it possible to reproduce this issue on EVM?

    Regards,

    Brijesh

  • Hi Brijesh,

    Thanks for your reply.

    CSIRX internally uses BCDMA channels, so i really doubt it can use HC/UHC channels of the UDMA. 

    BCDMA has 16 normal channels, so it seems HC/UHC cannot be used;

    I dont understand on TX ID in the above table. Which parameter are you changing here? 

    Customer try to modify TX ID in pdk_j721s2_08_06_01_03/packages/ti/drv/csitx/src/csitx_drvInit.c Csitx_chCfgInit() chCfgPrms->txChParams.busOrderId

    What all other components are running in the system?

    Customer is running some ADAS-related function. The running applications include

    1. Capture, LDC/MSC

    2. An NV12-to-tensor operator running on C7x_2

    3. BEV processing divided into three stages, distributed across C7x_1 and C7x_2.

    Is it possible to reproduce this issue on EVM?

    Customer has checked this. It is very hard to port their own application to EVM. 

  • Besides, in the CSIRx driver, after enabling CSIRX_DRV_ENABLE_DEBUG, customer observed a phenomenon while reproducing Short/Long Frame errors. When an error frame occurs, the timestamp difference between the current error frame and the previous frame is not the expected 40ms (camera frame rate is 25fps). What could be the possible reason?

    As shown in the figure below, frame status 0x1 indicates FVID2_FRAME_STATUS_COMPLETED, and frame status 0x4 indicates FVID2_FRAME_STATUS_ERROR. The timestamps (TS) for four consecutive frames are 61659, 61699, 61729, and 61779. The timestamp difference for the error frame is 61729 - 61699 = 30ms, which is less than the expected 40ms.

    Below is the log.

    mcu20_csirx_debug.txt

  • Hi Xingyu,

    Its surprising that you are getting error frames within 30ms. That shouldn't have happened. 

    But the main issue is, we shouldn't have seen these errors. 

    There is a parameter in the CSIRX node to use multiple pixels at a time, can you please try changing it? Can you set numPixels in tivx_capture_inst_params_t structure to value 1 and see if it helps?

    I am also reviewing other changes and see if there is any more change required. 

    Regards,

    Brijesh

  • Hi Brijesh,

    Thanks for your reply. Customer has tried changing the parameter and tested twice. No obvious improvement can be found and the error frame still exist.

    1. vision_apps/modules/src/app_capture_module.c configure_capture_params() add captureObj->params.instCfg[id].numPixels = 1U;
    2. pdk_j721s2_08_06_01_03/packages/ti/drv/csirx/csirx.h Csirx_instCfgInit()
    change
    modCfgPrms->numPixelsStrm0 = (uint32_t)0U;
    to
    modCfgPrms->numPixelsStrm0 = (uint32_t)1U;

  • Hi Xingyu,

    ok, frame drop is not an issue, but the main issue is, it is reporting short/long frame errors. 

    I want to make one more change in the CSIRX driver and see if it helps. I will share the changes tomorrow. 

    Could you please confirm that OCM memory in MCU domain is available for CSIRX to use? I mean are you using SBL or BootApp? They both use OCM memory. 

    Regards,

    Brijesh

  • Hi Brijesh.,

    I mean are you using SBL or BootApp

    Customer is using SBL;

    Could you please confirm that OCM memory in MCU domain is available for CSIRX to use?

    Customer OCM usage is as below:

  • Hi Xingyu,

    Do they also use Main Domain OCM memory?

    Regards,

    Brijesh 

  • Hi Brijesh,

    Do they also use Main Domain OCM memory?

    We are not very sure. Customer checked the system_memory_map and listed all the items related with OCMRAM and OCRAM.

  • Hi Xingyu,

    I asked because i see sections allocated in OCM memory like "Vector" "Common" and "NoCache", so wondering if Main Domain OCM memory is also used.

    I am assuming that that it is not and will use it to store some internal data of CSIRX driver.

    Can you also pleas confirm that customer is using only CSIRX instance0 and they are using it for capturing 4 channel/camera data? Because this memory is small, it may not be sufficient to fit complete data for all instances and channels.

    Regards,

    Brijesh

  • Hi Brijesh,

    Can you also pleas confirm that customer is using only CSIRX instance0 and they are using it for capturing 4 channel/camera data?

    Yes, customer is using CSIRX0 for capturing 4 camera data (1920x1536). 

  • Thanks Xingyu, i will share you the changes in CSIRX driver on SDK08.06 to try it out in a day or two. 

    Regards,

    Brijesh

  • Hi Xingyu,

    Can you please attached patches on PDK drivers and Vision apps and re-run the usecase with the changes? These changes moves the CSIRX descriptors to internal memory. Lets see if this change helps in fixing this issue. 

    /cfs-file/__key/communityserver-discussions-components-files/791/PDK_5F00_Moved_5F00_CSIRX_5F00_Desc_5F00_To_5F00_Internal_5F00_Mem.patch

    /cfs-file/__key/communityserver-discussions-components-files/791/Vision_5F00_Apps_5F00_Moved_5F00_CSIRX_5F00_Desc_5F00_To_5F00_Internal_5F00_Mem.patch

    Regards,

    Brijesh

  • Hi Brijesh,

    Thanks for your reply.

    Following the patch provided, after moving the csirx desc related data to OCRAM, testing revealed that image-related initialization and streaming are not working correctly. Customer collected information:
    1. The execution result of app_status_monitor.out. There is no output after applying the OCRAM patch;

    app_status_monitor_output_csirx_desc.txt

    app_status_monitor_output_original.txt

    2. The mcu20 bootlog. After applying the OCRAM patch, the app gets stuck in the function ownGraphNodeKernelInit() during the execution of vxVerifyGraph() in tiovx/source/framework/vx_graph_verify.c, which prevents the camera’s initialization and streaming from proceeding;

    mcu20_bootlog_csirx_desc.txt

    mcu20_bootlog_original.txt

    3. vx_app_rtos_linux_mcu2_0.out.map, which corresponds to the compilation result of mcu20.

    vx_app_rtos_linux_mcu2_0.out.map-original

    vx_app_rtos_linux_mcu2_0.out.map-csirx_desc

  • Based on the results of app_status_monitor.out, it appears that about 99% of the original 512KB L3 memory on mcu20 remains unused, and the 256KB available during initialization might be insufficient. However, the csirx desc requires around 190KB (0x30900) and needs at least 192KB to be allocated, so the available resources are a bit tight.

  • Hi Xingyu,

    Yes, its bit tight, but it can still fit, isn't it? If it does not, it would have given the compilation error. Since there is compilation error, it is fitting in the memory. but why CSIRX stopped working? Any change in the sensor/SERDES settings along with this patch? Can we please connect to mcu2_0 using CCS and see where it is getting stuck during initialization? 

    Regards.

    Brijesh

  • dear expert,

              Yes, the memory resource can fit it, no compilation error occured, but mcu2_0 get stuck which we have not figured it out. 

              By the way, NO change in the sensor/SERDES settings along with this  ocram patch.
              We did a test,  change L3_MEM_SIZE from 512KB to 256KB (not add section csirx_desc_mem), mcu2_0 works fine.So L3_MEM_SIZE 256kb should be enough.
              After applying the OCRAM patch,  the app (running in A core) gets stuck in the function ownGraphNodeKernelInit() during the execution of vxVerifyGraph() in tiovx/source/framework/vx_graph_verify.c, which prevents the camera’s initialization and streaming from proceeding. We guess something goes wrong with mcu2_0. We are looking into it  and add more debug info, since we are not quite familiar with CSS tool. 
               
                

  • dear expert,
          After applying the OCRAM patch,  all the file we need to update to the borad is  vx_app_rtos_linux_mcu2_0.out , not need any other file? 

  • Hi zhao li,

    This requires to just change mcu2_0 firmware, as this OCM memory is not used on mcu2_1 or anywhere else. Can you please share the map file for mcu2_0? 

    Regards,

    Brijesh

  • Dear expert,
               I reproduce the stuck issue with demo vx_app_multi_cam.out. After applying the OCRAM patch,  app vx_app_multi_cam.out  also  gets stuck in the function app_verify_graph()  during the execution of vxVerifyGraph() in vision_apps/apps/basic_demos/app_multi_cam/main.c, log listed below:
              



            This will be helpfull for you to reproduce the stuck issue after applying the OCRAM patch.

    Regards,
    Zhao Li

  • Hi Zhao Li,

    ok, let me try it on EVM first and share you the changes. 

    Regards,

    Brijesh

  • Dear expert,
           After applying the OCRAM patch, mcu2_0  gets stuck in CSL_udmapCppi5SetDescType(pTrpd, descType)  , right at the beginning of 

    CsirxDrv_udmaRxTrpdInit(),pdk_j721s2_08_06_01_03/packages/ti/drv/csirx/src/csirx_drvUdma.c。

    Regards,
    Zhao Li
  • Hi Zhao Li,

    That's surprising. Can you please check the pTrpd address and see if it is pointing to correct OCM memory? Also can you please check from the CCS memory window if this area is accessible? Most likely this area should be mapped, but if not, can we please add this mapping in the MPU? 

    Regards,

    Brijesh

  • Dear expert,
              "Can you please check the pTrpd address and see if it is pointing to correct OCM memory?Also can you please check from the CCS memory window if this area is accessible?"
                    Yes,address is in the range of {0x60040000, 0x60080000},can not read or write.
                I tried the following patch,  change mem addr translate size to MAIN_OCRAM_MCU2_0_SIZE, which can fix mcu2_0 stuck issue due to OCRAM patch.
               

            A new problem has emerged,demo vx_app_multi_cam.out get stuck in vxGraphParameterDequeueDoneRef() after run vxGraphParameterEnqueueReadyRef 4 times. I add debug “ GT_0trace(CsirxTrace, GT_ERR, "CsirxDrv_udmaCQEventCb receive dma completion event\r\n");” in CsirxDrv_udmaCQEventCb() pdk_j721s2_08_06_01_03/packages/ti/drv/csirx/src/csirx_drvUdma.c, and only see debug only print 4times。So I guess mcu2_0 do NOT receive UDMA_EVENT_TYPE_DMA_COMPLETION event.

    Regards,
    Zhao Li

  • Hi Zhao Li,

    Yes, it needs to be mapped and since it is mapped, we need to do it bit differently. When it is mapped, only mcu2_0 can use this mapped address, but for dma engine, we need to use physical address, so this change required in the driver.. 

    When descriptor is submitted to the Ring and when Ring is used for the channel, both of these places require to be physical actual address. Can you try making these changes in CSIRX driver and see if it helps? 

    Regards,

    Brijesh

  • Dear expert, 
         We have tried several ways in the last few days,like adding memory map in arch/arm64/boot/dts/imotion_d4/k3-j721s2-rtos-memory-map.dtsi in linux kernel,but can not make mcu2_0 and dma engine both work file。
         Can you help to share the full changes including previous OCRAM patch?


    Regards,
    Zhao Li

  • Hi Zhao Li,

    Sure, let me try it out on EVM and give you the exact patches. 

    Regards,

    Brijesh