This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

AM62P: The ads6311 radar camera has lost frame capture

Part Number: AM62P

Tool/software:

We used the v4l2 framework to capture the image when debugging the ads6311. Found that there will be frame drop. We further analyzed and found that memcpy copy 3932160 bytes, requiring about 30ms. So the dma rate is suspected to be too slow.

Fullscreen
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
ti_csi2rx0: ticsi2rx@30102000 {
compatible = "ti,j721e-csi2rx";
dmas = <&main_bcdma_csi 0 0x5000 15>, <&main_bcdma_csi 0 0x5001 15>,
<&main_bcdma_csi 0 0x5002 15>, <&main_bcdma_csi 0 0x5003 15>;
dma-names = "rx0", "rx1", "rx2", "rx3";
reg = <0x00 0x30102000 0x00 0x1000>;
power-domains = <&k3_pds 182 TI_SCI_PD_EXCLUSIVE>;
#address-cells = <2>;
#size-cells = <2>;
ranges;
status = "disabled";
cdns_csi2rx0: csi-bridge@30101000 {
compatible = "cdns,csi2rx";
reg = <0x00 0x30101000 0x00 0x1000>;
clocks = <&k3_clks 182 0>, <&k3_clks 182 3>, <&k3_clks 182 0>,
<&k3_clks 182 0>, <&k3_clks 182 4>, <&k3_clks 182 4>;
clock-names = "sys_clk", "p_clk", "pixel_if0_clk",
"pixel_if1_clk", "pixel_if2_clk", "pixel_if3_clk";
phys = <&dphy0>;
phy-names = "dphy";
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

We changed the priority to 15。

dmas = <&main_bcdma_csi 0 0x5000 15>, <&main_bcdma_csi 0 0x5001 15>,
            <&main_bcdma_csi 0 0x5002 15>, <&main_bcdma_csi 0 0x5003 15>;

After the change, we can't capture the data, the data is 0。What method can dma accelerate。

version:09.02.01.10

vision tiam62p5 linux6.1.83 memcpy耗时测试.txt
Fullscreen
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
[2022-04-28 17:44:01 920.392][v4l2halapi_cap][callback_process][313]:/dev/video2 copy size 1280 cost: 0 ms 560 us.
[2022-04-28 17:44:01 958.194][v4l2halapi_cap][callback_process][313]:/dev/video0 copy size 3932160 cost: 37 ms 115 us.
[2022-04-28 17:44:01 959.249][v4l2halapi_cap][callback_process][313]:/dev/video3 copy size 1280 cost: 0 ms 510 us.
[2022-04-28 17:44:01 994.379][v4l2halapi_cap][callback_process][313]:/dev/video1 copy size 3932160 cost: 35 ms 370 us.
[2022-04-28 17:44:02 041.317][v4l2halapi_cap][callback_process][313]:/dev/video2 copy size 1280 cost: 0 ms 835 us.
[2022-04-28 17:44:02 072.225][v4l2halapi_cap][callback_process][313]:/dev/video0 copy size 3932160 cost: 30 ms 110 us.
[2022-04-28 17:44:02 072.900][v4l2halapi_cap][callback_process][313]:/dev/video3 copy size 1280 cost: 0 ms 760 us.
[2022-04-28 17:44:02 101.438][v4l2halapi_cap][callback_process][313]:/dev/video1 copy size 3932160 cost: 28 ms 155 us.
[2022-04-28 17:44:02 143.028][v4l2halapi_cap][callback_process][313]:/dev/video2 copy size 1280 cost: 0 ms 255 us.
[2022-04-28 17:44:02 173.251][v4l2halapi_cap][callback_process][313]:/dev/video0 copy size 3932160 cost: 30 ms 825 us.
[2022-04-28 17:44:02 174.039][v4l2halapi_cap][callback_process][313]:/dev/video3 copy size 1280 cost: 0 ms 320 us.
[2022-04-28 17:44:02 202.637][v4l2halapi_cap][callback_process][313]:/dev/video1 copy size 3932160 cost: 28 ms 715 us.
[2022-04-28 17:44:02 245.204][v4l2halapi_cap][callback_process][313]:/dev/video2 copy size 1280 cost: 0 ms 335 us.
[2022-04-28 17:44:02 275.930][v4l2halapi_cap][callback_process][313]:/dev/video0 copy size 3932160 cost: 30 ms 955 us.
[2022-04-28 17:44:02 276.780][v4l2halapi_cap][callback_process][313]:/dev/video3 copy size 1280 cost: 0 ms 920 us.
[2022-04-28 17:44:02 305.340][v4l2halapi_cap][callback_process][313]:/dev/video1 copy size 3932160 cost: 28 ms 335 us.
[2022-04-28 17:44:02 346.875][v4l2halapi_cap][callback_process][313]:/dev/video2 copy size 1280 cost: 0 ms 720 us.
[2022-04-28 17:44:02 376.990][v4l2halapi_cap][callback_process][313]:/dev/video0 copy size 3932160 cost: 29 ms 320 us.
[2022-04-28 17:44:02 377.672][v4l2halapi_cap][callback_process][313]:/dev/video3 copy size 1280 cost: 0 ms 240 us.
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

please help!thank!

  • Fullscreen
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    [ 0.000000] Booting Linux on physical CPU 0x0000000000 [0x410fd034]
    [ 0.000000] Linux version 6.6.32 (cd7475193@67f185e364dc) (aarch64-oe-linux-gcc (GCC) 11.4.0, GNU ld (GNU Binutils) 2.38.20220708) #41 SMP PREEMPT Tue Dec 17 07:26:06 UTC 2024
    [ 0.000000] KASLR disabled due to lack of seed
    [ 0.000000] Machine model: Texas Instruments AM62P5 SK
    [ 0.000000] earlycon: ns16550a0 at MMIO32 0x0000000002800000 (options '')
    [ 0.000000] printk: bootconsole [ns16550a0] enabled
    [ 0.000000] efi: UEFI not found.
    [ 0.000000] [Firmware Bug]: Kernel image misaligned at boot, please fix your bootloader!
    [ 0.000000] Reserved memory: created CMA memory pool at 0x00000000b0000000, size 144 MiB
    [ 0.000000] OF: reserved mem: initialized node linux,cma, compatible id shared-dma-pool
    [ 0.000000] OF: reserved mem: 0x00000000b0000000..0x00000000b8ffffff (147456 KiB) map reusable linux,cma
    [ 0.000000] Reserved memory: created DMA memory pool at 0x000000009b500000, size 3 MiB
    [ 0.000000] OF: reserved mem: initialized node rtos-ipc-memory@9b500000, compatible id shared-dma-pool
    [ 0.000000] OF: reserved mem: 0x000000009b500000..0x000000009b7fffff (3072 KiB) nomap non-reusable rtos-ipc-memory@9b500000
    [ 0.000000] Reserved memory: created DMA memory pool at 0x000000009b800000, size 1 MiB
    [ 0.000000] OF: reserved mem: initialized node mcu-r5fss-dma-memory-region@9b800000, compatible id shared-dma-pool
    [ 0.000000] OF: reserved mem: 0x000000009b800000..0x000000009b8fffff (1024 KiB) nomap non-reusable mcu-r5fss-dma-memory-region@9b800000
    [ 0.000000] Reserved memory: created DMA memory pool at 0x000000009b900000, size 15 MiB
    [ 0.000000] OF: reserved mem: initialized node mcu-r5fss-memory-region@9b900000, compatible id shared-dma-pool
    [ 0.000000] OF: reserved mem: 0x000000009b900000..0x000000009c7fffff (15360 KiB) nomap non-reusable mcu-r5fss-memory-region@9b900000
    [ 0.000000] Reserved memory: created DMA memory pool at 0x000000009c800000, size 1 MiB
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
    dmesg log

  • Hello Xiangxu,

    Thanks for reporting this issue. We will be investigating it.

    Meanwhile, the best practices for building an image processing pipeline usually avoid memory copy by the CPU. Even though setting ASEL to 15 can speed up memcpy, it has some side effects, as explained in this FAQ: Why is memcpy so slow on AM6x when copying data from a V4L2 buffer? Therefore the best solution for you case may be to optimize your system such that memcpy is avoided.

    Regards,

    Jianzhong

  • Hi Xiangxu,

    I just checked with the HW team and found out that ASEL=14 or 15 is not supported on AM62P. So you'll have to optimize your application to avoid using memcpy.

    Regards,

    Jianzhong

  • hi jianzhong:

    So what else can you do to improve dma speed?
    Is there a memory consistency buffer specified by hardware when dma request buffer is supported? Now our application scenario requires frequent operations based on the raw data in the buffer after dma is moved

  • At the same time, when ASEL=14 or 15 is not turned on, we count the dma abnormal frame rate flag in the dma callback and find that there will be a dma error frameproblem

    j721e-csi2rx.c file

    Fullscreen
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    static void ti_csi2rx_dma_callback(void *param,
    const struct dmaengine_result *result)
    {
    struct ti_csi2rx_buffer *buf = param;
    struct ti_csi2rx_ctx *ctx = buf->ctx;
    struct ti_csi2rx_dma *dma = &ctx->dma;
    unsigned long flags = 0;
    /*
    * TODO: Derive the sequence number from the CSI2RX frame number
    * hardware monitor registers.
    */
    buf->vb.vb2_buf.timestamp = ktime_get_ns();
    buf->vb.sequence = ctx->sequence++;
    spin_lock_irqsave(&dma->lock, flags);
    WARN_ON(!list_is_first(&buf->list, &dma->submitted));
    if (0 == (ctx->ok_frame_cnt + ctx->ng_frame_cnt))
    {
    ctx->rx_start = ktime_get();
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

    Fullscreen
    1
    2
    3
    4
    5
    root@am62pxx-evm:/run/media/data-mmcblk0p5/output/bin# cat /sys/devices/platform/bus@f0000/30102000.ticsi2rx/csi_rx_status
    stream[0] ok: 352, ng: 2, total_fps: 1, ok_fps: 1
    stream[1] ok: 278, ng: 78, total_fps: 1, ok_fps: 1
    stream[2] ok: 353, ng: 0, total_fps: 1, ok_fps: 1
    stream[3] ok: 355, ng: 0, total_fps: 1, ok_fps: 1
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

  • So what else can you do to improve dma speed?

    It's not the DMA speed that needs to be improved. You're doing memcpy through CPU, which is slow due to cache maintenance, as explained in the FAQ mentioned earlier.

    You'll need to optimize your application to avoid memcpy. For example, if you use the Gstreamer framework to stream from the camera to a display, there will be zero memcpy. 

    At the same time, when ASEL=14 or 15 is not turned on, we count the dma abnormal frame rate flag in the dma callback and find that there will be a dma error frameproblem

    I assume this happened when you had the memcpy of 3932160 bytes. Can you test if you still see the DMA error if you don't do the memcpy?

    Thank you.

    Jianzhong

  • I assume this happened when you had the memcpy of 3932160 bytes. Can you test if you still see the DMA error if you don't do the memcpy?

    Thank you.

    don't quite understand the logic, but stopping the copy means that the output of the stream is cut off。the probability will occur in the middle of this frame error, error can continue to flow

  • Can you try something like "v4l2-ctl --stream-mmap" and see if it can operate at the sensor frame rate? That should rule out any issue with the CSI Rx driver (including DMA).

  • Here is our table of frame rates and data sizes in various formats

    1. Hi,

    As per today's discussion, we align that below is the customer setup and issue is faced during data transfer between 2 memory location in DDR (DDR_M1 to DDR_M2 ).

    Customer reported issue is that data transfer between DDR_M1 to DDR_M2 is slow.

    Expetced data transfer rate is (40-80M) per 30ms.

    Next Steps:

    1. Freddy Please confirm if the Issue captured above is the right issue
    2. Not clear why customer not using DMA instead of memcpy() ??
    3. Provide Memory to Memory data transfer (40M normal datd not CSI data) example
    4. Share MPU settings of the used memory in data transfer
    5. Data in "DDR_M2" is used by some RTOS based application running on another core ?
      1. if not, then what is shared memory location and shared between which cores in your setup ? 

    Regards

    Ashwani

  • Thanks Freddy, As per discussion in followup meeting, here is the customer setup:

    • Each frame has low (some bytes only) data but frame rate is high
    • Customer triggering DMA data transfer after receiving multiple frames
      • What is frequency of DMA trigger?
      • Small data and high frequency results into penalty ( in terms of DMA channel re-initialization)  

    Some points to consider on customer side:

    • DMA_BUF should be cacheable
    • Can try with cache_inv() after memcpy()
    • Can carve out Heap memory
    • Linux user space DMA transfer (character copy) example
      • Will be a big effort
    •  To use userspace buffer with scatter-gather dma to see if it helps achieve good performance
      • TI already shared patch for this on 20-Dec

    Regards

    Ashwani

  • Hi Xiangxu,

    To speed up memcpy and buffer exchanges, please enable software based in-kernel cache maintenance operations by making necessary changes in kernel and user-space application as shared in below steps : 

    1) Apply attached patch /cfs-file/__key/communityserver-discussions-components-files/791/0001_2D00_media_2D00_ti_2D00_j721e_2D00_csi2rx_2D00_Allow_2D00_passing_2D00_cache_2D00_hints_2D00_from.patch to ti-linux-kernel which enables passing cache hints from user-space.

    2) Make below changes in your v4l2 user-space application :

         i) Pass V4L2_MEMORY_FLAG_NON_COHERENT while requesting buffers (VIDIOC_REQBUFS) from v4l2 as shared in below snippet:

    Fullscreen
    1
    2
    3
    4
    5
    6
    7
    rb.count = nbufs;
    rb.type = dev->type;
    rb.memory = dev->memtype;
    + rb.flags = V4L2_MEMORY_FLAG_NON_COHERENT;
    ret = ioctl(dev->fd, VIDIOC_REQBUFS, &rb);
    if (ret < 0) {
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
        

      ii) Pass V4L2_BUF_FLAG_NO_CACHE_CLEAN while en-queuing buffers (VIDIOC_QBUF) from v4l2 as shared in below snippet:

    Fullscreen
    1
    2
    3
    4
    5
    buf.index = index;
    buf.type = dev->type;
    buf.memory = dev->memtype;
    buf.flags = V4L2_BUF_FLAG_NO_CACHE_CLEAN;
    ret = ioctl(dev->fd, VIDIOC_QBUF, &buf);
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

    You may refer attached patch /cfs-file/__key/communityserver-discussions-components-files/791/0001_2D00_yavta_2D00_hack_2D00_Pass_2D00_cache_2D00_hints_2D00_from_2D00_user_2D00_space_2D00_for_2D00_capt.patch while making above changes.

    3) To summarize the observations, we basically see a 5x improvement in memcpy time with suggested changes as shared below :

     i) Original average time to copy 614400 bytes was 5.193 ms :
      

    Fullscreen
    1
    2
    3
    yavta -c200 -F/run/capture -s 640x480 -f UYVY /dev/video2 > /run/1.txt
    tail /run/1.txt | grep avg
    avg: 0.005193
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

    With above changes, time to copy 614400 bytes reduced to 1.085 ms :

    Fullscreen
    1
    2
    3
    4
    yavta -c200 -F/run/capture -s 640x480 -f UYVY /dev/video2 > /run/2.txt
    tail /run/2.txt | grep avg
    avg: 0.001085
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

    Kindly let us know if any issues faced.

    Regards

    Devarsh