This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

SK-AM62P-LP: About memcpy speed

Part Number: SK-AM62P-LP
Other Parts Discussed in Thread: AM62P

Tool/software:

Hello TI team

 I'm using MCU SDK demo Display share , and I found the speed of memcpy is abnormal. the Code is below: 

        endMs = (ClockP_getTimeUsec() / 1000U);
        DebugP_log("Splash -> Gen 1 buffer: %u ms\r\n", (endMs - gBoot2SplashStartMs));
        {   // copy second flame 
            uint8_t *dst  = (uint8_t*)&gFirstPipelineFrameBuf[1];
            uint8_t *src  = (uint8_t*)&gFirstPipelineFrameBuf[0];
            size_t   bytes = 5529600U; // 1920*720*4 = 5,529,600
            memcpy(dst, src, bytes);
            /* Ensure DDR visibility for the second frame */
            CacheP_wb(dst, (uint32_t)bytes, CacheP_TYPE_ALLD);
        }
        endMs = (ClockP_getTimeUsec() / 1000U);
        DebugP_log("Splash -> Gen 2 buffer: %u ms\r\n", (endMs - gBoot2SplashStartMs));

And I got the log below :

Splash -> Gen 1 buffer: 264 ms
Splash -> Gen 2 buffer: 408 ms

That's means the memcpy about  5529600bit data used for about 140ms.

Why so slow?

  • Hi,
    To remove any variability introduced by other function calls apart from memcpy, can you please try something like:

    {   // copy second flame 
        uint8_t *dst  = (uint8_t*)&gFirstPipelineFrameBuf[1];
        uint8_t *src  = (uint8_t*)&gFirstPipelineFrameBuf[0];
        size_t   bytes = 5529600U; // 1920*720*4 = 5,529,600
        startMs = (ClockP_getTimeUsec() / 1000U);
        memcpy(dst, src, bytes);
        endMs = (ClockP_getTimeUsec() / 1000U);
        /* Ensure DDR visibility for the second frame */
        CacheP_wb(dst, (uint32_t)bytes, CacheP_TYPE_ALLD);
    }
    DebugP_log("Splash -> Gen 2 buffer: %u ms\r\n", (endMs - startMs));

    Also, is this value consistent over multiple frames?

  • Hello TI team

    I changed the code : get the log below:

    Splash -> Gen 2 buffer: 78 ms

    The modify point is  in DispApp_splashThread()   the second buffer is to memcpy not to gen another flame:

    /* Update frame buffers for the pipeline before starting display */
            DispApp_updateSplashFrameBuffer((void*)&gFirstPipelineFrameBuf[0], DISP_SPLASH_IMAGE_XPOSTION, \
                                    DISP_SPLASH_IMAGE_YPOSTION, DISP_SPLASH_IMAGE_WIDTH, \
                                    DISP_SPLASH_IMAGE_HEIGHT, DISP_BYTES_PER_PIXEL);
  • Are building the image in DEBUG mode or RELEASE mode?

  • It's Debug mode.

  • Please try with RELEASE mode, it is expected to reduce that latency.

  • It's working but not too much:

    before set to release (debug mode):

    ciserver Testapp Built On: Apr  3 2025 09:26:45
    Sciserver Version: v2025.04.0.0-REL.MCUSDK.K3.11.00.00.16+
    RM_PM_HAL Version: v11.00.07
    Starting Sciserver..... PASSED
    DispApp_init() - DONE !!!
    Display create complete!!锛
    Starting display ... !!!
    Display in progress ... DO NOT HALT !!!
    Splash -> Start to gen 1 Frame buffer 56 ms
    Splash -> Gen 1 buffer: 166 ms
    Splash -> memcpy 2 buffer: 140 ms
    Splash -> Gen 2 buffer: 355 ms
    Splash-> Start Fvid2 driver & first flame displayed: 358 ms
    DSS display share Passed!!
    [IPC RPMSG ECHO] Version: REL.MCUSDK.K3.11.00.00.16+ (Aug 27 2025 10:43:02):  

    Release Mode log below:

    Sciserver Version: v2025.04.0.0-REL.MCUSDK.K3.11.00.00.16+
    RM_PM_HAL Version: v11.00.07
    Starting Sciserver..... PASSED
    DispApp_init() - DONE !!!
    Display create complete!!锛
    Starting display ... !!!
    Display in progress ... DO NOT HALT !!!
    Splash -> Start to gen 1 Frame buffer 51 ms
    Splash -> Gen 1 buffer: 103 ms
    Splash -> memcpy 2 buffer: 140 ms
    Splash -> Gen 2 buffer: 322 ms
    Splash-> Start Fvid2 driver & first flame displayed: 325 ms
    DSS display share Passed!!
    [IPC RPMSG ECHO] Version: REL.MCUSDK.K3.11.00.00.16+ (Aug 27 2025 15:52:05):
    First Number of elapsed frames = 300, elapsed msec = 5005, fps = 59.94

    The generation or do RLE decode faster :110ms => 52ms;  memcpy time No Change: 140ms ;  but  CacheP_wb time increased  from 49ms  to 79ms.

    This's the log which print from:

     
                }
                endMs = (ClockP_getTimeUsec() / 1000U);
                DebugP_log("Splash -> Generation 1 frame buffer finish: %u ms\r\n", (endMs - gBoot2SplashStartMs));
                {   // copy second flame 
                uint8_t *dst  = (uint8_t*)&gFirstPipelineFrameBuf[1];
                uint8_t *src  = (uint8_t*)&gFirstPipelineFrameBuf[0];
                size_t   bytes = 5529600U; // 1920*720*4 = 5,529,600
                uint32_t startMs = (ClockP_getTimeUsec() / 1000U);
                memcpy(dst, src, bytes);
                endMs = (ClockP_getTimeUsec() / 1000U);
                /* Ensure DDR visibility for the second frame */
                DebugP_log("Splash -> memcpy 2 buffer time endMs-startMs: %u ms\r\n", (endMs - startMs));
                CacheP_wb(dst, (uint32_t)bytes, CacheP_TYPE_ALLD);
                }
                endMs = (ClockP_getTimeUsec() / 1000U);
                DebugP_log("Splash -> Gen 2 buffer finished: %u ms\r\n", (endMs - gBoot2SplashStartMs));
    It's very strange in DDR read or write speed, as my calculation the BW just 78.99 MB/s, it's much lower than DDR SK-AM62PX speed,
  • Hi,
    Let me look at it internally and get back to you by mid next week.

  • Hello  

    Did we have any progress in this issue?

  • Hi,

    I am working on this, please allow me some time to test this at my end.

    Best Regards,

    Meet.

  • Hi,

    I tested this at my end, and for the same number of bytes memcpy takes 71ms time for me, I will run some more tests to confirm this, but this is what I observed currently.

    One more thing to add here is that could you make startMs and endMs variable as type uint64_t, this is the return type for ClockP_getTimeUsec function. I am not sure whether this has any impact or not but you can test this once.

    Best Regards,

    Meet.

  • Thanks! wait for your further reply !

  • One more thing to add here is that could you make startMs and endMs variable as type uint64_t,

    Meanwhile I request you to try this once.

    Best Regards,

    Meet.

  • Yes,

    I changed to uint64

    Splash -> Generation 1 frame buffer finish: 552 ms
    Splash -> memcpy 2 buffer time: 119 ms, dst=93DCA000, src=93500000, size=7680 bytes
    Splash -> Gen 2 buffer finished: 683 ms

    no change. Is there some driver or device not init? or the memcpy need to customize to TI board?

  • Hi,

    Splash -> memcpy 2 buffer time: 119 ms, dst=93DCA000, src=93500000, size=7680 bytes

    I am getting the 71ms time for 5529600 bytes, In your code snippet I also see A DebugP_print log, CacheP_wb and some other operations are also included in this 140ms time, if you want to measure the accurate timing then you should take time stamps immediately before and after your memcpy instruction.

    memcpy is time expensive, you can use either DMA or Utils_memcpyWord if you think memcpy is taking too long for your system, usually you would use a graphics engine to modify certain pixels in the frame, but updating the whole frame after each vsync is supposed to be expensive.

    Could you elaborate what is the usecase here and what time you are expecting memcpy to be completed in?

    Best Regards,

    Meet.

  • Hello Pu Jia,

    The above  discussed points are valid.
    We did not use any Cache API or Debug API in the  MEMCPY operation during measurement.
    With the current Meet setup, we were able to transfer approximately 5 MB of data in ~71 msec.
    • If you are looking to further reduce this transfer time, the only viable option is to leverage DMA-based data movement.
    • For reference, in earlier tests on the AM62x device, the same 5 MB transfer took ~10 msec using DMA.
    • We recommend performing the same DMA-based test on the AM62P device to measure and validate the results in your setup.

    Regards,

    Anil.

  • Hello TI team

    So it's normal to do memcpy cpu based copy in am62px right? as I tired to use udma, it's about  40ms to copy 1920*1200*4 framebuffer.

    Is this normal or not?

  • Hello Pu Jia,

    There is no problem using memcpy on AM62P devices. However, for large transfers (for example 5 MB or 8.6 MB) memcpy will be noticeably slower than DMA, so we recommend using DMA for large blocks.

    On AM62x we’ve measured DMA throughput on the order of ~500 MB/s for bulk memory transfers. At that rate:
    • 5 MB → ~10.0 ms (decimal MB) / ~10.49 ms (5 MiB)
    • 8.6 MB → ~17.2 ms (decimal MB) / ~18.0 ms (8.6 MiB)
    So a transfer of ~8.6 MB completes in ~17–18 ms with DMA at ~500 MB/s.

    But your profiling see much larger values for the 8.6MB and I feel that you may not be measuring the right  method.

    Recommended DMA example sequence and measurement procedure:

    1. Start the DMA transfer (queue the descriptor / call the UDMA start API).
    2. Start your timer immediately before the UDMA queue/start.
    3. Stop the timer in the DMA completion callback after you perform cache invalidate.
    4. Repeat the measurement multiple times and take the median/percentiles to avoid single-run noise.

    Regards,

    Anil.

  • Hello TI team

    As my code,  it need about 40ms to copy 8.6M data.

     

    int32_t udma_memcpy_2d(void *dst, const void *src,
                           uint32_t line_bytes, uint32_t lines,
                           uint32_t src_stride, uint32_t dst_stride,
                           uint32_t timeout_ms)
    {
        if (g_ch == NULL || dst == NULL || src == NULL)
            return SystemP_FAILURE;
        if (line_bytes == 0u || lines == 0u)
            return SystemP_FAILURE;
        if (src_stride < line_bytes || dst_stride < line_bytes)
            return SystemP_FAILURE;
    
        CSL_UdmapTR15 *tr = (CSL_UdmapTR15 *)UdmaUtils_getTrpdTr15Pointer(g_trpd, 0U);
        fill_tr15_2d(tr, src, dst, line_bytes, lines, (int32_t)src_stride, (int32_t)dst_stride);
    
        uint32_t src_span = (lines - 1u) * src_stride + line_bytes;
        uint32_t dst_span = (lines - 1u) * dst_stride + line_bytes;
    
        CacheP_wb((void*)src, src_span, CacheP_TYPE_ALLD);
        CacheP_inv(dst, dst_span, CacheP_TYPE_ALLD);
    
        int32_t ret = queue_trpd_and_wait(g_trpd, timeout_ms);
        if (ret != SystemP_SUCCESS) return ret;
    
        CacheP_inv(dst, dst_span, CacheP_TYPE_ALLD);
        return SystemP_SUCCESS;
    }

    Could you please help to improve my code speed?

  • Hello Pu Jia,

    • On which core are you running the above code? (A53, DM R5F, or MCU R5F)

    • If it is running on the A53 core, please confirm the OS being used — Linux or baremetal?

    • From the code flow, it looks more like custom code rather than the standard MCU+ SDK examples. Can you confirm?

    • Are you building and running this in Release mode or Debug mode?

    • How exactly are you taking the performance measurements — where are you starting and stopping the timers?

    • In your test case, which timer are you using in the setup like PMU or Generic Timer ?

    Regards,

    Anil.

  • Hello Team

    This's DM R5 Core example.

    The example located in MCU driver/dss/dss share. I think it's FreeRTOS.

    The code is my customized memcpy function using UDMA.

    It's Release mode.

    I start /stop code please see the top of my post.

    I don't know the timer. just use function ClockP_getTimeUsec().

  • Hello Pu Jia,

    If you are using custom code, then it is difficult for me to review your code flow without knowing where exactly you are starting and stopping the timer.

    In the custom UDMA code, how are you differentiating the flow from the MCU+SDK reference flow? Please clarify that part.

    For measurement, I suggest you try using the MCU+SDK application below:
    • Start the timer before the UDMA_queraw() call.
    • Stop the timer after the CacheP_inv() call.

    This sequence will give proper results. I have also measured in this way, and the numbers are consistent.

    C:\ti\mcu_plus_sdk_am62px_11_01_00_16\examples\drivers\udma\udma_memcpy_interrupt\am62px-sk

    Regards,

    Anil.

  • Hello Team

    You can try to replace the code line 231 ~233  to my code: in void DispApp_splashThread(void *args)

                endMs = (ClockP_getTimeUsec() / 1000U);
                DebugP_log("Splash -> Generation 1 frame buffer finish: %u ms\r\n", (endMs - gBoot2SplashStartMs));
                {   // copy second flame 
                uint8_t *dst  = (uint8_t*)&gFirstPipelineFrameBuf[1];
                uint8_t *src  = (uint8_t*)&gFirstPipelineFrameBuf[0];
                size_t   bytes = 1920*1200*4U; 
                uint32_t startMs = (ClockP_getTimeUsec() / 1000U);
                memcpy(dst, src, bytes);
                endMs = (ClockP_getTimeUsec() / 1000U);
                /* Ensure DDR visibility for the second frame */
                DebugP_log("Splash -> memcpy 2 buffer time endMs-startMs: %u ms\r\n", (endMs - startMs));
                CacheP_wb(dst, (uint32_t)bytes, CacheP_TYPE_ALLD);
                }
                endMs = (ClockP_getTimeUsec() / 1000U);
                DebugP_log("Splash -> Gen 2 buffer finished: %u ms\r\n", (endMs - gBoot2SplashStartMs));

    and try.... it will cause 120ms to memcpy, that's  confirmed.
    So I tried to use the udma_cpy, it will take about 40 ms to copy about 8.7M data.
    That's why I tried to find help from your side.~~ below is my udma cpy main code area, could you please help to check why my code is much slower than yours?
    static inline void fill_tr15_linear(CSL_UdmapTR15 *pTr, const void *src, void *dst, uint32_t length)
    {
        /* Generate completion event; end-of-packet set. */
        pTr->flags    = CSL_FMK(UDMAP_TR_FLAGS_TYPE,          CSL_UDMAP_TR_FLAGS_TYPE_4D_BLOCK_MOVE_REPACKING_INDIRECTION);
        pTr->flags   |= CSL_FMK(UDMAP_TR_FLAGS_STATIC,        0U);
        pTr->flags   |= CSL_FMK(UDMAP_TR_FLAGS_EOL,           CSL_UDMAP_TR_FLAGS_EOL_MATCH_SOL_EOL);
        pTr->flags   |= CSL_FMK(UDMAP_TR_FLAGS_EVENT_SIZE,    CSL_UDMAP_TR_FLAGS_EVENT_SIZE_COMPLETION);
        pTr->flags   |= CSL_FMK(UDMAP_TR_FLAGS_TRIGGER0,      CSL_UDMAP_TR_FLAGS_TRIGGER_NONE);
        pTr->flags   |= CSL_FMK(UDMAP_TR_FLAGS_TRIGGER0_TYPE, CSL_UDMAP_TR_FLAGS_TRIGGER_TYPE_ALL);
        pTr->flags   |= CSL_FMK(UDMAP_TR_FLAGS_TRIGGER1,      CSL_UDMAP_TR_FLAGS_TRIGGER_NONE);
        pTr->flags   |= CSL_FMK(UDMAP_TR_FLAGS_TRIGGER1_TYPE, CSL_UDMAP_TR_FLAGS_TRIGGER_TYPE_ALL);
        pTr->flags   |= CSL_FMK(UDMAP_TR_FLAGS_CMD_ID,        0x00U);
        pTr->flags   |= CSL_FMK(UDMAP_TR_FLAGS_SA_INDIRECT,   0U);
        pTr->flags   |= CSL_FMK(UDMAP_TR_FLAGS_DA_INDIRECT,   0U);
        pTr->flags   |= CSL_FMK(UDMAP_TR_FLAGS_EOP,           1U);
    
        pTr->icnt0    = length;
        pTr->icnt1    = 1U;
        pTr->icnt2    = 1U;
        pTr->icnt3    = 1U;
        pTr->dim1     = (int32_t)pTr->icnt0;
        pTr->dim2     = (int32_t)(pTr->icnt0 * pTr->icnt1);
        pTr->dim3     = (int32_t)(pTr->icnt0 * pTr->icnt1 * pTr->icnt2);
        pTr->addr     = (uint64_t)Udma_defaultVirtToPhyFxn((void *)src, 0U, NULL);
    
        pTr->dicnt0   = pTr->icnt0;
        pTr->dicnt1   = 1U;
        pTr->dicnt2   = 1U;
        pTr->dicnt3   = 1U;
        pTr->ddim1    = (int32_t)pTr->dicnt0;
        pTr->ddim2    = (int32_t)(pTr->dicnt0 * pTr->dicnt1);
        pTr->ddim3    = (int32_t)(pTr->dicnt0 * pTr->dicnt1 * pTr->dicnt2);
        pTr->daddr    = (uint64_t)Udma_defaultVirtToPhyFxn((void *)dst, 0U, NULL);
    
        pTr->fmtflags = 0x00000000U;
    }
    
    static inline void fill_tr15_2d(CSL_UdmapTR15 *pTr,
                                    const void *src, void *dst,
                                    uint32_t line_bytes, uint32_t lines,
                                    int32_t src_stride, int32_t dst_stride)
    {
        pTr->flags    = CSL_FMK(UDMAP_TR_FLAGS_TYPE,          CSL_UDMAP_TR_FLAGS_TYPE_4D_BLOCK_MOVE_REPACKING_INDIRECTION);
        pTr->flags   |= CSL_FMK(UDMAP_TR_FLAGS_STATIC,        0U);
        pTr->flags   |= CSL_FMK(UDMAP_TR_FLAGS_EOL,           CSL_UDMAP_TR_FLAGS_EOL_MATCH_SOL_EOL);
        pTr->flags   |= CSL_FMK(UDMAP_TR_FLAGS_EVENT_SIZE,    CSL_UDMAP_TR_FLAGS_EVENT_SIZE_COMPLETION);
        pTr->flags   |= CSL_FMK(UDMAP_TR_FLAGS_TRIGGER0,      CSL_UDMAP_TR_FLAGS_TRIGGER_NONE);
        pTr->flags   |= CSL_FMK(UDMAP_TR_FLAGS_TRIGGER0_TYPE, CSL_UDMAP_TR_FLAGS_TRIGGER_TYPE_ALL);
        pTr->flags   |= CSL_FMK(UDMAP_TR_FLAGS_TRIGGER1,      CSL_UDMAP_TR_FLAGS_TRIGGER_NONE);
        pTr->flags   |= CSL_FMK(UDMAP_TR_FLAGS_TRIGGER1_TYPE, CSL_UDMAP_TR_FLAGS_TRIGGER_TYPE_ALL);
        pTr->flags   |= CSL_FMK(UDMAP_TR_FLAGS_CMD_ID,        0x00U);
        pTr->flags   |= CSL_FMK(UDMAP_TR_FLAGS_SA_INDIRECT,   0U);
        pTr->flags   |= CSL_FMK(UDMAP_TR_FLAGS_DA_INDIRECT,   0U);
        pTr->flags   |= CSL_FMK(UDMAP_TR_FLAGS_EOP,           1U);
    
        pTr->icnt0    = line_bytes;
        pTr->icnt1    = lines;
        pTr->icnt2    = 1U;
        pTr->icnt3    = 1U;
        pTr->dim1     = src_stride;
        pTr->dim2     = (int32_t)(src_stride * (int32_t)pTr->icnt1);
        pTr->dim3     = (int32_t)(src_stride * (int32_t)pTr->icnt1 * (int32_t)pTr->icnt2);
        pTr->addr     = (uint64_t)Udma_defaultVirtToPhyFxn((void *)src, 0U, NULL);
    
        pTr->dicnt0   = pTr->icnt0;
        pTr->dicnt1   = pTr->icnt1;
        pTr->dicnt2   = 1U;
        pTr->dicnt3   = 1U;
        pTr->ddim1    = dst_stride;
        pTr->ddim2    = (int32_t)(dst_stride * (int32_t)pTr->dicnt1);
        pTr->ddim3    = (int32_t)(dst_stride * (int32_t)pTr->dicnt1 * (int32_t)pTr->dicnt2);
        pTr->daddr    = (uint64_t)Udma_defaultVirtToPhyFxn((void *)dst, 0U, NULL);
    
        pTr->fmtflags = 0x00000000U;
    }
    int32_t udma_memcpy_2d(void *dst, const void *src,
                           uint32_t line_bytes, uint32_t lines,
                           uint32_t src_stride, uint32_t dst_stride,
                           uint32_t timeout_ms)
    {
        if (g_ch == NULL || dst == NULL || src == NULL)
            return SystemP_FAILURE;
        if (line_bytes == 0u || lines == 0u)
            return SystemP_FAILURE;
        if (src_stride < line_bytes || dst_stride < line_bytes)
            return SystemP_FAILURE;
    
        CSL_UdmapTR15 *tr = (CSL_UdmapTR15 *)UdmaUtils_getTrpdTr15Pointer(g_trpd, 0U);
        fill_tr15_2d(tr, src, dst, line_bytes, lines, (int32_t)src_stride, (int32_t)dst_stride);
    
        uint32_t src_span = (lines - 1u) * src_stride + line_bytes;
        uint32_t dst_span = (lines - 1u) * dst_stride + line_bytes;
    
        CacheP_wb((void*)src, src_span, CacheP_TYPE_ALLD);
        CacheP_inv(dst, dst_span, CacheP_TYPE_ALLD);
    
        int32_t ret = queue_trpd_and_wait(g_trpd, timeout_ms);
        if (ret != SystemP_SUCCESS) return ret;
    
        CacheP_inv(dst, dst_span, CacheP_TYPE_ALLD);
        return SystemP_SUCCESS;
    }
  • Hello ,

    To review and provide accurate feedback, we would kindly request the following:
    1. Full Project for Review
    Please share the complete project . With partial code, it is difficult for us to validate whether the timer is started/stopped at the correct locations or if there are hidden dependencies in your setup.
    2. Timer Placement
    • The timer should be started immediately before queuing the transfer (before queue_trpd_and_wait).
    • The timer should be stopped immediately after the cache invalidate (CacheP_inv) or  once the transfer completion is confirmed.
    This way, the measurement reflects the true transfer latency observed by the CPU.
    3. Clarification on queue_trpd_and_wait
     Could you please confirm if this function only returns after DMA completion, or does it support polling Mode or Interrupt Mode ? Sharing the implementation would help us confirm.
    We understand that 71 ms was observed for a ~5MB test case, but not for the 8.6 MB transfer.

    I am getting the 71ms time for 5529600 bytes, In your code snippet I also see A DebugP_print log, CacheP_wb and some other operations are also included in this 140ms time, if you want to measure the accurate timing then you should take time stamps immediately before and after your memcpy instruction

    Regards,

    Anil.

  • Yes, For 6M test 71ms and 8.6M should be 110ms.

    So we need to try udma to do this?

  • Yes , If you go with DMA almost you may get results below .

    • 5 MB →  ~10.49 ms 
    • 8.6 MB → ~18.0 ms 

    Regards,

    Anil.

  • Hello , Could you please share DMA copy code for me ? you can see my DMA copy code upper.... How to improve my code?

  • Hello Pu Jia,

    Please use the example below.

    C:\ti\mcu_plus_sdk_am62px_11_01_00_16\examples\drivers\udma\udma_memcpy_interrupt\am62px-sk

    Configure the Source and destination buffers to your required size.

    Before DMA starts, get the Timer T1 value.

    Call the Timer and get the T2 value after the CacheP_Inv call.

    Finally, do T2 - T1.

    The testing needs to be done in the release build.

    Please look at the image below to take the T1 and T2 values.

    Regards,

    Anil.

  • Hello

    the different of my code and example is I set the 

    CSL_UdmapTR15  *pTr

     

        pTr->icnt0    = line_bytes;  
        pTr->icnt1    = lines;

     

    Did you know the limitation of the 

    pTr->icnt0    = length?
  • Hello Pu Jia,

    The ICNT0 length is maximum 64KB only.

    So, for your requirement to transfer 5MB and 9MB, we need to configure icnt0 and inct1.

    Make sure that the testing needs to be done in the release build only .

    Regards,

    Anil.

  • Hello

    If I tried to use ICNT1/2/3 how can I do for the rest of data?

    It's Means If I tried to copy 64K*15+15K data, So I do ICNT0=64K, ICNT1=15 Plus ICNT=15K ICNT1=1?

  • Hello Pu Jia,

    Yes, the above method works.

    Each icnt value is maximum 64KB. So, accordingly, you can configure.

    Regards,

    Anil.

  • Hello

    The size of icnt maximum = 64K or 64K-1 ?

  • Hello Pu Jia,

    Did you face any problem with ICNT 0 = 64K ?

    The maximum size of the ICNT0 is 64K .

    Regards,

    Anil.

  • Sorry. My AI always told me that the count should be 64K-1. so I want to confirm it with you.

  • Hello Pu Jia,

    Please, you can also confirm with Tetsing.

    Configure icnt0 to 64K or 63K and confirm the maximum size limit in your testing.

    Regards,

    Anil.