This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

[TDA4M] Why doesn't it seem like "Multi CH DDR 1MB to DDR 1MB" is faster than "1CH DDR 1MB to DDR 1MB" ?

Hello

I clicked "resolved" button by mistake on previos question.

    - LINK : http://e2e.ti.com/support/processors/f/791/t/837816

I attated summarized DRU and UDMA throughput again.

I recieved a message from Sivaraj

   

I'm sorry, I can't understand the first answer exactly. The second answer is OK.

From what I understand about the 1st answer :

    - UDMA and DUR can do 1 task on 1 channel. (If they do 2 tasks, it needs 2 channel.)

    - In other words, If DMAs(UDMA, DRU) use 2 channel, there are 2 tasks.

    - In the 1st answer, cache is loated in MSMC. I attached MSMC block diagram(Figure 8-3 in TRM) as well.

    - If DRU does "a" task to transfer 1MB from/to DDR, datapath would be DDR(Read) -> cache(in MSMC) -> DDR(Write).

    - But If DRU does "b" task to transfer 2MB from/to DDR before "a" task is completed, cache miss will occur in "a" task.

    - DMA will read again to store on cache and then write to DDR. An interrupt will occur at that time as well.

    - Hence, DRU transfer using multi channel is not efficiency and CPU loas could be high because of a lot of interrupts.

Could you please check above thinsg that I've understood ?

And It's really helpful for me to expain more details.

Best regards

Yongsig.

  • Hi,

    I think you got confused when I mentioned "task" in my response.

    The task that I mentioned is the BIOS software task that the UDMA unit test application creates to schedule the DMA operation.

    And the cache miss is the program and data cache misses when running multiple "SW" tasks in a CPU. This is will happen because of context switching and the data/stack memory is different for each of the task

    This is not the DMA scheduling inside DRU/UDMA across channel. When multiple channel transfer is done, the DRU/UDMA will interleave the transfer based on priority and then kind of happen in pseudo parallel.

    Hope this clarifies

    Regards

    Sivaraj R

  • Thank you for your reply.
    I think "Task" and "Context swtiching" are OK.
    I will appreciate it if you could check it out again.
       1. Do you mean one task is the same as one API ?

       2. A DRU or UDMA can transfer data using multi-channel or single channel in one API. Right ?
           (like udma_chaining_test in pdk)

       3. "Cache Tag Bank_x" located in MSMC is not data cache but tag memory for tagging the cache.
       4. In summarized DRU throuput table, "Multi CH DDR 1MB to DDR 1MB" has so high CPU Load.
           I think the reason is that there are a lot of interruption and context switching.
           But I was wondering why there are much more context switching when DRU is used than UDMA is used?
    Best regards,
    Yongsig.
  • 1. Task here mean BIOS task. Kindly refer BIOS userguide or generic OS concept to understand task. A task can contain multiple API sequence

    2. To support Multi-channel you need to open/create multiple DMA channels. This could be from one task or multiple tasks. It's up to you how you need to manage in your application. In the test application, we have created one task per channel so that they can run asynchronously.

    3. Yes

    4. There is more context switch because there are more interrupts in DRU as DRu throughput is higher.

  • Thanks Sivaraj.

    I think task, context switching and cache miss are OK.

    Finally, I have a question.

    In previous reply, cuould you explain more details about as follows ?

         - "In case of DRU, the DDR and CPU was already loaded"

         - "In case of UDMA, the DDR was not fully utilized with 1CH"

    Best regards

    Yongsig.

  • The DRU in the system can give very high throughput with DDR as well as with MSMC RAM. This is how the System is designed. Hence we get maximum DDR throughput and the DDR will be fully loaded. Since the data transfer is faster with DRU, 1MB transfer gets completed faster and the CPU is interrupted at a higher rate. Hence we get higher CPU load.

    The UDMA transfer by design doesn't provide maximum throughput by design for single channel. We get better throughput when multiple channels are used. Even then it won't be as good as the DRU transfer.