I am comparing the EDMA data transfer performance between C64x and C66x, where in the C64x project direct CSL macros are used to modify the registers and in the C66x project, CSL functional layer driver calls are used, which is recommended by TI. In my case, I am running C6472 @ 700MHz with DDR2, and C6678 @ 1GHZ with DDR3.
Here is my driver design:
In C64x:
Transfer:
- directly setup PARAM set registers
- set the corresponding bit in ESR to trigger the transfer
Wait for completion:
- poll the corresponding bit in IPR register
- after transfer completes, set the corresponding bit in ICRH to clear the interrupt pending bit
In C66x:
Transfer:
- Create a local Param register struct (localParamSet) and set them up
- Call CSL_edma3MapDMAChannelToParamBlock() to map the DMA channel to the PARAM block
- Call CSL_edma3GetParamHandle() to obtain a handle to the PARAM set
- Call CSL_edma3ParamSetup() to copy localParamSet to the actual PARAM set registers
- Call CSL_edma3DMAChannelEnable() to enable the channel
- Call CSL_edma3SetDMAChannelEvent() to trigger the transfer
Wait for completion:
- poll the corresponding bit by calling CSL_edma3GetHwStatus(), cmd = CSL_EDMA3_QUERY_INTRPEND
- Call CSL_edma3HwControl() with cmd = CSL_EDMA3_CMD_INTRPEND_CLEAR to clear the interrupt pending bit
- Call CSL_edma3ClearDMAChannelEvent() to clear channel event
- Call CSL_edma3DMAChannelDisable() to disable the channel
I have realized that the overhead of the C66x driver is quite significant compared to the C64x, especially for small size data transfer (smaller than 32KB). And also, the performance is not as fast as I expected while I have a faster DSP @ 1GHz and faster DDR memory.
In the C66x test project, I'm using CC1 only. For the LL2 to DDR test, 1KB transfer elapse time: 0.618 us, but the function overhead to set up the PARAM set: 0.942us. The overhead takes longer time than the transfer.
I am wondering if I have missed anything in my setup that causes this, or there is any way I can speed it up, or there is nothing I can do about it. And in the latter case could we conclude that we shouldn't use EDMA for small size data transfer (like 32KB or less) but using memcpy instead?
Thanks,
-- Louis