Hi,
I'm using TMS320C6455 and trying to perform a matrix transposition. The original matrix has 512 lines and 16 elements per line. All elements are short type.
I tried both EDMA3 method and for-loop method, and both ways showed right transposition results. But EDMA3 is much slower than FOR-LOOP. (80000 cycles to 60000 cycles)
I don't know why EDMA3 is slower because it was much faster than for-loop in a block-move task.
Here are some code snippet I'm using. How can I improve the performance of EDMA3?
(1) EDMA3 configuration code:
int temp = 0;
DMA_EMCR = 0xFFFFFFFF;
DCHMAP0 = 0x0;
DMAQNUM0 = 0x0;
PaRAM0_OPT = 0x00900004;
PaRAM0_SRC = srcAddr;
PaRAM0_BCNT_ACNT = 0x00100002;
PaRAM0_DST = dstAddr;
PaRAM0_DSTBIDX_SRCBIDX = 0x04000002;
PaRAM0_BCNTRLD_LINK = 0xFFFF;
PaRAM0_DSTCIDX_SRCCIDX = 0x00020020;
PaRAM0_RSV_CCNT = 0x200;
DMA_EESR = 0x01;
DMA_ESR = 0x01;
while (temp == 0)
{
temp = (*(volatile int *) DMA_IPR) & 0x01;
}
DMA_ICR = 0x01;
(2) FOR-LOOP code:
for (i=0; i<512; i++)
{
for (j=0; j<16; j++)
{
newData[j*512+i] = oldData[i*16+j];
}
}
Thanks!
Ricky