This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

DM8148/8127: EDMA3 data sorting (missing transfer completion)

Hello TI' teams

I'm works with Dm8148 and customer board. I need perform algorithm on C64x sub core and for performance increasing must be implement data sorting by EDMA.
My image has XGA resolution: 1024 pixels x 768 lines, where single pixel has 16 bits width, so line pitch = 2048 bytes.
This image splitted on 96 slices: 1024 pixels x 8 lines of each slice for perform processing into SRAM.

Source image slice from SDRAM (1024 elements in single line, 8 lines, 16 bits per element):

L[0]P[0], L[0]P[1], L[0]P[2], L[0]P[3], ... L[0]P[1023]
L[1]P[0], L[1]P[1], L[1]P[2], L[1]P[3], ... L[1]P[1023]
L[2]P[0], L[2]P[2], L[2]P[2], L[2]P[3], ... L[2]P[1023]
....
L[7]P[0], L[7]P[2], L[7]P[2], L[7]P[3], ... L[7]P[1023]

must be trasnposed to destination slice in SRAM/L2 (16 elements in single line, 512 lines, 16 bits per element):

L[0]P[0], L[0]P[1], L[1]P[0], L[1]P[1], L[2]P[0], L[2]P[1], L[3]P[0], L[3]P[1], ... , L[7]P[0],   L[7]P[1]
L[0]P[2], L[0]P[2], L[1]P[2], L[1]P[3], L[2]P[2], L[2]P[3], L[3]P[2], L[3]P[3], ... , L[7]P[2],   L[7]P[3]
...................................................................................

where L[x]P[y]:
x - line number in slice
y - pixel number in line


I use CSL for EDMA programming with next parameters:

    VOLATILE UInt32 opt= CSL_EDMA_OPT_MAKE
         (
            CSL_EDMACC_OPT_ITCCHEN_ENABLE,
            CSL_EDMACC_OPT_TCCHEN_DISABLE,
            CSL_EDMACC_OPT_ITCINTEN_DISABLE,
            CSL_EDMACC_OPT_TCINTEN_ENABLE,
            chnlTcc,
            CSL_EDMACC_OPT_TCCMODE_NORMAL,
            CSL_EDMACC_OPT_FWID_8,
            CSL_EDMACC_OPT_STATIC_NORMAL,
            CSL_EDMACC_OPT_SYNCDIM_ABSYNC,
            CSL_EDMACC_OPT_DAM_INCR,
            CSL_EDMACC_OPT_SAM_INCR
         );

    param_channel[0] = opt;
    param_channel[1] = (UInt32)pSrc;
      /* BCNT: img size line in 32 bits = 512, ACNT - 32 bits */
    param_channel[2] = ((UInt32)(sizeX >> 1) << 16) | sizeof(Int32);
    param_channel[3] = (UInt32)pDst;
      /* DSTBIDX: 32 elements, SRCBIDX: 1 element */
    param_channel[4] = 0x00200001u;
      /* BCNTRLD: no reload, LINK: NULL-link */
    param_channel[5] = 0x0000FFFFu;
      /* DSTCIDX: 1 element,   SRCDIDX: 2048 */
    param_channel[6] = 0x00010000u | ((sizeX >> 1) << 4);
      /* CCNT: 8 frames */
    param_channel[7] = 0x00000008u;

Used EDMA physical channel (chnlTcc) = 0.
EDMA transfer of single slice (1024 pixels x 8 lines) manually triggered by set bit.0 into ESR.
EDMA transfer completion waited by polling bit.0 into IPR register.
After transfer completion - set bit.0 into ICR register.

Transfer to SRAM via L3 memory space - adds offset 0x30000000 to destination pointer.

I pay attention, that after successfull transfer of 2 slices waits function infinitely "plugged" on waits bit into IPR register:
....
 [c6xdsp ] img slice rotate from a8810c80 to 40800380 [line size = 1024 pixels]
 [c6xdsp ] wait for transfer completion of slice 0
 [c6xdsp ] transfer of slice 0 completed
 [c6xdsp ] img slice rotate from a8812c80 to 40800380 [line size = 1024 pixels]
 [c6xdsp ] wait for transfer completion of slice 1
 [c6xdsp ] transfer of slice 1 completed
 [c6xdsp ] img slice rotate from a8814c80 to 40800380 [line size = 1024 pixels]
 [c6xdsp ] wait for transfer completion of slice 2
 [c6xdsp ] transfer of slice 2 completed
 [c6xdsp ] img slice rotate from a8816c80 to 40800380 [line size = 1024 pixels]
 [c6xdsp ] wait for transfer completion of slice 3
....

Please let me know what is reason for this problem? Are tarnsfer parameters looks Ok?

Thank you very much and best regards.