This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

2D to 2D block transfer from image with EDMA3

Hello

I m new to DM6467. and  I m just writting simple code for 8x8 block tranasfer from one image to another. i write down the below code

 

ipDataPtr = (

unsigned char *) inBufs->descs[0].buf;

opDataPtr = (

unsigned char *) outBufs->descs[0].buf;

 

 

 

blockWidth = 8;

 

 

for

( i=0; i<elem.NumLines ; i += blockWidth) {

 

 

for

( j=0; j<elem.LineLength ; j+=blockWidth ) {

tin = ipDataPtr + (i*ipWdthStp + j); tout = opDataPtr + (i*opWdthStp + j);

VCA_ALGORITHMS_MT_doCopy2D2DBlock(obj->dmaHandle2D[0], tin , tout, &elem, blockWidth);

 

}

}

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Void VCA_ALGORITHMS_MT_doCopy2D1D(IDMA3_Handle dmaHandle, Void *in, Void *out, ElemStruct *elem)

{

 

ACPY3_Params params;

 

 

params.transferType = ACPY3_2D1D;

params.srcAddr = (

void *)in;

params.dstAddr = (

void *)out;

params.elementSize = (elem->LineLength);

params.numElements = (elem->NumLines);

params.numFrames = 1;

params.srcElementIndex = elem->srcLineOffset;

 

 

//params.dstElementIndex = elem->dstLineOffset;

params.srcFrameIndex = 0;

params.dstFrameIndex = 0;

 

 

/* Configure logical dma channel */

ACPY3_configure(dmaHandle, &params, 0);

 

 

 

/* Use DMA to fcpy input buffer into working buffer */

ACPY3_start(dmaHandle);

 

 

/* Check that dma transfer has completed before finishing "processing" */

 

 

while

(!ACPY3_complete(dmaHandle)) {

;

};

 

 

/*

* DeActivate Channel scratch DMA channels.

*/

 

 

// ACPY3_deactivate(dmaHandle);

}

Void VCA_ALGORITHMS_MT_doCopy2D2DBlock(IDMA3_Handle dmaHandle, Void *in, Void *out, ElemStruct *elem,

 

int

blockWidth)

{

 

ACPY3_Params params;

 

 

/*

* Activate Channel scratch DMA channels.

*/

 

 

// ACPY3_activate(dmaHandle);

 

 

 

/* Configure the logical channel */

params.transferType = ACPY3_2D2D;

params.srcAddr = (

 

void

*)in;

params.dstAddr = (

 

void

*)out;

params.elementSize = blockWidth;

params.numElements = blockWidth;

params.numFrames = 1;

 

//(elem->LineLength)*(elem->NumLines)/(blockWidth*blockWidth);

params.srcElementIndex = elem->srcLineOffset ;

params.dstElementIndex = elem->dstLineOffset ;

params.srcFrameIndex = 0;

params.dstFrameIndex = 0;

 

 

/* Configure logical dma channel */

ACPY3_configure(dmaHandle, &params, 0);

 

 

 

/* Use DMA to fcpy input buffer into working buffer */

ACPY3_start(dmaHandle);

 

 

/* Check that dma transfer has completed before finishing "processing" */

 

 

while

(!ACPY3_complete(dmaHandle)) {

;

};

 

 

/*

* DeActivate Channel scratch DMA channels.

*/

 

 

// ACPY3_deactivate(dmaHandle);

}

 

but surprisingly this code is working fine with 32 and 16 byte block transfer but its not working with 8x8 and 4x4 block.

i cant understand hows it happen.

 

can anybody tell me why that is happening.

thanks in advance.

 

  • Naresh850 said:
    this code is working fine with 32 and 16 byte block transfer but its not working with 8x8 and 4x4 block

    My first questions are:

    "working fine with 32 and 16 byte block transfer" - Exactly what does this mean? Are you doing these with the 1D1D transfer function above?

    "not working with 8x8 and 4x4 block" - What is not working? Does nothing get transferred? Not enough? To the wrong destination? From the wrong source?

    May I assume the image array is single byte objects?

    And additional questions:

    What is in the elem struct?

    What is the value of ipWdthStp?

    What does the call look like that is "working fine with 32 and 16 byte block transfer"?

    Sorry I do not have immediate answers. Nothing jumps out as wrong.

    If you set a breakpoint at the call to ACPY3_start(dmaHandle);, can you look at the PARAM for the QDMA channel to see what it looks like?

  • Thanks RandyP

     

    "working fine with 32 and 16 byte block transfer" - Exactly what does this mean? Are you doing these with the 1D1D transfer function above?

    No, it is 2D2D transfer and working fine means code is running well. I can see the vidoe frame on vlc. I m copying video frame from input buffer to output buffer by using EDMA. It is a test code for EDMA. when I copying 16x16 and 32x32 its working. but when my blockwidth is 8 or 4 its not working. 

    "not working with 8x8 and 4x4 block" - What is not working? Does nothing get transferred? Not enough? To the wrong destination? From the wrong source?

    I thnik code is stuck anywhere. I didnt get video on my vlc display. 

    May I assume the image array is single byte objects?

    yes image is single byte object and its a 704x576.

    What is in the elem struct?

    elem.LineLength = obj->frameWidth;// 704

    elem.NumLines = obj->frameHeight;//576

    elem.srcLineOffset = obj->frameWidth;

    elem.dstLineOffset = obj->frameWidth;

     

     

    ipWdthStp = elem.LineLength;

     

    //ipImgPtr->widthStep;

    opWdthStp = elem.LineLength;

     

     

     

    If you set a breakpoint at the call to ACPY3_start(dmaHandle);, can you look at the PARAM for the QDMA channel to see what it looks like?

    No I cant put the breakpoint over there. I m using OS to test this code.

    so it that any alingnment issue? what value can  be for SAM and DMA in OPT fileld of PARAM?

    thanks in advance. 

    regards,

    Naresh

      

     

     

  • Naresh,

    First, welcome to the E2E forum. I just noticed that you have now made two posts to the forum. I hope you find the TI support resources acceptable. Please also keep in mind the search capability within the E2E forum, and the app notes and user guides on www.ti.com, and the helpful articles on the TI Wiki Pages. The E2E forum has provided a lot of answers and support for members like yourself, and additional support comes when you provide the solutions that you eventually find as you continue to work on your application. Your participation here is appreciated.

    Second, there is a feature that I use when replying to emails to copy from the posting to which I am replying. In the window above your reply box, you can simply select text there and click the red "Quote" link, and that text will be copied into your reply. Honestly, I think the way you formatted my questions with the underlines is a more efficient use of space, but it may have been more difficult or more steps. Since you are new to the forum, I get to make suggestions. Soon you will be explaining these things to me.

    Naresh850 said:
    It is a test code for EDMA. when I copying 16x16 and 32x32 its working. but when my blockwidth is 8 or 4 its not working.

    I am a DSP guy and not a Linux guy, so I tend to do all my debugging in CCS with breakpoints. You are more modern, and I salute that. My instinct tells me that you have reached a point where the overhead for initiating the block transfers is too much to allow the DM6467 to keep up with the video frame rate. There is a fixed time required to setup each transfer, as written above, and if the number of bytes transferred is too few, then the overhead plus transfer time will be greater than the available time for the frame of data.

    Naresh850 said:
    I thnik code is stuck anywhere. I didnt get video on my vlc display.

    I believe that printf's are the way to do debug in the OS environment, so you may want to add a printf every frame or every 60 frames or something to let you know if the transfers are completing as expected or if the program is locked up waiting for a transfer to complete. Unfortunately, "stuck" and "no video" are not necessarily the same thing.

    Naresh850 said:
    so it that any alingnment issue?

    The EDMA does not really have any alignment issues. It is very good at transferring data from any alignment and any length. If there are exceptions, I cannot think of them right now.

    Naresh850 said:
    what value can  be for SAM and DMA in OPT fileld of PARAM?

    Both SAM and DAM must be 0 for INCR mode is almost all cases. The CONST mode is only for a limited set of transfers on internal buses that support this special mode. It has always been a source of confusion for these two fields, and I have not yet used the CONST mode after 5 years of using the EDMA3.

     

    If this answers your question, please click  Verify Answer  on this post; if not, please reply back with more information to help us answer your question.

  • Hi RandyP,

    thanks for ur prompt replay.

    As I am using OS with DM6467. not need to maintain video frame rate.  If my processing time is more than vidoe frame rate then its automatically skip the frame.

    and this fuction,VCA_ALGORITHMS_MT_doCopy2D1D,  is blocking call. but what i think there is exception occur.

    is that any alinment issue? my both the buffere is in external memory DDR2.

    regards,

    NAresh

     

     

     

  • Naresh850 said:
    is that any alinment issue? my both the buffere is in external memory DDR2.

    With EDMA3, I do not expect any alignment issues. And since the transfers are word-aligned at worst for the 4x4 case, that would not be a problem for the EDMA3.

    It may be best for you to post a new thread on the Embedded Software - Linux Forum to get comments from the experts there.

  • Hi RandyP,

    I want to use DMA to transfer 32 bit at a time then 8 bit.  Is it possible to configure DMA to transfer 32bit then 8 bit (4byte instead of one byte) ?

    If it is possible then how? how to configure the DMA for this type of transfer? what are the changes I have to make for that?

    Can you please explain me in brief tooo?

    Thanks in advance.

    Regards,

    Naresh

     

       

     

  • I need a picture to understand what you are asking for. Most likely, what you want to do is possible because EDMA3 is so programmable and flexible.

    If you have an input array of 128 bytes and an output array of 128 bytes, both arranged as 8 rows of 16 bytes each as an example. The values in the input array have values of 00 through 7F. Which values do you want to transfer first, then second, then third, and so on, and to where do you want those bytes to go in the output array?

    00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
    10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F
    20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F
    30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F
    40 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
    50 51 52 53 54 55 56 57 58 59 5A 5B 5C 5D 5E 5F
    60 61 62 63 64 65 66 67 68 69 6A 6B 6C 6D 6E 6F
    70 71 72 73 74 75 76 77 78 79 7A 7B 7C 7D 7E 7F

    This will help me understand what you want to do.

    Regards,
    RandyP

  • ok good.
     
    I want to use DMA that transfer 4 byte at time to transfer 128 bytes.
     
     
    look at the below code. It transfer 4 byte a time not a single byte. I want to use DMA on that way. DMA can transfer 4 byte as a chunk of data at a time. so it takes less time to transfer whole 128 bytes.
    is that possible?
     
    can we configure element size of DMA is just 4 and num of elements is as usal 8 and transfer whole 128 byte in chunk of 4 byte packet? 
     
     
     
    unsinged int *in;
     
    unsinged int *out;
     
    in = ( unsigned int *)inbf;
    out  = ( unsigned int *)outbf
     
     
    for (i = 0; i < 8; i ++){
     
        for (j = 0; j < 16; j +=4){
        
            out[i][j] = in[i][j];
    }
    }
     
     
    regards,
    Naresh
  • This is a very good way to describe the transfer you want to do, using the language of C. Thank you.

    The inner loop can be implemented in the EDMA3 by setting the following parameters:

    • ACNT=4, this defines that 4 bytes will be transferred from the dest addr to the src addr, this is called an array in EDMA3 terminology
    • BCNT=16, this defines the number of arrays of ACNT bytes that will be transferred, this group of BCNT arrays is called a frame in EDMA3 terminology
    • BSRCIDX = BDSTIDX = 4 * 4, this defines the distance in bytes from the first byte of one array to the first byte of the next array, in your C code this is ACNT * 'j' increment of 4 (j += 4)
    • CCNT = 8, this defines the number of frames of ACNT*BCNT bytes that will be transferred, this is called a block in EDMA3 terminology
    • CSRCIDX = CDSTIDX = ? because the C code does not define the span of the intended in/out arrays, but this is either the distance between the beginning of each frame (ABSync) or between the last array of a frame to the first array of the next frame (ASync)

    Multiple events must be used to trigger this complete transfer.

    These terms and examples of these transfers are included in the EDMA3 User's Guide. It will be helpful for you to reference portions of that User's Guide that are not clear to you. Section 3.2 on page 62 (sprueq5a for DM646x EDMA3) shows a subframe extraction example.

  • Thanks RandyP,

    I test your method with my test code which transfer entire frame inbuff to outbuff. Is it improve performance of DMA(DMA throughput)?

    look my previous setting is here:

    type : 2D2D

    src addr: (void *)in

    Dstaddr: (void *)out

    ACNT = 704

    BCNT = 576

    CCNT = 1

    DSTBIDX = 576;

    SRCBIDX = 576;

    DSTCIDX = SRCCIDX = 0

     

    and I did the new setting as per your guidlines:

     

     

    type : 2D2D

    src addr: (void *)in

    Dstaddr: (void *)out

    ACNT = 4

    BCNT =704

    CCNT = 576

    DSTBIDX = 4*176;

    SRCBIDX = 4*176; 

    DSTCIDX = SRCCIDX = 0

     

    but this setting is not work means I cant get video on display.  can you plz check and tell me whats the wrong with this new setting.

     

    regards,

    NAresh

  • What is wrong is that we are not at the same level of understanding of your application. I have not been helping you well because I do not know what you want to do with the video frames.

    You have been asking detailed and low-level questions about 8x8 and 32 bits per transfer and such, but that is how you want to do something and not what you want to happen for the whole frame.

    There is nothing wrong with your ACNT=704 / BCNT=576 / CCNT=1 configuration above. It will work well with the eventual QDMA execution that comes from the dataCopy2D2D call.

    The "new" configuration will not work for several reasons, in particular because CCNT must be 1 for QDMA transfers.

    If you want to simply copy a 576x704 video from from one place to another, your first configuration is the one to use.

    What is wrong with the ACNT=704 / BCNT=576 / CCNT=1 configuration? Why do you want to do subframe block copies like 8x8 or what you described in C code?

  • My intention was to use DMA efficiently. 

    Now quiestion is that what do you by efficiently, correct?  At a time DMA transfer data bytes from one frame to other, right?

    But the question is in what manner DMA transfer data? it transfer it byte by byte or half word by halfe word  or else it transfer word by word.

    if we configure dma such that it can transfer word by word then we efficiently use the external bus bandwidth and our ACNT is ruduce by 4 times. Right?

    I want to use DMA efficiently so data transfer time is reduced by 4.

    So is that possible with EDMA3?

    Previously I worked with ADI blackfin DSP BF561 (3 years) and I nicely use DMA transfer with word by word at a time and improve the performance by 4 times. 

    hope you understand.

    regards,

    Naresh

     

     

     

     

  • My intention was to use DMA efficiently. 

    Now quiestion is that what do you by efficiently, correct? 

    DMA transfer data bytes from one frame to other, right?

    But the question is that in what manner DMA transfered data? it transfer byte by byte or half word by halfe word  or else it transfer word by word.

    if we configure dma such that it can transfer word by word then we efficiently use the external bus bandwidth and our ACNT is ruduce by 4 times. Right?

    I want to use DMA efficiently in this way, so data transfer time is reduced by 4.

    So is that possible with EDMA3?

    Previously I worked with ADI blackfin DSP BF561 (3 years) and I nicely use DMA transfer with word by word at a time and improve the performance by 4 times. 

    hope you understand.

    regards,

    Naresh

     

     

     

     

  • You have to work hard to use the EDMA3 inefficiently. It will automatically make your transfers be as efficient as possible.

    If you want to transfer a full frame of data from one buffer to another, then use the ACNT=704 / BCNT=576 / CCNT=1 configuration.

    If you are writing a video compression algorithm that requires you to extract 8x8 macroblocks to operate on, then you will need to do something else.

    I think you just want to do the ACNT=704 / BCNT=576 / CCNT=1 configuration but you are worried that it is not efficient enough. It should be just fine. It will transfer all the bytes, but it will take care of optimizing to do the best possible job with the alignment and counts that you have supplied.