• Join
  • Sign In with my.TI Login
Texas Instruments
  • Products
  • Applications
  • Tools & Software
  • Support & Community
  • Sample & Buy
  • About TI
Sample & Purchase Cart Sample & Purchase Cart
  • Search
  • Advanced
TI E2E™ Community
  • Support Forums
  • Blogs
  • Groups
  • Videos
  • 简体中文
  • More ...
TI Home » TI E2E Community » Support Forums » Embedded Software » Linux » Linux forum » Scheduling Issue: High Priority Thread's Venc1_process() call fails to pre-empt Low Priority Thread's memory transfer loop
Share
Linux
  • Forum
Options
  • Subscribe via RSS

Forums

Scheduling Issue: High Priority Thread's Venc1_process() call fails to pre-empt Low Priority Thread's memory transfer loop

This question is not answered
Andrew Muehlfeld
Posted by Andrew Muehlfeld
on Dec 07 2011 17:16 PM
Intellectual540 points

I am encountering a scheduling issue with software based on the DM6467 h.264 encode demo.  A call to Venc1_process() in a higher priority thread (thread 1) waits for a memory transfer loop in a lower priority thread (thread 2).

 

In my application, thread 2 can take as long as it wants, with absolutely no interest in speed.  Thread 1 is crucial.  My only wild guess is that the compiler is doing something clever with my for loop, preventing the higher priority thread from prempting it.  Any ideas?  Thanks. 

 

 

****************** Excerpt from thread 1 (SCHED_FIFO Priority = MAX - 1 ***********************************

            if (Venc1_process(hVe1, hCcvOutBuf, hDstBuf) < 0) {
                ERR("Failed to encode video buffer\n");
                cleanup(THREAD_FAILURE);
            }

**************************************************************************************************************************

 

********************Within the Venc1_process() call, defined in Videnc1.c , it stops at this line  ***************

    /* Encode video buffer */
    status = VIDENC1_process(hVe->hEncode, &inBufDesc, &outBufDesc, &inArgs,
                             &outArgs);

**************************************************************************************************************************

 

******************** Excerpt from thread 2 (SCHED_FIFO Priority = MAX - 5 **********************************


        printf("Begin Semi-planar to Planar format conversion\n");
        //Reformat data
        for(k = 0; k < imgBufSize/4; k ++)
        {
            //Copy Cb data
            imgIn422PBufP[imgBufCPOff + k] = imgInBufP[imgBufCPOff + (k*2)];

            //Copy Cr data
            imgIn422PBufP[imgBufCPOff + imgBufCROff + k] = imgInBufP[imgBufCPOff + (k*2) + 1];
        }
        printf("Completed Semi-planar to Planar format conversion\n");

***********************************************************************************************************

codec engine DaVinci DMAI Linux codec DM6467T
Report Abuse
  • Reply
You have posted to a forum that requires a moderator to approve posts before they are publicly available.
All Replies
  • Robert Tivy
    Posted by Robert Tivy
    on Dec 07 2011 17:52 PM
    Genius11090 points

    Andrew Muehlfeld
    A call to Venc1_process() in a higher priority thread (thread 1) waits for a memory transfer loop in a lower priority thread (thread 2).

    Is this a remote codec?  If so, do you know that the VIDENC1_process() call isn't just stuck waiting for a return from the remote core?

    If you remove the low-pri thread loop, does the VIDENC1_process() call return normally?

    Does the VIDENC1_process() call return after the loop is done?

    Sorry for the basic questions, just need to set the table correctly in order to get to the bottom of the issue.

    Regards,

    - Rob

     

    Report Abuse
    • Reply
    You have posted to a forum that requires a moderator to approve posts before they are publicly available.
  • Andrew Muehlfeld
    Posted by Andrew Muehlfeld
    on Dec 08 2011 09:32 AM
    Intellectual540 points

    Q: Is this a remote codec?  If so, do you know that the VIDENC1_process() call isn't just stuck waiting for a return from the remote core?

    A: Yes, this is a remote codec.  VIDENC1_process() is called from the ARM on a DM6467T.  It is calling TI's h264enc v01.20.02, using the DMAI interface.  The h264enc codec runs on the DSP core.  It's certainly possible that VIDENC1_process() call is waiting for a return from the remote core.  At first glance, that seems pretty likely.  If that's the case, the question becomes: why doesn't the remote core return immediately, as it does without the loop?  The VIDENC1_process() call is part of a 30 frames per second video encoding application.  It stalls for about one second during the memory transfer loop, noted both by watching the video, and from debug printf() statements.  Are there some resources VIDENC1_process() needs that may be used by the loop?  Are there certain conditions under which a remote call cannot be made?

    Q: If you remove the low-pri thread loop, does the VIDENC1_process() call return normally?

    A: Yes. If I remove the loop, the VIDENC1_process() call returns normally.

    Q: Does the VIDENC1_process() call return after the loop is done?

    A: Yes, the VIDENC1_process() call returns after the loop is done.

    Report Abuse
    • Reply
    You have posted to a forum that requires a moderator to approve posts before they are publicly available.
  • Satish Arora
    Posted by Satish Arora
    on Dec 09 2011 13:48 PM
    Prodigy670 points

    Two things that come to my mind

    1) The compiler producing a tight loop code for second thread and hence not allowing interrupts. To confirm this can you try to compile the second thread with debug and without optimization flags.

    2) The second is keeping DDR lot more busy and hence delaying the encode which also needs the same DDR bandwidth. The buffers that second thread is operating on - whats their size and by any chance are these non-cached buffers?

    Thanks,

    Satish

    Report Abuse
    • Reply
    You have posted to a forum that requires a moderator to approve posts before they are publicly available.
  • Andrew Muehlfeld
    Posted by Andrew Muehlfeld
    on Dec 09 2011 16:21 PM
    Intellectual540 points

    Thanks for your thoughts Satish.  I think I have a lot left to learn here...

    1) I compiled with debugging and no optimization flags, with no change in results.

    2) Yes, the buffers in the second thread are non-cached.  Both buffers are DMAI buffers created with Buffer_create(), using the default values for memParams(type = Memory_CONTIGPOOL, flags = Memory_NONCACHED, align = Memory_DEFAULTALIGNMENT, seg = 0).  Both buffers are 21073920 bytes (~20MB).

    Here is my understanding of cached vs. non-cached buffers.  Please correct me.

    Using non-cached buffers in this case increases time for the loop to complete, since each byte is read and written individually to DDR2, rather than reading and writing whole cache lines at a time.  The reason DMAI buffers deafult to non-cached has something to do with the ability to pass buffers between the ARM core and the DSP core.  I'm confused, however.  I thought Codec Engine handled all cache management requirements, for the very purpose of allowing the ARM and DSP cores to share cached buffers.  I even encountered a problem where one of my non-cached buffers became corrupted because I wasn't calling XDM_SETACCESSMODE_WRITE(outBufs->descs[0].accessMask).  What am I missing?

    I changed the two buffers in the second thread to cached by setting gfxAttrs.bAttrs.memParams.flags = 0; prior to calling Buffer_create().  This increased the speed of the loop's completion, but the 1st thread still stalls while the 2nd thread's loop completes.

    3) I don't know exactly what the compiler does when you turn on debugging.  I tried forcing the loop to give up the CPU by adding usleep(1) as the last statement in the loop.  This allowed the 1st thread to run without interruption, which is my ultimate goal, but then it takes the loop several minutes to complete.  When I originally said thread 2 had no speed requirements, I meant that it could take a few seconds, not a few minutes.

    4) What else can I try to determine whether the loop is blocking interrupts, or the two threads are fighting for memory bandwidth?

    5) Is there any way to monitor DDR2 bandwidth?

    6) Should I be using VDCE for this memory operation, instead of c code on the ARM?

    Report Abuse
    • Reply
    You have posted to a forum that requires a moderator to approve posts before they are publicly available.
  • Chris Ring
    Posted by Chris Ring
    on Dec 09 2011 16:35 PM
    Genius16190 points

    Andrew Muehlfeld
    I'm confused, however.  I thought Codec Engine handled all cache management requirements, for the very purpose of allowing the ARM and DSP cores to share cached buffers.  I even encountered a problem where one of my non-cached buffers became corrupted because I wasn't calling XDM_SETACCESSMODE_WRITE(outBufs->descs[0].accessMask).  What am I missing?

    This article may help - http://processors.wiki.ti.com/index.php/Cache_Management

    In particular, the Codec Engine section toward the bottom describes what [little] cache management CE does/doesn't do.

    Chris

    codec engine cache
    Report Abuse
    • Reply
    You have posted to a forum that requires a moderator to approve posts before they are publicly available.
  • Satish Arora
    Posted by Satish Arora
    on Dec 11 2011 14:59 PM
    Prodigy670 points

     

    2) Yes, the buffers in the second thread are non-cached.  Both buffers are DMAI buffers created with Buffer_create(), using the default values for memParams(type = Memory_CONTIGPOOL, flags = Memory_NONCACHED, align = Memory_DEFAULTALIGNMENT, seg = 0).  Both buffers are 21073920 bytes (~20MB).

    20 MB buffer!! - Curious about whats it containing? Is it going through the DMAI to a usual video decoder/encoder?

    Well its cacheability on ARM side depends on several factors like is ARM/DSP modifying the data in this buffer by CPU touch or only DMAs. Can't comment much without knowing what all is happening with this buffer.

     

    3) I don't know exactly what the compiler does when you turn on debugging.  I tried forcing the loop to give up the CPU by adding usleep(1) as the last statement in the loop.  This allowed the 1st thread to run without interruption, which is my ultimate goal, but then it takes the loop several minutes to complete.  When I originally said thread 2 had no speed requirements, I meant that it could take a few seconds, not a few minutes.

    Not a solution but can you experiment with SCHED_RR instead of SCHED_FIFO  - this should allow your first thread to get a chance...

    Another wild guess - In case, first thread has any dependency on any kernel thread, the second thread being scheduled as SCHED_FIFO will have higher priority and will preempt kernel thread...

    Report Abuse
    • Reply
    You have posted to a forum that requires a moderator to approve posts before they are publicly available.
  • Andrew Muehlfeld
    Posted by Andrew Muehlfeld
    on Dec 12 2011 11:48 AM
    Intellectual540 points

    The 20MB buffer contains YCbCr data for a 10 megapixel image (each pixel gets 8 bits Y and 8 bits C).  I included a larger excerpt of the code below.

    From the video thread (previously referred to as thread 1), a YUV422 Semi-Planar image is memcp'd to a buffer that gets passed to the still thread (previously referred to as thread 2).  It is memcp'd, rather than passed directly, because the original buffer continues to be used in the video thread. 

    The still thread (thread 2) copies the data from the newly populated buffer, to another 20MB buffer, reformatting it to YUV422 Planar (separating interleaved Cb and Cr bytes into two separate blocks).  The resulting buffer is passed to TI's jpeg encoder, running on the DSP, using DMAI calls.

    With default non-cached buffers, the de-interleaving takes 1.7 seconds.  With cached buffers, the de-interleaving takes 700ms.  I read up on cache coherency, as recommended by Chris.  The first buffer is never touched by the DSP, so I believe it can be cached with no cache management required.  The second buffer gets sent to the DSP, after being written by the ARM CPU, so I need to call Memory_cacheWbInv() prior to process().  Is that correct?  Using cached buffers is an improvement, but not a solution.

    I experimented with SCHED_RR, both with the original priorities, and elevating thread 2's priority to the same as thread 1.  The delay to thread 1 was unchanged.

    Can you elaborate on the kernel thread idea?  Both threads call DMAI functions which I believe use the dsplink.ko kernel module.  Could that be related?


    **********************************************************************
    // Video Thread (thread 1) - Fill and pass buffer
    int takeStill(VideoEnv *stillEnvp, Buffer_Handle *pPreProcessedBuf, Buffer_Handle *pImgInBuf, int imgBufSize)
    {
        Int stillFifoRet;
        char                    *imgInBufP;
        char                    *diOutBufP;

        printf("Beginning takeStill\n");
        if(video_still_count > 0)
        {
            printf("Calling Fifo_get on hStillOutFifo, will print return after\n");
            stillFifoRet = Fifo_get(stillEnvp->hStillOutFifo, &(*pImgInBuf));
            printf("Returned from Fifo_get on hStillOutFifo\n");
            if (stillFifoRet < 0)
            {
                ERR("Failed to get buffer from still thread\n");
            }
        }

        imgInBufP = Buffer_getUserPtr(*pImgInBuf);
        diOutBufP = Buffer_getUserPtr(*pPreProcessedBuf);

        printf("takeStill: imgBufSize: %d\n", imgBufSize);
        memcpy(imgInBufP, diOutBufP, imgBufSize);

        Buffer_setNumBytesUsed(*pImgInBuf, imgBufSize);

        printf("Calling Fifo_put on hStillInFifo, will print return after\n");
        Fifo_put(stillEnvp->hStillInFifo, *pImgInBuf);
        printf("Returned from Fifo_put on hStillInFifo\n");

        //Clear flag
        takePushPin = 0;
        video_still_count ++;
        return 0;
    }


    **********************************************************************
    // Still thread (thread 2) -
        while (!gblGetQuit()) {
            /* Pause processing? */
            Pause_test(envp->hPauseProcess);


            /* Get a buffer to encode from the capture thread */
            fifoRet = Fifo_get(envp->hInFifo, &hImgInBuf);

            if (fifoRet < 0) {
                ERR("Failed to get buffer from video thread\n");
                cleanup(THREAD_FAILURE);
            }

            printf("completed Fifo_get()\n");

            /* Did the capture thread flush the fifo? */
            if (fifoRet == Dmai_EFLUSH) {
                cleanup(THREAD_SUCCESS);
            }

            int imgBufSize;
            int imgBufCPOff;
            int imgBufCROff;

            Int8                    *imgInBufP;
            Int8                    *imgIn422PBufP;

            imgIn422PBufP = Buffer_getUserPtr(hImgIn422PBuf);
            imgInBufP     = Buffer_getUserPtr(hImgInBuf);

            imgBufSize = Buffer_getNumBytesUsed(hImgInBuf);
            imgBufCPOff = imgBufSize/2;  //Color Plane Offset
            imgBufCROff = imgBufSize/4;  //CR Offset (referenced from CP Offset)

            Buffer_setNumBytesUsed(hImgIn422PBuf, imgBufSize);

            printf("still.c: Doing memcpy\n");
            memcpy(imgIn422PBufP, imgInBufP, imgBufSize/2);

            printf("Beginning Semi-planar to Planar format conversion\n");
            //Reformat data
            for(k = 0; k < imgBufSize/4; k ++)
            {
                //Copy Cb data
                imgIn422PBufP[imgBufCPOff + k] = imgInBufP[imgBufCPOff + (k*2)];

                //Copy Cr data
                imgIn422PBufP[imgBufCPOff + imgBufCROff + k] = imgInBufP[imgBufCPOff + (k*2) + 1];
            }
            printf("Completed Semi-planar to Planar format conversion\n");

            //Return buffer to video thread for next still
            Fifo_put(envp->hOutFifo, hImgInBuf);

            if(Buffer_getNumBytesUsed(hImgIn422PBuf) == 691200)
            {
                //printf("Little jpeg\n");
                iEncDynamicParams->inputHeight = 480;
                iEncDynamicParams->inputWidth = 720;
                iEncDynamicParams->captureWidth = 720;

            }
            else
            {
                //printf("Big jpeg\n");
                iEncDynamicParams->inputHeight = TEN_MP_HEIGHT;
                iEncDynamicParams->inputWidth = TEN_MP_WIDTH;
                iEncDynamicParams->captureWidth = TEN_MP_WIDTH;
            }

            BufferGfx_getDimensions(hImgIn422PBuf, &tmpDimensions);
            //printf("Original hImgInBuf Dimensions --- Width: %d, Height: %d, Line Length: %d, X: %d, Y: %d\n", tmpDimensions.width, tmpDimensions.height, tmpDimensions.lineLength, tmpDimensions.x, tmpDimensions.y);

             tmpDimensions.width = iEncDynamicParams->inputWidth;
             tmpDimensions.height = iEncDynamicParams->inputHeight;
             tmpDimensions.lineLength = iEncDynamicParams->inputWidth;

             BufferGfx_setDimensions(hImgIn422PBuf, &tmpDimensions);
             BufferGfx_getDimensions(hImgIn422PBuf, &tmpDimensions);
             //printf("New     hImgInBuf Dimensions --- Width: %d, Height: %d, Line Length: %d, X: %d, Y: %d\n", tmpDimensions.width, tmpDimensions.height, tmpDimensions.lineLength, tmpDimensions.x, tmpDimensions.y);



            iEncStatus.size = sizeof(IMGENC1_Status);
            iEncStatus.data.buf = NULL;
            //It is required to call IMGENC1_control() after each Ienc1_process() call to
            // reinitialize the jpeg encoder
            if(IMGENC1_control(hIMGENC1, XDM_SETPARAMS, iEncDynamicParams, &iEncStatus))
            {
              printf("Called VIDENC1_control, failed\n");
            }

            if(IMGENC1_control(hIMGENC1, XDM_GETBUFINFO, iEncDynamicParams, &iEncStatus))
            {
              printf("Called VIDENC1_control, failed\n");
            }


            printf("Calling Ienc1_process on image # %d\n", still_count);
            if(Ienc1_process(hIe1, hImgIn422PBuf, hImgOutBuf) != 0)
            {
                printf("Ienc1 Failed\n");
            }
            else
            {
                printf("Ienc1 Succeeded\n");
            }
            //printf("JPEG encoder used: %d bytes, updated\n", Buffer_getNumBytesUsed(hImgOutBuf));

            printf("Returned from Ienc1_process on image # %d\n", still_count);

            printf("Writing jpeg image to file\n");

            sprintf(dynamic_file_names, "/opt/dvsdk/dm6467/test_%d.jpg", still_count);
            outFile = fopen(dynamic_file_names, "w");
            fwrite(Buffer_getUserPtr(hImgOutBuf), 1, Buffer_getNumBytesUsed(hImgOutBuf), outFile);
            fclose(outFile);
            still_count ++;
            printf("Completed writing jpeg image to file\n");
        }
    **********************************************************************

    Report Abuse
    • Reply
    You have posted to a forum that requires a moderator to approve posts before they are publicly available.
  • Satish Arora
    Posted by Satish Arora
    on Dec 12 2011 13:36 PM
    Prodigy670 points

    Thanks for the explanation. Here is what I understood from your explanation above

    • There is a video thread(thread 1) which receives a captured frame in a buffer around 20 MB;
    • Video thread copies it into another buffer and passes in to thread 2 which is still thread and continues to use the original buffer for Video encoding.
    • Still thread receives the YUV422 semi-planar buffer and colorconverts to YUV  422 planar. returns the buffer back to video thread.
    • Still thread then goes on encoding the 422 planar buffer using JPEG encoder.

    Couple of points

    • Is the rate for video encoding different than still encoding? I mean out of all that get processed by video encoder, only few frames passed to still thread?
    • You did a memcpy in video thread, so that you can pass on the copied buffer to still thread. Since you were making a copy, you might as well convert this to planar here itself; this should save your unnecessary conversion/copy in the thread 2.
    • I might be wrong here but I see that thread 1 has a dependency on thread 2 i.e. it needs to get the buffer (required for copy) back from thread 2 before it can progress further. Just want to check if you have made sure that the thread 1 is actually stalled here or in the video encode call... 
    • I am also wondering if it is the copy in the second thread that holds your first thread or the JPEG encoding. I get this doubt because JPEG encoder also runs on DSP and hence can create some degradation for video encoder which also would require DSP support for encoding. You can try commenting off the copy part and still have the JEPG  encoder part as is to confirm this.
    • Also if the stall is in video encoder, rough idea how much is the additional stall that you see because of presence of thread 2. 

    Thanks

    Satish

    Report Abuse
    • Reply
    You have posted to a forum that requires a moderator to approve posts before they are publicly available.
  • Andrew Muehlfeld
    Posted by Andrew Muehlfeld
    on Dec 12 2011 15:23 PM
    Intellectual540 points

    Satish Arora

    Thanks for the explanation. Here is what I understood from your explanation above

    • There is a video thread(thread 1) which receives a captured frame in a buffer around 20 MB;
    • Video thread copies it into another buffer and passes in to thread 2 which is still thread and continues to use the original buffer for Video encoding.
    • Still thread receives the YUV422 semi-planar buffer and colorconverts to YUV  422 planar. returns the buffer back to video thread.
    • Still thread then goes on encoding the 422 planar buffer using JPEG encoder.

    All correct.

     

    Satish Arora
    • Is the rate for video encoding different than still encoding? I mean out of all that get processed by video encoder, only few frames passed to still thread?

    Yes, the rate of still encoding is much lower than video encoding.  The user can request a still frame encoding, which calls takeStill(), and is anticipated once every few minutes, and at a maximum burst rate of a a few consecutive still images at 5 second intervals. 

     

    Satish Arora
    • You did a memcpy in video thread, so that you can pass on the copied buffer to still thread. Since you were making a copy, you might as well convert this to planar here itself; this should save your unnecessary conversion/copy in the thread 2.

    That's what I did originally, but the conversion to planar took too long.  The memcpy() is quick.  The motivation for creating a separate thread in the first place was to allow the conversion to run as a low priority thread, without blocking the video thread.

     

    Satish Arora
    • I might be wrong here but I see that thread 1 has a dependency on thread 2 i.e. it needs to get the buffer (required for copy) back from thread 2 before it can progress further. Just want to check if you have made sure that the thread 1 is actually stalled here or in the video encode call... 

    Thread 1 would wait on thread 2 if the user requested a second still image before the first completed.  In my test case, I only request one, and have verified that takeStill() is only called once. 

    Satish Arora
    • I am also wondering if it is the copy in the second thread that holds your first thread or the JPEG encoding. I get this doubt because JPEG encoder also runs on DSP and hence can create some degradation for video encoder which also would require DSP support for encoding. You can try commenting off the copy part and still have the JEPG  encoder part as is to confirm this.

    This comment led me down a good path.  I realized that the JPEG codec and H264 codec were in the same group on the remote server, with the same priority.  I moved the JPEG codec to a new group, with its own scratch memory, and a lower priority.  The JPEG encoder did introduce a delay that varied from 100ms to 300ms.  That has been eliminated.  I tried your suggestion of commenting out the transfer loop and leaving the JPEG encoder loop intact.  Once I fixed the priorities in codec.cfg and server.cfg, there was no delay with the transfer loop commented out.  With the transfer loop re-introduced, the delay came back, even with the fixed server priorities.

     

    Satish Arora
    • Also if the stall is in video encoder, rough idea how much is the additional stall that you see because of presence of thread 2. 

    memcpy (y data): 60-70ms
    deinterleave (cb and cr data): 700-1100ms

     

     

     

    Report Abuse
    • Reply
    You have posted to a forum that requires a moderator to approve posts before they are publicly available.
  • Satish Arora
    Posted by Satish Arora
    on Dec 13 2011 14:01 PM
    Prodigy670 points

    Andrew Muehlfeld
    This comment led me down a good path.  I realized that the JPEG codec and H264 codec were in the same group on the remote server, with the same priority.  I moved the JPEG codec to a new group, with its own scratch memory, and a lower priority.  The JPEG encoder did introduce a delay that varied from 100ms to 300ms.  That has been eliminated.  I tried your suggestion of commenting out the transfer loop and leaving the JPEG encoder loop intact.  Once I fixed the priorities in codec.cfg and server.cfg, there was no delay with the transfer loop commented out.  With the transfer loop re-introduced, the delay came back, even with the fixed server priorities.

    Good to see it helped somewhere.

    Andrew Muehlfeld
    That's what I did originally, but the conversion to planar took too long.  The memcpy() is quick.  The motivation for creating a separate thread in the first place was to allow the conversion to run as a low priority thread, without blocking the video thread.

    Understood. In general copying big video buffers with CPU is not a good idea. You would want to use DMAs (EDMA in DM6467) for such copy. Not just for simple copy, even for YUV422SP to YUV422P conversion, you should be able to use EDMAs. You can look at VDCE driver, it uses EDMA to copy the Luma buffer. Using EDMAs you should also be able to  do SP to P conversion for chroma. Look at few example of using DMAs at http://www.ti.com/lit/ug/sprueq5b/sprueq5b.pdf.

    If you are able to do this copy/conversion fast enough using EMDAs then you might be able to avoid two copies, as you can then do this in thread 1 only.

    Thanks,

    Satish


    Report Abuse
    • Reply
    You have posted to a forum that requires a moderator to approve posts before they are publicly available.
  • Satish Arora
    Posted by Satish Arora
    on Dec 14 2011 06:23 AM
    Prodigy670 points

    Satish Arora
    Understood. In general copying big video buffers with CPU is not a good idea. You would want to use DMAs (EDMA in DM6467) for such copy. Not just for simple copy, even for YUV422SP to YUV422P conversion, you should be able to use EDMAs. You can look at VDCE driver, it uses EDMA to copy the Luma buffer. Using EDMAs you should also be able to  do SP to P conversion for chroma. Look at few example of using DMAs at http://www.ti.com/lit/ug/sprueq5b/sprueq5b.pdf.

    If you are able to do this copy/conversion fast enough using EMDAs then you might be able to avoid two copies, as you can then do this in thread 1 only.

     

    I found another thread on e2e where somebody used EDMAs to convert from YUV422 SP to YUV422 Planar. - http://e2e.ti.com/support/embedded/multimedia_software_codecs/f/356/t/56639.aspx

    Thanks

    Satish

    Report Abuse
    • Reply
    You have posted to a forum that requires a moderator to approve posts before they are publicly available.
  • Andrew Muehlfeld
    Posted by Andrew Muehlfeld
    on Dec 14 2011 11:07 AM
    Intellectual540 points

    Thanks for the ideas.  I can see that this operation would be better done with DMA.  There seem to be many ways to perform DMA on the DM6467T, but very little documentation on any of them.

    One method is to use the ACPY3 API.  Another method is to use the VDCE driver.  Another is to use EDMA driver directly.  Which method is most appropriate in this case?  Is there any documentation available?

    Report Abuse
    • Reply
    You have posted to a forum that requires a moderator to approve posts before they are publicly available.
  • mike soso
    Posted by mike soso
    on Dec 21 2011 08:38 AM
    Intellectual930 points

    Did you solve the 422SP to 422P conversion with EDMA3 ? I need to do the same thing...

    Report Abuse
    • Reply
    You have posted to a forum that requires a moderator to approve posts before they are publicly available.
  • Andrew Muehlfeld
    Posted by Andrew Muehlfeld
    on Dec 21 2011 08:46 AM
    Intellectual540 points

    No, I did not modify my 422SP to 422P conversion to use EDMA3.  I sped up the conversion on the ARM by using cached buffers, and optimized some other application specific things surrounding the conversion, but the video stream still pauses.  I'm still curious why the linux thread priorities aren't functioning as expected, and I might still try EDMA3 sometime, but the hiccup has been reduced to a tolerable length, at least for the short term, and eliminating it completely isn't a top priority.  If you do figure out how to do it, please share.

    Report Abuse
    • Reply
    You have posted to a forum that requires a moderator to approve posts before they are publicly available.
  • mike soso
    Posted by mike soso
    on Dec 22 2011 03:15 AM
    Intellectual930 points

    It's not a priority for now, but i think i will need to do it into the 3 next month. I will let you inform ;) !

    Mika

    Report Abuse
    • Reply
    You have posted to a forum that requires a moderator to approve posts before they are publicly available.
12
TI E2E™ Community
  • Support Forums
  • Blogs
  • Videos
  • Groups
  • Site Support & Feedback
  • Settings
TI E2E™ Community Groups
  • TI University Program
  • Make the Switch
  • Microcontroller Projects
  • Motor Drive & Control
Other Communities
  • Deyisupport
  • Designsomething.org
  • beagleboard.org
  • TI on Element 14
  • TI on TechXchangeSM
Other Technical & Support Resources
  • WEBENCH® Design Center
  • Product Information Centers
  • Technical Documents
  • TI Design Network
  • TI Technical Articles
  • TI Training

All content and materials on this site are provided "as is". TI and its respective suppliers and providers of content make no representations about the suitability of these materials for any purpose and disclaim all warranties and conditions with regard to these materials, including but not limited to all implied warranties and conditions of merchantability, fitness for a particular purpose, title and non-infringement of any third party intellectual property right. TI and its respective suppliers and providers of content make no representations about the suitability of these materials for any purpose and disclaim all warranties and conditions with respect to these materials. No license, either express or implied, by estoppel or otherwise, is granted by TI. Use of the information on this site may require a license from a third party, or a license from TI.

Content on this site may contain or be subject to specific guidelines or limitations on use. All postings and use of the content on this site are subject to the Terms of Use of the site; third parties using this content agree to abide by any limitations or guidelines and to comply with the Terms of Use of this site. TI, its suppliers and providers of content reserve the right to make corrections, deletions, modifications, enhancements, improvements and other changes to the content and materials, its products, programs and services at any time or to move or discontinue any content, products, programs, or services without notice.

Follow Us Texas Instruments on Facebook Texas Instruments on Twitter Texas Instruments on LinkedIn Texas Instruments on Google+
TI Worldwide | Contact Us | my.TI Login | Site Map | Corporate Citizenship | mobile m.ti.com (Mobile Version)

TI is a global semiconductor design and manufacturing company. Innovate with 100,000+ analog ICs and
embedded processors, along with software, tools and the industry’s largest sales/support staff.

© Copyright 1995-2013 Texas Instruments Incorporated. All rights reserved.
Trademarks | Privacy Policy | Terms of Use