This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

H264 Encoder on dm6467 (CE version 2.26.2.11 hangs after single program execution)

Hi!

I have been working on a question in the linux forum - which I think about it now, was probably not the right spot but was anyhow being helped (which was nice).  The TI employee came to the conclusion that my problem might actually be that the codec version I am using is returning a buffer handle that should be kept internal to the codec.  He suggested posting here and linking back to our conversation.

The basic story is this:

I was working with dvsdk 3.10.0.19 which came with DMAI 2.10.00.12, CE 2.25.5.16 and cs2dm6467 v1.0.0.10.  I created a codec server with a single codec in it - H264enc from the cs2dm6467 folder.  I then used DMAI application video_encode_io1 to begin to work with the codec and file io.  It seemed to be working fine.  I modified the application to create threads (as many as 10) and it was able to encode frames like a champ.  Except that after so many runs it would just hang (many an hour or so of running).  I found a forum that said that CE v2.25.5.16 had a memory leak in it that would cause the program to hang (the log shown in the forum matched my log.)  So I started the process over with CEv2.26.2.11.  After integrating this version, my project would not work ever.  So posted in a forum and Rob Tivvy suggested explicitly registering the buffers that the DMAI was using.  This did not work either.  After closer investigation at the logs, Rob now thinks there might be a problem with my codec.

Could you please help me confirm his assumption, recommend a codec version to use instead, and/or recommend another test to run for debugging.

Unfortunately, I don't have the JTag environment set up so the only debug I can do is with logs and printf. 

Here is the link:

http://e2e.ti.com/support/embedded/f/354/p/87684/304901.aspx#304901

thanks so much,

Brandy

  • Here is a bit more information:

    I found this post:  http://e2e.ti.com/support/embedded/f/354/p/63547/229656.aspx regarding the contig memory not getting freed.  I assume this must relate to it not getting registered, but just for fun I reinstated CE 2.25.5.16 and checked to see if my codec was also not freeing this buffer.  Conclusion: its not.  However, as far as I can tell, my code is properly registering the buffer and allocating it with the correct size.  Here is the log:

    5850.log_oldCE_successful_encode.txt

    the forum never posted a solution to the problem and I am using a newer codec as well version 1.20.2.0 but given the log, I think the problem still applies.

    Thanks agian!

    Brandy

  • Hi Brandy,

    Was there a reason why you had to create your own codec server as opposed to using the original codec server that was provided? It seems it'd have been simpler to just use the server that was provided for you out-of-box. It'd be good to use the original server if possible to eliminate any possibility of a problem introduced while creating the new one.

    So it seems your app works fine until you started modifying the video_encode_io1 app to run more threads. Could you expand on what these threads are doing? Are they all running the same encoder? When using multiple threads to call into codec engine, make sure you call Engine_open in each thread so that each thread gets its own engine handle.

    If you think that the above thread is related to your problem, you can try to do what they did by freeing the buffer after closing the encoder. Though I don't really see how this would help the freeze, given I presume you do not repeatedly create and close the encoder in your app.

    You should try posting a log from CE 2.25.05.16, now that you have gone back to that version.

    I will let the codec team comment on potential locking issues with this revision of the encoder on the DM6467, as I am not too familiar with it.

    Best regards,

    Vincent

     

     

  • Hi Vincent,

    Let me explain more:

    I created my own server in an effort to simplify what I was looking at - plus with the genserver.exe (the Davinci Software for Dummies) it was quite simple to point it at the codec I wanted and create a server.  In addition when I am maintaining code, I don't want to have to maintain the other codecs as well even though we are not using them.  In the end, I need to somehow create a stand alone folder structure that anyone in my office can download and then compile with (assuming they have the compiler.)

    I am not sure if the app worked fine or not with only 1 thread.  I choose not to run just one thread a hundred times in a row becuase that was not value add for me.  To be honest, I would have assumed that was done by TI to test the DMAI applications. 

    I don't repeated open and close the engine.  I open it once in each thread and then each thread closes it when its done.

    Ok regarding the log that I posted just last night:  It showed two errors:

    @2,738,213us: [+2 T:0x40afe490] ti.sdo.dmai - [Buffer] Free Buffer of size 518400 at 0x41b7f000 (0x8b5c2000 phys)
    @2,738,323us: [+7 T:0x40afe490] OM - Memory_contigFree> Error: buffer (addr=1102573568, size=518400) not found in translation cache
    @2,738,430us: [+2 T:0x40afe490] ti.sdo.dmai - [Buffer] Free Buffer of size 345600 at 0x41c09000 (0x8b678000 phys)
    @2,738,529us: [+7 T:0x40afe490] OM - Memory_contigFree> Error: buffer (addr=1103138816, size=345600) not found in translation cache

    This is becuase I was calling unregistering the buffers before I freed them. When I switch the order, CE 2.25.5.16 works without errors.

    What you would like is a failing log of 2.25.5.16, which I will try to get but it takes a while.  Also, here is the failing log of 2.26.2.11, the buffer.c code that I modified according to the thread with Rob Tivvy and also pseudo code regarding my thread scheme.  What I need is a) a reason while 2.25.5.16 hangs on occasion and b) why 2.26.2.11 doesn't work at all (and whether this is becuase of the codec version I am using.)

    Thanks in advance for your help, please see the list files below.

    Brandy

    Failing Log:

    2772.output.txt

    Buffer.c

    0121.Buffer.c.txt 

    Partial code for threading(no error checking):

    void * encodeThrFxn(void *thrArgs)
    {
        CERuntime_init();
        Dmai_init();

        inFile = fopen(args->inFile, "rb");
        outFile = fopen(outfilename, "wb");
        hEngine = Engine_open(args->engineName, NULL, NULL);

        hVe1 = Venc1_create(hEngine, args->codecName, &params, &dynParams);

        inBufSize = Venc1_getInBufSize(hVe1);
        outBufSize = Venc1_getOutBufSize(hVe1);
        gfxAttrs.bAttrs.memParams.align = BUFSIZEALIGN;    
        hInBuf = Buffer_create(Dmai_roundUp(inBufSize, BUFSIZEALIGN), (Buffer_Attrs *)&gfxAttrs);

        hOutBuf = Buffer_create(Dmai_roundUp(outBufSize, BUFSIZEALIGN), (Buffer_Attrs *)&gfxAttrs);

        while (numFrame < args->numFrames && sigHandle_quit == 0)
        {
         numFrame++;

         if (readFrameGray_make420SP(hInBuf, inFile) < 0) 
            {
                goto cleanup;
            }

            BufferGfx_resetDimensions(hInBuf);

            /* Encode the video buffer */
            if (Venc1_process(hVe1, hInBuf, hOutBuf) < 0)
            {
                printf("Failed to encode video buffer\n");
                goto cleanup;
            }

            if (Buffer_getNumBytesUsed(hOutBuf))
            {
                if (fwrite(Buffer_getUserPtr(hOutBuf), Buffer_getNumBytesUsed(hOutBuf), 1, outFile) != 1)
                {
                    printf("Failed to write encoded video data to file\n");
                    goto cleanup;
                }
            }

      cleanup:
        /* Clean up the application */
        printf("... exiting thread %i\n", (int)pthread_self());
        return 0;
    }

    void appMain(Args * args)
    {
        status = EXIT_SUCCESS;
        for(i = 0; i < args->numThreads; i++)
        {
            if (pthread_create(&thrID[i], NULL, encodeThrFxn, args))
            {
             printf("Failed to create video thread %d\n", (int)thrID[i]);
             status = EXIT_FAILURE;
             goto cleanup;
         }

        }
       
        for(i = 0; i < args->numThreads; i++)
        {
            if (pthread_join(thrID[i], &ret) == 0) {
                if (ret == THREAD_FAILURE) {
                    status = EXIT_FAILURE;
                    goto cleanup;
                }
            }   
        }

    cleanup: 
     
        exit(status);
    }

  • Hello Vincent,

    I got CE version 2.25.5.16 to fail.  I ran my threaded code with 10 threads, each encoding 100 frames.  It was successful around 70 times in a row and then failed.

    Here is a successful log:

    8360.output_2.txt

    Here is the failed log:

    2134.output_1.txt

    Thanks agian!

     

  • Hi Brandy,

    Looking at your code snippet, I'd suggest you call CERuntime_init() and Dmai_init() just once in your main function. These functions are meant to be called just once in the system, not once per thread.

    Looking the the failing trace from CE 2.25.5.16, it looks like the creation of the 10th codec instance never returned. The 10th thread is waiting for the DSP to tell it that the creation of the 10th instance succeeded, but the confirmation never came. The DSP seems to be 'stuck' at that point. Here are a few things to try to troubleshoot this:

    - Run your app with only 9 threads. Maybe you are hitting a condition where resources are low, ultimately resulting in memory corruption (e.g. stack overflow comes to mind). See if you still hit this problem. If the app still hangs, keep decreasing the number of threads to the minimum value where this hang occurs. This would help simplify your system and its troubleshooting. Then create the CE_DEBUG log again for the simplified failing case.

    - I noticed that the system has all kinds of calls happening at the same time. To simplify this, study the DVSDK demos and learn how to use the Rendezvous module. This module comes from DMAI, and it can be used to sync up all your threads. Then set up a sync point for all 10 of your threads immediately after the Venc1_create() function call in each thread:

        hVe1 = Venc1_create(hEngine, args->codecName, &params, &dynParams);

        Rendezvous_meet(hRendezvousInit);

    This would make sure all codec instances are created prior to proceeding to processing. This would help to see if the 10th codec creation still fails if processing isn't involved. Maybe you have hit a race condition between codec creation and processing, and this would help to isolate it.

    Again, create the CE_DEBUG log for the simplified failing case (if it still fails).

    - If you have access to CCS and JTAG, refer to http://processors.wiki.ti.com/index.php/Debugging_the_DSP_side_of_a_CE_application_using_CCS for instructions on how to connect to the DSP. Don't bother with adding the spinLock loop. Just run your app until it hangs, and connect to it using CCS. Then you can observe in which function the DSP code is stuck. This would be a valuable data point.

    Let us know how this goes.

    Best regards,

    Vincent

  • Hello Vincent,

    Well, once I moved the DMAI_init() and CE_init() functions out of the thread routine, I have yet to see a fail.  I ran 9 threads 150 times, no fail and 10 threads 150 times with no fail.  Here is a passing log for 10 threads:

    0550.output_50.txt

    Can you please explain the significance in this result?  Would each thread be initializing too much memory?  Please also explain the importance of the thread rendezvouing before continuing execution.  (I have not added this code yet, but certianly will).

    Later today, I will see if this change helped with CE v2.26.2.11.  Do you have any other suggestions about why 2.26.2.11 does not work for my application?  I understand that if v2.25 is working, why bother to upgrade?  My thought is that eventually something will not be backwards compatible and this will force and upgrade - then I will be at the same place except that my application will have gotten more complicated.

    Thanks!

    Brandy 

     

  • Hi Brandy,

    Glad to hear that you were able to fix the issue. CERuntime_init and DMAI_init are "component-wide" initialization functions that are meant to be called just once. Given your application was repeatedly calling these functions over multiple threads, it is possible that a later call corrupted the state of a variable that was already being used by a previously created thread, depending on your timing. That would explain why you see failures only occasionally.

    The rendezvous suggestion was only to make the debugging easier, in the event there was bad code during the codec creation phase we can then better isolate the problem. By putting in the rendezvous you can make sure the creation phase is distinct from the processing phase, which is a good practice, even though CE itself does not have such a requirement.

    As for CE 2.26.2.11, I'd rather not comment on it, as it has yet to be system-tested with all the other components in a DVSDK setting. As a developer, I'd advise anyone to try to stick to the components that have been tested and shipped together in a DVSDK. It may actually be less work to upgrade the DVSDK as a whole (instead of just CE) at a later point in time.

    Best regards,

    Vincent

  • Thanks for your help!  I guess since I can't figure out what is going on with CE2.26.2.11, I will just have to take your advice - although it is still odd that it does not work at all.

    Thanks agian!

  • Hi Vincent,

    I am sorry, but the day did not go so well today.  When I went to do some bench mark loops, I found I could only get through 1 to at most 15 cycles before the system locked up.  What is interesting, is that I was doing my benchmark without on CE_DEBUG values.  When I went to repeat the situation with CE_DEBUG=2, I could not get the system to fail.

    What does this mean? I have tried numerous tests on my network stability, thinking that perhaps the NFS file writing was not keeping up with the Davinci - but that does not seem to be the problem.  What's more is the more frames I try to encode, the harder the system crashes.  In other words, when I tried to encode 300 frames in each thread the davinci hang causes my nfs and virtual box to freeze.  When I do just 100 frames, it only locks up the davinci. 

    Also, when I run the 'top' command while running my program - it shows that the program is using > 100% of the memory.  Is that normal?

    Also, look at the total number of free memory - do you think this could have an effect? Here is my loadmodules command for the memory module:

    insmod /kernel_binaries/dm6467/cmemk.ko phys_start=0x84C00000 phys_end=0x8ba00000 pools=2x921600,1x460800,1x1048576,1x345600,2x86400,11x564528,5x677376,14x5396480,3x4147200,4x1451520,4x1843200

    Like I have already posted - I literally not changed anything on the memory map and I am not really sure or confident of how to do that.  I set up my bootargs and loadmodules to match the default:

    # Default Memory Map
    #
    # Start Addr    Size    Description
    # -------------------------------------------
    # 0x80000000     76 MB  Linux
    # 0x84C00000    110 MB  CMEM
    # 0x8ba00000     70 MB  CODEC SERVER

    Also, when I am running the "top" command, I can get it to fail either.

    I am going to the LInux Embedded System Design Workshop next week - are there any specific questions or skills you can recommend that I focus on to help solve this problem?

    In case you asking yourself why it is so important that I have 10 threads running, it is due to the application and risk reduction.  We were told that the chip could encode 10 streams of data simulatanously a faster than 8hz.  I am trying to see how fast it can encode 10  data streams.  When I can get the data correctly, it seems like it is fast enough.  But if I can't get a stable system - then that is a mute point.

    I'll keep trying different ideas tomorrow, so any hints would be great.

  • Hi Brandy,

    The fact that you are seeing the issue only when CE_DEBUG is turned off indicates that the problem is likely to be timing related. When CE_DEBUG is turned on, the code runs slower, and it may have masked some race condition in the application. These types of problem are typically hard to debug. You will need to first find out at what point the application is hanging. You said the device locked up after 100 frames. Then it should not be able to proceed further when you ask it to process 300.

    I don't think I have seen %MEM go over 100% before. Have you tried running with fewer threads to see if you can reduce this number to less than 100? Maybe it is worth checking to see if this helps with the hang. In any case, reducing the number of threads is going to help with your debugging. You should make sure the app can run with 2 threads before you increase the number. I know your goal is to run it with 10, but it is best to proceed a step at a time.

    To learn more details about how to change the memory map, you may want to read up on this article: http://processors.wiki.ti.com/index.php/Changing_the_DVEVM_memory_map. It may be easiest to start by shrinking the CMEM area if possible and increase the memory assigned to Linux with the delta.

    At the workshop, learn as much as you can about debugging multi-threaded programs, and of course any detail about Linux memory management would also be useful.

    Best regards,

    Vincent