Codec Engine / H.264 Encoder memory leak

Andreas Gaeer

Hello TI!

When deleting a H.264 encoder instance "VIDENC1_delete", we run into the problem that one CMEM buffer gets not freed. It seems "Memory_contigFree" is called with a wrong buffer size. This looks very similar to this post: http://e2e.ti.com/support/embedded/f/354/p/63547/229656.aspx#229656.

When starting our app with CE_DEBUG=3 here is the allocation of this buffer:

@3,031,640us: [+0 T:0x453d7460 S:0x453d5f84] OM - Memory_contigAlloc> Enter(size=146, align=4, cached=FALSE, heap=FALSE)
@3,031,871us: [+4 T:0x453d7460 S:0x453d5f84] OM - Memory_contigAlloc> CMEM_alloc(146) = 0x4b022000.
@3,032,032us: [+4 T:0x453d7460 S:0x453d5f84] OM - Memory_contigAlloc> CMEM_getPhys(0x4b022000) = 0x89e95000.
@3,032,694us: [+0 T:0x453d7460 S:0x453d5f84] OM - Memory_contigAlloc> return (0x4b022000)

And here is the matching free:

@28,716,807us: [+0 T:0x453d7460 S:0x453d674c] OM - Memory_contigFree> Enter(addr=1258430464, size=48)
@28,717,119us: [+7 T:0x453d7460 S:0x453d674c] OM - Memory_contigFree> Error: buffer (addr=1258430464, size=48) not found in translation cache

Our cmem driver is loaded with these arguments:

phys_start=0x89c00000 phys_end=0x90000000 pools=4x3846400,8x2880000,2x1440000,2x2880512,1x2880512,4x12499968,16x4096,8x8192,4x24576,4x36864,16x53248,8x65536,256x56,1x320,1x640,2x608,1x2961x28,2x24 allowOverlap=1 phys_start_1=0x00001000 phys_end_1=0x00008000 pools_1=1x28672

We use these versions of the CE and H.264 encoder:

Codec engine: 2_25_03_13
H.264 encoder : 2_10_00_00_production

The application runs on an DM368 device.

It seems like this (or something similar) also happens with the MPEG4 encoder, but I don't have detailed dumps for that.

I have attached a complete alloc/free dump from "CE_ DEBUG=3 our_app | grep Memory_contig". You see a H.264 encoder instance getting created. Then after some seconds it's deleted and a JPEG encoder gets created.

3125.cmem_alloc_free.log

As this bug delays our release, a quick response would be appreciated.

Regards,

Andreas

over 14 years ago

0 Chris Ring over 14 years ago

TI__Genius 17205 points

We chased something similar with another customer... the solution was to increase the size of the internal virt/phys map cache table. Try adding this to your cfg script and see if it helps:

xdc.useModule('ti.sdo.ce.osal.linux.Settings').maxCbListSize = 200; // default size is 100

If that works, I'll explain what it's doing. :)

Chris

0 Robert Tivy over 14 years ago

TI__Mastermind 18260 points

Are you using the DVSDK, specifically the DMAI module? I believe there is a problem with the Buffer module in there, where it passes a size to Memory_contigFree that's different than what was allocated, hence the problem. Does that other thread that you point to not explain your problem too?

Regards,

- Rob

0 Brijesh Singh over 14 years ago in reply to Robert Tivy

TI__Expert 5680 points

IIRC, the DMAI Buffer module issue was fixed in latest DMAI release.

If you are using DMAI, then have you tried running your app using latest DVSDK 4.0 [1]

[1] http://software-dl.ti.com/dsps/dsps_public_sw/sdo_sb/targetcontent/dvsdk/DVSDK_4_00/latest/index_FDS.html

Thanks

Brijesh

0 Andreas Gaeer over 14 years ago in reply to Chris Ring

Intellectual 335 points

Hi Chris,

thanks for this hint. This doesn't fix the problem I posted, but it fixes problems we discovered Friday after my posting ;-)

Please explain what it is doing. After looking into Memory_cmem.c I would actually suggest to set the value to UINT32_MAX. Or - even better - removing this whole Memory_maxCbListSize check, because it does not work. There is only one reason I could think of why this check would be necessary and thinking about that does not make me happy.

Also IMHO it is the worst of all solutions to silently ignore allocation errors. If this serious allocation problem would at least have printed an ERROR on the console we would have discovered it weeks ago.

Could you please give me a bug number of this error in TIs bugtracking system, so I can track if the problem is solved?

Regards,

Andreas.

0 Andreas Gaeer over 14 years ago in reply to Brijesh Singh

Intellectual 335 points

Hello Brijesh,

could you specify which SVN commit of DMAI you mean? As we're not using DMAI, we cannot just switch to the newest version. But we could check if our code makes the same mistake.

Regards,

Andreas.

0 Andreas Gaeer over 14 years ago in reply to Chris Ring

Intellectual 335 points

Hi Chris,

after a little testing, it seems the problem with wrong sizes on free comes from the H.264 encoder release we're using. Even a simple testprog, that only creates an encoder instance and deletes it results in the same error message.

I have attached the testcode I used together with the cfg file and an execution log.

Will this bug be fixed in the next release of the H.264 encoder?

Regards,

Andreas.

7506.h264_leak.zip

0 Robert Tivy over 14 years ago in reply to Andreas Gaeer

TI__Mastermind 18260 points

Andreas,

The Memory_maxCbListSize check is there to keep performance of the Memory_getBufferPhysicalAddress() and Memory_getBufferVirtualAddress() at a reasonable level. Before it was added, the list would grow unbounded due to automatic insertion during Memory_getBufferPhysicalAddress(), and search operations would take more and more time as the list grew, eventually eating up so many CPU cycles that the application started missing realtime.

This would not be a problem if the list was restricted to Memory-managed buffers - buffers that are allocated and freed by Memory_contigAlloc()/Memory_contigFree(). The problem creeps in because Codec Engine was doing a "favor" for users that passed non-Memory buffers to codec algorithms, which is a perfectly valid use-case. To explain, a user app would pass a buffer allocated outside of Codec Engine (CE) to a VISA-class "process()" or "control()" function (e.g., VIDENC1_process()). The VISA-class stub needs to pass the physical address to the codec algorithm, so it uses Memory_getBufferPhysicalAddress() since the CE function Memory_getBufferPhysicalAddress() is designed to work for non-Memory buffers. On the way "back" from the process() function, the VISA-class stub needs to translate a physical address from the codec algorithm into a virtual address that can be used by the Linux app, so the Memory module's translation cache is used for this purpose. Since the addition to the translation cache was automatic in this case, and since there is no corresponding automatic removal, the list would grow and grow with entries for these non-Memory buffers, and the list would retain translations for buffers that were no longer in use.

There is a bug that we filed that is related to this: SDOCM00076172 - "CE's OSAL API Memory_getBufferPhysicalAddress() should not add buffer to its physical/virtual address translation cache"

Our "fix" for this is to not automatically add a translation entry to the cache during Memory_getBufferPhysicalAddress() and also require the user to register/unregister the buffer using the existing functions Memory_registerContigBuf()/Memory_unregisterContigBuf() before calling the VISA-class process() function. Memory_contigAlloc()/Memory_contigFree() will continue to add/remove translation entries, and Memory_getBufferPhysicalAddress() will still work for non-Memory, non-CMEM buffers but without adding them to the cache. Basically, we need the application to tell CE when the buffer is active and when it's done, which is just basic good programming practice (CE was allowing the application to be lazy about buffer management).

With this new behaviour the list should stay reasonably small and not grow unbounded.

This "fix" also contains a big fat error "hint" when Memory_getBufferVirtualAddress() can't find an entry:

    if (virtualAddress == 0) {
        Log_print2(Diags_USER7, "[+7] Memory_getVirtualAddress> "
            "ERROR: buffer (physAddr=0x%x, size=0x%x) not found in translation"
            "cache\n\nEnsure that you have registered this buffer with "
            "Memory_registerContigBuf()\n", (IArg)physicalAddress, (IArg)sizeInBytes);
    }

We will probably be addressing this in a better fashion some time soon (maybe 2011 Q1), perhaps by using a hash table instead of a list, so we would appreciate any suggestions that you have regarding a better solution.

Regards,

- Rob

0 Andreas Gaeer over 14 years ago in reply to Robert Tivy

Intellectual 335 points

Hi Robert,

thanks for the detailed info. Now I understand what's going on.

May I suggest using a binary tree (maybe rb-tree) instead of a hash-table for translation cache? That way you won't have a problem deciding what size your table should have, but performance would still be great. Also it may be a good idea to divide the data structures for memory allocation (Cmem_alloc & Cmem_free) and the translation cache (getBufferPhysicalAddress). That way you can easily limit the amount of data in the translation cache while not risking corruption in the memory management.

Did you have a look at the "struct mm_struct" of the linux kernel? There the memory areas are linked into a) a simple list for easy in-order traversal, b) a rb_tree for fast lookup and c) the recent used area is also stored for even faster multiple lookup of the same area.

Regards,

Andreas.

0 Jinkyu Park over 14 years ago in reply to Chris Ring

Prodigy 170 points

Hello Chris,

I met the same problem.

My application was working well in DVSDK 2.1.

After updrading DVSDK 3.1, Venc1_delete() can't free CMEM buffers.

Fortunately, your soultion works for me. (Thank you~)

Please, let me know what it's doing.

Best Regards,

Jinkyu Park

0 Gadi Bergman over 10 years ago in reply to Chris Ring

Intellectual 780 points

Hi Chris, the solution works for us, thanks!

Processors

Processors forum

Codec Engine / H.264 Encoder memory leak