CMEM allocations and page size question

John Anderson

I'm using DVSDK4 and the DM368. Is it correct to assume that all allocations for cmem is a separate 4K page? In other words 5 x 56 bytes really meands 5 x 4K?

If so, then wouldn't t make more sense to add up all the misc allocations that are equal to or smaller than 4K and just allocation a bunch of 4K memory blocks?

John A

over 14 years ago

0 Robert Tivy over 14 years ago

TI__Mastermind 18260 points

John Anderson said:

I'm using DVSDK4 and the DM368. Is it correct to assume that all allocations for cmem is a separate 4K page? In other words 5 x 56 bytes really meands 5 x 4K?

Yep, that's true.

John Anderson said:

If so, then wouldn't t make more sense to add up all the misc allocations that are equal to or smaller than 4K and just allocation a bunch of 4K memory blocks?

Each individual buffer needs to occupy its own unique page since each buffer can have different cacheing dynamics. When allocating CMEM buffers you can specify CMEM_CACHED or CMEM_NONCACHED, and this cacheing policy applies to the whole 4K page.

Also, CMEM is somewhat simplified if it doesn't have to manage sub-pages within pages (keeping track of page reference counts, etc.).

Regards,

- Rob

0 John Anderson over 14 years ago in reply to Robert Tivy

Genius 5240 points

Robert Tivy said:

Each individual buffer needs to occupy its own unique page since each buffer can have different cacheing dynamics. When allocating CMEM buffers you can specify CMEM_CACHED or CMEM_NONCACHED, and this cacheing policy applies to the whole 4K page.

Also, CMEM is somewhat simplified if it doesn't have to manage sub-pages within pages (keeping track of page reference counts, etc.).

This didn't really answer the quoted question. If the cache spec is at the buffer level I could simply add up all the buffers that are less than 4K and allocate them all as 4K blocks in one big pool, right? Or are you saying that the cache is specified at the pool level?

If it's at the pool level then a pool that has buffers allocated as CMEM_CACHED would deny a buffer request the was CMEM_NONCACHED, correct? I'm still thinking that a buffer request is sometimes denied even if there are available buffers. That would be explained if the caching specification is at the pool level, and not the buffer level.

John A

0 Robert Tivy over 14 years ago in reply to John Anderson

TI__Mastermind 18260 points

John Anderson said:
Or are you saying that the cache is specified at the pool level?

The cache attribute is specified during user allocation, at the individual buffer level, so there is no pre-knowledge of cacheable vs. non-cacheable buffers and associated grouping.

I'm not sure what you're looking for here. CMEM probably could be implemented in ways such as what you suggest, but at what cost and for what benefit amount? The memory savings offered by your suggestion come with the price of complexity (and inflexibility) in configuration. There is a large benefit in allowing simple pool geometry configuration - a given geometry can fit a wide range of needs, easier debug, and probably more. There are also the run-time costs of sub-page handling, and ensuring that cache line boundaries are considered, etc.

TI is/has been aware of these issues and weighed customer needs/concerns in reaching the current CMEM design. There is no one solution to fit everyone's needs, and customer experience has shown that the current CMEM design is a good balance of simplicity and functionality.

John Anderson said:
I'm still thinking that a buffer request is sometimes denied even if there are available buffers.

You're free to believe what you want, but what you state is simply not true. The CMEM architecture/implementation allows for *any* larger buffer to be chosen, and chosen in a best-fit way. CMEM has been used extensively so I would expect that a problem of this nature (i.e., not honoring design features) would have been identified by now. If you can demonstrate otherwise then I'm happy to look into it, but unless you have a clear, reproducable situation then I would suggest that you are misinterpreting your results.

Regards,

- Rob

0 John Anderson over 14 years ago in reply to Robert Tivy

Genius 5240 points

Robert Tivy said:
The cache attribute is specified during user allocation, at the individual buffer level, so there is no pre-knowledge of cacheable vs. non-cacheable buffers and associated grouping.
I'm not sure what you're looking for here.

That's what I wanted to know. The reason why I'm asking this question is because a lot of allocation is being done and the user is expected to understand how to tell CMEM to create pools of buffers. It seems that there is no drawback to allocating all the <4K buffers from a single pool of 4K buffers. I asked the question because I'm trying to understand the pros and cons.

Robert Tivy said:
CMEM probably could be implemented in ways such as what you suggest, but at what cost and for what benefit amount? The memory savings offered by your suggestion come with the price of complexity (and inflexibility) in configuration.

We don't seem to be on the same page. I'm thinking that allocating one single pool of 4K buffers is simpler and more flexible. The statement that loads cmem is full of misc buffer allocations.

However, I could see that allocating each size in it's own pool narrows down the "can't get a buffer" error to that specific pool, and won't interact with other different sized buffer allocations. Perhaps this is the "con" I'm looking for to shoot down my suggestion.

Robert Tivy said:
You're free to believe what you want, but what you state is simply not true.

I don't ask these question so I can believe what I want.

Robert Tivy said:
The CMEM architecture/implementation allows for *any* larger buffer to be chosen, and chosen in a best-fit way. CMEM has been used extensively so I would expect that a problem of this nature (i.e., not honoring design features) would have been identified by now. If you can demonstrate otherwise then I'm happy to look into it, but unless you have a clear, reproducable situation then I would suggest that you are misinterpreting your results.

That's good to know. It's difficult when you get an error saying no buffer is available, then look in /proc/cmem and find a bunch free.

The simple answer is that after the failure, the API calls have already freed the other buffers, so I'm not looking at cmem's usage at the point that it denied a request. I asking these questions because I want to make sure I understand what's going on. It's not like I believe it's a sure thing that all software acts like you think it does.

John A

0 Robert Tivy over 14 years ago in reply to John Anderson

TI__Mastermind 18260 points

John Anderson said:
We don't seem to be on the same page. I'm thinking that allocating one single pool of 4K buffers is simpler and more flexible. The statement that loads cmem is full of misc buffer allocations.

I guess I'm on a "cached" page and you're on a "non-cached" page :)

OK, I'm starting to see where your question is coming from, but not sure. Could you please offer some concrete pool geometry alternatives that you're considering? I'm assuming that you see some pre-canned configurations in a 'loadmodules.sh' script and are wondering if there is a more efficient geometry specification for those pool needs.

As a guess, though, are you wondering if, say:
pools=1x56,2x128
might be better specified as:
pools=3x4096
? I don't think it makes a difference here, either one would result in the same memory usage, and I would favor the former (exact specification) over the latter (general specification), since it exactly states the pool buffer needs, and could result in memory savings if the host system had pages smaller than 4096.

John Anderson said:

However, I could see that allocating each size in it's own pool narrows down the "can't get a buffer" error to that specific pool, and won't interact with other different sized buffer allocations. Perhaps this is the "con" I'm looking for to shoot down my suggestion.

The "can't get a buffer" error applies to the whole of CMEM buffers, not just the size that you're requesting (due to "request promotion to larger sizes" we have already discussed), so your "con" is not really there. Also, when you configure a pool of small buffers < page size, internally each individual buffer size gets rounded up to the page size, so if you configure "pools=1x56,2x128" then you are actually configuring 3x4096 and anyone of those 3 buffers becomes a candidate for any allocation <= 4096 in size.

John Anderson said:

That's good to know. It's difficult when you get an error saying no buffer is available, then look in /proc/cmem and find a bunch free.

The simple answer is that after the failure, the API calls have already freed the other buffers, so I'm not looking at cmem's usage at the point that it denied a request.

You're probably right. Stalling the application at the point of failure and before cleanup and then doing '% cat /proc/cmem' would help you see the state of CMEM at the failure point.

John Anderson said:
I asking these questions because I want to make sure I understand what's going on. It's not like I believe it's a sure thing that all software acts like you think it does.

Fair enough, and I'm just making sure that you do understand things correctly.

Regards,

- Rob

0 John Anderson over 14 years ago in reply to Robert Tivy

Genius 5240 points

Robert Tivy said:
so if you configure "pools=1x56,2x128" then you are actually configuring 3x4096 and anyone of those 3 buffers becomes a candidate for any allocation <= 4096 in size.

OK, this is an interesting piece of info. So any declared buffer, no matter how small is still eligible for up to a 4096 request.

The reason for all this questioning is that I have been having trouble with restarting my encoder and decoder several times without running out of buffers. I've gone over my code numerous times to find what I may not be freeing or closing, but not having any luck.

Apparently today is my lucky day, because I searched the forum and found a post regarding freed buffers not getting freed. One of the suggestions was that it was a problem in the DMAI and the fix was to put this in your app's config file....

xdc.useModule('ti.sdo.ce.osal.linux.Settings').maxCbListSize = 200; // default size is 100

After doing this I've been able to restart my encoder and decoder (both run simultaneously in the same app) numerous times and I haven't had a buffer allocation failure yet. I'm still keeping my fingers crossed, but this looks like the solution to my problem.

Thanks,

John A

0 Robert Tivy over 14 years ago in reply to John Anderson

TI__Mastermind 18260 points

John Anderson said:
So any declared buffer, no matter how small is still eligible for up to a 4096 request.

That's right. The requested size really only gets used for the purpose of display in /proc/cmem. When parsing the pool geometry from the 'insmod' command line, the buffer size gets immediately rounded up to the next PAGE_SIZE boundary and that rounded-up size is used from then on.

John Anderson said:

Apparently today is my lucky day, because I searched the forum and found a post regarding freed buffers not getting freed. One of the suggestions was that it was a problem in the DMAI and the fix was to put this in your app's config file....

xdc.useModule('ti.sdo.ce.osal.linux.Settings').maxCbListSize = 200; // default size is 100

After doing this I've been able to restart my encoder and decoder (both run simultaneously in the same app) numerous times and I haven't had a buffer allocation failure yet. I'm still keeping my fingers crossed, but this looks like the solution to my problem.

I have hope that that is your issue, since it fits into your situation.

It's usually very useful to throw the env var CE_DEBUG=2 or 3 to see this sort of issue (without going through previous threads regarding your CMEM issues I'm not sure that you did that). CE_DEBUG produces lots of output that's hard to parse, but it's good to paste here in the forum for perusal by a TI engineer that knows what to look for.

Regards,

- Rob

0 John Anderson over 14 years ago in reply to Robert Tivy

Genius 5240 points

Here's my quick response. I ran the encode/decode app and restarted the encoder once, then tried to restart my decoder.

In the app compiled with the maxCbListSize = 200, I got two of these statements....

@100,513,274us: [+7 T:0x44cf7490] OM - Memory_contigFree> Error: buffer (addr=1139363840, size=48) not found in translation cache

In the app not compiled with that statement, I got those two plus the following...

@40,179,780us: [+7 T:0x43cc9490] OM - Memory_contigFree> Error: buffer (addr=1137483776, size=1788) not found in translation cache
@40,180,005us: [+7 T:0x43cc9490] OM - Memory_contigFree> Error: buffer (addr=1137487872, size=4312) not found in translation cache
@40,180,214us: [+7 T:0x43cc9490] OM - Memory_contigFree> Error: buffer (addr=1137496064, size=4312) not found in translation cache
@40,180,403us: [+7 T:0x43cc9490] OM - Memory_contigFree> Error: buffer (addr=1137504256, size=20480) not found in translation cache
@40,180,593us: [+7 T:0x43cc9490] OM - Memory_contigFree> Error: buffer (addr=1137524736, size=2048) not found in translation cache
@40,180,781us: [+7 T:0x43cc9490] OM - Memory_contigFree> Error: buffer (addr=1137528832, size=1327872) not found in translation cache
@40,513,021us: [+7 T:0x43cc9490] OM - Memory_contigFree> Error: buffer (addr=1138913280, size=896) not found in translation cache
@40,513,307us: [+7 T:0x43cc9490] OM - Memory_contigFree> Error: buffer (addr=1138917376, size=51492) not found in translation cache
@40,513,523us: [+7 T:0x43cc9490] OM - Memory_contigFree> Error: buffer (addr=1138970624, size=51492) not found in translation cache
@40,513,721us: [+7 T:0x43cc9490] OM - Memory_contigFree> Error: buffer (addr=1139023872, size=64896) not found in translation cache
@40,513,910us: [+7 T:0x43cc9490] OM - Memory_contigFree> Error: buffer (addr=1139089408, size=32768) not found in translation cache
@40,514,159us: [+7 T:0x43cc9490] OM - Memory_contigFree> Error: buffer (addr=1139122176, size=65536) not found in translation cache
@40,514,353us: [+7 T:0x43cc9490] OM - Memory_contigFree> Error: buffer (addr=1139187712, size=512) not found in translation cache
@40,514,543us: [+7 T:0x43cc9490] OM - Memory_contigFree> Error: buffer (addr=1139191808, size=49152) not found in translation cache

Plus about 30 of the following (48, 56 bytes and 74 bytes)

@76,652,819us: [+7 T:0x45f7f490] OM - Memory_contigFree> Error: buffer (addr=1176444928, size=56) not found in translation cache

John A

0 Robert Tivy over 14 years ago in reply to John Anderson

TI__Mastermind 18260 points

John Anderson said:

In the app compiled with the maxCbListSize = 200, I got two of these statements....

@100,513,274us: [+7 T:0x44cf7490] OM - Memory_contigFree> Error: buffer (addr=1139363840, size=48) not found in translation cache

The presence of this message means you might still have a CMEM memory leak, although at a much slower rate than before. You can test this out by doing "%cat /proc/cmem" both before the test run and afterwards - the results should be identical. For a "quiet" system there should be only "free" buffers in that list (no "busy" buffers).

Regards,

- Rob

0 Robert Tivy over 14 years ago in reply to Robert Tivy

TI__Mastermind 18260 points

Robert Tivy said:

In the app compiled with the maxCbListSize = 200, I got two of these statements....

@100,513,274us: [+7 T:0x44cf7490] OM - Memory_contigFree> Error: buffer (addr=1139363840, size=48) not found in translation cache

Regards,

- Rob

[/quote]

I need to ammend my post above ...

Due to auto-cleanup of un-freed buffers, you probably won't see any "busy" buffers after the application run finishes. However, you could look for auto-freeing messages displayed by a "debug" build of cmemk.ko (you can build a "debug" version by descending into <linuxutils>/ti/sdo/linuxutils/cmem/src/module and typing "%make debug" and the resultant cmemk.ko will produce lots of debug output on the console). Such a message will contain the text:
CMEMK Debug: busy entry(s) found

Regards,

- Rob

Processors

Processors forum

CMEM allocations and page size question