From where does dma_alloc_coherent() attempt to get memory?

Helmut Forren

From where does dma_alloc_coherent() attempt to get memory and how does it relate to the memory map described at http://processors.wiki.ti.com/index.php/Changing_the_DVEVM_memory_map ?

Thanks,

Helmut

over 14 years ago

0 Robert Tivy over 14 years ago

TI__Mastermind 18260 points

dma_alloc_coherent() is a Linux kernel API. It allocates memory suitable for a DMA operation. The memory would come from somewhere in the physical address space that's granted to the Linux kernel via the u-boot bootargs parameter mem=##. So, the memory returned by that API doesn't relate to the wiki doc to which you linked.

Regards,

- Rob

0 Helmut Forren over 14 years ago in reply to Robert Tivy

Mastermind 6805 points

Rob (Chris),

This is very helpful. I'm having a dma_alloc_coherent() failure and now I know I need to increase the kernel memory.

Please clarify one thing for me please. Figure 4 from my link shows example "Linux (52MB)" running from 0x80000000 to 0x83400000. You implied that the kernel memory and this Linux memory are separate. If so, and if the kernel memory starts at 0x80000000, then for my purposes I should revise figure 4 in my mind's eye to show kernel memory starting at 0x80000000 and then Linux (runtime, not kernel?) having much less. Or, contrary to all that, is the "Linux (52MB)" on figure 4 the same memory as that referred to in the u-boot bootargs mem=##?

I ask because, in trying to get the canny edge detection example working (http://processors.wiki.ti.com/index.php/C64x%2B_iUniversal_Codec_Creation_-_from_memcpy_to_Canny_Edge_Detector#Memory_Management), I find that memory is very tight. If the kernel memory is truly outside figure 4, then it's going to be tough. If the kernel memory is actually the same as the yellow "Linux (52MB)", then my life will be easier.

Thanks very much,

Helmut

0 Helmut Forren over 14 years ago in reply to Helmut Forren

Mastermind 6805 points

Rob (Chris):

I deduce from cmemk experiments that the kernel memory and the "Linux" memory from figure 4 must be the same memory. Lowering the phys_start parameter for cmemk.ko in loadmodules.sh quickly begins to give an error about overlapping kernel. So either the "Linux" memory from figure 4 is near zero (doubtful) or it's the same memory as referenced by the U-boot bootargs mem=## setting (likely).

I changed my bootargs from mem=112M to mem=128M, which pushed me right up below the maximum possible cmemk phys_start=0x88000000, without having to reduce the cmemk pools or server.tcf memory map as defined by the canny example. (Note the original canny loadmodules.sh had phys_start=0x80000000, but adding up the pools suggested I could raise it as much as I did, at the expense of cmemk heap I guess.)

Now my original dma_alloc_coherent() failure PERSISTS unchanged: "vpif_capture vpif_capture: dma_alloc_coherent size 4149248 failed". My change from bootargs change 112M to 128M should have made an additional 16M available, far more than the 4M dma_alloc_coherent request. However, I suspect the bootargs is a minimum, and per my reading of the memory map link I posted here, Linux/kernel is getting everything left over. So for a given server.tcf memory map and cmemk setup, the memory available to dma_alloc_coherent doesn't change, in spite of changes to bootargs.

Either way, my dma_alloc_coherent() still fails. I'm not sure if it should even be trying to allocate this much. I'm not sure if this might be a linux memory fragmentation problem (where dma_alloc_coherent wants contiguous memory (note "coherent" may just be an erroneous choice of terms, and "contiguous" is implied by the name)).

QUESTION: I'm wondering about setting back to bootargs mem=112M, loading cmemk earlier to sit right on top of that, then unloading and reloading cmemk with a higher phys_start... all in an attempt to make sure there's a big free block of contiguous memory for the 4M alloc. Might that work, or is it nonsense? (Going out on a limb here, maybe the Linux ?mmu? gets set up by the bootargs and does NOT have the ability to dynamically follow the impact of cmemk and adjust.)

Note: My EVM won't boot to NFS as root, so I've been booting with /dev/hda1 as root, then mounting the NFS elsewhere. I then test canny that way. Just in case this NFS has been eating too much memory, I copied my current stuff to the hard disk and rebooted without NFS mounted. Didn't help. Same dma_alloc_coherent error.

QUESTION: I'm able to get the following "backtrace". Does it suggest anything?

canny: page allocation failure. order:10, mode:0xd0

Backtrace:

[<c00307d0>] (dump_backtrace+0x0/0x114) from [<c030af14>] (dump_stack+0x18/0x1c)

r7:0000000a r6:000000d0 r5:00000000 r4:00000000

[<c030aefc>] (dump_stack+0x0/0x1c) from [<c007af00>] (__alloc_pages_nodemask+0x4

9c/0x4fc)

[<c007aa64>] (__alloc_pages_nodemask+0x0/0x4fc) from [<c00320dc>] (__dma_alloc+0

x164/0x400)

[<c0031f78>] (__dma_alloc+0x0/0x400) from [<c0032404>] (dma_alloc_coherent+0x58/

0x64)

[<c00323ac>] (dma_alloc_coherent+0x0/0x64) from [<c0247258>] (__videobuf_mmap_ma

pper+0x118/0x224)

r7:c57f8880 r6:c519720c r5:c47f5a64 r4:c560e630

[<c0247140>] (__videobuf_mmap_mapper+0x0/0x224) from [<c02452ac>] (videobuf_mmap

_mapper+0x64/0x94)

r8:00000000 r7:c57f8880 r6:4404d000 r5:c560e630 r4:c519720c

[<c0245248>] (videobuf_mmap_mapper+0x0/0x94) from [<c0249440>] (vpif_mmap+0x1c/0

x20)

r5:c560e630 r4:c57f8880

[<c0249424>] (vpif_mmap+0x0/0x20) from [<c023cc88>] (v4l2_mmap+0x40/0x4c)

[<c023cc48>] (v4l2_mmap+0x0/0x4c) from [<c008f76c>] (mmap_region+0x220/0x42c)

r5:c560e630 r4:000000ff

[<c008f54c>] (mmap_region+0x0/0x42c) from [<c008fc4c>] (do_mmap_pgoff+0x2d4/0x33

[<c008f978>] (do_mmap_pgoff+0x0/0x334) from [<c0030008>] (do_mmap2+0x94/0xc4)

[<c002ff74>] (do_mmap2+0x0/0xc4) from [<c002cec0>] (ret_fast_syscall+0x0/0x28)

Mem-info:

DMA per-cpu:

CPU 0: hi: 42, btch: 7 usd: 0

active_anon:175 inactive_anon:175 isolated_anon:0

active_file:16 inactive_file:241 isolated_file:0

unevictable:0 dirty:3 writeback:9 unstable:0 buffer:26

free:23333 slab_reclaimable:239 slab_unreclaimable:994

mapped:44 shmem:25 pagetables:58 bounce:0

DMA free:93332kB min:1348kB low:1684kB high:2020kB active_anon:700kB inactive_an

on:700kB active_file:64kB inactive_file:964kB unevictable:0kB isolated(anon):0kB

isolated(file):0kB present:113792kB mlocked:0kB dirty:12kB writeback:36kB mappe

d:176kB shmem:100kB slab_reclaimable:956kB slab_unreclaimable:3976kB kernel_stac

k:360kB pagetables:232kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned

:0 all_unreclaimable? no

lowmem_reserve[]: 0 0 0

DMA: 17*4kB 164*8kB 117*16kB 79*32kB 52*64kB 32*128kB 23*256kB 25*512kB 22*1024k

B 19*2048kB 0*4096kB 0*8192kB 0*16384kB = 93332kB

282 total pagecache pages

28672 pages of RAM

23414 free pages

2933 reserved pages

841 slab pages

111 pages shared

0 pages swap cached

vpif_display vpif_display: dma_alloc_coherent size 4149248 failed

0 Helmut Forren over 14 years ago in reply to Helmut Forren

Mastermind 6805 points

I'm at the end of my rope... again. Is it possible for someone to call me at 678-919-1093?

Thanks very much,

Helmut

0 Helmut Forren over 14 years ago in reply to Helmut Forren

Mastermind 6805 points

Got past this roadblock and onto the next...

Without any doc, but reading http://e2e.ti.com/support/dsp/davinci_digital_media_processors/f/100/p/7946/310202.aspx#310202, deduced Capture_create() first parameter is NULL in canny's capture.c such that it's relying on buffers already created by driver. But compared to working encodedecode's capture.c which passes an hBufTab as first parameter. Looked over code. Took a chance. Wholesale replaced canny's capture.c with encodedecode's. alloc error went away. I see repeating "UNIVERVAL_process" status output on terminal (stdout I guess). It looks like the thing is running.

Of course, my output is black. So this is my next roadblock. Please feel free to make suggestions.

Thanks,

Helmut

0 Helmut Forren over 14 years ago in reply to Helmut Forren

Mastermind 6805 points

FYI, went into video.c and bypassed call to UNIVERSAL_process() and replaced it with a simple memcpy() to copy the input buffer to the destination buffer.

My video monitor now shows my raw camera input. Yippee! This means my capture and display threads and setups are working properly.

This also means UNIVERSAL_process() is not working properly. Of course, the canny edge detector example comes defaulting to D1 PAL and has comments that it won't work for 720P. I've modified the front in to allow my 720P 60Hz settings through, and now my capture/display is supporting it. But obviously UNIVERSAL_process() itself in the codec is not supporting it properly. I see comments outside the UNIVERSAL_process() call about "luma" and "420P". It's now time to get into how the codec actually works, and figure out how to make it work for 720P 60Hz, which I assume is coming across in YUV422 format. Dunno yet.

Next expected roadblock: Getting universal_canny_ires codec sourcecode to compile properly. Note I believe I've been using a pre-made package that came in iUniversal_Canny_C64x_release_1_0\dvsdk_2_00_00_22_Canny_iUniversal\dm6467_dvsdk_combos_2_05\packages\ti\fae and not source code. Note dvsdk 2.00. The only source is in iUniversal_Canny_C64x_release_1_0\dvsdk_1_11_00_00_universal_codecs_source\dsp_alg\universal_canny_ires. Note dvsdk 1.11. We'll see what happens...

-Helmut

0 Iain Hunter over 14 years ago in reply to Helmut Forren

TI__Genius 16130 points

Helmut,

Based on your experience the wiki has been updated with some new comments

http://processors.wiki.ti.com/index.php/C64x%2B_iUniversal_Codec_Creation_-_from_memcpy_to_Canny_Edge_Detector#Migrating_the_demo_applications_to_another_DVSDK

Iain

0 Robert Tivy over 14 years ago in reply to Helmut Forren

TI__Mastermind 18260 points

Helmut,

"Linux memory" and "kernel memory" typically refer to the same thing. When booting Linux, the mem=## parameter determines the total of memory managed by the Linux kernel. This memory total includes pages granted to user processes, as well as all sorts of subdivision among specific kernel usage. "mem=##" just determines total memory size, and the base of this memory block is the start of DDR memory, which typically is 0x80000000. So, when you say mem=52M, that means that Linux is being given all memory from 0x80000000 -> 0x83400000. The kernel parcels this memory up as it sees fit, but doesn't handle *any* memory outside that block, including all memory above 0x83400000. In essence, the kernel doesn't even know that memory goes beyond that.

The non-Linux memory is subdivided into the regions specified in the Figure 4 that you reference. Each subdivided section can go anywhere, for the most part, and Figure 4 shows a scheme which places CMEM immediately after the Linux memory. The CMEM kernel module (cmemk.ko) does tell Linux about the CMEM memory block, but really only for the purpose of memory ownership, so Linux will not assume management or ownership of the CMEM memory - it is completely managed by cmemk.ko.

The memory managed by dma_alloc_coherent() would definitely come from the block granted to Linux. In one of your followon posts you ask why increasing mem= from 112M to 128M didn't help this allocation failure. I can't really answer that since I'm not familiar with the details of memory subdividing in Linux, but it's quite possible that the extra memory you granted to Linux was put in some memory pool other than the one managed by dma_alloc_coherent(). The trick for your issue is to find out how to specifically increase the memory available to dma_alloc_coherent(). Unfortunately I don't have that knowledge and would have to research it extensively, but perhaps someone else reading this thread could comment.

Regards,

- Rob

0 Helmut Forren over 14 years ago in reply to Robert Tivy

Mastermind 6805 points

Rob,

Thanks very much for your fine explanation.

One additional question about this, then. If CMEM and the others (ref server.tcf (spelling?)) start at a GAP above the mem=## implied ending, then is that memory UNUSED (aka WASTED)? I deduce that this is the case. This means that when playing with the memory map, one definitely needs to consider all those things described in my first link, PLUS the bootargs mem=## value. Because of this, to help others understand, I strongly suggest that an appropriate person (you?) edit the wiki link to point this out. Please note that, as I understand it and as likely other readers would understand it, but which you have still not explicitly agreed, the yellow block on figure 4 is indeed the memory out of which the linux kernel gets a mem=## amount. And if ## is less than the size of the yellow block, then there will be unused (wasted) memory -- color it gray -- at the top of the yellow block. I believe pointing this out is important.

For myself, please note I fixed the problem by avoiding it. I had copied the canny edge detection sample and, while I did examine it and compare it to encodedecode, I didn't yet know which differences were needed and which not. I later discovered capture.c was calling Capture_create() with no buffer pointer, and this was leading to the dma_alloc_coherent() call. My understanding from elsewhere is that doing so causes the [driver] to use some pre-allocated space. I believe the cause of the alloc failure was that no space was pre-allocated. This is a little cockeyed from your saying dma_alloc_coherent memory comes from linux [in general, my inferrence]. More precisely, I think it's coming from something pre-allocated and for which I wasn't successfully setting up. Instead of fixing all this, I found that the encodedecode sample, old original to canny and newly upgraded to DVSDK3.10, does its own buffer alloc and provides them to Capture_create(). By adopting the encodedecode's capture.c wholesale, I "upgraded" myself and the problem went away.

Just a few minutes ago, I finally got me-processed video on my output monitor, where "me-processing" was a simple top/bottom sub-frame swap of the image. (The canny edge detection still isn't working because it's deeply coded to depend on video format, and in upgrading to DM6467T, that original format is not available (at least not with my video source).) But the point is, I've finally gotten to the point of doing what I'm supposed to be doing, and that's working on my own codec, rather than installing linux, building, fighting with env vars and setups, etc, etc, etc. Happy time.

-Helmut

0 Robert Tivy over 14 years ago in reply to Helmut Forren

TI__Mastermind 18260 points

Helmut Forren said:
One additional question about this, then. If CMEM and the others (ref server.tcf (spelling?)) start at a GAP above the mem=## implied ending, then is that memory UNUSED (aka WASTED)? I deduce that this is the case.

Yes, any gap between the end of Linux memory (implied by mem=##) and the start of the next "user" of DDR memory (CMEM, as is the case for that Figure 4) is completely wasted.

Helmut Forren said:
Please note that, as I understand it and as likely other readers would understand it, but which you have still not explicitly agreed, the yellow block on figure 4 is indeed the memory out of which the linux kernel gets a mem=## amount. And if ## is less than the size of the yellow block, then there will be unused (wasted) memory -- color it gray -- at the top of the yellow block. I believe pointing this out is important.

The yellow block, as I see it, *is* the memory specified by the mem=## bootargs parameter. If one were to reduce mem=## and not change anything else, I would draw that figure as having a smaller yellow block, followed by some gap representation, then the CMEM memory block.

Helmut Forren said:
I believe the cause of the alloc failure was that no space was pre-allocated.

Either that, or all dma_alloc_coherent() memory has already been taken by some other kernel entity.

It's great to hear that it's "Happy time" for you :) Thankyou for all the perspective you've provided, it already has enabled us to improve some documentation and more improvements will follow shortly.

Regards,

- Rob

0 Helmut Forren over 14 years ago in reply to Robert Tivy

Mastermind 6805 points

Rob,

My understanding of this is now complete. In hindsight, I see where the wiki link DOES mention "bootargs" and "kernel" in the section "Changing the boot argument in your Linux bootloader". However, it's low down and not at all obvious. I added a comment regarding wasted space.

In reviewed hindsight, pointing out wasted space if bootargs mem=## is too small has parallels with the equally wasted space if, in the context of the link, CMEM and dsplink areas were not equally adjacent. So the "wasted space" question is bigger than just bootargs. Meanwhile, all the memory usage specifications EXCLUDING bootargs are discussed throughout the article and in depth, while bootargs is only addressed by two meager lines.

I STRONGLY recommend this article be revised. Search for all occurrences of "linux" (my browser counts 52), and in frequent cases where appropriate, rewrite the text slightly to imply that the bootargs mem=## setting controls here. Then, promote or duplicate the section that does detail bootargs to a position equivalent to the detailed descriptions of the others. And finally, there's a general UPDATE needed anyway. Just like Jsarao commented, I was in the dark and having to assume regarding dsplink 1.64.

Thanks,

Helmut

0 Marlon Smith over 14 years ago in reply to Helmut Forren

Intellectual 300 points

Hi Helmut,

I was wondering if you could give me a quick example of a set of bootargs and a set of arguments for insmod cmemk.ko that will allow gstreamer to play back a 720p file. I have tried many different settings, but I keep ending up with errors like this:

VICP Error: init: Failed to openCMEMK Error: Failed to find a pool which fits 28672

even when I think I have allocated several pools larger than that.

Thanks!

Marlon

0 Helmut Forren over 14 years ago in reply to Marlon Smith

Mastermind 6805 points

I've never run gstreamer, but here's what I recommend. This is most likely a "human error", where you've missed something in your count of buffer usage. It could also be a "swiss cheese" or "napsack" problem, where the order in which buffer requests got satisfied caused sufficient framentation of your buffers, that there's no longer a large enough one available.

Rely on the command "cat /proc/cmem" to get an output of current cmem buffer usage. Put calls in your script as close to immediately before and after your failure. This may help you see what's not correct, after which you may follow through with calls elsewhere. If getting "close" means inside a program for which you have source, use "system("cat /proc/cmem");" in C language.

0 Helmut Forren over 14 years ago in reply to Helmut Forren

Mastermind 6805 points

Oh, also, I don't think it's about bootargs. I believe all you do with bootargs that's germane is set up the Linux memory usage. You're not getting an error from CMEM that you're overlapping that. You may, however, find you need to reduce the "mem=##" setting in the bootargs, and then lower the cmemk.ko "phys_start" address to match.

Also, study http://processors.wiki.ti.com/index.php/Changing_the_DVEVM_memory_map and http://processors.wiki.ti.com/index.php/CMEM_Overview.

0 Marlon Smith over 14 years ago in reply to Helmut Forren

Intellectual 300 points

Thanks Helmut, I really appreciate the help! I'll look into those things.

Cheers

Marlon

Processors

Processors forum

From where does dma_alloc_coherent() attempt to get memory?