This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Linux/AM5728: remoteproc resource failure when loading DSP firmware

Part Number: AM5728
Other Parts Discussed in Thread: BEAGLEBOARD-X15,

Tool/software: Linux

Hi!

We have a setup with a couple of Beagleboard-X15 that is used in a Continous Integration setup where we use the boards to run unit tests both on one of the ARM A15 cores and on one of the TI C66 DSPs.

Running ARM unittests remotely using SSH works flawless.

Running on the DSP works most of the time, but the load of the DSP firmware sometimes intermittently fails with the following traces in dmesg:

 

[13082.619143] omap-rproc 40800000.dsp: assigned reserved memory node dsp1_cma@88c00000
[13082.634519] remoteproc remoteproc2: 40800000.dsp is available
[13082.653586] remoteproc remoteproc2: powering up 40800000.dsp
[13082.661259] remoteproc remoteproc2: Booting fw image dra7-dsp1-fw.xe66, size 4550324
[13082.676767] omap_hwmod: mmu0_dsp1: _wait_target_disable failed
[13082.682661] omap-iommu 40d01000.mmu: 40d01000.mmu: version 3.0
[13082.688629] omap-iommu 40d02000.mmu: 40d02000.mmu: version 3.0
[13082.706083] alloc_contig_range: 4 callbacks suppressed
[13082.706088] alloc_contig_range: [88c00, 88c03) PFNs busy
[13082.719361] alloc_contig_range: [88c00, 88c03) PFNs busy
[13082.725893] alloc_contig_range: [88d00, 89100) PFNs busy
[13082.731932] alloc_contig_range: [88e00, 89200) PFNs busy
[13082.738122] alloc_contig_range: [88e00, 89300) PFNs busy
[13082.743757] alloc_contig_range: [89000, 89400) PFNs busy
[13082.754792] alloc_contig_range: [89500, 8c500) PFNs busy
[13082.761321] alloc_contig_range: [89600, 8c600) PFNs busy
[13082.767575] alloc_contig_range: [89600, 8c700) PFNs busy
[13082.773787] alloc_contig_range: [89800, 8c800) PFNs busy
[13082.781919] omap-rproc 40800000.dsp: failed to allocate dma memory: len 0x3000000
[13082.789479] remoteproc remoteproc2: Failed to process resources: -12
[13082.803299] omap_hwmod: mmu1_dsp1: _wait_target_disable failed
[13082.815916] omap_hwmod: mmu0_dsp1: _wait_target_disable failed

All unit tests are compile with the same config.bld and resource table (we are using a custom resource table).

In total we run ~120 unittests and in a normal run of those, 0-5 of them fails during load with similar messages as above. (The trace "Failed to process resources: -12" is always there!).

We use the remoteproc filesystem interface to load the firmware meaning that:

  1. Copy firmware binary to /lib/firmware/dra7-dsp1-fw.xe66
  2. bind firmware by echoing "40800000.dsp" to /sys/bus/platform/drivers/omap-rproc/bind
  3. Poll /sys/kernel/debug/remoteproc/remoteproc2/state until it's "running", if it isn't within 15 seconds, unbind firmware (same echo but to unbind), wait 20 seconds and then continue from 1 again. (We do maximum 5 retries like this)
  4. Monitor trace0 until some "end-string" is received.
  5. Unbind by echoing "40800000.dsp" to /sys/bus/platform/drivers/omap-rproc/unbind

All of the above commands are executed from a remote server using SSH.

Most out of all 120 unittests works flawless, and sometimes one retry is enough, but at too many occasions, all of the 5 retries fails...

We are using Processor SDK 04.03.00.05 with CMEM enabled and CMA for DSP1 CMA reconfigured via a DTSI-file to 64 MiB starting at 0x88c00000.

  • Hi, Marcus,

    Do the failure messages show up for a particular test case? If it does, what kind of test is it?
    or if you can narrow it down to a single test or a few cases when it happens?

    In the logs, I am interested in the error showing DSP failed to allocate DMA memory for len: 3000000.
    Could you check which area this allocate is from, and is it out of resource table defined area?
    It will make more sense if this error happens to a particular type of test cases when resource table covers most,
    but overlooks a few cases.

    The SDK 4.3 is a bit old. The latest is 5.2 (kernel 4.14). You may want to consider to upgrade to the latest.

    Rex
  • No, it can show up in any testcase or for any binary and before the actual test is launched. This happends during the firmware load before any execution has started.

    Yes, that 0x3000000 is the size of the DATA part of the memory set to 48 MiB in the resource table and exactly the same size in config.bld. It is a carveout in the resourcetable. What is even more interesting is that sometimes (much less often, only seen it a couple of times among ~500 runs) it has problem with some other section of size 1 MiB. To me this seems to be some resource cleanup issue in the linux platform, because it resolved eventually without restarting the board although it may take a couple of failed tests (tests not loading with the same problem) before it recovers.

    What do you mean with "when resource table covers most, but overlooks a few cases", what does the resource table not cover? All our tests are compiled and linked with the same config.bld and the same resource table.

    Yes I know that that SDK is a bit old, but we are stuck on that due to our client not wanting to upgrade... If we can confirm that this problem is fixed in newer SDK, then perhaps we would have more arguments towards our client. We have tried with newer SDK ourselfs but did ran into other problems...

  • Hi, Marcus,

    Another thing to try is have a 15~30 secs wait between step 5 and next step 1. I want to exclude the shutdown is complete before the next start.
    I'll be out of office tomorrow. My response may be slow, but will be back in office Monday.

    Rex
  • That could be an idea worth trying. Is there any filesystem object that could be monitored after unbind instead of a delay to make sure that the DSP is fully "unloaded" and all resources are freed.

    Currently step 1-3 could take as much as (15+20)*5 seconds. Although all retries might put some load on the resource allocation system postponing free jobs...
  • Hi,Marcus,

    In the past, we had other customer doing unbind/bind test in a for loop. Though the system did the best to release all resources, but it isn't designed for such frequent behavior of unbind and bind to reload the DSP . At the time, we suggested the other customer to try everything gracefully with a wait. The longer than I would expect delay between step 1 and 3 is a bit surprising, but it could be added up by the first miss followed by a retry after 15 secs. It may add to your elapse time for running a compete unit tests, but could you start with 30 secs wait to see if all tests passed, then reduce the wait to to find the minimal time required. If adding wait doesn't improve anything, then we should look into it further.

    Rex

  • Hi!

    I did try with always waiting 30 seconds after an unbind, but this resulted in no obvious improvement. Then I tried with 60 seconds instead, still no improvement.

    /Marcus

  • Hi, Marcus,

    Let me discuss this internally to see if anyone has a clue. It seems to me that something is causing the page not being isolated. What I can't understand is the error is not sticky to a particular case. Otherwise, it could be a configuration error, etc. I'll see what I get from internally.

    Rex
  • Hi Marcus,

    Also, as an experiment, can you please try to reproduce the issue with the standard IPC test applications ? - processors.wiki.ti.com/index.php

    Thanks and Regards,
    Piyali
  • Hi Marcus,

    Please share the resource table you are using for the DSP binary and the corresponding DTB entry in the kernel device tree.

    regards,
    Venkat
  • Hi, Marcus,

    Do you use the kernel as what was released in the SDK or had made changes or been doing stable merge to it?
    Are you using CMA in HighMem or in linear memory (<760 MB from 0x80000000) at least appears to be lowmem only from PFN numbers?

    The particular PFNs busy trace is a classic symptom of CMA allocation failures (unable to re-allocate those big chunks of memory) - tends to happen with high memory utilization systems and repeated remoteproc reboots (memory freed during stop/remove, and re-allocated during probe/start).

    We have seen a similar report on 4.14 kernel which was caused by a stable merge commit,
    commit af2f729e5d890965c3f1a6e557660e7653452718 . If you have done a stable merge, is your kernel suffering from a similar patch/issue?

    Please also check if you are still experiencing issues switching to a sysfs start/stop method instead of the sysfs bind/unbind method.
    echo stop > /sys/class/remoteproc/remoteprocX/state
    echo start > /sys/class/remoteproc/remoteprocX/state

    The name of fthe processors can be lookde up in debugfs to know which remoteproc needed to control
    cat /sys/kernel/debug/remoteproc/remoteprocX/name

    We had fixed a few critical issues related to DSP CMA pools in HighMem and incorrect cleanup as part of another e2e forum, e2e.ti.com/.../710086. Please see the commits listed as part of the resolution.

    One obvious solution of course is not to use MCA pools for the rempoteprocs if you know for sure that you are always going to use that memory. That can be done by changing the "reusable" property in the corresponding reserved-memory nodes to "no-map". This removes the contention for that memory from Linux kernel altogether.

    Rex
  • Marcus,

    If you are implementing the last suggestion in Rex's reply above, below is an example you could use to not use CMA pools.

    git.omapzoom.org/

    At the end of your dts file, do

    #define NO_MAP(label) &label { \
    /delete-property/ reusable; \
    no-map; }
    NO_MAP(dsp2_cma_pool);

    for each core you are using in your tests.

    regards,
    Venkat
  • We have tried with
    echo stop > /sys/class/remoteproc/remoteprocX/state
    echo start > /sys/class/remoteproc/remoteprocX/state

    We see the exact same behaviour.

    We build our processor SDK like this:

    git clone git://arago-project.org/git/projects/oe-layersetup.git tisdk
    cd tisdk

    # For processor sdk 04_03_00_05
    ./oe-layertool-setup.sh -f configs/processor-sdk/processor-sdk-04.03.00.05-config.txt

    We then apply two patch-files before we build.

    The first patch we apply in tisdk/sources/meta-ti. This patch adds 12MiB of CMEM @ 0x88000000 and reconfigures DSP1 CMA to 64 MiB @ 0x88C00000 for both cmem-am572x.dtsi and cmem-dra72x.dtsi

    From 08d0fd011fac8737b5c783b162364c62a86184f4 Mon Sep 17 00:00:00 2001
    From: Enver Sultanov <enver.sultanov@smarteye.se>
    Date: Tue, 4 Dec 2018 13:47:51 +0100
    Subject: [PATCH] META-TI: Changes to build yocto for am57xx/dra7xx
    
    ---
     recipes-kernel/linux/cmem.inc                      |  4 +-
     recipes-kernel/linux/files/dra7xx/cmem-am572x.dtsi | 43 ++++++++++++++++++++++
     recipes-kernel/linux/files/dra7xx/cmem-dra72x.dtsi | 21 +++++++++--
     3 files changed, 63 insertions(+), 5 deletions(-)
     create mode 100644 recipes-kernel/linux/files/dra7xx/cmem-am572x.dtsi
    
    diff --git a/recipes-kernel/linux/cmem.inc b/recipes-kernel/linux/cmem.inc
    index 64d3264..80b922e 100644
    --- a/recipes-kernel/linux/cmem.inc
    +++ b/recipes-kernel/linux/cmem.inc
    @@ -12,11 +12,12 @@ CMEM_DTSI = "cmem.dtsi"
     CMEM_DTSI_am571x = "cmem-am571x.dtsi"
     CMEM_DTSI_dra71x = "cmem-dra71x.dtsi"
     CMEM_DTSI_dra72x = "cmem-dra72x.dtsi"
    +CMEM_DTSI_am572x = "cmem-am572x.dtsi"
     
     # Split device trees between variants
     CMEM_DEVICETREE = "${KERNEL_DEVICETREE}"
     CMEM_DEVICETREE_am571x = "am571x-idk.dtb am571x-idk-lcd-osd101t2045.dtb am571x-idk-lcd-osd101t2587.dtb"
    -CMEM_DEVICETREE_am572x = "am57xx-beagle-x15.dtb am57xx-beagle-x15-revb1.dtb \
    +CMEM_DEVICETREE_am572x = "am57xx-beagle-x15.dtb am57xx-beagle-x15-revb1.dtb am57xx-beagle-x15-revc.dtb \
                               am57xx-evm.dtb am57xx-evm-cam-mt9t111.dtb am57xx-evm-cam-ov10635.dtb \
                               am57xx-evm-reva3.dtb am57xx-evm-reva3-cam-mt9t111.dtb am57xx-evm-reva3-cam-ov10635.dtb \
                               am572x-idk.dtb am572x-idk-lcd-osd101t2045.dtb am572x-idk-lcd-osd101t2587.dtb"
    @@ -50,7 +51,6 @@ python do_unpack() {
     
     python do_setup_cmem() {
         import shutil
    -
         old_overrides = d.getVar('OVERRIDES', False)
     
         if d.getVar('RESERVE_CMEM', True) is '1':
    diff --git a/recipes-kernel/linux/files/dra7xx/cmem-am572x.dtsi b/recipes-kernel/linux/files/dra7xx/cmem-am572x.dtsi
    new file mode 100644
    index 0000000..a685da2
    --- /dev/null
    +++ b/recipes-kernel/linux/files/dra7xx/cmem-am572x.dtsi
    @@ -0,0 +1,43 @@
    +/ {
    +        reserved-memory {
    +                #address-cells = <2>;
    +                #size-cells = <2>;
    +                ranges;
    +
    +                cmem_block_mem_0: cmem_block_mem@88000000 {
    +                        reg = <0x0 0x88000000 0x0 0x00c00000>;
    +                        no-map;
    +                        status = "okay";
    +                };
    +
    +                /delete-node/ dsp1_cma@99000000;
    +                dsp1_cma_pool: dsp1_cma@88c00000 {
    +			compatible = "shared-dma-pool";
    +			reg = <0x0 0x88c00000 0x0 0x4000000>;
    +			reusable;
    +			status = "okay";
    +		};
    +        };
    +
    +        ocp {
    +          dsp1 {
    +                  memory-region = <&dsp1_cma_pool>;
    +                  };
    +        };
    +
    +        cmem {
    +                compatible = "ti,cmem";
    +                #address-cells = <1>;
    +                #size-cells = <0>;
    +
    +		#pool-size-cells = <2>;
    +
    +                status = "okay";
    +
    +                cmem_block_0: cmem_block@0 {
    +                        reg = <0>;
    +                        memory-region = <&cmem_block_mem_0>;
    +                        cmem-buf-pools = <3 0x0 0x00400000>;
    +                };
    +        };
    +};
    diff --git a/recipes-kernel/linux/files/dra7xx/cmem-dra72x.dtsi b/recipes-kernel/linux/files/dra7xx/cmem-dra72x.dtsi
    index ebd6129..db96704 100644
    --- a/recipes-kernel/linux/files/dra7xx/cmem-dra72x.dtsi
    +++ b/recipes-kernel/linux/files/dra7xx/cmem-dra72x.dtsi
    @@ -4,11 +4,26 @@
                     #size-cells = <2>;
                     ranges;
     
    -                cmem_block_mem_0: cmem_block_mem@a0000000 {
    -                        reg = <0x0 0xa0000000 0x0 0x0c000000>;
    +                /delete-node/ cmem_block_mem@a0000000;
    +                cmem_block_mem_0: cmem_block_mem@88000000 {
    +                        reg = <0x0 0x88000000 0x0 0x00c00000>;
                             no-map;
                             status = "okay";
                     };
    +
    +                /delete-node/ dsp1_cma@99000000;
    +                dsp1_cma_pool: dsp1_cma@88c00000 {
    +			compatible = "shared-dma-pool";
    +			reg = <0x0 0x88c00000 0x0 0x4000000>;
    +			reusable;
    +			status = "okay";
    +		};
    +        };
    +
    +        ocp {
    +          dsp1 {
    +                  memory-region = <&dsp1_cma_pool>;
    +                  };
             };
     
             cmem {
    @@ -23,7 +38,7 @@
                     cmem_block_0: cmem_block@0 {
                             reg = <0>;
                             memory-region = <&cmem_block_mem_0>;
    -                        cmem-buf-pools = <1 0x0 0x0c000000>;
    +                        cmem-buf-pools = <3 0x0 0x00400000>;
                     };
             };
     };
    -- 
    2.7.4
    
    

    The second patch we apply in tisdk/sources/meta-arago. This patch enables a couple of packages, for example avahi, cmem and openssh.

    From 5f076df7d4b0080bf0fc3841eaca413de8ce3f1e Mon Sep 17 00:00:00 2001
    From: Enver Sultanov <enver.sultanov@smarteye.se>
    Date: Tue, 4 Dec 2018 13:56:03 +0100
    Subject: [PATCH] META-ARAGO: Changes to build yocto for am57xx/dra7xx
    
    ---
     meta-arago-distro/recipes-core/images/arago-base-tisdk-image.bb | 5 +++++
     1 file changed, 5 insertions(+)
    
    diff --git a/meta-arago-distro/recipes-core/images/arago-base-tisdk-image.bb b/meta-arago-distro/recipes-core/images/arago-base-tisdk-image.bb
    index e926724..31d3709 100644
    --- a/meta-arago-distro/recipes-core/images/arago-base-tisdk-image.bb
    +++ b/meta-arago-distro/recipes-core/images/arago-base-tisdk-image.bb
    @@ -11,6 +11,11 @@ IMAGE_INSTALL += "\
         packagegroup-arago-console \
         packagegroup-arago-base-tisdk \
         packagegroup-arago-test \
    +    packagegroup-core-ssh-openssh \
    +    avahi-daemon avahi-utils \
    +    cmem \
    +    dsptop \
    +    packagegroup-core-tools-debug \
         ${VIRTUAL-RUNTIME_initramfs} \
         "
     
    -- 
    2.7.4
    
    

    Processor SDK 04_03_00_05 uniquely identifies a Linux kernel version as well I hope. We have not done any other changes than the above two patches to the platform including Linux kernel.

  • Hi Marcus,

    I have looked through your OE patch, where you were adding a DSP CMA pool at 0x88c00000. Are you not seeing any CMA initialization failures in your boot log? Your new DSP CMA pool is not on a 8 MB aligned address, but instead a 4 MB aligned address. I would recommend you to swap your CMEM and DSP CMA pools as that would give a better alignment for your DSP CMA pool size and your starting address.

    Looking through the 4.9 stable kernels, there is also one commit that may be of use to you.
    714c19ef57a5 ("cma: fix calculation of aligned offset") from v4.9.79 stable release.

    regards
    Suman
  • Thanks!
    We are trying with aligning our CMA to 8 MiB. Can't wait for the results.

    Didn't know about the alignment requirement...
  • It didn't work.

    We did configure 64 MiB DSP1 CMA @ 0x88000000 and 12 MiB CMEM at 0x8C000000.

    This is the dmesg output when it fails:

    [ 4071.495354] omap-rproc 40800000.dsp: assigned reserved memory node dsp1_cma@88000000
    [ 4071.508007] remoteproc remoteproc2: 40800000.dsp is available
    [ 4071.573727] remoteproc remoteproc2: powering up 40800000.dsp
    [ 4071.579461] remoteproc remoteproc2: Booting fw image dra7-dsp1-fw.xe66, size 49785832
    [ 4071.593962] omap_hwmod: mmu0_dsp1: _wait_target_disable failed
    [ 4071.599845] omap-iommu 40d01000.mmu: 40d01000.mmu: version 3.0
    [ 4071.605807] omap-iommu 40d02000.mmu: 40d02000.mmu: version 3.0
    [ 4071.612074] alloc_contig_range: [88000, 88003) PFNs busy
    [ 4071.617647] alloc_contig_range: [88004, 88007) PFNs busy
    [ 4071.623279] alloc_contig_range: [88000, 88003) PFNs busy
    [ 4071.628848] alloc_contig_range: [88004, 88007) PFNs busy
    [ 4071.651310] alloc_contig_range: [8bf00, 8c000) PFNs busy
    [ 4071.656659] omap-rproc 40800000.dsp: failed to allocate dma memory: len 0x100000
    [ 4071.664110] remoteproc remoteproc2: Failed to process resources: -12
    [ 4071.686220] omap_hwmod: mmu1_dsp1: _wait_target_disable failed
    [ 4071.698707] omap_hwmod: mmu0_dsp1: _wait_target_disable failed

  • Hi Marcus,

    Is this result also with the stable fix patch applied? Is your system executing any memory-intensive applications while running this tests?

    I see your firmware image is quite large (~47 MB). The remoteproc ELF loader only processes the program sections, and the build firmware file has lot of sections that are unused by the loader code. The firmware class code needs to allocate the size of the firmware file to read the file contents and give a pointer to the driver. Please reduce your firmware size by using the corresponding C6000 compiler's strip6x command, and use the smaller stripped image. It shouldn't affect the DSP functionality (You can compare the readelf -l output on both the images). 

    Use

    <ti-cgt-c6000>/bin/strip6x -p <input firmware> -o <output firmware>

    regards

    Suman

  • Hello Anna,

    I work with Marcus and I'm currently working on solving this issue.

    We have run the tests with the patch applied and CMA memory pool aligned (8MB). No improvement.

    I have also run the tests with stripped binaries (they are indeed significantly smaller) but the error still persists:

     '[16180.025698] remoteproc remoteproc2: releasing 40800000.dsp\n'
     '[16180.741896] omap-rproc 40800000.dsp: assigned reserved memory node '
     'dsp1_cma@88000000\n'
     '[16180.758778] remoteproc remoteproc2: 40800000.dsp is available\n'
     '[16180.766491] remoteproc remoteproc2: powering up 40800000.dsp\n'
     '[16180.776465] remoteproc remoteproc2: Booting fw image dra7-dsp1-fw.xe66, '
     'size 395944\n'
     '[16180.791512] omap_hwmod: mmu0_dsp1: _wait_target_disable failed\n'
     '[16180.797415] omap-iommu 40d01000.mmu: 40d01000.mmu: version 3.0\n'
     '[16180.803371] omap-iommu 40d02000.mmu: 40d02000.mmu: version 3.0\n'
     '[16180.810815] alloc_contig_range: [88004, 88007) PFNs busy\n'
     '[16180.822958] alloc_contig_range: [88008, 8800b) PFNs busy\n'
     '[16180.829073] alloc_contig_range: [88100, 88500) PFNs busy\n'
     '[16180.834879] alloc_contig_range: [88200, 88600) PFNs busy\n'
     '[16180.840605] alloc_contig_range: [88200, 88700) PFNs busy\n'
     '[16180.846113] alloc_contig_range: [88400, 88800) PFNs busy\n'
     '[16180.875937] alloc_contig_range: [88400, 88500) PFNs busy\n'
     '[16180.881327] omap-rproc 40800000.dsp: failed to allocate dma memory: len '
     '0x100000\n'
     '[16180.888760] remoteproc remoteproc2: Failed to process resources: -12\n'
     '[16180.914526] omap_hwmod: mmu1_dsp1: _wait_target_disable failed\n'
     '[16180.927510] omap_hwmod: mmu0_dsp1: _wait_target_disable failed\n')
    
    

    There is nothing else, except typical idle Linux services, running on the processor when the tests run.

  • Hi Michal,

    OK, good to know the environment. So, I take it your error frequency has gone down a little bit based on your kernel timestamp?
    Can you share the script you are using if you just invoking repeated binds and unbinds? Also, are you changing firmwares in between or your tests, or is it the same firmware?

    regards
    Suman
  • I can't confirm the frequency of the error went down. We are not measuring the frequency so saying that would be guessing on my part. We did suspect that the size of the binary caused the problem but the error occurs for the small binaries too.

    Yes we are changing the firmware in between tests. We are basically going through a list of around 120 different binaries and analyzing the output. Sometimes, some of the binaries cause this problem. We didn't observe a pattern, like for example correlation of the size with the error frequency.

    Unfortunately I can't share the scripts since there is a lot there to unpack. The original post explains well which commands get executed on the board side. I'll try to think of any other piece of information we could provide you to solve this issue.

  • Hi Michal,

    OK, understood. So, are the resource tables also different between each of these binaries? It helps to see if there is a pattern for when the error shows up in terms of the firmware switches and check the resource tables.

    regards

    Suman 

  • Hi Suman!

    All these binaries shares the exact same resource table.

    /Marcus

  • Hi Marcus,

    OK, can you please share your resource table then? Rex can try to reproduce on our SDK using your resource table with our images.

    regards

    Suman

  • Hi, Marcus,

    Did you also modify the dts file to reflect the DSP memory usage on linux side? Can I have it as well?

    Rex
  • Yes, we have modified the DTS. See reply in https://e2e.ti.com/support/processors/f/791/p/772040/2881581#2881581 for how we patch the original TI processor SDK.

    Note! The resource table I previously attached and the changes to the DTSI files described in reply linked above is having the CMA configured at 0x88C00000 which isn't 8 MiB aligned as discussed earlier. Aligning to 8 MiB had no effect as stated in previous replies in this thread.

    /Marcus

  • I'm attaching the dtsi with unaligned CMA.

    / {
            reserved-memory {
                    #address-cells = <2>;
                    #size-cells = <2>;
                    ranges;
    
                    /delete-node/ cmem_block_mem@a0000000;
                    cmem_block_mem_0: cmem_block_mem@88000000 {
                            reg = <0x0 0x88000000 0x0 0x00c00000>;
                            no-map;
                            status = "okay";
                    };
    
                    /delete-node/ dsp1_cma@99000000;
                    dsp1_cma_pool: dsp1_cma@88c00000 {
    			compatible = "shared-dma-pool";
    			reg = <0x0 0x88c00000 0x0 0x4000000>;
    			reusable;
    			status = "okay";
    		};
            };
    
            ocp {
              dsp1 {
                      memory-region = <&dsp1_cma_pool>;
                      };
            };
    
            cmem {
                    compatible = "ti,cmem";
                    #address-cells = <1>;
                    #size-cells = <0>;
    
    		#pool-size-cells = <2>;
    
                    status = "okay";
    
                    cmem_block_0: cmem_block@0 {
                            reg = <0>;
                            memory-region = <&cmem_block_mem_0>;
                            cmem-buf-pools = <3 0x0 0x00400000>;
                    };
            };
    };
    

  • We have in the mean-time experimented with the DSP's CMA memory. We tried, earlier suggested, no-map option for the CMA memory space.
    The problem with this solution is that the memory space gets binded to the DMA driver.

    Below you can see that the DMA memory at 0x0x88c00000 has been created - as opposed to other, CMA pools.

    [    0.000000] Reserved memory: created DMA memory pool at 0x0000000088c00000, size 64 MiB
    [    0.000000] OF: reserved mem: initilized node dsp1_cma@88c00000, compatible id shared-dma-pool
    [    0.000000] Reserved memory: created CMA memory pool at 0x0000000095800000, size 56 MiB
    [    0.000000] OF: reserved mem: initilized node ipu2_cma@95800000, compatible id shared-dma-pool
    [    0.000000] Reserved memory: created CMA memory pool at 0x000000009d000000, size 32 MiB
    [    0.000000] OF: reserved mem: initilized node ipu1_cma@9d000000, compatible id shared-dma-pool
    [    0.000000] Reserved memory: created CMA memory pool at 0x000000009f000000, size 8 MiB
    [    0.000000] OF: reserved mem: initilized node dsp2_cma@9f000000, compatible id shared-dma-pool
    

    But running the binary ends up with omap-rproc failing to allocate the memory:

    [ 303.636256] omap_hwmod: mmu1_dspl: _wait_target_disable failed
    [ 303.648909] omap_hwmod: mmuO_dspl: _wait_target_disable failed
    [ 323.923043] remoteproc remoteproc2: releasing 40800000.dsp
    [ 324.666031] omap-rproc 40800000.dsp: assigned reserved memory node dsp1_cma@88c00000
    [ 324.685557] remoteproc remoteproc2: 40800000.dsp is available
    [ 324.693205] remoteproc remoteproc2: powering up 40800000.dsp
    [ 324.702570] remoteproc remoteproc2: Booting fw image dra7-dspl-fw.xe66, size 395936
    [ 324.720848] omap_hwmod: mmuO_dspl: _wait_target_disable failed
    [ 324.726748] omap-iommu 40d01000.mmu: 40d01000.mmu: version 3.0
    [ 324.732722] omap-iommu 40d02000.mmu: 40d02000.mmu: version 3.0
    [ 324.742576] omap-rproc 40800000.dsp: failed to allocate dma memory: len 0x3700000
    [ 324.750132] remoteproc remoteproc2: Failed to process resources: -12
    [ 303.636256] omap_hwmod: mmu1_dspl: _wait_target_disable failed
    [ 303.648909] omap_hwmod: mmuO_dspl: _wait_target_disable failed
    [ 323.923043] remoteproc remoteproc2: releasing 40800000.dsp
    [ 324.666031] omap-rproc 40800000.dsp: assigned reserved memory node dsp1_cma@88c00000
    [ 324.685557] remoteproc remoteproc2: 40800000.dsp is available
    [ 324.693205] remoteproc remoteproc2: powering up 40800000.dsp
    [ 324.702570] remoteproc remoteproc2: Booting fw image dra7-dspl-fw.xe66, size 395936
    [ 324.720848] omap_hwmod: mmuO_dspl: _wait_target_disable failed
    [ 324.726748] omap-iommu 40d01000.mmu: 40d01000.mmu: version 3.0
    [ 324.732722] omap-iommu 40d02000.mmu: 40d02000.mmu: version 3.0
    [ 324.742576] omap-rproc 40800000.dsp: failed to allocate dma memory: len 0x3700000
    [ 324.750132] remoteproc remoteproc2: Failed to process resources: -12
    [ 324.763805] omap_hwmod: mmu1_dspl: _wait_target_disable failed
    [ 324.776816] omap_hwmod: mmuO_dspl: _wait_target_disable failed
    

    The device tree looks as follows:

    / {
    		reserved-memory {
    				#address-cells = <2>;
    				#size-cells = <2>;
    				ranges;
    
    				cmem_block_mem_0: cmem_block_mem@88000000 {
    						reg = <0x0 0x88000000="" 0x0="" 0x00c00000="">;
    						no-map;
    						status = "okay";
    				};
    
    				/delete-node/ dsp1_cma@99000000;
    				dsp1_cma_pool: dsp1_cma@88c00000 {
    						compatible = "shared-dma-pool";
    						reg = <0x0 0x88c00000="" 0x0="" 0x4000000="">;
    						no-map;
    						status = "okay";
    				};
    		};
    
    		ocp {
    		  dsp1 {
    				  memory-region = <&dsp1_cma_pool>;
    				  };
    		};
    
    		cmem {
    				compatible = "ti,cmem";
    				#address-cells = <1>;
    				#size-cells = <0>;
    
    				#pool-size-cells = <2>;
    
    				status = "okay";
    
    				cmem_block_0: cmem_block@0 {
    						reg = <0>;
    						memory-region = <&cmem_block_mem_0>;
    						cmem-buf-pools = <3 0x0="" 0x00400000="">;
    				};
    		};
    };
    

    The resource table for this memory:

    {
        TYPE_CARVEOUT,
        DSP_MEM_TEXT, 0,
        DSP_MEM_TEXT_SIZE, 0, 0, "DSP_MEM_TEXT",
    },
    
    {
        TYPE_CARVEOUT,
        DSP_MEM_DATA, 0,
        DSP_MEM_DATA_SIZE, 0, 0, "DSP_MEM_DATA",
    },
    
    {
        TYPE_CARVEOUT,
        DSP_MEM_HEAP, 0,
        DSP_MEM_HEAP_SIZE, 0, 0, "DSP_MEM_HEAP",
    },
    
    {
        TYPE_CARVEOUT,
        DSP_MEM_IPC_DATA, 0,
        DSP_MEM_IPC_DATA_SIZE, 0, 0, "DSP_MEM_IPC_DATA",
    },
    

    We also tried using

    TYPE_DEVMEM

    but that doesn't help. Any pointers on how to access no-map memory?

  • Hi, Marcus,

    Sorry for the slow response. I tried to reproduce the issue using your resource table with IPC ex02_messageq example from PSDK 5.2 release.
    Though I am getting PFNs busy messages, but DSP image is always loaded successfully. Though I used 8MB alignment for the CMA configuration, but I don't think it makes difference. Earlier I used non-8MB alignment configuration. DSP image was loaded fine, but I didn't check if the messageq example still runs after the PFNs busy message happened. Either case, I don't get dma allocation failure. I am re-running non-8MB alignment images. The following logs are from 8-MB CMA alignment configuration. My CMA is configured at 0x8900 0000 but not DMA area though. I wonder if some other changes you have caused it.

    [ 0.000000] Booting Linux on physical CPU 0x0
    [ 0.000000] Linux version 4.14.79-gbde58ab01e (oe-user@oe-host) (gcc version 7.2.1 20171011 (Linaro GCC 7.2-2017.11)) #1 SMP PREEMPT Thu Dec 20 04:51:24 UTC 2018
    [ 0.000000] CPU: ARMv7 Processor [412fc0f2] revision 2 (ARMv7), cr=30c5387d
    [ 0.000000] CPU: div instructions available: patching division code
    [ 0.000000] CPU: PIPT / VIPT nonaliasing data cache, PIPT instruction cache
    [ 0.000000] OF: fdt: Machine model: TI AM5728 EVM
    [ 0.000000] Memory policy: Data cache writealloc
    [ 0.000000] efi: Getting EFI parameters from FDT:
    [ 0.000000] efi: UEFI not found.
    [ 0.000000] Reserved memory: created CMA memory pool at 0x0000000089000000, size 64 MiB
    [ 0.000000] OF: reserved mem: initialized node dsp1-memory@99000000, compatible id shared-dma-pool
    [ 0.000000] Reserved memory: created CMA memory pool at 0x0000000095800000, size 56 MiB
    [ 0.000000] OF: reserved mem: initialized node ipu2-memory@95800000, compatible id shared-dma-pool
    [ 0.000000] Reserved memory: created CMA memory pool at 0x000000009d000000, size 32 MiB
    [ 0.000000] OF: reserved mem: initialized node ipu1-memory@9d000000, compatible id shared-dma-pool
    [ 0.000000] Reserved memory: created CMA memory pool at 0x000000009f000000, size 8 MiB
    [ 0.000000] OF: reserved mem: initialized node dsp2-memory@9f000000, compatible id shared-dma-pool
    [ 0.000000] cma: Reserved 24 MiB at 0x00000000fe400000

    I ran unbind/bind for 15 mins and seeing PFNs busy messages, I stopped the script and verify if the messageq example still runs:

    root@am57xx-evm:~# echo 40800000.dsp > /sys/bus/platform/drivers/omap-rproc/unbind
    [ 1058.337355] omap_hwmod: mmu0_dsp1: _wait_target_disable failed
    [ 1058.343978] remoteproc remoteproc2: stopped remote processor 40800000.dsp
    [ 1058.352201] remoteproc remoteproc2: releasing 40800000.dsp
    root@am57xx-evm:~#
    root@am57xx-evm:~#
    root@am57xx-evm:~# echo 40800000.dsp > /sys/bus/platform/drivers/omap-rproc/bind
    [ 1078.541390] omap-rproc 40800000.dsp: assigned reserved memory node dsp1-memory@99000000
    [ 1078.551796] remoteproc remoteproc2: 40800000.dsp is available
    root@am57xx-evm:~# [ 1078.563986] remoteproc remoteproc2: powering up 40800000.dsp [ 1078.569709] remoteproc remoteproc2: Booting fw image dra7-dsp1-fw.xe66, size 4447856
    [ 1078.584405] omap_hwmod: mmu0_dsp1: _wait_target_disable failed
    [ 1078.590292] omap-iommu 40d01000.mmu: 40d01000.mmu: version 3.0
    [ 1078.596237] omap-iommu 40d02000.mmu: 40d02000.mmu: version 3.0
    [ 1078.602740] alloc_contig_range: [89000, 89003) PFNs busy
    [ 1078.608604] alloc_contig_range: [89004, 89007) PFNs busy
    [ 1078.614232] alloc_contig_range: [89008, 8900b) PFNs busy
    [ 1078.619726] alloc_contig_range: [89000, 89003) PFNs busy
    [ 1078.625265] alloc_contig_range: [89004, 89007) PFNs busy
    [ 1078.630886] alloc_contig_range: [89008, 8900b) PFNs busy
    root@am57xx-evm:~# [ 1078.636410] alloc_contig_range: [89010, 89013) PFNs busy
    [ 1078.666719] virtio_rpmsg_bus virtio3: rpmsg host is online
    [ 1078.669255] virtio_rpmsg_bus virtio3: creating channel rpmsg-proto addr 0x3d
    [ 1078.679382] remoteproc remoteproc2: registered virtio3 (type 7)
    [ 1078.688148] remoteproc remoteproc2: remote processor 40800000.dsp is now up

    root@am57xx-evm:~# ./app_host_8MB_alignment DSP1
    --> main:
    [ 1091.977559] omap-iommu 58882000.mmu: 58882000.mmu: version 2.1
    [ 1092.024177] omap_hwmod: mmu0_dsp2: _wait_target_disable failed
    [ 1092.030066] omap-iommu 41501000.mmu: 41501000.mmu: version 3.0
    [ 1092.035987] omap-iommu 41502000.mmu: 41502000.mmu: version 3.0
    --> Main_main:
    --> App_create:
    App_create: Host is ready
    <-- App_create:
    --> App_exec:
    App_exec: sending message 1
    App_exec: sending message 2
    App_exec: sending message 3
    App_exec: message received, sending message 4
    App_exec: message received, sending message 5
    App_exec: message received, sending message 6
    App_exec: message received, sending message 7
    App_exec: message received, sending message 8
    App_exec: message received, sending message 9
    App_exec: message received, sending message 10
    App_exec: message received, sending message 11
    App_exec: message received, sending message 12
    App_exec: message received, sending message 13
    App_exec: message received, sending message 14
    App_exec: message received, sending message 15
    App_exec: message received
    App_exec: message received
    App_exec: message received
    <-- App_exec: 0
    --> App_delete:
    <-- App_delete:
    <-- Main_main:
    <-- main:

    Rex
  • My changes of the dts files:

    user@udbuser:~/work/ti-processor-sdk-linux-am57xx-evm-05.02.00.10/board-support/linux-4.14.79+gitAUTOINC+bde58ab01e-gbde58ab01e/arch/arm/boot/dts$ git diff
    diff --git a/arch/arm/boot/dts/am57xx-beagle-x15-common.dtsi b/arch/arm/boot/dts/am57xx-beagle-x15-common.dts
    index ff74e31..cc9c2bc 100644
    --- a/arch/arm/boot/dts/am57xx-beagle-x15-common.dtsi
    +++ b/arch/arm/boot/dts/am57xx-beagle-x15-common.dtsi
    @@ -50,7 +50,7 @@

    dsp1_memory_region: dsp1-memory@99000000 {
    compatible = "shared-dma-pool";
    - reg = <0x0 0x99000000 0x0 0x4000000>;
    + reg = <0x0 0x89000000 0x0 0x4000000>;
    reusable;
    status = "okay";
    };
    diff --git a/arch/arm/boot/dts/am57xx-evm-cmem.dtsi b/arch/arm/boot/dts/am57xx-evm-cmem.dtsi
    index c7781c4..b5e59da 100644
    --- a/arch/arm/boot/dts/am57xx-evm-cmem.dtsi
    +++ b/arch/arm/boot/dts/am57xx-evm-cmem.dtsi
    @@ -4,8 +4,9 @@
    #size-cells = <2>;
    ranges;

    - cmem_block_mem_0: cmem_block_mem@a0000000 {
    - reg = <0x0 0xa0000000 0x0 0x0c000000>;
    + /* cmem_block_mem_0: cmem_block_mem@a0000000 { */
    + cmem_block_mem_0: cmem_block_mem@88000000 {
    + reg = <0x0 0x88000000 0x0 0x00C00000>;
    no-map;
    status = "okay";
    };
    @@ -29,7 +30,7 @@
    cmem_block_0: cmem_block@0 {
    reg = <0>;
    memory-region = <&cmem_block_mem_0>;
    - cmem-buf-pools = <1 0x0 0x0c000000>;
    + cmem-buf-pools = <3 0x0 0x00400000>;
    };

    cmem_block_1: cmem_block@1 {
    (END)
  • Hi Michal,

    The allocation order and alignment within the DMA allocator and CMA allocator are slightly different. These entries are also used for the MMU programming, so the allocated address and your requested memory address dictate whether it can be a 16 MB entry or multiple 1 MB entries. The larger the size, the alignment becomes bigger especially if the underneath logic is using logarithmic page orders.

    I would suggest that you split up your 55 MB carveout into multiple carveout entries. So, can you please try by splitting up your 55 MB into multiple entries - say 16 MB, 16 MB, 16 MB and then 7 MB (or 4MB and 3 MB). 

    Rex,

    You can also try the same experiment with our IPC images.

    regards

    Suman

  • Sorry for the slow response. I'll try to split up the carveout and see what I get.

    Rex
  • Hi, Marcus,

    Don't know if you tried with what Suman suggested. Does it help?
    I don't see any differences.

    Rex
  • Unfortunately we didn't have time to test this solution, but we will experiment with this soon.

    Can you reproduce our problems, Rex?

  • Not using Processor SDk 5.2 release.

    Rex

  • HI, Michael,

    I'll close this thread for now till there is further info to update.

    Rex