This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

DSPLink driver appears to (kinda) not work on 2.6.33 kernel

Other Parts Discussed in Thread: OMAP-L137

Hi folks.  I have a series of programs 1that uses DSP to grab a chunk of L3 cache memory and share it between a few applications.  All is fine and dandy running on the 2.6.29 kernel, but we've decided to upgrade to 2.6.33 for some "enhancements" in the driver architecture.  Unfortunately, when I do this, my DSPLink applications fail.  They read garbage out of the DSP instead of the values I know should be there.

The fun thing is that the dsplink example programs seem to run fine.  I tried messagegpp and loopgpp (dsplink examples) which do similar things to what I'm doing.

Any suggestions from the TI Linux community?  Has anyone had a similar problem?

Thanks,
Chris 

  • Chris,

    What device are you using? What DSPLink release are you using? Have you updated to the latest DSPLink 1.65.01.06 release? There were some driver changes needed to support newer Linux kernels. You can get the download from the following link.

    http://software-dl.ti.com/dsps/dsps_public_sw/sdo_sb/targetcontent/

    Did you move to an SDK release? Some SDK releases run demo-ware at boot time which load and run the DSP. This demo-ware might have configured the DSP's cache differently from what you are expecting. Maybe you need to disable the demo?

    In case you don't have it, here is another good DSPLink web site.

    http://processors.wiki.ti.com/index.php/Category:DSPLink

    ~ Ramsey

  • Thanks for the advice Ramsey.  We were using 1.65.0.2.  I'm currently working on getting the latest release working on our system.  I'll verify your answer if it works out.

    Thanks,
    Chris

  • Hi Ramsey,

    I upgraded to the latest DSPLInk (1.65.01.06) and the program still fails to execute properly.  We are using a new linux distribution from our vendor, it's possible that maybe they are configuring the cache one way... but I thought that the DSPLink tconf files would re-configure that when the program was run.  Is that not the case?

    Thanks,
    Chris

  • Chris,

    You are correct, when the DSP executable runs, it should reconfigure the DSP cache as specified in the tconf files.

    I have some questions to better understand the context of your application. What device are you running on? Where is the L3 cache located? When you share the L3 cache, I'm not clear on what you mean. The memory backing the DSP cache cannot be shared or accessed directly by anyone, the cache controller has full control of this memory. Maybe you are partitioning the L3 RAM into some cache and some non-cache memory? Then you are sharing the non-cache part? Who are you sharing it with, another task on the DSP or a thread on the host processor? If it's the host processor, how are you mapping the L3 addresses into the Linux process address space?

    ~ Ramsey

  • Good questions.  Let's see if I can answer them for you:

    1) I'm on an OMAP-L137 on a custom board we've designed.
    2) I have changed CFG_OMAPL1XX_SHMEM.c to change the following defines:
    #define POOLMEMORYADDR 0x80000000u
    #define POOLMEMORYSIZE 0x00020000u

    This sets the pool up to use the L3 memory...   I believe the l3 cache is off by default (but I could be very wrong about that.)

    My tconf has the following:

      prog.module("GBL").C64PLUSL2CFG = "0k";

    prog.module("GBL").C64PLUSL1DCFG = "32k";
    prog.module("GBL").C64PLUSL1PCFG = "32k";

    var IRAM = prog.module("MEM").instance("IRAM");
    IRAM.len = 262114;
    IRAM.createHeap = 1;

    //the line below turns ON the L3 cache. We turn leave it off.
    //bios.GBL.C64PLUSMAR128to159 = 0x000000ff;

    bios.setMemCodeSections(prog, IRAM);
    bios.setMemDataNoHeapSections(prog, IRAM);
    bios.setMemDataHeapSections(prog, IRAM);
    When I run our program, I can verify that the memory address being used is L3 shared ram, which, in this case, is set up to be non-cache.   This works on the 2.6.29 kernel.
    3) What I'm trying to do is create a memory space that can be shared by the DSP and any and all processes running on the ARM.  I implement my own locking mechanisms. 
    4) The memory allocated through the use of POOL_alloc and the dsp address is captured through POOL_translateAddr.  
    This is stored into a linux shared object.  Other processes grab this shared object, get the dsp memory address and then open the pool and use 
    POOL_translateAddr to map the dsp address to an arm address for that process space.
    
    
    Does that make sense?
    
    
    Thanks,
    Chris 
  • Chris,

    I think there is some confusion regarding terminology. There is no L3 Cache on the OMAPL137. The DSP has L1P Cache (level one program cache), L1D (level one data cache), and L2 Cache (level two cache use for both program and data). When configuring the DSP cache, the amount of cache is configurable. The cache memory is taken from the DSP internal memories. Any internal memory not used for cache is available for program use. From your configuration above, it looks like you have turned off the L2 cache, so all 256 KB of L2 memory is available for your program to use. You have configured 32 KB for both L1P and L1D caches, which is the maximum, so there is no L1 RAM left over for your program.

    Regarding the external memory (sometimes called L3 Memory), there are MAR registers which control the cachability of external memory. What this means is that when using external memory, if it is configured as cacheable (MAR bit = 1), then the contents of that memory may be pulled into the cache (L1 in your case) for future access. If the external memory is configured as none-cacheable (MAR bit = 0) then the contents will never be pulled into the cache (i.e. all memory access are long-distance memory transactions to external memory). By default, any external memory used by your DSP program will have its MAR bit = 1, making it cacheable. You need to explicitly set the MAR bit = 0 to prevent the contents of memory from being pulled into the cache.

    In your case, I think if you uncomment the MAR configuration above and set it to zero, it will prevent the pool data from being pulled into the cache and thus will make the memory content always coherent from both the DSP's and ARM's point of view. Each bit in the MAR register controls 16 MB of memory. The C64PLUSMAR128to159 controls the 128th to 159th block of 16 MB of memory. In other words, from 0x8000_0000 - 0x9FFF_FFFF (512 MB). Since you placed the pool memory at address 0x8000_0000 of size 128 KB, then you only need to clear bit 0 of that mar register. Since you probably don't care about the rest of memory, you can just set the entire mar register to 0.

    bios.GBL.C64PLUSMAR128to159 = 0x00000000;

    When the pool memory is cacheable, it is your responsibility to maintain cache coherency. In other words, when you are done writing data to the pool memory from the DSP, you must flush the cache to ensure all data in the cache is written back out to external memory before the ARM accesses that pool memory. The ARM cannot see the data in the DSP cache. When the ARM is done writing data to the pool, the DSP should invalidate the cache to make sure that it reads the pool memory from external memory. For debugging purposes, its easiest to make the pool memory none-cacheable (by setting the MAR bit = 0). When you want to improve performance, you can make the pool memory cacheable (by setting the MAR bit = 1). If you are accessing the data only once, then its best to make it none-cacheable. But if you access the data multiple times, then making it cacheable will improve performance. Note that when using MSGQ, the cache coherency is maintained for by by the MSGQ module, but this is not the case for POOL.

    From the ARM perspective, I believe the pool memory is always none-cacheable. I'll have to confirm this.

    One question, you have placed the pool memory at 0x8000_0000 which is traditionally the load address for Linux. I'm guessing that your Linux kernel boot args specify a different address for Linux. But it would be good to confirm there is no overlap with Linux. Would you post your Linux kernel boot args (cat /proc/cmdline).

    For another point of reference, there is an audio example on the processors wiki which uses DSPLink and pools. Maybe there is a gem there for you to use.

    http://processors.wiki.ti.com/index.php/Audio_Soc_example

    If this does not fix the problem, please attach the DSP's map file to this thread so I can inspect the memory map.

    ~ Ramsey

  • Ramsey,

    Thanks for the clarification.  I am indeed talking about L3 Memory... I swear somewhere the documentation uses the word "cache", but I can understand how that's not technically correct.  :)

    I tried setting the MAR to all 0's before and it didn't work, but on your suggestion I tried again -- and no luck.  :/   I agree that there's probably something around this issue that's causing the problem.  Either I'm not doing something right or this version of Linux is doing something funny.

    I did try putting in HAL_cacheInv and HAL_cacheWb functions on the DSP side -- this does let me see the DSP values on the ARM side of the equation.  Using POOL_invalidate and POOL_writeback on the ARM doesn't seem to allow things to be seen on the DSP, though -- unless I'm doing something wrong.

    The buffers we're putting in L3 are definately transient... they get loaded by the Arm, processed by the DSP and the results read back by the ARM (generally.)  It's not something that works well with a cache, and we'd have to do a lot of cache flushes.

    From what I understand, 0x80000000 is the beginning of L3.  OUr Linux kernel gets loaded at 0xc7000000.  Here's our /proc/cmdline:

      console=ttyS1,115200n8 rw ip=172.16.67.52 root=/dev/nfs nfsroot=172.16.67.3:/home/projects/OMAP-L137/timesys/rfs_006

    and here's our entire uboot environment:

    bootdelay=3
    baudrate=115200
    ethaddr=00:0f:9a:65:68:96
    filesize=1D2FE9
    fileaddr=C0700000
    netmask=255.255.252.0
    bootcmd=tftp;bootm
    nfshost=172.16.67.3
    serverip=172.16.67.3
    ipaddr=172.16.67.52
    bootargs=console=ttyS1,115200n8 rw ip=172.16.67.52 root=/dev/nfs nfsroot=172.16.67.3:/home/projects/OMAP-L137/timesys/rfs_006
    bootfile=uImage-synchrony-633-2.6.33
    stdin=serial
    stdout=serial
    stderr=serial
    ver=U-Boot 1.3.3-00009-g0717822-dirty (Oct 27 2011 - 11:32:54)

    Environment size: 457/131068 bytes

    I've attached our memory map.

    Thanks,
    Chris 

  • A second attempt at attaching the memory map....

    4478.dspload.map.txt

    Thanks,

    Chris

  • Chris,

    On the Linux  side, it looks like the kernel boot args do not specify the location or size of Linux memory. This means it's using some default values as configured into the kernel. You are probably correct in your assumption that Linux memory starts at 0xc700_0000, but I would still like to confirm this. Would you attach the output of the iomem configuration (cat /proc/iomem). The System RAM entry in that file should state the size and location of Linux memory.

    Would you also post the linux version number (uname -a). I'm curious what version you are using.

    On the DSP side, let's see if we can confirm that the DSP reads and writes are working as expected. Are you able to use CCS with a JTAG connection to the DSP? If so, would you point a memory window at the start of your data buffer. The coloring of the window should indicate if any data is in the cache. Hopefully, with the MAR bits=0, none of the data should be cached. Then, step through the DSP code and see if the CPU reads the same values from memory that you see in CCS. This would confirm that the DSP reads are successful. Likewise, as the DSP writes data, you should see it update in your memory window. This would confirm the DSP writes are succeeding.

    You could then try a similar test from the ARM side. Setup the ARM program to write some data to memory, you should see the data land in the memory window. You will need to halt the DSP and refresh the memory window.

    You mentioned having changed CFG_OMAPL1XXGEM_SHMEM.c, to set the POOLMEMORYADDR and POOLMEMORYSIZE. Did you also make the corresponding change in the DSP configuration file? The POOLMEM configuration needs to match.

    dsplink/dsp/inc/DspBios/5.XX/OMAPL1XXGEM/dsplink-omapl1xxgem-base.tci

    var POOLMEM = prog.module("MEM").create("POOLMEM");
    POOLMEM.base        = DSPLINKMEM.base + DSPLINKMEM.len ;
    POOLMEM.len         = 0xD0000 ;
    POOLMEM.createHeap  = false;
    POOLMEM.comment     = "POOLMEM";

    Hopefully, this will give us some more clues to work with.

    ~ Ramsey

  • Hi Ramsey, 
    
    
    Here are the things you were looking for:

    1) cat /proc/iomem:
    01c00000-01c07fff : edma_cc0
    01c00000-01c07fff : edma
    01c08000-01c083ff : edma_tc0
    01c08400-01c087ff : edma_tc1
    01c21000-01c21fff : watchdog
    01c21000-01c21fff : watchdog
    01c23000-01c23fff : omap_rtc
    01c23000-01c23fff : omap_rtc
    01c41000-01c41fff : spi_davinci.0
    01c41000-01c41fff : spi_davinci
    01c42000-01c4201f : serial
    01d0c000-01d0c01f : serial
    01d0d000-01d0d01f : serial
    01e00000-01e005ff : musb_hdrc
    01e20000-01e24fff : davinci_emac.1
    01e20000-01e24fff : eth0
    01e25000-01e25fff : ohci.0
    01e25000-01e25fff : ohci_hcd
    01e28000-01e28fff : i2c_davinci.2
    01e28000-01e28fff : i2c_davinci
    60000000-60007fff : latch-addr-flash.0
    60000000-60007fff : latch-addr-flash
    62000000-62000fff : latch-addr-flash.0
    62000000-62000fff : DA830 UI NOR address latch
    68000000-68007fff : latch-addr-flash.0
    68000000-68007fff : AEMIF control
    c0000000-c3ffffff : System RAM
    c0026000-c0354fff : Kernel text
    c0356000-c0391cd3 : Kernel data

    This should be ok, don't you think? 
    2) uname -a
    Linux synchrony_633 2.6.33-ts-armv5l #1 PREEMPT Wed Jan 25 12:04:16 EST 2012 armv5tejl GNU/Linux
    3) I'll be working on testing memory writes with CCS and JTAG next.
    4) The DSP configuration file was NOT changed.  When I changed it to read:

    var POOLMEM = prog.module("MEM").create("POOLMEM");
    /*POOLMEM.base = DSPLINKMEM.base + DSPLINKMEM.len ;
    POOLMEM.len = 0xD0000 ;
    */
    POOLMEM.base = 0x80000000;
    POOLMEM.len = 0x00020000;
    POOLMEM.createHeap = false;
    POOLMEM.comment = "POOLMEM";
    and build, I get the following error:
    js: "./loop_tsk.tcf", line 80: MEM segment L3_CBA_RAM: overlaps with another segment or cache configuration.
    MEM segment POOLMEM: overlaps with another segment or cache configuration.
    What's the best way forward here?
    
    
    An interesting note: I did *not* change this file for the 2.6.29 kernel and it works in that environment.
    
    
    Thanks,
    Chris
  • Chris,

    It looks to me that your dsp memory map overlaps with the Linux kernel memory map. Contrary to our earlier expectation, I think Linux is using the following memory.

    C000_0000 - C3FF_FFFF  400_0000  (64 MB)  System RAM

    The dsp map from from 0xC3E0_0000 - 0xC3FF_FFFF overlaps with the Linux System RAM (the dsp overlaps with the last two MB). I think you should move the dsp memory map from 0xC3E0_0000 up to 0xC400_0000. Remember to update both the host and dsp memory configuration and to update the MAR bit as needed.

    Linux 2.6.33 should be fine.

    Just to confirm again, which memory are you trying to share with your program, L3_CBA_RAM or POOLMEM?

    8000_0000 - 8001_FFFF    2_0000 (128 KB)  L3_CBA_RAM
    C3F3_0000 - C3FF_FFFF    D_0000 (832 KB)  POOLMEM

    I suggest to update the dsp memory map and try to get the jtag debugging working. Then you should be able to make some progress.

    ~ Ramsey

  • Ramsey, 

    I see what you're saying.  I'm not sure of how I should change the memory, though.  I think I need to change:

    1) dsplink/config/all/CFG_OMAPL1XXGEM_SHMEM.c

    What do I do here?  the "RESETCTRLADDR" is set to 0xC3E00000 and the other variables are derived from it.  RESETCTRLADDR sounds awfully important, are you sure that's ok to change?

    I have POOLMEMORYADDR changed to 0x80000000u and the POOLMEMORYSIZE set to 0x00020000u.  This was supposed to make DSPLink's pools come from the same memory space as the L3_CBA_RAM  definition.

    Should I change that back?  What should this file look like?

    2)  dsplink/dsp/inc/DspBios/5.XX/OMAPL1XXGEM/dsplink-omapl1xxgem-base.tci

    change this file to match the settings in #1.  Doing this may cause problems because I want poolmem to live in l3, and there appears to be a L3_CBA_RAM setting in the ti.platforms.evmOMAPL137 configuration file, whereever that lives (in the bios setup, I'm guessing)

    To answer your question, I want DSPLink to allocate pools from the l3 shared cache space (0x8000 0000).  I think that means that POOLMEM should be pointing to that space.

    Also (like I haven't asked enough already :)  -  I'm having problems getting CCS to connect to the DSP.  I think this is because the chip is already up and running OMAP, and the DSP is being held in reset.  How do I get around this?  Should I load a DSP image first (I think of this as I type...) and then try to connect?  I'll try that.

    Thanks,
    Chris 

  • Ok, I think I have CCS up and debugging the DSP.   Apparently it helps to have the JTAG emulator attached to the device before attempting to use it.  Who knew? :)

    The memory window has values in all black.  They look very similar to the first time I tried to look at the data from the ARM side of things (by running my program that inspects this data), but when run that program again, the data is different that what appears in the memory window -- and it doesn't change.

    Thanks,
    Chris 

  • Ramsey,

    Here's some notes on my last test.

    After connecting to DSP, before running dspload

    0x80000000 C1501BA2 6B1D6951
    0x80000008 8FF1991E 274F1917
    0x80000010 69CF5CB9 44CB4583

    After running DSP load

    0x80000000 00000000 00000000
    0x80000008 00000000 00000000
    0x80000010 00000000 00000000

    Clicked run, and it was working for a bit on the ARM (right values showing up, counter was ticking... this is a first!) I couldn't see memory in CCS, couldn't reconnect because it was running. Pause button not highlighted.

    Went away for 15 mins or so, came back, was able to reconnect. memory is now:

    0x80000000 00000000 00000000
    0x80000008 0000000A 004CB195 (both these in red)
    0x80000010 00000001 00000002

    Does not match what's showing up in hal:

    0x80000000 4017b16c 4017b16c
    0x80000008 0002f328 00030328
    0x80000010 0002f328 0002f328

    Clicked run in CCS again,

    Hal is reporting the values I expect to see, and I'm seeing the simple counter count up.
    Can't inspect memory in CCS because it's running.

    I halt, and I get garbage again in HAL. This garbage, by the way, is similar to what I see when I think the system isn't working.
  • Chris,

    Sounds like you are making good progress. I'll try to catch up with your questions.

    1. dsplink/config/all/CFG_OMAPL1XXGEM_SHMEM.c

     It's okay to change RESETCTRLADDR but you need to keep it aligned on a 10-bit boundary (i.e. the low 10-bits must all be zero). I'm guessing you changed it to the following.

    #define RESETCTRLADDR 0xC4000000

    Your pool memory settings look correct. Setting POOLMEMORYADDR to 0x80000000 should locate DSPLink pool memory to that address which you are calling L3_CBA_RAM in your memory map.

    2. dsplink/dsp/inc/DspBios/5.XX/OMAPL1XXGEM/dsplink-omapl1xxgem-base.tci

    I'm guessing you made the following changes.

    RESET_VECTOR.base = 0xC4000000;
    POOLMEM.base = 0x80000000;
    POOLMEM.len = 0x20000;

    It's okay to place the pool memory there. The L3_CBA_RAM memory entry simply defines the memory address, you can choose to use the memory as you wish. The platform tci file is located in the xdctools product, but I prefer to look at the documentation instead.

    xdctools_3_22_01_21/packages/ti/platforms/evmOMAPL137/Platform.tci

    To view the documentation, do the following.

    Open xdctools_3_22_01_21/docs/xdctools.chm
    Open API Reference > ti > Platforms > evmOMAPL137 > Platform

    Here you will find the external memory map definition: SDRAM. This is just the default, you are overriding it in your dsplink-omapl1xxgem-base.tci file. For the internal memory map, you need to open the device. On the current page, note the following entries in the Platform.DSP definition.

    catalogName: ti.catalog.c6000
    deviceName: OMAPL137

    Now open the device file as follows.

    API Reference > ti > catalog > c6000 > OMAPL137

    Scroll down to params.memMap, this is your device's internal memory map. You cannot change these entries, but you can place code/data in some of them.

    3. Looks like you got CCS working. You are correct in guessing that the DSP must be released from reset before you can attach CCS. I often add the following spin loop to make this easier.

    { volatile int spin = 1; while (spin); }

    Once you attach with CCS, just set the spin variable to zero and the program will continue. Don't place this code to early because the DSPLink handshake will timeout and place the DSP back into reset. It needs to be after the DSPLink handshake. Somewhere in your application code should be a good place.

    I'm not clear on the issue you are having with CCS. Once you connect to th DSP, you should be able to run and pause as you wish. You don't need to disconnect from the DSP as long as your program is running, but remember to disconnect before your application terminates.

    4. I'm guessing that your DSP access to L3_CBA_RAM is successful but the HOST access is inconsistent. I'll have to ask around if there is some special consideration needed for the HOST to access this memory. I'll follow up with my findings.

    ~ Ramsey

  • Ramsey,

    First, I want to say thank you for all your help so far!  Your replies have been a great help.

    I've made the changes you suggested above, but when I compile the tconf, I get teh following error:

    js: "./tconf/dspload_tsk.tcf", line 100: MEM segment L3_CBA_RAM: overlaps with another segment or cache configuration.
    MEM segment POOLMEM: overlaps with another segment or cache configuration.
    
    
    Thanks,
    Chris 
  • Chris,

    Your welcome and thanks. It's nice to hear positive feedback.

    My last suggestion seems to be off base. The POOLMEM configuration is used to reserve external memory for pool, but since you are using Shared RAM for your pool storage, I don't think the POOLMEM configuration is actually needed. I think the runtime configuration is the next place to look. Would you compare your calls to POOL_makePoolId and Pool_open against the examples in the ProgrammersGuide Section 4.

    dsplink/doc/ProgrammersGuide.pdf

    I don't have a system to replicate your setup but I can simulate one on a different board. It will take some time to setup. Does it work when you place the pool storage in external memory? Say above 0xC400_0000? That would eliminate any issues with the Shared RAM memory.

    Do you have access to the OMAPL137 Data Sheet? Maybe there is some information there regarding the Shared RAM?

    http://www.ti.com/lit/ds/symlink/omap-l137.pdf

    ~ Ramsey

  • Ramsey,

    I took out the POOLMEM configuration from dsplink/dsp/inc/DspBios/5.XX/OMAPL1XXGEM/dsplink-omapl1xxgem-base.tci.  It's back to it's original form.

    I think I'm using POOL_makePoolId and POOL_open correctly.

    // in my dspload, which is run at start up, these functions are used to open the pool initially and setup the pools
    #define PROCESSOR_ID    0
    #define POOL_ID 0
    #define NUMBUFFERPOOLS 2
    #define NUMBUFS_POOL0 1
    #define NUMBUFS_POOL1 1
    #define FLAG_IO_DSPLINK_SIZE DSPLINK_ALIGN(sizeof(FlagIOBuffer), DSPLINK_BUF_ALIGN)
    #define HIGH_SPEED_DSPLINK_SIZE DSPLINK_ALIGN(sizeof(HighSpeedBuffer), DSPLINK_BUF_ALIGN)

    DSP_STATUS openPool(DSP_STATUS status) {
        Uint32 numBufs[NUMBUFFERPOOLS] = {NUMBUFS_POOL0, NUMBUFS_POOL1}; // huh?
    Uint32 size[NUMBUFFERPOOLS];
    SMAPOOL_Attrs poolAttrs;

    if(DSP_SUCCEEDED(status)) {
    size[0] = FLAG_IO_DSPLINK_SIZE;
    size[1] = HIGH_SPEED_DSPLINK_SIZE;
    poolAttrs.bufSizes = (Uint32 *) &size;
    poolAttrs.numBuffers = (Uint32 *) &numBufs;
    poolAttrs.numBufPools = NUMBUFFERPOOLS;
    poolAttrs.exactMatchReq = TRUE;

    status = POOL_open(POOL_makePoolId(PROCESSOR_ID, POOL_ID), &poolAttrs);
    checkStatus("POOL_open()", status);
    }

    return status;
    }
    DSP_STATUS createBuffer(DSP_STATUS status, Pvoid *armAddress, Pvoid *dspAddress, Uint32 bufferSize) {
    if(DSP_SUCCEEDED(status)) {
    status = POOL_alloc(POOL_makePoolId(PROCESSOR_ID, POOL_ID),
    armAddress,
    bufferSize);
    checkStatus("POOL_alloc()", status);
    }

    if(DSP_SUCCEEDED(status)) {
    status = POOL_translateAddr(POOL_makePoolId(PROCESSOR_ID, POOL_ID),
    dspAddress,
    AddrType_Dsp,
    *armAddress,
    AddrType_Usr);
    checkStatus("POOL_trandlateAddr()", status);
    }

    return status;
    }
    //in the main configure routine the above functions are called like so...
        status = openPool(status);
    checkStatus("openPool()", status);

    status = createBuffer(status, &armHighSpeedAddress, &dspHighSpeedAddress, HIGH_SPEED_DSPLINK_SIZE);
    checkStatus("createBuffer() - High Speed", status);

    status = createBuffer(status, &armFlagIOAddress, &dspFlagIOAddress, FLAG_IO_DSPLINK_SIZE);
    checkStatus("createBuffer() - Command", status);
    
    
    ---
    // In my application framework, the following routine is called.  This is how I get a pointer to pool data to manipulate.
    #define PROCESSOR_ID 0
    #define POOL_ID 0

    #define USE_DSPLINK 1

    #define NUMBUFFERPOOLS 2
    #define NUMBUFS_POOL0 1
    #define NUMBUFS_POOL1 1

    /**
    * Attach to the DSP and map the DSP shared memory section
    * into our process space.
    */
    int Application::setupDspLink() {
    #ifdef USE_DSPLINK
    DSP_STATUS status = PROC_setup(NULL);

    if(DSP_SUCCEEDED(status)) {
    status = PROC_attach(0, NULL);

    if(DSP_SUCCEEDED(status)) {
    debug("controlCom->armFlagIOAddress: %x\n",
    (unsigned int)controlCom->armFlagIOAddress);
    debug("flagIO dsplink size: %d\n", FLAG_IO_DSPLINK_SIZE);

    Uint32 numBufs[NUMBUFFERPOOLS] = {NUMBUFS_POOL0, NUMBUFS_POOL1};
    Uint32 size[NUMBUFFERPOOLS];

    SMAPOOL_Attrs poolAttrs;

    size[0] = HIGH_SPEED_DSPLINK_SIZE;
    size[1] = FLAG_IO_DSPLINK_SIZE;
    poolAttrs.bufSizes = (Uint32 *) &size;
    poolAttrs.numBuffers = (Uint32 *) &numBufs;
    poolAttrs.numBufPools = NUMBUFFERPOOLS;
    poolAttrs.exactMatchReq = TRUE;

    status = POOL_open(POOL_makePoolId(0, 0), NULL);
                if(DSP_SUCCEEDED(status)) {
    status = POOL_translateAddr(POOL_makePoolId(0, 0),
    (void **)&highSpeedBuffer,
    AddrType_Usr,
    controlCom->dspHighSpeedAddress,
    AddrType_Dsp);
    if(DSP_FAILED(status)) {
    perror("POOL_translateAddr(dspHighSpeedBufferAddress)");
    error("Error calling POOL_translate: 0x%x\n", status);
    }
    else {
    status = POOL_translateAddr(POOL_makePoolId(0, 0),
    (void **)&flagIOBufferAddress,
    AddrType_Usr,
    controlCom->dspFlagIOAddress,
    AddrType_Dsp);
    if(DSP_FAILED(status)) {
    perror("POOL_translateAddr(dspFlagIOBufferAddress)");
    error("Error calling POOL_translate: 0x%x\n", status);
    }
    }
    }
    else {
    perror("POOL_open");
    error("Error calling POOL_open: 0x%x\n", status);
    }
    }
    else {
    error("Error calling PROC_attach: 0x%x\n", status);
    }
        }
    else {
    error("Error calling PROC_setup: 0x%x\n", status);
    }

    return status;
    #else // Don't use DSPLINK
    // temporary hack to allow programs to run.
    highSpeedBuffer = new HighSpeedBuffer();
    flagIOBufferAddress = new FlagIOBuffer();
    return DSP_SOK;
    #endif // USE_DSPLINK
    }
    Thanks,
    Chris
  • I'm pretty sure that the DSP is creating the pools in l3 cache.  Here's the DSP addresses being reported by DSPLoad:

    dspHighSpeedAddress: 0x80000480

    dspFlagIOAddress: 0x80000000

    These addresses are mapped using POOL_translateAddr in other programs -- is it possible that that mapping to an Arm address is some how affecting the cacheability?  I'm grasping at straws....

    I'm almost certain this is somehow related to cache.  If I put HAL_cacheInv and HAL_cacheWb commands in the dsp side of the dsplink code, I can see the data being reflected back on the ARM (Although I can't seem to go the other way, write from the arm to the DSP.)

    Thanks,

    Chris

  • Chris,

    Are you doing similar cache handling on the ARM-side.  In the POOL case, there is a POOL_writeback() API to ensure the data in memory is valid for the other cores (DSP)  to see.

  • Hi Arnie!

    I did indeed try that -- but it didn't seem to work.  Maybe I should try again with a more simple example.  I should point out that my code does work with a 2.6.29 Kernel (but for various reasons we want to go with a higher version #.)

    Thanks,
    Chris 

  • Ok, here's a recap of what I think I know:

    1) The Pool is being allocated out of the 0x8000 0000 memory region (L3_CBA_RAM, or "shared memory")

    2) If I leave out HAL_cacheInv and HAL_cacheWbInv statements, my program running on the ARM fails to read data from the DSP correctly.  The data is often garbled, or simply not correct.

    3) I DO have POOL_invalidate and POOL_writeBack statements in my program.

    4) If I put in HAL_cacheInv and HAL_cacheWbInv statements in my DSP code, then the program seems to work correctly.

    5) If I look at the MAR 128-159 registers, they are all 0.  This should mean that the cache is supposed to be DISABLED, right?

    6) So why does it seem that the cache is enabled?

    Thanks,
    Chris 

  • How are you confirming the MAR bit setting?  As you observed, it still seems that the cache is enabled on the DSP-side.  The MAR-bit only affect the caching on the DSP-side.  Double check that your DSP/BIOS configuration (*.tcf, *.tci) isn't setting the MAR bits.  Is there some system software, besides DSP/BIOS,  in your application that may be enabling the MAR bits?

    --Arnie

  • Arnie,

    I'm reading the values for the mar registers from memory on the dsp side and shuttling them over to the ARM for output.  (Using the cache invalidate commands to make sure the values are getting through.)

    I am pretty sure that our DSP/BIOS configuration is ok, in fact, it is actively turning OFF the MAR:

    bios.GBL.C64PLUSMAR128to159 = 0x00000000;

    There is a Linux environment running, we're trying to determine what's going on there.  The difference is the kernel, I think.

    Thanks,

    Chris

  • Could the problem be on the Arm side?  Is there some cache setting on the ARM that I need to look at?  Where would I find that/how would I exit this.  I'm running a linux kernel on the ARM.

    Thanks,

    Chris