This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

EDMA hangup on DSP

Other Parts Discussed in Thread: OMAP-L138, SYSBIOS, OMAPL138

I have a SYSBIOS DSP application that works perfectly stably (and has for a while) when I am running a bare-metal application on the ARM on an OMAP-L138 using custom hardware.   The ARM sets up the chip, starts VPIF, and sends CHIPSIG2 interrupts to the DSP on frame completions.  The DSP takes care of the rest of the peripherals (UART and SPI) while running the algorithm, which makes heavy use of EDMA and EDMA completion ISRs.

I am trying to move the bare-metal ARM application to SYSBIOS, and I am most of the way through, but now the DSP algorithm no longer runs stably.   The UART communication looks like it is intermittent, and the algorithm quickly just hangs in EDMA, with the stack looking like,

        edma3ComplHandler(struct unknown *)() at edma3resmgr.c:5,273 0xC078D2B8     
        lisrEdma3ComplHandler0(unsigned int)() at edma3resmgr.c:5,344 0xC07AB54C     
        ti_sysbios_family_c64p_EventCombiner_dispatch__F(unsigned int)() at EventCombiner.c:143 0xC07A23AC     
        ti_sysbios_family_c64p_EventCombiner_dispatch__E(unsigned int)() at st_pe674.c:16,649 0xC07AD240     
        edma3ComplHandler(struct unknown *)() at edma3resmgr.c:5,241 0xC078D1E8     
        main() at main.c:948 0xC0748004     
        _c_int00() at boot.c:173 0xC0000084  (the entry point was reached)    

where it apparently is just looping around on the EDMA IPR flags which never finish getting cleared.   It kind of smells to me like unhandled, frequent interrupts are getting fired, but I don't know how to look for that.

Now, again, this code works perfectly fine when the bare metal ARM code is used to bring up the system.   I have verified that I have EXACTLY the same settings for SYSCFG0, SYSCFG1, PSC0, and PSC1 between the two setups, but there MUST be something else going on.

I recognize that this is not enough information to solve my problem:  what I am looking for is suggestions and pointers on how to proceed, what debugging techniques I should apply, etc.

   Thanks,

      Jay

  • Jay,

    Which version of SYSBIOS are you using?

    Am I correct in understanding that the DSP code is unchanged from its working version? Only the ARM code is different?

    Which platform are you building the ARM application with? Is there any chance that the memory regions overlap with the DSP in some fatal way?

    Alan

  • Alan,

    I am using SYSBIOS 6.33.03.33, PSP 3.0.01, XDC 3.23.01.43, and EDMA LLD 2.11.04.01 on both the DSP and the ARM (although I had to rebuild the EDMA LLD for the ARM, since it doesn't come pre-built and the PSP won't build at all for the ARM as far as I can tell).   I am using CCS 5.1.1.00031

    The DSP code is unchanged from its working version, all I do is swap whether the ARM code that brings it up, captures images via VPIF, and sends CHIPSIG2 interrupts to the DSP on completion is my working bare-metal or is using SYSBIOS.  The code used to do all that (DSP bringup, VPIF capture, raising CHIPSIG2) under SYSBIOS is ported from the original bare metal version, using the CSL header files from the PSP for register locations and contents instead of more "home grown" header files.

    I am building the ARM application with a custom platform derived from the OMAPL138 platform (as I did on the DSP).  This was primarily to change the system clock to 405MHz and muck with the memory map.

    I don't think I have overlapping memory regions.

    The "MEMORY CONFIGURATION" from the .map file for the DSP is

                          name            origin    length      used     unused   attr    fill
    ----------------------  --------  ---------  --------  --------  ----  --------
      IROM                  11700000   00100000  00000000  00100000  R  X
      IRAM                  11800000   00020000  0001fffc  00000004  RW X
      L1PSRAM               11e00000   00008000  00000000  00008000  RW X
      L1DSRAM               11f00000   00008000  00000000  00008000  RW 
      ALGORITHM             80011000   0000ec00  0000afc8  00003c38  RW X
      DSP_BOOT              c0000000   00001000  000000c0  00000f40  RW X
      DDR_DSP               c0001000   02fff000  007caa0e  028345f2  RW X
      DDR_ARM               c3000000   01000000  00000000  01000000  RW X
      DDR_SHARED            c4000000   03000000  00000000  03000000  RW X
      DSP_LOG               c7000000   01000000  00010006  00fefffa  RW X

    and the MEMORY CONFIGURATION part of my ARM application map file is,

              name            origin    length      used     unused   attr    fill
    ----------------------  --------  ---------  --------  --------  ----  --------
      IROM                  11700000   00100000  00000000  00100000  R  X
      L3_ARM                80000000   00011000  00000000  00011000  RW X
      ALGORITHM             80011000   0000ec00  00000000  0000ec00  RW X
      SHAREDVARS            8001fc00   00000400  00000000  00000400  RW X
      DDR_ARM               c3000000   01000000  0003f3ec  00fc0c14  RW X
      DDR_SHARED            c4000000   03000000  025d0000  00a30000  RW X
      INTVECS               ffff0000   00001000  00000368  00000c98  RW X
      ARMRAM                ffff1000   00001000  00001000  00000000  RW X

    I don't see any conflict between these two memory maps.

    BTW, the memory configuration that works for the bare metal ARM application is,

             name            origin    length      used     unused   attr    fill
    ----------------------  --------  ---------  --------  --------  ----  --------
      SHAREDRAM             80000000   00011000  00010bef  00000411  RWIX
      DSP_EXT_RAM           c0000000   03000000  00000000  03000000  RWIX
      ARM_EXT_RAM           c3000000   01000000  00000000  01000000  RWIX
      SHARED_EXT_RAM        c4000000   03000000  025d0000  00a30000  RWIX
      INTVECS               ffff0000   00000030  00000030  00000000  RWIX
      ARMRAM                ffff0030   00001fd0  00001800  000007d0  RWIX

    You might note one significant difference between the two ARM memory maps is that the SYSBIOS one uses a GEL file to initialize DDR so that the ARM code and memory can be in DDR, while the bare metal uses not GEL file so all code and memory is in on chip memory.

    My original application does some EDMA with a peripheral at startup, and then brings up the DSP.   I implemented this for the SYSBIOS application, but I have it commented out for now and the "poor" behavior on the DSP still happens, so as far as I can tell, the EDMA registers are completely untouched before the DSP starts up.

    Again, it still seems like somehow, someway, there are unhandled interrupts being blasted at the system, but I need some guidance on how to diagnose that.

    Thanks,

       Jay

  • One thing to note is that by default SYS/BIOS sets up the ARM's MMU and enables both program and data caching for all memory regions defined in the platform.

    You will probably need to override this default behavior for memory shared between the two cores.

    To test whether this is even part of the problem you can disable caching by adding the following to your config script:

    var Cache = xdc.useModule('ti.sysbios.family.arm.arm9.Cache');

    Cache.enableCache = false;

    If this turns out to resolve the issue, you can selectively override the MMU table entries for specific memory regions by following the example in the CDOC for the ti.sysbios.family.arm.arm9.Mmu module.

    Alan

  • Turning the cache on and off didn't seem to help anything.

     I've also disabled the MMU by including,

    var Mmu = xdc.useModule('ti.sysbios.family.arm.arm9.Mmu');

    Mmu.enableMMU = false;

    and that way I don't have to muck with the specific memory regions I need to set up peripherals for now.   This is the same mode (I believe) that my bare-metal code is in.

     

     

  • What is the ARM code doing when the DSP code gets bogged down in the EDMA interrupt handler?

    What other interrupts do you have defined for the DSP? You can see these by examining the Hwi module's ROV 'basic' or 'detailed' views with CCS.

    All non-configured interrupts (ie not created using the Hwi module), will vector to a function that loops on itself forever if they accidentally get enabled and go off.

    Since you're not seeing this behavior, I don't think the problem is due to an "unhandled" interrupt.

    Alan

  • The ARM code is in a loop

    1. wait on a semaphore released in a VPIF completion ISR
    2. raise a CHIPSIG2 to wake up the DSP
    3. Do some polling based I2C read/writes

    and that is it.

    The DSP is waiting on the CHIPSIG2 interrupt (semaphored in an Hwi ISR), running the algorithm, managing UART communications (using a custom EDMA LLD based driver) and managing SPI communications (using the PSP driver in interrupt mode).

    The ROV reports these interrupts,

    ,0xc0a17948,,,7,ti_sysbios_family_c64p_EventCombiner_dispatch__E,0x00000000,0x00000000,0,0x80,0x80
    ,0xc0a17960,,,8,ti_sysbios_family_c64p_EventCombiner_dispatch__E,0x00000001,0xc09d07f0,1,0x100,0x100
    ,0xc0a17978,,,9,ti_sysbios_family_c64p_EventCombiner_dispatch__E,0x00000002,0x00000000,2,0x200,0x200
    ,0xc0a17990,,,10,ti_sysbios_family_c64p_EventCombiner_dispatch__E,0x00000003,0x00000000,3,0x400,0x400
    ,0xc0a179a8,0xc0a18050,,14,ti_sysbios_knl_Clock_doTick__I,0x00000000,0xc09f5a56,4,0x4000,0x4000
    ,0xc07b1360,0xc07b1350,,4,image_acq_isr,0x00000000,0xc09d7f92,5,0x10,0x10
    ,0xc07f2660,0xc07f2650,,5,viop_gpio_isr,0x00000000,0xc09f835a,75,0x20,0x20

    I'm a little suspicious that the GPIO Bank 8 interrupt is not being handled correctly, but it should have been "not handled correctly" in the case where the ARM was bare metal as well.

     

  • When you say "semaphored in an Hwi ISR" does this mean you have an algorithm task blocked on a semaphore that is posted by the CHIPSIG2 interrupt (which is sent by the ARM)?

    Can you instrument the viop_gpio_isr() function to determine if it is being invoked excessively?

    Which clock source is being routed to the timers by the ARM code? Is your DSP code configured for a non-default timer clock frequency? (ie is it expecting the timers to be clocked at 32KHz?). This might explain the source of excessive interrupts. When you halt the DSP, does the Clock module's ROV view show a 'ticks' value that seems reasonable for the length of time the DSP was running (assuming 1000 ticks per second)?

    Is the GEL file configuring all the PLLs the same way as your bare metal application?

    Alan

  • I was using "semaphored" as short hand for that:  the ARM posts a CHIPSIG2, the DSP ISR releases a semaphore that a DSP thread is waiting on, and we proceed to process the image.

    It did not appear that the viop_gpio_isr routine is being called too many times.

    The clock source is an interesting thing:  when you bring SYSBIOS up on the arm it clears the upper bits on the SYSCFG0->SUSPSRC register.  In my bare-metal code, the SUSPRC register is all 1s.   I tried the "right" methods for changing the SUSPSRC (using Timer.timerSettings[0].ownerCoreId = 1), but it didn't effect anything.  In desperation, I long ago just reached into the SYSCFG register and jammed the values to all 1's to make the SYSCFG0 registers match perfectly between my bare metal and SYSBIOS setup.   It did not seem to matter one way or another.

    The ROV Clock ticks values seem to be going up by what I expect (1000 ticks per second).  

    Also, I know that the GEL file is configuring the PLLs in the same way.

    Now I am at a cross roads.   I changed the order of the initialization on the DSP and things work.    In my application, the ARM starts up, boots the DSP, and then without any coordination goes on and starts digitizing and blasting the DSP with CHIPSIG2 interrupts.   My theory was that if I set up the CHIPSIG2 Hwi handler earlier in my DSP boot process life would get better.   And it did:  now the DSP comes up and works, acquiring CHIPSIG2, running the algorithm, and working with the UART and SPI appropriately.   But I am worried that this is just accidental, as it seemed to work "before."  That tells me I have some kind of subtle timing situation happening. 

    So, do I try and break it again to find the real source of the problem, or just remain happy that it works and move on?  For all I know I changed something else in my other work over the last few days and "fixed" it that way.

    Sigh.

  • As an engineer, I must advise you to figure out precisely what the problem was and how it got fixed.

    As a project manager, I'd probably advise you to 'move on' and come back to this after you've got things more or less working,  time permitting...

    Alan