This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

ARM memory transfer hanging DSP

Other Parts Discussed in Thread: OMAPL138, OMAP-L138

I am running DSP/BIOS on the DSP and linux on the ARM on an OMAPL138 (MityDSP board).

The DSP receives data from the McBSP1 via EDMA using the SIO_issue()/SIO_reclaim mechanism from the TI example. After processing the data, it is loaded into a section of DDR memory that is shared with the ARM. This shared area is not cached.

On the ARM side, it periodically copies a chunk of data from the shared memory (also not cached on the ARM side) into a area that is cached (buffer1), then writes it to a buffer (buffer2) in smaller chunks. This last buffer (buffer2) is then written to a SATA drive using fwrite().

Here's the problem: I would like to use larger chunks of data to write to the SATA to be more efficient. However, I find that if buffer2 is 1K or greater the DSP hangs in a SIO_reclaim() call that never returns. This stops the data flow in its tracks. I have checked the McBSP1 and EDMATC1DSP registers on the DSP side and they never change once configured. The mcbspInputTask is blocked pending on a semaphore. The SIO_reclaim has a timeout, but execution never reaches the line following the call. No timeout, no return, so no data flow.

On the ARM side I have isolated the code that creates the hung DSP by commenting sections of code. It's not the fwrite() to the SATA drive. It's the reformatting and copying of data from buffer1 to buffer2. If buffer2 is ~70 bytes, all is well. If buffer2 is 1024 bytes the DSP hangs. If no copying is done between buffer1 and buffer2 (just write junk to SATA), all is well and the DSP never hangs (even if buffer2 is 74K).

I can't think of a reason transferring data from one buffer to another on the ARM would cause the SIO_recalaim() call to hang on the DSP. Any thoughts?

Mary

  • Hi Mary,

    what SDK are you using with your board and with which version of DSP/BIOS (5.xx.xx.xx) ?

    Where the buffer that SIO_reclaim is trying to reclaim from? Buffer1? I'm wondering how long it's taking for the ARM to transfer data from buffer1 to buffer2. Also, if buffer2 blocks are really big in comparison to buffer1 blocks, you might be "hanging on to" several empty buffer1 blocks before giving them all up to the DSP.

  • BIOS 5.41.10.36

    Linux SDK is from Critical Link MDK_2012-08-10  (Angstrom version v2012.05 Kernel 3.2.0)

    Perhaps I didn't explain well enough.

    On the DSP side:

    Issue 2 buffers (BufA and BufB, used only by the DSP)

    while(1)

    {

        reclaim a buffer (this will alternate between BufA and BufB)

        process and store results in SharedBuffer (non-cached DDR)

       issue buffer that was reclaimed above

    }

    On the ARM side:

    Periodically copy most recent data from SharedBuffer (non-cached DDR) to buffer1 (cached DDR)

    Reformat as per custom protocol and store to buffer2 (cached DDR)

    write buffer2 to SATA drive (using fwrite)

    Hope that is clearer.

    Mary

  • Mary,

    It is very puzzling that an ARM memory transfer will hang the DSP!  A few questions:

    How is the ARM transfer done?  A CPU copy, or a DMA?  If a DMA, is it possible that it is colliding with DSP-side DMA use?

    Are you doing cache coherency operations (writebacks, invalidates, etc.) when moving the data in/out of the cached regions?

    Can you tell where within SIO_reclaim() the DSP gets “stuck”?

    You said “SIO_reclaim has a timeout, but execution never reaches the line following the call”.  Do you mean you’ve *specified* a timeout for the call, or that you can see the timeout happens?  Can you explain this more?  Also, do you know that system clock ticks are firing on the DSP (which is required for the timeout)?

    Thanks,
    Scott

  • How is the ARM transfer done?  A CPU copy, or a DMA?  If a DMA, is it possible that it is colliding with DSP-side DMA use?

    The ARM transfer is done using a CPU copy, from the buffer called dataIn[][] to the buffer called fileBuf[]] both are in cached DDR memory.  See code below.

    ARM Code:

    ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

    long Id;

    #define COPYBUF_NUM_SAMPLES 244

    #define NUM_CHANNELS 16

    #define FILEBUF_SIZE ((8 + 1 + (4 * NUM_CHANNELS)) * COPYBUF_NUM_SAMPLES) // (Id, status, channel data) * #samples

    uint8_t fileBuf[FILEBUF_SIZE];

    // Cached buffers
    uint32_t status[COPYBUF_NUM_SAMPLES]; // status in cached memory
    float dataIn[COPYBUF_NUM_SAMPLES][NUM_CHANNELS]; // copy of data in cached memory

    float *pSharedData;   // points to non-cached shared memory written by DSP

    void periodicTask()        // called when the DSP has written 244 samples of 16 channels to shared memory

    {

    // Copy shared data into cached memory
    data_get(&dataIn[0][0], pSharedData, COPYBUF_NUM_SAMPLES, NUM_CHANNELS * sizeof(float));
     writeData((uint8_t *)&dataIn[0][0], status, COPYBUF_NUM_SAMPLES);

    }

    void writeData(uint8_t *pSrc, uint32_t *pSrc2, int n)
    {
    int i, ch;
    int size, samples;
    uint16_t active;
    uint8_t *pBuf = fileBuf;
    uint8_t input;
    ssize_t nBytes;
    if (gFileData == NULL)
    return;
    // if (gFdData <= 0)
    // return;
    if (start >= NUM_SAMPLES)
    return;
    size = 0;
    samples = 0;
    pBuf = fileBuf;
    for (i=0; i<n; i++)
    {
    Id++;
    samples++;
    memcpy(pBuf, &Id, sizeof(Id)); 
    pBuf += sizeof(Id);
    size += sizeof(Id);
    *pBuf++ = *pSrc2++ >> 16; // status bit is bit 16
    size++;
    // get selected channels
    active = chActive;
    for (ch=0; ch<NUM_CHANNELS; ch++)
    {
    if (active & 0x0001) // Active?
     {
            *pBuf++ = *pSrc++;          // write channel data,  If I comment out these 4 lines the DSP doesn't hang
            *pBuf++ = *pSrc++;
            *pBuf++ = *pSrc++;
            *pBuf++ = *pSrc++;
            size += 4;
     }
     else
    pSrc += 4; // next channel
        active >>= 1; // shift active channel mask
    } // endfor: ch
    // IF I CALL THE FWRITE() HERE, THE DSP DOESN'T HANG, BUT IT'S INEFFECIENT
    } // endfor: i
     fwrite(fileBuf, 1, size, gFileData);
    //    nBytes = write(gFdData, fileBuf, size);       // NO DIFFERENCE USING write() INSTEAD OF fwrite()
    //    if (nBytes < 0)
    //     error(0, errno, "ARM: write failed: ");
    //    lseek(gFdData, 0, SEEK_END);    // move to end
    FileSize += size;
    numSamplesRec += samples;
    } // end: writeData
    -----------------------------------------------------------------------------------------------------------------------------------------------------
    Are you doing cache coherency operations (writebacks, invalidates, etc.) when moving the data in/out of the cached regions?

    No.  See code above.

    Can you tell where within SIO_reclaim() the DSP gets “stuck”?

    Not really.  Using ROV I can see the task that calls SIO_reclaim() is blocked pending on a semaphore.  The firmware never never reaches a breakpoint set on the line after SIO_reclaim().

    You said “SIO_reclaim has a timeout, but execution never reaches the line following the call”.  Do you mean you’ve *specified* a timeout for the call, or that you can see the timeout happens?  Can you explain this more?  Also, do you know that system clock ticks are firing on the DSP (which is required for the timeout)?

    I have specified a timeout.  How would you verify that system clock ticks are firing on the DSP?  Why wouldn't they fire?
    DSP Code:
    --------------------------------------------------------------------------------------------------------------------------------------
    /* issue/reclaim buffers     */
    #define NUM_BUFS 2
    #pragma DATA_ALIGN(bufIn, 128);
    unsigned char bufIn[NUM_BUFS][MAX_BUFSIZE];
    Void mcbspInputTask(Void)
    {
        Ptr rcv     = NULL;
        Int32 * pData = NULL;
        Int32   nmadus1 = 0;
        Int32 samplesReceived = 0;
        Uint32  i       = 0;
        Uint32  j       = 0;
        Uint32 index   = 0;
        Int32 val = 0;
        Int err = 0;
       unsigned int bufSize = 0;
        // Init EDMA
        /* initialize the edma library                                            */
        if (IOM_COMPLETED != edma3init())
        {
            printf( "\n\rEDMA intialization failed");
            return;
        }
        else
        {
            LOG_printf(&trace, "\n\rEDMA intialized");
     
            /* update the edma handle to the channel parameters                   */
            mcbspChanparamIn.edmaHandle = hEdma[0];
        }
        
         /* create the streams required for the transactions */
        mcbspCreateStreamsIn();
      
      /* prime the driver with buffers  */
       data_NumSamples = 4;
        bufSize = data_NumSamples * NUM_CHANNELS * BYTESPERSAMPLE;
        mcbspDriverPrimeIn(bufSize);
      
        /* reclaim each packet and resubmit   */
    while(1)
        {
     
            /* Reclaim FULL buffer from the input stream to be reused             */
            nmadus1 = SIO_reclaim(mcbspInHandle,(Ptr *)&rcv,NULL);
    // SET BREAKPOINT HERE
            if (nmadus1 < 0) // error
            {
                LOG_printf(&trace, "\n\rError reclaiming full buffer from the input stream. \nError = %d  count = %d rcv = 0x%x", nmadus1, count, rcv);
                SIO_idle(mcbspInHandle);
            }
            else
            {
    //  Process data, store results into shared non-cached data (code removed)
                pktCountIn++;
                
            }// endelse
            
            /* issue the buffer back to the input stream */    
            if (rcv == NULL)
            rcv = (count & 1) ? bufIn[1] : bufIn[0];    
            err = SIO_issue(mcbspInHandle, rcv, bufSize, NULL);
            if (IOM_COMPLETED != err)
            {
                LOG_printf(&trace, "\nError %d issuing buffer 0x%x to the stream.", err, rcv);
                SIO_idle(mcbspInHandle);
              err = SIO_issue(mcbspInHandle, rcv, bufSize, NULL);
            }
        } // endwhile: 1
         
    } // end: mcbspInputTask()
    /*
     * \brief   Function to submit the requests to the driver
     *
     * \param   bufSize - size of buffer (bytes) to issue
     *
     * \return  None
     */
    static Void mcbspDriverPrimeIn(unsigned int bufSize)
    {
        Uint32 count = 0;
        
        for (count = 0; count < NUM_BUFS; count++)
        {
            if (IOM_COMPLETED
                != SIO_issue(mcbspInHandle, (Ptr)(bufIn[count]), bufSize, NULL))
            {
                 LOG_printf(&trace, "\n\rIssue to input stream failed.");
                 SYS_abort("Issue to input stream failed\n");
            }
        }
        LOG_printf(&trace, "\n\rIssue requests (count = %d) submitted to Mcbsp Input driver.", count);
    }
    /*
     * \brief    Function to create the required streams for the reception of
     *           Mcbsp data.
     *
     * \params   None
     *
     * \return   None
     */
    static Void mcbspCreateStreamsIn(Void)
    {
        SIO_Attrs      sioAttrs  = SIO_ATTRS;
        sioAttrs.nbufs = NUM_BUFS;
        sioAttrs.align = BUFALIGN;
        sioAttrs.model = SIO_ISSUERECLAIM;
        sioAttrs.timeout = 1250; // in ticks (1ms)
     
        /* create the channel for the RX operation                                */
        mcbspInHandle = SIO_create("/dioMcbspIN", SIO_INPUT, MAX_BUFSIZE, &sioAttrs);
        if (NULL == mcbspInHandle)
        {
            LOG_printf(&trace, "\n\rRX Stream creation Failed\n");
            SYS_abort("Stream Creation Failed\n");
        }
    }

    --------------------------------------------------------------------------------------------------------------------------------------

    Hope this clarifies.

    Mary

  • Mary,

    Thanks for the details.  

    You can see if the CLK module is ticking by setting a breakpoint on the symbol “CLK_F_isr” to see if the underlying timer is actually ticking or not.  If you do this does the breakpoint activate?  Or, you can look at the “systemTick” value shown in the KNL view in ROV.

    There are several possible reasons that CLK might not be working, e.g., if the CLK module is not enabled, if the timer is clock-gated or has an improper functional clock frequency, etc.  

    If the timeouts aren’t working this should be easier to resolve, versus a situation where the DSP becomes non-functional during memory access.

    Thanks,
    Scott

  • I find that if I break on CLK_F_isr, it will only break once and then never again and systemTick doesn't change either.  Also, when systemTick is 'frozen' a breakpoint just after the SIO_reclaim() call doesn't activate either.

    Just looking at systemTick with ROV I can see it changing at start up.  Then I put the ARM in "Data Recording" mode which invokes the code I posted earlier, and it continues to change for a while and then gets stuck.  At this point I cannot break at either CLK_F_isr or after SIO_reclaim().  I also cannot break on the ARM side in my periodic task.  The periodic tasks blocks on sigwait() which is pending on an interval timer signal.  Using the DSP debugger, I can then restart just the DSP (leaving the ARM side running) and cause the systemTick to resume changing and the periodic task on the ARM to break.

    Mary

  • Mary,

    Is it possible that both ARM and DSP are using the same timer peripheral?  If the ARM uses this for triggering its data recording(?), maybe the DSP’s SIO_reclaim() is just blocked waiting for ARM-side processing to complete?  But the SIO_reclaim() timeout won’t happen because the ARM-side has modified the timer configuration needed for CLK, so CLK stops ticking, and the timeouts won’t happen to allow return from the SIO_reclaim() call?  Since you restart the DSP and the ticks are working again it makes me wonder if this is what is going on?...

    Scott

  • Yes, I had that thought.  I changed the CLK configuration for BIOS to use Timer1 instead of Timer0.  I no longer lose the systemTick or hang.

    Now my problem is keeping up with the data rate.  I think the ARM and DSP are having conflicts over the DDR memory.  The DSP can keep up with the rate until the ARM starts trying to record data.  Which CPU has priority over DDR access? or is it first come first served?

    Mary

  • Mary,

    OK, glad to hear the hang issue is gone!

    There are several ways to change transfer priority levels.  For example, see “11.3 Master Priority Control”, and the DDR controller chapter (15) in the OMAP-L138 Technical Reference Manual: http://www.ti.com/lit/ug/spruh77a/spruh77a.pdf

    If you have detailed questions about configuring device priorities you can post them to the OMAP-L13x processors forum: http://e2e.ti.com/support/dsp/omap_applications_processors/f/42.aspx

    Scott

  • Changing the Master Piority Control to favor the DSP had no effect.  I resolved this last issue by putting the DSP code's .far and .bss sections into IRAM which the ARM does not use.  Things are running fine now.

    Thanks for the help.

    Mary