This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Speed problem when TC interrupt enabled for EDMA3



I am having a problem when I enable the transfer completion interrupt in my ping-pong DMA code.  I find that when the interrupt is enabled that the code works but that other threads are starved for time.  I started with the code from edma3_lld_01_10_00_01\examples\edma3_driver\evm6747.  Then I modified dma_ping_pong_test so that it is event triggered instead of manual (see attached zip).  I'm using GPIO4_3 to trigger the events.  This works, but I find that if I feed in a GPIO signal that is faster than about 15kHz for the EDMA trigger that everything bogs right down.  For example I don't see any Heartbeat text or update of my CPU Load Graph in CC3.  If I disable the TC interrupt then I can run the GPIO signal much faster ex >100kHz.  My ultimate goal is to use the EDMA to transfer ADC data from a FIFO on the EMIF into ping-pong buffers.  I'm expecting the FIFO's to half-fill at a rate of about 40kHz.   I will then alternate processing between ping and pong.  I am running on the SpectrumDigital C6747 EVM. 

Does anyone have any idea why the TC interrupt would slow things down so much?  Is this normal?  Is there a reliable way to test for completion of the transfers without using the interrupt?

Thanks,

Lopi

 

TriggeredDMAevm6747version2.zip
  • Since your GPIO can run at 15KHz with the interrupt, the time per event is 66us. Since the GPIO can run at 100KHz without the interrupt, the time per transfer is 10us. So this says that you are spending 56us in your complete interrupt process.

    • What clock frequency are you running the DSP at?
    • Are you using DSP/BIOS?
    • What are ALL the steps that the interrupt goes through to get to the ISR? HWI_Dispatcher? EDMA dispatcher? Rts interrupt wrapper?
    • What do you do in the ISR?

    A good test I would like you to try is to connect your GPIO to the TC ISR rather than to the EDMA event. Measure the frequency you can run this interrupt and compare that to the 56us when adding the TC to cause the same interrupt.

  • What clock frequency are you running the DSP at?  I'm presuming it's 300MHz.  I'm using the GEL file from the SpectrumDigital site.  It sets up the PLL.  What register would I need to look at to confirm this?

    Are you using DSP/BIOS?  Yes

    What are ALL the steps that the interrupt goes through to get to the ISR? HWI_Dispatcher? EDMA dispatcher? Rts interrupt wrapper? I'm using the EDMA3 LLD driver.  It seems to make use of the ECM_Dispatcher.  When I break my code it is often in the routine "edma3ComplHandler" from "edma3resmgr.c".  Sorry I haven't figured out enough of the profiling tools or EDMA driver code to say more.

    What do you do in the ISR? Very little - it's the callback1 function from the edma3 examples.  It just tests whether the status is EDMA3_RM_XFER_COMPLETE, sets the irqRaised1 flag ,and returns.

    A good test I would like you to try is to connect your GPIO to the TC ISR rather than to the EDMA event.  - I haven't had a chance to try this yet.

    I'm also tempted to try just running the EDMA without using the driver.  Thanks for the help.

    Lopi.

  • Lopi said:

    I'm using the EDMA3 LLD driver.  It seems to make use of the ECM_Dispatcher.  When I break my code it is often in the routine "edma3ComplHandler" from "edma3resmgr.c".  Sorry I haven't figured out enough of the profiling tools or EDMA driver code to say more.

    Is the LLD specifically for the C6747 or did it come from another device? The reason I ask is that the C6747 does not implement the aaaH EDMA3 registers since it only has 32 channels. If the high-word registers were being accessed, this could cause a delay or mistaken results.

    Is there a reason that you are using the ECM_Dispatcher? This should only be invoked if you set your HWI_INTn interrupt selection number to 0-3; any higher number would just go straight to the ISR rather than through the ECM module. I assume you are using EDMA Region 1, since that appears to be the only one you for which can get a completion interrupt to the DSP. If you set the HWI_INTn interrupt selection number to 8 for EDMA3 CC0 Region 1 interrupt, the ECM module should not be used, but edma3ComplHandler will still be used, as it currently is.

    How are you observing that your test program will not work above 15KHz? I understand that you are toggling the GPIO at that rate and that if you increase the toggle frequency above that it does not work. So how do you observe that it does not work? Does the program crash? Are you watching some external bus activity and see that it will not go any faster?

    If your test program is just setting up the EDMA and then going into a while(1) loop forever, then you could measure the time taken in the ISR as follows (not tested, so forgive any typos)

    #include <c6x.h> // required for TSCL/TSCH register symbols

    test_task()
    {
        Uint64 u64MaxDelay = 0;
        Uint64 u64LastTime;
        Uint64 u64ThisTime;

    ... setup the EDMA so everything is ready

        TSCL = 0;  // start the TSC timer

        u64LastTime  = TSCL;
        u64LastTime |= ((Uint64)TSCH) << 32;
        while (1)
        {
            u64ThisTime  = TSCL;
            u64ThisTime |= ((Uint64)TSCH) << 32;
            if ( u64MaxDelay < ( u64ThisTime-u64LastTime ) )
                u64MaxDelay = u64ThisTime-u64LastTime;
            u64LastTime = u64ThisTime;
        }
    }

    This will confirm the amount of time you spend getting interrupted, but it it will also include DSP/BIOS things like timer ticks, if any. If you get a max number and want to learn more about it, you could make a histogram array and count the number of times you are in different delay ranges. Just a thought. Please note that if you decide to use TSC in an interrupt routine, it will mess up the TSCH read here, so you would need to protect the TSCL/H reads from an interrupt happening between them.

  • Is the LLD specifically for the C6747 or did it come from another device? The reason I ask is that the C6747 does not implement the aaaH EDMA3 registers since it only has 32 channels. If the high-word registers were being accessed, this could cause a delay or mistaken results. 

    I took the code from edma3_lld_01_10_00_01\examples\edma3_driver\evm6747.  However all of the processors listed  in the examples\edma3_driver directory (dsk6455...) use the same src directory.  It appears that device-specific info comes from the file bios_edma3_drv_sample_c6747_cfg.c.  There's definitely references to the high-word registers in the code with no special precautions as far as I can tell for the c6747, but it would take some major changes to the driver and resource manager to get rid of this.

    Is there a reason that you are using the ECM_Dispatcher? This should only be invoked if you set your HWI_INTn interrupt selection number to 0-3; any higher number would just go straight to the ISR rather than through the ECM module.

    The example code uses ECM_Dispatcher.  In bios_edma3_drv_sample_init it calls the function registerEdma3Interrupts.  I suppose I could change the driver code and recompile the libraries but I'll need ECM eventually in my project.  Although, to be honest I find the ECM pretty confusing.

    How are you observing that your test program will not work above 15KHz? I understand that you are toggling the GPIO at that rate and that if you increase the toggle frequency above that it does not work. So how do you observe that it does not work? Does the program crash? Are you watching some external bus activity and see that it will not go any faster?

    I am observing that my Heartbeat task (see below) doesn't update in CC3 when I toggle the GPIO faster than 15kHz.  If I slow it down then I start getting text again.  The program never crashes it just seems that the EDMA hogs the CPU.  I had observed similar symptoms in my main project where I'm reading data via the Jungo USB msd using a custom application on my PC (ie when I run the EDMA test code then my USB task doesn't get serviced) . 

    void tskHeartBit()
      {
     unsigned int counter = 0u;

        while (counter < 0x1000000u)
            {
            printf("\r\n\r\n!!! EDMA3 LLD HrtBt %x", counter);
            counter++;
      TSK_sleep(1000);
            }
        }

    I wonder if you could take a look at the code where I setup the EDMA channels.? This is where I've made the most changes compared to the example code so I'm thinking it is the most likely place where I have an error:

    ////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////

    EDMA3_DRV_Result edma3_test_ping_pong_mode(EDMA3_DRV_Handle hEdma)
        {
        EDMA3_DRV_Result result = EDMA3_DRV_SOK;
        EDMA3_DRV_PaRAMRegs paramSet = {0,0,0,0,0,0,0,0,0,0,0,0};
        /* One master channel */
        unsigned int chId = 0;
        /* Two link channels */
        unsigned int lChId1 = 0;
        unsigned int lChId2 = 0;
        unsigned int tcc = 0;
        int i;
        unsigned int count;
        unsigned int Istestpassed = 0u;
        unsigned int BRCnt = 0;
        int srcbidx = 0, desbidx = 0;
        int srccidx = 0, descidx = 0;
        /* PaRAM Set handle */
        unsigned int phyaddress = 0;
        EDMA3_DRV_ParamentryRegs *param_handle = NULL;
        /* Number of triggers for EDMA3. */
        unsigned int numenabled = PING_PONG_NUM_TRIGGERS;

        pingpongSrcBuf = (signed char*)_pingpongSrcBuf;
        pingpongDestBuf = (signed char*)_pingpongDestBuf;
        pingpongSrcBufCopy = pingpongSrcBuf;
        pingpongDestBufCopy = pingpongDestBuf;
        dstL1DBuff1 = (signed char*)_dstL1DBuff1;
        dstL1DBuff2 = (signed char*)_dstL1DBuff2;


        /* Initalize source buffer for PING_PONG_DDR_BUFFER_SIZE bytes of data */
        for (count = 0u; count < PING_PONG_DDR_BUFFER_SIZE; count++)
            {
            pingpongSrcBuf[count] = (count % 0xFF);
            }


    #ifdef EDMA3_ENABLE_DCACHE
        /*
        * Note: These functions are required if the buffer is in DDR.
        * For other cases, where buffer is NOT in DDR, user
        * may or may not require the below functions.
        */
        /* Flush the Source Buffer */
        if (result == EDMA3_DRV_SOK)
            {
            result = Edma3_CacheFlush((unsigned int)pingpongSrcBuf, PING_PONG_DDR_BUFFER_SIZE);
            }

        /* Invalidate the Destination Buffers */
        if (result == EDMA3_DRV_SOK)
            {
            result = Edma3_CacheInvalidate((unsigned int)pingpongDestBuf, PING_PONG_DDR_BUFFER_SIZE);
            }

        /**
         * Since the ping/pong buffers are in IRAM, there is no need of invalidating
         * them. If they are in DDR, invalidate them.
         */

        /*
        if (result == EDMA3_DRV_SOK)
            {
            result = Edma3_CacheInvalidate((unsigned int)dstL1DBuff1, PING_PONG_L1D_BUFFER_SIZE);
            }
        if (result == EDMA3_DRV_SOK)
            {
            result = Edma3_CacheInvalidate((unsigned int)dstL1DBuff2, PING_PONG_L1D_BUFFER_SIZE);
            }
        */
    #endif  /* EDMA3_ENABLE_DCACHE */


        /* Set B count reload as B count. */
        BRCnt = PING_PONG_BCNT;

        /* Setting up the SRC/DES Index */
        srcbidx = (int)PING_PONG_ACNT;
        desbidx = (int)PING_PONG_ACNT;

        /* AB Sync Transfer Mode */
        srccidx = ((int)PING_PONG_ACNT * (int)PING_PONG_BCNT);
        descidx = ((int)PING_PONG_ACNT * (int)PING_PONG_BCNT);

        /* Setup for DMA Channel 1*/
        tcc = EDMA3_DRV_TCC_ANY;
    #ifdef GPIOTRIGGERED
        chId = CSL_EDMA3_CHA_GPIO_BNKINT4; 
    #else
        chId = EDMA3_DRV_DMA_CHANNEL_ANY;
    #endif

        /* Request any DMA channel and any TCC */
        if (result == EDMA3_DRV_SOK)
            {
            result = EDMA3_DRV_requestChannel (hEdma, &chId, &tcc,
                                                (EDMA3_RM_EventQueue)0,
                                                &callback1, NULL);
            }

        /* If successful, allocate the two link channels. */
        if (result == EDMA3_DRV_SOK)
            {
            lChId1 = EDMA3_DRV_LINK_CHANNEL;
            lChId2 = EDMA3_DRV_LINK_CHANNEL;

            result = (
                        (EDMA3_DRV_requestChannel (hEdma, &lChId1, NULL,
                                                (EDMA3_RM_EventQueue)0,
                                                &callback1, NULL))
                        ||
                        (EDMA3_DRV_requestChannel (hEdma, &lChId2, NULL,
                                                (EDMA3_RM_EventQueue)0,
                                                &callback1, NULL))
                        );
            }


        /**
         * Fill the PaRAM Sets associated with all these channels with transfer
         * specific information.
         */
        if (result == EDMA3_DRV_SOK)
            {
            paramSet.srcBIdx    = srcbidx;
            paramSet.destBIdx   = desbidx;
            paramSet.srcCIdx    = srccidx;
            paramSet.destCIdx   = descidx;

            paramSet.aCnt       = PING_PONG_ACNT;
            paramSet.bCnt       = PING_PONG_BCNT;
            paramSet.cCnt       = PING_PONG_CCNT;

            /* For AB-synchronized transfers, BCNTRLD is not used. */
            paramSet.bCntReload = BRCnt;

            /* Src & Dest are in INCR modes */
            paramSet.opt &= 0xFFFFFFFCu;
            /* Program the TCC */
            paramSet.opt |= ((tcc << OPT_TCC_SHIFT) & OPT_TCC_MASK);

            /* Enable Intermediate & Final transfer completion interrupt */
      // don't want intermediate interrupt
            //paramSet.opt |= (1 << OPT_ITCINTEN_SHIFT);
            paramSet.opt |= (1 << OPT_TCINTEN_SHIFT);

            /* AB Sync Transfer Mode */
            paramSet.opt |= (1 << OPT_SYNCDIM_SHIFT);


            /* Program the source and dest addresses for master DMA channel */
            paramSet.srcAddr    = (unsigned int)(pingpongSrcBuf);
      paramSet.destAddr   = (unsigned int)(dstL1DBuff1);


            /* Write to the master DMA channel first. */
            result = EDMA3_DRV_setPaRAM(hEdma, chId, &paramSet);
            }


        /* If write is successful, write the same thing to first link channel. */
        if (result == EDMA3_DRV_SOK)
            {
            result = EDMA3_DRV_setPaRAM(hEdma, lChId1, &paramSet);
            }


        /**
         * Now modify the dest addresses and write the param set to the
         * second link channel.
         */
        if (result == EDMA3_DRV_SOK)
            {
            paramSet.destAddr   = (unsigned int)(dstL1DBuff2);

            result = EDMA3_DRV_setPaRAM(hEdma, lChId2, &paramSet);
            }

     

        /**
         * Do the linking now.
         * Master DMA channel is linked to IInd Link channel.
         * IInd Link channel is linked to Ist Link channel.
         * Ist Link channel is again linked to IInd Link channel.
         */
        if (result == EDMA3_DRV_SOK)
            {
            result = (
                        (EDMA3_DRV_linkChannel (hEdma, chId, lChId2))
                        ||
                        (EDMA3_DRV_linkChannel (hEdma, lChId2, lChId1))
                        ||
                        (EDMA3_DRV_linkChannel (hEdma, lChId1, lChId2))
                        );
            }

        /**
         * Save the handle to the master dma channel param set.
         * It will be used later to modify the source address quickly.
         */
        if (result == EDMA3_DRV_SOK)
            {
            result = EDMA3_DRV_getPaRAMPhyAddr(hEdma, chId, &phyaddress);
            }

        /*
        - Algorithm used in the ping pong copy:
        1. Application starts EDMA of first image stripe into ping buffer in L1D.
        2. Application waits for ping EDMA to finish.
        3. Application starts EDMA of next image stripe into pong buffer in L1D.
        4. Application starts processing ping buffer.
        5. Application waits for pong EDMA to finish.
        6. Application starts EDMA of next image stripe into ping buffer in L1D.
        7. Application starts processing pong buffer.
        8. Repeat from step 3, until image exhausted.
        - EDMA re-programming should be minimized to reduce overhead (PaRAM
            accesses via slow config bus), i.e. use 2 reload PaRAM entries, and
            only change src address fields.
        */

        if (result == EDMA3_DRV_SOK)
            {
            /* Param address successfully fetched. */
            param_handle = (EDMA3_DRV_ParamentryRegs *)phyaddress;

            /* Step 1 */
      #ifdef GPIOTRIGGERED
         result = EDMA3_DRV_enableTransfer (hEdma, chId,
                                                EDMA3_DRV_TRIG_MODE_EVENT);

      #else
            result = EDMA3_DRV_enableTransfer (hEdma, chId,
                                                EDMA3_DRV_TRIG_MODE_MANUAL);
      #endif
            /**
             * Every time a transfer is triggered, numenabled is decremented.
             */
            numenabled--;

            /**
             * Every time a transfer is triggered, pingpongSrcBufCopy is
             * incremented to point it to correct source address.
             */
            pingpongSrcBufCopy += PING_PONG_L1D_BUFFER_SIZE;
            }

    #ifdef GPIOTRIGGERED
        #ifdef EDMA3_DRV_DEBUG
        EDMA3_DRV_PRINTF("edma3_test_ping_pong_mode gpio triggered started\r\n");
     #endif
     // Return without checking status of irqRaised1. Source address doesn't get incremented
     // and process_ping_pong_buffer don't get called either.
        return result;
    #endif
    ////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////

    Thanks,

    Lopi

    PS I included my whole project as an attachment on my first post.

     

     

  • Lopi said:
    I suppose I could change the driver code and recompile the libraries but I'll need ECM eventually in my project.  Although, to be honest I find the ECM pretty confusing.

    It should be easier than that to change one of the HWIs to be dedicated to your EDMA completion interrupt, but the ECM will only add a small amount of extra overhead. We tried to explain how it works in the "C64x+ Megamodule Features" training video for the C6474 in the Training section of TI.com. You can find the complete video set at http://focus.ti.com/docs/training/catalog/events/event.jhtml?sku=OLT110002 . The INTC and ECM function basically the same in the C674x, with any differences explained in the C6747 datasheet or User's Guides.

    Lopi said:
    I am observing that my Heartbeat task (see below) doesn't update in CC3 when I toggle the GPIO faster than 15kHz.  If I slow it down then I start getting text again.  The program never crashes it just seems that the EDMA hogs the CPU.

    This is the second time you have said CC3. Do you mean CCS? We used to have CC, then changed to CCS, and there is also CCE, which I do not know much about.

    I guess your heartbeat is valid, although for realtime systems we usually recommend against using printf since it stops the CPU when it comes time to dump data. I know the LLD makes use of it for debug, and I use it a lot for benchmarking. Just keep in mind that it will cause a big hit in real-time latency every second when it runs. For you case, the fact that it never runs means that you are staying in interrupt code all the time, which is what you wanted to know anyway, so who am I to criticize it?

    Lopi said:
    I included my whole project as an attachment on my first post.

    I don't see anything wrong in the red-highlighted lines. I did not go through all the other lines in detail. They look reasonable from just skimming over them.

    Lopi said:
    tcc = EDMA3_DRV_TCC_ANY;

    The only comment I have from the code is that this line might should be changed to tcc = CSL_EDMA3_CHA_GPIO_BNKINT4; to match your channel number. I think the LLD tends to associate each TCC with the corresponding Channel, even though that is not a requirement. But it could be a source of confusion when trying to tell the difference between the two.

     

    Your pjt file has a bunch of test C files, so I have no idea where the ISR for your routine is. You said it is not going much. Do you need all the other test files in there, or are you sure they are not getting involved? I am not going to try to debug your code, but I will try to help you figure out where to look.

    When the interrupt occurs, the ECM Dispatcher will get called. I do not know if it should be used along with the HWI_Dispatcher - I have never used the ECM Dispatcher, sorry. But if you are getting into and out of your ISR okay, then it must be configured right. Have you set a breakpoint in your ISR to make sure you get there? If not, that would be good to at least know you are getting there.

    After the ECM Dispatcher, the LLD's EDMA dispatcher will get called to figure out how to get to your ISR in particular. ALl of these dispatchers do represent interrupt overhead, but should not be taking anywhere near the 56us that you seem to be experiencing.

    If you look at the CPU Load Graph tool, do you see the load going to 100% when you get close to 15KHz? What does it say at 10KHz and 5KHz? How about when you do not use the interrupt?

    Why is there no .map file?

    Your linker cmd file looks like you have all of L1D and L1P allocated as cache, but you have buffer names with L1D in them. Is this just historical and confusing, or am I missing something in where these are located?

    Are your transfers going to external memory? If so, then can you look at some signal on a scope to see how much time the actual transfers are taking when you are at full CPU load?

  • Randy,

    Thanks for the quick reply.

    I meant to say that I am using Code Composer Studio 3.3. 

    Yes, I've set an interrupt in the ISR and confirmed that I get there.  When I run my GPIO at 1kHz I have 15% CPU load, 5kHz gives 38% CPU load and 10 kHz gives 65% CPU load.  When I don't use the interrupt I have 10% CPU load with my GPIO triggering at 500 kHz (I think when I mentioned successfully running at 100kHz earlier that was as high as I had tried it).

    I've re-attached my project with just the relevant files (this time with the map file).  I haven't made any changes to the linker cmd or buffer names so the confusion there must originate with the example that I started with. 

    I've tried to eliminate the ECM_Dispatcher, but so far I it's not working (I don't get to the breakpoint in the ISR).  I commented out this code in the attached files.

    Lopi

    evm6747 version3.zip
  • I meant to say "I set a breakpoint in the ISR". 

  • Many of my comments came before I looked at the LLD examples that you started from. You have done a fine job of using all the techniques that are used in the examples.

    Looking through the LLD EDMA dispatcher and the callback1 function, I do not see any reason for the big delay you are experiencing. Hopefully, someone else will see this thread and jump in with a straight-forward answer.

    To try to dig into what is going on, I would like to ask you to try to narrow down the delay(s) involved to see exactly where the problem might be. Here are some recommended steps:

    1. Change CCNT to 0. This will make the DMA transfer become a DUMMY transfer. This means no actual transfer will occur, but it will still set IPR and generate an interrupt.
    2. Use TSCL/TSCH (see previous post above) to measure the time in CPU cycles around some test code with and without triggering the EDMA manually. A sample of code that might work is below.
    3. Assuming that this does prove out the >50us delay to run the interrupt code, add another TSCL/TSCH read inside callback1 and compare the time before the trigger, inside the ISR, and after returning. This could be helpful in figuring out where the problem is.

    In dma_ping_pong_test.c, I think this code would be best located in here replacing the #ifdef GPIOTRIGGERED after /* Step 1 */, since everything should be ready to run. I am assuming that interrupts are enabled at this point, which they should be for the code in Step 2 to work if you were not returning early.

    Uint64 u64ISRTime = 0;

    {
        Uint64 u64MaxDelay = 0;
        Uint64 u64LastTime;
        Uint64 u64ThisTime;
        Uint64 u64Calibrate;
        Uint64 u64TimeBeforeISR;
        Uint64 u64TimeAfterISR;
        volatile int i;


        TSCL = 0;  // start the TSC timer

        u64LastTime  = TSCL;
        u64LastTime |= ((Uint64)TSCH) << 32;
        for ( i = 0; i < 100; i++ );  // delay to make sure there is time for the int to occur
        u64ThisTime  = TSCL;
        u64ThisTime |= ((Uint64)TSCH) << 32;
        u64Calibrate = u64ThisTime - u64LastTime;

        u64TimeBeforeISR  = TSCL;
        u64TimeBeforeISR |= ((Uint64)TSCH) << 32;
        result = EDMA3_DRV_enableTransfer (hEdma, chId,
                                          EDMA3_DRV_TRIG_MODE_MANUAL);
        for ( i = 0; i < 100; i++ );  // delay to make sure there is time for the int to occur
        u64TimeAfterISR  = TSCL;
        u64TimeAfterISR |= ((Uint64)TSCH) << 32;

    }

    In callback1, if needed:

    extern Uint64 u64ISRTime;
        u64ISRTime  = TSCL;
        u64ISRTime |= ((Uint64)TSCH) << 32;

     

  • Here is what I think the problem is. You are running everything out of external memory without the benefit of cache. That is how the LLD example was written - it functions well and demos all the different ways you can run the EDMA3 features, but it is not system-optimized for performance benchmarking.

    You will learn a lot about the benefits of cache and of using internal memory by trying the following steps and checking the GPIO rate you can reach after each. I would be interested to see those results.

    1. Set the MAR bits for the external memory and shared memory in your tcf config file. The datasheet will tell you exactly which bits need to be be set to 1 for which address ranges. Note that the description field in the tcf GUI config lists the numbers from low-to-high while the actual MAR bits in the 32-bit hex value go from high-to-low. MAR32-63 means that MAR32 is bit 0 and MAR63 is bit 31.

    2. Enable some L2 cache. This is not a clear tradeoff for this case, but it could be a good plan to put big cache and try mixing that with other moves.

    3. You have several logging buffers and such in IRAM. Move these to SDRAM to make room for more critical code and data.

    4. Move .text, .bss, .stack, to IRAM or Shared RAM. There may be a tradeoff between L2 cache vs. L2 SRAM here, so try some things.

    The most critical code for your particular test case is the set of routines that implement the full path to the ISR and back. The HWI_Dispatcher and ECM_Manager are in BIOS and the lisrEdma3ComplHandler is in the LLD. There are ways to put just portions of .text into specific memory areas, but you can worry about that later if you need to do fine tuning to the performance.

  • Randy,

    You're right, playing around with the memory does seem to be the key.  I've gotten-rid-of the HeartBeat task and now I'm benchmarking my performance with the CPU load.  I've set my GPIO frequency to 40 kHz.  With these conditions I was running 96% CPU load.  Now for your suggestions:

    1.  Set MAR192to223 = 0x0000000f; - this makes a huge difference - my CPU load drops to 36%

    2.  Enable L2 cache - I tried 32K and 128K - didn't detect any difference in CPU load .

    3.  Move logging buffers to IRAM - it turns out that I have lots of room for code in IRAM for this simple example so no need to do this.  Either way it makes no difference to my CPU load.

    4.  Move .text, .bss, .stack to IRAM - again I didn't detect any difference in CPU load.  I experimented with moving some other things in the tcf from SDRAM to IRAM but didn't see any difference.

    I can reduce my CPU load to about 6% CPU  by hacking lots of code out of edma3ComplHandler.  For example I don't need to loop through checking the interrupt flags for all the channels since I know that I've only enabled transfers for GPIO BANK 4.  I think that what may be really slowing it down is that the EDMA variables, like resMgrObj, are stored in external memory. 

    Thanks for the help,

    Lopi

  • Sorry... here's the same thing as above but in a readable font this time....

    Randy,

    You're right, playing around with the memory does seem to be the key.  I've gotten-rid-of the HeartBeat task and now I'm benchmarking my performance with the CPU load.  I've set my GPIO frequency to 40 kHz.  With these conditions I was running 96% CPU load.  Now for your suggestions:

    1.  Set MAR192to223 = 0x0000000f; - this makes a huge difference - my CPU load drops to 36%

    2.  Enable L2 cache - I tried 32K and 128K - didn't detect any difference in CPU load .

    3.  Move logging buffers to IRAM - it turns out that I have lots of room for code in IRAM for this simple example so no need to do this.  Either way it makes no difference to my CPU load.

    4.  Move .text, .bss, .stack to IRAM - again I didn't detect any difference in CPU load.  I experimented with moving some other things in the tcf from SDRAM to IRAM but didn't see any difference.

    I can reduce my CPU load to about 6% CPU  by hacking lots of code out of edma3ComplHandler.  For example I don't need to loop through checking the interrupt flags for all the channels since I know that I've only enabled transfers for GPIO BANK 4.  I think that what may be really slowing it down is that the EDMA variables, like resMgrObj, are stored in external memory. 

    Thanks for the help,

    Lopi

  • Without editing the code, you reduced the CPU load from 96% to 36%? That is a big advantage from a fairly small amount of L1P/L1D cache. Since most of your load seems to be inside the ISR, you may be retaining all or most of that ISR code in the cache between interrupts. That is not a normal situation, but for a "capabilities" test like yours, it makes sense.

    The fact that L2 cache did not help much tells me that enough code is fitting into L1P and L1D so that the extra space does not help.

    From your comment on resMgrObj being stored in external memory, an easy question is whether you have enough room to move those EDMA variables to IRAM. There is a compiler switch that tells the compiler where to put aggregate data, like struct and arrays. If you change this to put them in near memory instead of far, that might move them all to IRAM. Or you could manually move them using the linker command file and/or #pragma DATA_SECTION directives.

    You may or may not agree with this, but there is safety in leaving edma3CompIHandler untouched. Subtle issues may be handled without being clearly documented in the code, or future system changes might make some of the hacks get in the way later. But 36% down to 6% is a huge change.