This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TMS320C6424: Problem with cache invalidate

Part Number: TMS320C6424

Hi TI

I have a EDMA memory transfer from one memory location to another both in external memory. Size is about 19 kB. I use cache invalidation to ensure the cache is flushed, but tests have shown that some of the memory isn't flushed.

Here is what I do:

1) memset(&stDataFile[ui8DataFileIndex].stData, 0, sizeof(stDataFile[ui8DataFileIndex].stData));

2) BCACHE_wbInv(&stDataFile[ui8DataFileIndex].stData, sizeof(&stDataFile[ui8DataFileIndex].stData), TRUE);

3) Initaite EDMA transfer

4) wait for EDMA transfer to complete by checking bit in IprH register

5) BCACHE_inv(&stDataFile[ui8DataFileIndex].stData, sizeof(stDataFile[ui8DataFileIndex].stData), TRUE);

If I check the data right after step 5, there is random ranges in the memory with the value 0 (from the memset in step 1).

If I wait a couple of ms (TSK_sleep) or longer the data is correct.

Any suggestions of what might cause this?

  • Hi,

    I've notified the RTOS team. Their feedback will be posted here.

    Best Regards,
    Yordan
  • Can you check the buffer alignment as specified here:

    e2e.ti.com/.../35570

    These problems have been described here:
    processors.wiki.ti.com/.../Cache_Management

    Regards,
    Rahul
  • Hi Rahul

    Thank you for replying.

    The buffer is aligned at a 128 boundery (using #pragma DATA_ALIGN), and the buffer size is 19200 which is divideable with 128.

    I've made a test using the BCACHE_wbInvAll() followed by the BCACHE_wait() and that solved the problem but is not a usable solution.

    Are there any known issues with BCACHE_inv in DSP/BIOS_Kernel 5,2,5,25 11-20-2009 ?

  • Thomas,

    What exactly does "check the data right after step 5" mean? How are you checking memory?

    Can you try changing the Wait parameter in step 5 to FALSE and add a step 5a to call BCACHE_wait() explicitly?

    I do not know if there is an equivalent CSL function for the C6424 like what is called out in the Cache User Guide, CACHE_inv. If there is, try that instead of BCACHE_inv. If not, there are CSL register-level macros that would allow you to perform the cache invalidate command directly, following the instructions in the Megamodule Reference Guide Example 4-2. "Block Coherence Operation Example"; it is pretty simple. If there is an issue with the wait operation, then you may need to try various delays before testing L2IWC.

    An odd question, perhaps, but what is the value of TRUE in your program?

    The symptoms you describe definitely appear to be a failure of the Wait feature. The BCACHE commands have been in wide use for many years, so it is not so likely that they will be failing like this. Yet they are. If you can try these tests to see what changes, if any, you see in your operation, it may help point us to the problem.

    Regards,
    RandyP
  • Hi RandyP

    Thank you for replying.

    RandyP said:
    What exactly does "check the data right after step 5" mean? How are you checking memory?

    I check the data by calculating the checksum right after step 5, and then later again before writing the data to FLASH storage. And the two doesn't match. The checksum calculation is inherited from other projects with many years in the field (i.e. proven track record).

    During debugging I also do several memcpy of the buffer to visual inspect the data through JTAG (Lauterbach Trace32). And I can see that there are random blocks of '0' (from the memset in step 1). These blocks are reduced after each memcpy.

    If I place the buffers in non-cached memory it all works fine.

    RandyP said:
    Can you try changing the Wait parameter in step 5 to FALSE and add a step 5a to call BCACHE_wait() explicitly?

    No luck. I still get different checksum calculations.

    RandyP said:
    do not know if there is an equivalent CSL function for the C6424 like what is called out in the Cache User Guide, CACHE_inv. If there is, try that instead of BCACHE_inv. If not, there are CSL register-level macros that would allow you to perform the cache invalidate command directly, following the instructions in the Megamodule Reference Guide Example 4-2. "Block Coherence Operation Example"; it is pretty simple. If there is an issue with the wait operation, then you may need to try various delays before testing L2IWC.

    I'm not familiar with CSL functions for the C6424, but I found the Registers and tried this:

    CACHE_L2IBAR = (uint32_t)&stDataFile[ui8DataFileIndex].stData;

    CACHE_L2IWC = 600; //19200/32

    while(CACHE_L2IWC != 0);

    But unfortunately no luck. I still get different checksum calculations.

    RandyP said:
    An odd question, perhaps, but what is the value of TRUE in your program?

    TRUE is defined as 1

    RandyP said:
    The symptoms you describe definitely appear to be a failure of the Wait feature. The BCACHE commands have been in wide use for many years, so it is not so likely that they will be failing like this. Yet they are. If you can try these tests to see what changes, if any, you see in your operation, it may help point us to the problem.

    We are using a rather old version of the DSP BIOS/Kernel (see version in my previous post). I've also found an errata for another device (cannot remember which), that had an bug when using the invalidate functionality, and the recommendation was to disable interrupts (and a few other things) when invalidating. This is however not an option for us at this stage. Can you confirm that there isn't a similar issue with the C6424 device?

  • Thomas,

    Unfortunately, I do not have any access into the BIOS to confirm anything. I am more of a user like you trying to find the problem and a solution. I did not see anything about cache like this in the device errata, which is about all the access I have.

    Would you please try this to capture some early values of L2IWC?

    int CaptureL2IWC[50];

    CACHE_L2IBAR = (uint32_t)&stDataFile[ui8DataFileIndex].stData;

    CACHE_L2IWC = 600; //19200/32

    for ( int i = 0; i < 50; i++ )
    CaptureL2IWC[i] = CACHE_L2IWC;

    while(CACHE_L2IWC != 0);

    Then take a look at the CaptureL2IWC array to see what the early values are, please?

    Regards,
    RandyP
  • /*
     *  ======== bcache_wait.c ========
     *
     *! Revision History
     *! ================
     *! 04-Jun-2010 jv	Fix SDOCM00071000: hole in BCACHE_wait(). 
     *! 19-Sep-2006 jv	fix SDSCM00003322 - do extra read on 2430, 3430. 
     *! 10-May-2006	jv	modify emifAddr name.
     *! 22-Mar-2006	jv	Move BCACHE_wait function here.
     *! 23-Jan-2006	jv	Support 6488 (Faraday) chip
     *! 23-Jan-2006	jv	Support 6486 (Tomahawk) chip
     *! 20-Dec-2005	jv	Change the MAXWC to 0xFF00 to be safe.
     *! 14-Dec-2005	jv	move #pragma's next to function. Implement second
     *!			review changes.  Remove EMIF workaround for omap2430.
     *! 09-Dec-2005	jv	Put all code into .bios section. Remove CSL.
     *! 07-Dec-2005	jv	Moved all sources into one source file also made fixes
     *!			from code review feedback.
     *! 10-Nov-2005	jv	Created.
     */
    
    #include <std.h>
    #include <bcache.h>
    #include "_bcache.h"
    #include <hwi.h>
    
    
    /* emif configuration address needed to insure writes made it externally */
    volatile Uint32 *_BCACHE_emifAddr = NULL;
    
    #if defined(_2430_) || defined(_3430_)
    /* address table to be read during BCACHE_wait() */
    extern far volatile Uint32 *_BCACHE_readTable[];
    #endif
    
    
    #pragma CODE_SECTION(BCACHE_wait, ".bios")
    /*
     *  ======== BCACHE_wait ========
     *  Wait for the L2 count to complete.  This function needs only to wait
     *  for L2 word count since all block cache operations in BIOS are done
     *  through the L2 registers and all global cache operations must already
     *  wait until the operation completes.  Note:  Its sufficient to wait
     *  on one of the L2 count registers since all 3 count registers are
     *  mirrors of one another and map to the same bits.
     */
    Void BCACHE_wait()
    {
    #if defined(_2430_) || defined(_3430_)
        Int i;
    #else
        Uns mask;
    #endif
    
        /* wait for L2 word count to be zero */
        while (*_BCACHE_L2WWC != 0) {
    	;
        }
    
        /*
         *  SDSCM03322 - dummy read to the addresses specified in BCACHE_readTable
         */
    #if defined(_2430_) || defined(_3430_)
        for (i = 0; i < 3; i++) {
    	if (_BCACHE_readTable[i] != NULL) {
    	    *_BCACHE_readTable[i];
    	}
        }
    
    #else
    
        /*
         *  make a dummy write and read to emif config register to
         *  insure that data made it out to external memory, otherwise
         *  its possible that the data is out of the Master's view but
         *  has not reached its final destination.
         */
        mask = HWI_disableI();
        if (_BCACHE_emifAddr != NULL) {
    	*_BCACHE_emifAddr = 0;
    	*_BCACHE_emifAddr;
    	_BCACHE_emifAddr = NULL;
        }
        HWI_restoreI(mask);
    
    #endif
    }
    

    I looked at the BCACHE code. There was one small change to eliminate a race condition. I've attached the source file. You should be able to include this in your project. The linker should pick up this version instead of the kernel's library. The change was to put the HWI_disable/restore around the if statement instead of inside the if statement.

    Todd

  • Hi RandyP

    RandyP said:
    Then take a look at the CaptureL2IWC array to see what the early values are, please?

    Here is the result:

    I wonder why the initial value is 608 and not 600. The data should be aligned at 128.

  • Hi Todd

    Thank you for contributing.

    ToddMullanix said:
    I looked at the BCACHE code. There was one small change to eliminate a race condition. I've attached the source file. You should be able to include this in your project. The linker should pick up this version instead of the kernel's library. The change was to put the HWI_disable/restore around the if statement instead of inside the if statement.

    I've included the file but had to comment out the inclusion of "_bcache.h" (not found) and change _BCACHE_L2WWC to CACHE_L2WWC, which is the register definition in our project (I've checked with the datasheet that it points to the correct address).

    Unfortunately I still get different checksum calculations, so it didn't solve the problem.

  • Thomas,

    Please put (uint32_t)&stDataFile[ui8DataFileIndex].stData in the first location of the array so we can see it.

    With that loop in the code, which shows that the count finally reaches 0 before exiting, do you get correct data reads after that?

    Regards,
    RandyP
  • RandyP said:
    Please put (uint32_t)&stDataFile[ui8DataFileIndex].stData in the first location of the array so we can see it.

    Actually during my recent tests I've moved the data out of the stDataFile struct, so the stData is defined as:

    #pragma DATA_ALIGN(stData, 128);
    static volatile stAdcData_t stData[600];

    I can memcpy the data (before each of my checksum calculations) and show you the problem. Is that what you had in mind?

    RandyP said:

    With that loop in the code, which shows that the count finally reaches 0 before exiting, do you get correct data reads after that?

    No, same problem persists (different checksum calculations).

    I'm almost out of time for this issue. So I might need to fallback and place the arrays in non-cached memory (which solves the problem), and live with the performance degradation.

  • Thomas,

    Please put (uint32_t)stData in the first location of the array so we can see it, or if the compiler prefers, put (uint32_t)&stData[0]. Also, please put (uint32_t)CACHE_L2IBAR in the second location of the array.

    And, I would like to see the 32-bit hex values in the 8 DMA PARAM entries for the EDMA3 transfer from just prior to triggering the transfer. I want to confirm addresses and Early/Normal and how you have everything setup for the transfer.

    Please display these all as 32-bit hex.

    The block size for WC has to be a multiple of 32 bytes, so when you write 600 to WC, it rounds up to 608 which is the next higher multiple of 32. Although I have not seen this problem before, I have seen cases where other variables located past the end of an array can get corrupted by unintended cache coherence commands. It would be good to look at what is in that space, although unlikely to be part of the problem.

    Are you using the latest silicon revision of the C6424, rev 1.3?

    What is the content (data values) of the memory being copied to stData? At what address is it stored?

    If you get a checksum of the original data from its original location prior to the copy, how does that checksum compare with the varying checksum values of stData after the transfer?

    Can you post the checksum code? Is it C or assembly? If C, can you include the Disassembly listing for it?

    How do you determine that the DMA transfer has completed?

    Since you are not dealing with a race condition on reading WC as 0 prematurely, the cache will have been completely invalidated and is unlikely to be contributing to this problem. This is why I am asking all these questions. The capture loop eliminates the cache coherence command from being at fault, as far as I can see. The symptoms still point there but we will not get better confidence than from your capture loop.

    Regards,
    RandyP
  • Hi RandyP

    Sorry for the delayed answer.

    RandyP said:
    Please put (uint32_t)stData in the first location of the array so we can see it, or if the compiler prefers, put (uint32_t)&stData[0]. Also, please put (uint32_t)CACHE_L2IBAR in the second location of the array.

    For reference:

    CACHE_L2IBAR = (uint32_t)&sData;
    CaptureL2IWC[0] = (uint32_t)&stData;
    CaptureL2IWC[1] = CACHE_L2IBAR;
    
    CACHE_L2IWC = 600;
    for (i = 2; i < 50; i++ )
    {
    	CaptureL2IWC[i] = CACHE_L2IWC;
    }
    while(CACHE_L2IWC != 0);

    The result:

    I'm not sure why the value in index 0 and 1 isn't the same?

    RandyP said:
    And, I would like to see the 32-bit hex values in the 8 DMA PARAM entries for the EDMA3 transfer from just prior to triggering the transfer. I want to confirm addresses and Early/Normal and how you have everything setup for the transfer.

    Please display these all as 32-bit hex.

    RandyP said:
    The block size for WC has to be a multiple of 32 bytes, so when you write 600 to WC, it rounds up to 608 which is the next higher multiple of 32. Although I have not seen this problem before, I have seen cases where other variables located past the end of an array can get corrupted by unintended cache coherence commands. It would be good to look at what is in that space, although unlikely to be part of the problem.

    Currently the CaptureL2IWC array is placed after stData. And the problem is still there.

    RandyP said:
    Are you using the latest silicon revision of the C6424, rev 1.3?

    Yes.

    RandyP said:
    What is the content (data values) of the memory being copied to stData? At what address is it stored?

    It's a chunk of ADC data from a circular buffer stored in non-cached external memory. The buffer starts at address 0x80400000.

    RandyP said:

    If you get a checksum of the original data from its original location prior to the copy, how does that checksum compare with the varying checksum values of stData after the transfer?

    Unfortunately no checksum prior to the copy.

    RandyP said:
    Can you post the checksum code? Is it C or assembly? If C, can you include the Disassembly listing for it?

    I'm sorry, I cannot share the code with you but heres the prototype:

    uint32_t CheckSum_Adler32(const uint32_t ui32Adler, const uint8_t *pui8buf, const uint32_t ui32Len);

    My had a little concern about qualifying the ptr as const when the stData is volatile. I've made some tests (one without const and another with volatile), but the checksum still differs depending on when it is calculated (which it don't when the data is placed in non-cached memory).

    RandyP said:
    How do you determine that the DMA transfer has completed?

    I poll and wait until the corresponding bit is set in the Interrupt pending register i.e. TCC 43 in the IPRH register (bit 11 (43-32)).

  • Thomas,

    The first two entries in CaptureL2IWC are different because you write a different value to L2IBAR that what you wrote to the first entry in the array, sData vs. stData.

    Since the WC register is counting down to 0, the cache is invalidated and that operation is done. There cannot be any more side effects of that cache operation.

    If you are waiting for the IPR bit to be set, then the EDMA operation has completed and all the data has landed. It cannot be continuing after that.

    I do not see any way to make happen what you are describing. There have to be other things going on in your application that are causing this odd behavior, but I do not see how the cache could be a part of it.

    Regards,
    RandyP
  • Hi RandyP

    RandyP said:
    The first two entries in CaptureL2IWC are different because you write a different value to L2IBAR that what you wrote to the first entry in the array, sData vs. stData.

    That must be an error while editing the post. In my code it's correct.

    RandyP said:
    Since the WC register is counting down to 0, the cache is invalidated and that operation is done. There cannot be any more side effects of that cache operation.

    If you are waiting for the IPR bit to be set, then the EDMA operation has completed and all the data has landed. It cannot be continuing after that.

    I do not see any way to make happen what you are describing. There have to be other things going on in your application that are causing this odd behavior, but I do not see how the cache could be a part of it.

    Okay. We have decided to move the data to none cached memory, which apparently solves the problem but is slower when accessing the data.

    Thank you for the support :)

    Best regards
    Thomas

  • Thomas,

    Thomas Jespersen93 said:
    That must be an error while editing the post. In my code it's correct.

    If it is correct in your code, then there is something else wrong since the first two values in the array are different. That needs to be debugged.

    Thomas Jespersen93 said:
    We have decided to move the data to none cached memory, which apparently solves the problem but is slower when accessing the data.

    There has to be something else going wrong in your system or code, since the cache should be cleanly invalidated. Attached is a standalone test program that does the several steps you have listed and successfully reads the memory. If you try running it, please report whether you also get correct values - the printf's should show a pre-test checksum, a second array's checksum, then a series of post DMA checksums that are all valid. The checksum is written in assembly and is very fast. The code is ugly and has left-overs from other tests I have run, so a lot of comments and #defines and #if's will have to be ignored.

    I do not find anything wrong with the C6424 cache invalidation hardware. Please try this code and if it works then integrate it into your code.

    Regards,
    RandyP

    C6424CacheTest.zip

  • Thomas,

    Thomas Jespersen93 said:
    That must be an error while editing the post. In my code it's correct.

    If it is correct in your code, then there is something else wrong since the first two values in the array are different. That needs to be debugged.

    Thomas Jespersen93 said:
    We have decided to move the data to none cached memory, which apparently solves the problem but is slower when accessing the data.

    There has to be something else going wrong in your system or code, since the cache should be cleanly invalidated. Attached is a standalone test program that does the several steps you have listed and successfully reads the memory. If you try running it, please report whether you also get correct values - the printf's should show a pre-test checksum, a second array's checksum, then a series of post DMA checksums that are all valid. The checksum is written in assembly and is very fast. The code is ugly and has left-overs from other tests I have run, so a lot of comments and #defines and #if's will have to be ignored.

    I do not find anything wrong with the C6424 cache invalidation hardware. Please try this code and if it works then integrate it into your code.

    Regards,
    RandyP

    0871.C6424CacheTest.zip