This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Odd behavior when enabling C6657 L2 cache



In the process of trying to enable L2 cache on DDR3 memory, I've run across odd behavior that i can't explain, nor can I eliminate.  I'm working with a C6657 DSP on both a TMSXEV6657LE evaluation card and our production card.

In DDR3 memory, I have marked the first two regions (0x80000000 - 0x81ffffff) as being non-cached.  The rest of DDR3 memory is cached.  Later, I read a block from NAND flash (on the EMIF16 bus) into memory at 0x82000000, which is the first cached area in the DDR3 region.  The data comes into DDR3 correctly.  However, it also shows up in program memory, as well.  Here's where it gets really strange (well, strange to me, anyhow).

Two consecutive cache lines of data (128 bytes) in DDR3 space correspond to that same amount of memory in the program area.  The next 128 bytes in DDR3 space corresponds to 128 bytes in program memory, but not after the previous 128-byte chunk.  It appears 512 bytes before the previous chunk.

Specifically, the first four 128-byte chunks map like this:

0x82000000 --> 0x008fff80
0x82000080 --> 0x008ffd80
0x82000100 --> 0x008ffb80
0x82000180 --> 0x008ff980

Some extra experiments show that this continues for the entire size of the NAND flash block (128 KB).

If I modify the cached DDR3 by hand, using CCS's memory browser, the program memory changes, then, as well.  When I modify the first location in DDR3, the entire 128-byte chunk in program memory changes to match DDR3.  It's as if two cache lines were fetched into DDR3 memory.

In CCS's memory browser, I have cache-coloring enabled.  The DDR3 memory is shaded with a light-green color, which indicates that the region is in L2 cache.  The program memory has no shading, which says that it is not in any cache.

Here is the code I used to initialize and enable L2 cache.  Can anyone see what I've done wrong?

 typedef unsigned long WORD32;

#define DDR3_BASE 0x80000000
#define DDR3_SIZE 0x20000000
#define MAR_REGION_SIZE 0x01000000
#define MAR_INDEX(addr) ((addr) / MAR_REGION_SIZE)

#define SRIO_HEAP_SIZE 0x02000000
#pragma DATA_SECTION(srio_heap,"noncached_ddr3");
BYTE srio_heap[SRIO_HEAP_SIZE];

static void ddr3_cache_init (void)
{
    uint mar;
    Uint8 pcx, pfx;
    WORD32 offset, addr, nc_rgn_start, nc_rgn_end;

    // Get the start and end of the non-cached region buffer.
    nc_rgn_start = (WORD32)srio_heap;
    nc_rgn_end = (WORD32)srio_heap + SRIO_HEAP_SIZE;

    // Set DDR3 caching.
    for (offset = 0; offset < DDR3_SIZE; offset += MAR_REGION_SIZE)
    {
        // Get the address of this DDR3 region. Calculate which MAR
        // controls this region.
        addr = DDR3_BASE + offset;
        mar = MAR_INDEX (addr);

        // If this region is in the non-cached part of DDR3, disable caching.
        // Otherwise, turn on caching for this region.
        if ((nc_rgn_end < addr) || (nc_rgn_start > ((addr + MAR_REGION_SIZE) - 1)))
            CACHE_enableCaching (mar);
        else
            CACHE_disableCaching (mar);

        // Ensure memory region is not prefetchable.
        // NOTE: It behaves the same if these two lines are deleted.
        CACHE_getMemRegionInfo (mar, &pcx, &pfx);
        CACHE_setMemRegionInfo (mar, pcx, 0);
    }

    // Enable L2 cache.
    CACHE_setL2Size (CACHE_1024KCACHE);

    // Make caches coherent.
    CSL_XMC_invalidatePrefetchBuffer(); // NOTE: It behaves the same if this line is deleted
    CACHE_invAllL1p (CACHE_WAIT); // NOTE: I've tried multiple combinations of invalidation
    CACHE_wbInvAllL1d (CACHE_WAIT); // and writeback. They didn't make any difference.
    CACHE_wbInvAllL2 (CACHE_WAIT);
}

  • Jeff,

    Can you dump the Associated MAR registers before and after you believe they're disabled?

    These would be MAR 128 and 129 (0184 8200h and 0184 8204h)

    Also, please open the Dissassembly window in your function ddr3_cache_init and let me know where in memory it resides (not where it's cached.)  You can also get this from the .map file.

    One last thing, you use the term Program Memory a lot, but that could mean multiple things, can you tell me what you mean (i.e. L1P memory space, the space where you current execution code resides, a section of memory space you've named 'program memory', etc.)

    Best Regards,

    Chad

  • Chad,

    Thanks for the response.  Here is the information you requested.

    These are all of the MARs, both before and after the call to ddr3_cache_init.  The gaps indicate sections where the contents of a line were equal to the previous line.  I've highlighted the specific registers you asked for.

    Cache registers before ddr3_cache_init:
      C6657 L2 Config register: 03000000
      C6657 Memory attribute registers:
        (00000000)   0: 00000001 00000000 00000000 00000000
        (04000000)   4: 00000000 00000000 00000000 00000000
        (08000000)   8: 00000000 00000000 00000000 00000000
        (0c000000)  12: 0000000d 0000000d 0000000d 0000000d
        (10000000)  16: 0000000c 0000000c 0000000c 0000000c
                          ...      ...      ...      ...
        (7c000000) 124: 0000000c 0000000c 0000000c 0000000c
        (80000000) 128: 0000000d 0000000d 0000000d 0000000d
        (84000000) 132: 0000000d 0000000d 0000000d 0000000d
                          ...      ...      ...      ...
        (9c000000) 156: 0000000d 0000000d 0000000d 0000000d
        (a0000000) 160: 0000000c 0000000c 0000000c 0000000c
                          ...      ...      ...      ...
        (fc000000) 252: 0000000c 0000000c 0000000c 0000000c
    
    Cache registers after ddr3_cache_init:
      C6657 L2 Config register: 03000006
      C6657 Memory attribute registers:
        (00000000)   0: 00000001 00000000 00000000 00000000
        (04000000)   4: 00000000 00000000 00000000 00000000
        (08000000)   8: 00000000 00000000 00000000 00000000
        (0c000000)  12: 0000000d 0000000d 0000000d 0000000d
        (10000000)  16: 0000000c 0000000c 0000000c 0000000c
                          ...      ...      ...      ...
        (7c000000) 124: 0000000c 0000000c 0000000c 0000000c
        (80000000) 128: 00000004 00000004 00000005 00000005
        (84000000) 132: 00000005 00000005 00000005 00000005
                          ...      ...      ...      ...
        (9c000000) 156: 00000005 00000005 00000005 00000005
        (a0000000) 160: 0000000c 0000000c 0000000c 0000000c
                          ...      ...      ...      ...
        (fc000000) 252: 0000000c 0000000c 0000000c 0000000c
    

    My routine (ddr3_cache_init) resides in L1P memory at 0x8580b0.  The CCS Memory Browser has shaded it to indicate that it's in the L1P cache.

    By "Program Memory", I mean the location where my program resides, which also is L1P memory space.  According to the map file, the .text section is located at 0x850000 for 0x35e60 bytes.

  • This is from the TMS320C66x DSP Cache User's Guide.

    Cacheability - The cacheability settings of external memory addresses (through MAR bits) only affect L1D and L2 caches on
    C66x devices; that is, program fetches to external memory addresses are always cached in L1P regardless of the
    cacheability setting. This is not the case on C64x devices, where the settings affects all caches, L1P, L1D, and L2.

    Also, the 0x008xxxxxx space is the L2 space not the L1P memory space.  Though it's being cached into the L1P space if it's Program Data.

    The MAR values look fine for Caching and Pre-fetching enabled, then for Caching and Pre-fetching disabled.  That said, L1P will still cache the program code.

    Is it only L1P that you're observing caching data out of DDR after the MARs are disabled?

    Best Regards,
    Chad 

  • That is a little confusing to me.  What/where is L1P memory space?  I assumed that it was 0x008xxxxx, but apparently not.

    Regardless, we're not fetching code from DDR3, so I shouldn't think it would affect L1P at all.  These are data-only accesses.

    Furthermore, when the DDR3 writes cause memory in the 0x00800000-0x008fffff region to change, CCS indicates that the memory is not in L1P at all (i.e., the memory browser pane does not shade that region).  In other words, I write to DDR3, I see data change in the lower range, and CCS says that data is not in L1P (or L1D, for that matter - it is not shaded at all).

    Thanks,
    jw 

  • The memory space is defined in the Data Manual, L1P is 0x00E00000 - 0x00E07FFF.

    Writes to the DDR3 in the disabled MAR region will not cause the 0x00800000 - 0x008FFFFF to change if the cache is disabled.  

    Chapter 5 of the Data Manual goes over some details regarding how the cache is broken down and into what spaces.

    That said, it sounds like by what you've stated above, that the writes to DDR3 location 0x82000000 - 0x82000xxx are resulting in changes to 0x008FFxxx.  The L2 cache is at the End of the L2 space, so if you had any L2 Set to Cache mode, then it would include this space.  Note that MAR 130 went from 0x0000000D to 0x00000005, which only disabled Pre-fetch, it didn't disable caching.

    Can you dump the L2CFG register at 0x01840000?

    Best Regards,

    Chad

  • Chad,

    The L2CFG register has the value 0x03000000 before ddr3_cache_init and the value 0x03000006 after.

    I believe that I may have had an epiphany that might help solve this problem.  Based upon what you've said so far, and upon deeper reading of the documents, this is how I think it works.  Please correct me if I'm wrong:

    The L1P memory at 0x00e00000 can be used as SRAM or as cache.  If cache is selected, then the cache controller uses it to store cache lines.  Any data that was in a cache-line location prior to enabling cache may be (and probably will be) overwritten when the cache pulls data from an external source.

    I've  loaded my code into the on-chip SRAM region at 0x00800000.  When it executes out of there, the cache controller reads cache lines and stores them somewhere in the 0x00e00000 (L1P) region, which is configured solely as cache, not as general-use SRAM.

    The L2 region at 0x00800000 initially is set to be SRAM.  Our code can reside there, since the cache controller is not using it to store cache lines.  When I set the L2 cache size to 1024K, that entire region becomes cache, and the data in SRAM is no longer valid.  Fetches from anywhere in L2 space cause the cache controller to store cache lines in the L2 region, potentially writing over code that is executing out of that region.

    That makes sense, according to the behavior that I see.  I don't know why I didn't see that before; I suppose I assumed that the cache memory was somewhere else where it wasn't visible to the programmer.

    Potential solutions to my problem, then, are:

    1. Move all of the code/data out of the L2 SRAM.  That's not acceptable, since I have to initialize the DDR3 before I can use it.  The init code, at least, has to reside in the on-board RAM.
    2. Set the L2 cache size to a smaller value, and locate our code/data in the non-cache part of L2 SRAM.  This may be a workable solution.
    3. Put the DDR3 init code in L2 SRAM, and the remaining code in DDR3 memory.  I suppose that the secondary boot loader can handle that.  If so, then that's probably what I'll go with.

    Please confirm that what I have stated above is correct or not.  If it is, I'll move on with one of the solutions.\

    Thanks for your assistance!

    jw

  • Jeff,

    All of what you said was completely correct.  

    I would suggest option 2, with placing the majority of you most used code into L2 Space.  I'd also suggest placing the rest of what code is that will fit into MSMC SRAM (Shared L2 space) if you're not already using it for Data or something else, access to it will be faster than to the DDR3.

    Best Regards,

    Chad