This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

cache coherence when L1D is cacheable on C6670

Expert 2985 points
Other Parts Discussed in Thread: TMS320C6670

Hi all,

I met a problem about cache coherence when L1D is cacheable on C6670.

My code is base on sys/bios 6.33, C6670 and CCSv5.3.

***************************************************************************************************

I just want to use a 256 bytes buffer on MSMC and use core0 writes this buffer and then the core0 inform core1 to print this buffer on CCS's console.

**************************************************************************************************

To do this, my code flow is below

1). Define a global buffer MSMC_test

    #defineMSMC_test_len 256

    #pragma DATA_ALIGN   (MSMC_test, 64)  // to align with the L1D cache line

    #pragma DATA_SECTION (MSMC_test, ".shareMemotest");

     unsigned char MSMC_test[MSMC_test_len];

     in the CMD file

     .shareMemotest  load >> MSMCSRAM

2). Before the tasks runs, I set L1P, L1D cacheable but L2 not.

    CACHE_setL1PSize(CACHE_L1_32KCACHE);

    CACHE_setL1DSize(CACHE_L1_32KCACHE);

    CACHE_setL2Size(CACHE_0KCACHE);

3). Core0 writes the MSMC_test[0~256] then write-back-invalidates the L1D on Core0

    for(i=0;i<MSMC_test_len;i++)

         MSMC_test[i] = test_cnt;

    CACHE_wbInvL1d ((void *) MSMC_test, MSMC_test_len, CACHE_WAIT);

4). Then the core0 send a notification to Core1.

5). Then Core1 invalidates the L1D on Core1 and print the MSMC_test[0~256] on the console

    CACHE_invL1d((void *) MSMC_test, MSMC_test_len, CACHE_WAIT);

    for(i=0;i<MSMC_test_len;i++)

    {

        System_printf("%d ",MSMC_test[i]);

        if(i%20 == 0)

            System_printf("\n");

    }

**************************************************************************************************

My problem 

1. In the 5) step, when the Core1 prints the MSMC_test[0~127], all things go correct. I can get the correct results on console and the Memory Broswer is below

2. But when the Core1 prints the MSMC_test[128], the Memory Broswer shows the MSMC_test[128~191] changes to the last data  like below (to begin with, Core0 writes 0x01 to MSMC_test[0~256] and the second time Core0 writes 0x02 to MSMC_test[0~256])

So I can not get the correct on console.

***************************************************************************************************

Could anyone help me!

Regards,

Feng

  • Feng,

    It looks like your'e buffer is aligned to an 'odd' 64B boundary, which would result in the first 64B is spanning the 2nd half of a 128B cache line, and the last 64B spanning the 1st half of a 3rd 128B cache line.

    You're 256 invalidation writeback is hitting the first two cache lines, which is only the first 196B of data.

    I would suggest aligning your buffer on a 128B boundary, but you could also expand how much you invlaidate writeback to a third line of cache.

    EDIT: Actually that's not on an 'odd' 64B boundary.  Something else must be going on.

    Best Regards,

    Chad

  • Hi Chad,

    ************************************************************************************************

    I had changed the  '#pragma DATA_ALIGN   (MSMC_test, 64)'  to ' #pragma DATA_ALIGN   (MSMC_test, 128)' as you suggested, but the problem remained.

    ************************************************************************************************

    From what I understood, the L2 cache line was 128 bytes and not 128 bits.

    To make my post clear, I summarize my code flow as below

    1)Core0 writes the 256 bytes buffer on MSMC

    2)Core0 writeback-invalidate the L1D on Core0

    3)Core0 send the interrupt to Core1

    4)Core1 receive the interrupt and then invalidate the L1D on Core1

    5)Core1 print the buffer[0~255]

     

     

    My problems is that

    When the Core1 starts to print the 256 bytes buffer, the Memory Broswer showed the data correct whether the L1D/L2 cache boxes were selected or not.

    But when the Core1 print the 128th byte of the buffer, the Memory Broswer showed the buffer[128~192] were incorrect with the L1D/L2 cache boxes were selected. At the same time, if I deselected the L1D/L2 cache boxes, the buffer[128~191] were correct.

    And when the Core1 print the 192nd byte of the buffer,  the Memory Broswer showed the buffer[192~255] were incorrect with the L1D/L2 cache boxes were selected. At the same time, if I deselected the L1D/L2 cache boxes, the buffer[192~255] were correct.

    ************************************************************************************************

    I searched some old threads and found the solution.

    When I invalidate the XMC's prefetch buffer before the Core1 reads the MSMC, all things go correctly.

    This is also mentioned in CorePac datasheet. And the CorePac datasheet tells me that if invalidating the prefetch buffer, the performance of w/r MSMC/DDR3 will be degraded.

    But I do not expect to degrade the performance of accessing external memeory.

    So do you have any other suggestions?

    ************************************************************************************************

    Another question:

    In my situation, the Core0 and Core1 use the same code to run. In the code, the DNUM is used to distinguish the Core0 or Core1. And I define the MSMC_test like below

      #define     MSMC_test_len 256

      #pragma   DATA_ALIGN   (MSMC_test, 256)  // to align with the L1D cache line

      #pragma   DATA_SECTION (MSMC_test, ".shareMemotest");

      unsigned char MSMC_test[MSMC_test_len];

    So both Core0 and Core1 runs these codes above. So I think the Core0 has a MSMC_test[0~255] and the Core1 also has one.

    Are the MSMC_test[0~255]s(one for Core0 and the other for Core1) same with physical location in the SOC?

    ************************************************************************************************

    Regards,

    Feng

  • Feng,

    You're correct, I intended Byte and I put Bits incorrectly.  I corrected that statement.  Also, I looked again and it's not on an 'odd' 64B boundary as I suspected.

    Question, is Core 0 still running and writting while Core 1 is active?

    Can you Uncheck the L1D Cache and see if the 0x01010101's are in L2? 

    Another question:

    In my situation, the Core0 and Core1 use the same code to run. In the code, the DNUM is used to distinguish the Core0 or Core1. And I define the MSMC_test like below

      #define     MSMC_test_len 256

      #pragma   DATA_ALIGN   (MSMC_test, 256)  // to align with the L1D cache line

      #pragma   DATA_SECTION (MSMC_test, ".shareMemotest");

      unsigned char MSMC_test[MSMC_test_len];

    So both Core0 and Core1 runs these codes above. So I think the Core0 has a MSMC_test[0~255] and the Core1 also has one.

    Are the MSMC_test[0~255]s(one for Core0 and the other for Core1) same with physical location in the SOC?

    Can you paste the actual code here for this routine here?

    Best Regards,

    Chad

  • Hi Chad,

    Sorry for the late reply!

    Question, is Core 0 still running and writting while Core 1 is active?

    No, when the Core1 reads the MSMC_test[256], the Core0 does not write the MSMC_test[256] but run in the while() func.

    My actual code shows as follow

    CACHE_setL1PSize(CACHE_L1_32KCACHE);
    CACHE_setL1DSize(CACHE_L1_32KCACHE);
    CACHE_setL2Size(CACHE_1024KCACHE);
    CACHE_invAllL1p(CACHE_WAIT);
    CACHE_wbInvAllL1d(CACHE_WAIT);
    CACHE_wbInvAllL2(CACHE_WAIT);	
    

    Then

    else if(packet_choose == 4 && CoreNum ==0)
    {
        //Core0 starts
        //JF: Core0 writes the MSMC_test[256]
        for(i=0;i<MSMC_test_len;i++)
            MSMC_test[i] = test_cnt;//for test purpose
    
        //JF: write-back-invalidate core0's L1D and L2
        CACHE_wbInvL1d ((void *) MSMC_test, MSMC_test_len, CACHE_WAIT);
        CACHE_wbInvL2 ((void *) MSMC_test, MSMC_test_len, CACHE_WAIT);
    
        //JF: Then the Core0 sends a doorbell to the SRIO switch and back to the DSP itself. 
    //JF: Then the doorbell triggers the Core1 doorbell interrupt service routine if (SRIO_Func (hDrvManagedSrioDrv, Srio_Ftype_DOORBELL, 0) < 0) { System_printf ("Error: Doorbell operation failed\n"); Task_exit(); }
    test_cnt++;//for test purpose //JF: Core0 stops packet_choose = 5; } else if(CoreNum ==1) { //Core1 starts //JF: In the Doorbell interrupt service routine, the core1_print_flag is set if(core1_print_flag == 1) { //JF: Invalidate core1's L1D and L2 CACHE_invL1d((void *) MSMC_test, 2048, CACHE_WAIT); CACHE_invL2((void *) MSMC_test, 2048, CACHE_WAIT); //JF: Invalidate core1's Prefetch Buffer XPFCMD = 1;//if no this, the problem what I describe in the above posts will come //JF: Core1 prints for(i=0;i<MSMC_test_len;i++) { System_printf("%d ",MSMC_test[i]); if(i%20 == 0) System_printf("\n"); } System_printf("\n"); //JF: Core1 stops core1_print_flag = 0; } }

     The Core0 writes the MSMC_test then sends a doorbell to the DSP itself (to the IDT SRIO switch and then back) and triggers the Core1 doorbell interrupt service routine. It sounds odd, but I just do like this for test prupose.

    The key is 

    //JF: Invalidate core1's Prefetch Buffer, see the CorePacdatasheet
    XPFCMD = 1;//if no this, the problem what I describe in the above posts will come

    If there is this code, the problem what I met in the above posts will be gone.
    XPFCMD is defined in the above of the whole .c file.

    Now I want to make all the things clear!

    Regards,
    Feng

     

  • Feng,

    It looks like you're running into the issue defined in Advisory 33 of the TMS320C6670 Errata.  The workaround is to disable the prefetching for the MAR range 0x0C00 0000 - 0x0F00 0000 (i.e. what would effectively be the MSMC range.)  

    You can do this by setting the PFX bits of MARs 12-15 to 0 (this is 1 by default.)  This will disable the prefetching for those addresses. 

    Best Regards,

    Chad

  • Thanks Chad, I checked that. It helps me make all things clear!

    Regards,

    Feng