This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

CACHE coherency operation do not appear to operate properly on C6670 in TMDXEVM6670LE

Other Parts Discussed in Thread: TMS320C6670

Hi

In the process of writing a Network driver for LwIP I have discovered that sometimes the Cache coherency operation do not operate correctly, specifically the CACHE_invL2.  Attached is a simple program demonstrating the issue using EDMA transfer (there also appear to be an issue where when the TCC is 16 or larger, the IPR for the EDMA3 is never set, ... but 15 or lower TCC value does set the IPR upon transfer completion).

The following #define MACROS in the program listing controls some of the operations to use:

1)  The DEFAULT_CACHE_WAIT macro controls the CACHE_WAIT parameter passed to CACHE_xxx operations in the CSL when ERRATA_CACHE_FUNCTIONS is defined.  This overrides the user supplied parameter to the CACHE_xxx operations.

#define DEFAULT_CACHE_WAIT CACHE_WAIT

2)  For some reason, EDMA3 TCC code above 15 DOES NOT SHOW UP IN IPR (Interrupt Pending Register)

#define EDMA_TCC_DESIRED  15

3) DEFINE THE FOLLOWING MACRO IF MANUALLY ADJUST CACHE_invL2 to WHOLE CACHE LINE

#define DO_CACHE_INVALID_FULL_LINE

4) DEFINE USE_ERRATA_CACHE_FUNCTIONS to wrap the CACHE coherence operations between ( _disable_interrupt)  and  (mfence, 16x NOP, _enable_interrupt) operations as SUGGESTED in SPRZ332.pdf - TMS320C6670 Errata Silicon Revision 1.0)

#define USE_ERRATA_CACHE_FUNCTIONS

=====================

The program initially sets up CACHE and enables all of 512MB ddr3 onboard to be cacheable.  Then it memset to 0 the memory at 0x80000200 with a length of 512 * 32 bytes, followed by a wbInvL2 operation.  Then in the main program loop, it prepares an EDMA3 transfer, change the data at the source address (32 bytes at 0x90000000 - upper half of DDR3), followed by a wbL2 operation on the source memory then issues the SET command on the EDMA3 operation... after it waits for the EDMA3 to complete (by monitoring the IPR register... see NOTE above regarding TCC code), then it issues an InvL2 operation (adjustable by the MACROS as described above... where either the expected range or the WHOLE CACHE LINE in L2 can be invalidated), and prepares to read the destination address.  If the value read at the destination address is not the expected value, it will output two lines of text (from a printf and from the ASSERT macro).  Then it continues on to the next iteration of the loop.

-------------

The expected program output would be "" empty, since nothing should fail..

============

However, a typical run on our EVM (we tested on 3 evm boards) is shown below... Notice that the index Jumped from 23 to 8 and again from 27 to 8... and the location of index -- according to the LINKER.cmd [see later] is in Core1L2 [When we RUN the program on CORE0, all OTHER COREs are SUSPENDED].  But more importantly, the program got incorrect values from the Destination DDR3 address...

When we step through the loop, we see that immediately after the EDMA transfer SET command, the DDR3 (L1D Cache and L2 Cache unchecked in Memory viewer) shows the correct value.  Then after checking the L1D and L2 cache check boxes, and stepping over the CACHE_invL2 line, we see that this destination address range is NOT in CACHE, and the correct value is still in DDR3.  But as soon as the destination address is accessed (causing CACHE miss, presumably).. with the statement "if (words->WORD8[6] != index + 0x60) {"... the Memory viewer shows that the destination address range is in BOTH L1D and L2, but the value is all 0 (the value immediately after the initial memset on 0x80000200, 512 * 32).  When the L1D and L2 check boxes are unchecked, the DDR3 still shows the correct/expected value...

>>>>>>>>>>>>>  WHAT can we do to WORK AROUND the issue if what's in the Errata (SPRZ332.pdf is not sufficient) ?

____ TYPICAL PROGRAM OUTPUT ____

[C66xx_0] At addr 80000200 + 6, index 0, expected 00000060, got 6935D6B5
[C66xx_0] Assertion "Valid value" failed at line 313 in ../main.c
[C66xx_0] At addr 80000220 + 6, index 1, expected 00000061, got 00000000
[C66xx_0] Assertion "Valid value" failed at line 313 in ../main.c
[C66xx_0] At addr 80000300 + 6, index 8, expected 00000068, got 00000000
[C66xx_0] Assertion "Valid value" failed at line 313 in ../main.c
[C66xx_0] At addr 80000320 + 6, index 9, expected 00000069, got 00000000
[C66xx_0] Assertion "Valid value" failed at line 313 in ../main.c
[C66xx_0] At addr 80000340 + 6, index 10, expected 0000006A, got 00000000
[C66xx_0] Assertion "Valid value" failed at line 313 in ../main.c
[C66xx_0] At addr 80000360 + 6, index 11, expected 0000006B, got 00000000
[C66xx_0] Assertion "Valid value" failed at line 313 in ../main.c
[C66xx_0] At addr 80000380 + 6, index 12, expected 0000006C, got 00000000
[C66xx_0] Assertion "Valid value" failed at line 313 in ../main.c
[C66xx_0] At addr 800003A0 + 6, index 13, expected 0000006D, got 00000000
[C66xx_0] Assertion "Valid value" failed at line 313 in ../main.c
[C66xx_0] At addr 800003C0 + 6, index 14, expected 0000006E, got 00000000
[C66xx_0] Assertion "Valid value" failed at line 313 in ../main.c
[C66xx_0] At addr 800003E0 + 6, index 15, expected 0000006F, got 00000000
[C66xx_0] Assertion "Valid value" failed at line 313 in ../main.c
[C66xx_0] At addr 80000400 + 6, index 16, expected 00000070, got 00000000
[C66xx_0] Assertion "Valid value" failed at line 313 in ../main.c
[C66xx_0] At addr 80000420 + 6, index 17, expected 00000071, got 00000000
[C66xx_0] Assertion "Valid value" failed at line 313 in ../main.c
[C66xx_0] At addr 80000440 + 6, index 18, expected 00000072, got 00000000
[C66xx_0] Assertion "Valid value" failed at line 313 in ../main.c
[C66xx_0] At addr 80000460 + 6, index 19, expected 00000073, got 00000000
[C66xx_0] Assertion "Valid value" failed at line 313 in ../main.c
[C66xx_0] At addr 80000480 + 6, index 20, expected 00000074, got 00000000
[C66xx_0] Assertion "Valid value" failed at line 313 in ../main.c
[C66xx_0] At addr 800004A0 + 6, index 21, expected 00000075, got 00000000
[C66xx_0] Assertion "Valid value" failed at line 313 in ../main.c
[C66xx_0] At addr 800004C0 + 6, index 22, expected 00000076, got 00000000
[C66xx_0] Assertion "Valid value" failed at line 313 in ../main.c
[C66xx_0] At addr 800004E0 + 6, index 23, expected 00000077, got 00000000
[C66xx_0] Assertion "Valid value" failed at line 313 in ../main.c
[C66xx_0] At addr 80000500 + 6, index 24, expected 00000078, got 00000000
[C66xx_0] Assertion "Valid value" failed at line 313 in ../main.c
[C66xx_0] At addr 80000520 + 6, index 25, expected 00000079, got 00000000
[C66xx_0] Assertion "Valid value" failed at line 313 in ../main.c
[C66xx_0] At addr 80000540 + 6, index 26, expected 0000007A, got 00000000
[C66xx_0] Assertion "Valid value" failed at line 313 in ../main.c
[C66xx_0] At addr 80000560 + 6, index 27, expected 0000007B, got 00000000
[C66xx_0] Assertion "Valid value" failed at line 313 in ../main.c
[C66xx_0] At addr 80000300 + 6, index 8, expected 00000068, got 00000000
[C66xx_0] Assertion "Valid value" failed at line 305 in ../main.c
[C66xx_0] At addr 80000320 + 6, index 9, expected 00000069, got 00000000
[C66xx_0] Assertion "Valid value" failed at line 305 in ../main.c
[C66xx_0] At addr 80000340 + 6, index 10, expected 0000006A, got 00000000
[C66xx_0] Assertion "Valid value" failed at line 305 in ../main.c
[C66xx_0] At addr 80000360 + 6, index 11, expected 0000006B, got 00000000
[C66xx_0] Assertion "Valid value" failed at line 305 in ../main.c
[C66xx_0] At addr 80000380 + 6, index 12, expected 0000006C, got 00000000
[C66xx_0] Assertion "Valid value" failed at line 305 in ../main.c
[C66xx_0] At addr 800003A0 + 6, index 13, expected 0000006D, got 00000000
[C66xx_0] Assertion "Valid value" failed at line 305 in ../main.c
[C66xx_0] At addr 800003C0 + 6, index 14, expected 0000006E, got 00000000
[C66xx_0] Assertion "Valid value" failed at line 305 in ../main.c
[C66xx_0] At addr 800003E0 + 6, index 15, expected 0000006F, got 00000000
[C66xx_0] Assertion "Valid value" failed at line 305 in ../main.c
[C66xx_0] At addr 80000400 + 6, index 16, expected 00000070, got 00000000
[C66xx_0] Assertion "Valid value" failed at line 305 in ../main.c
[C66xx_0] At addr 80000420 + 6, index 17, expected 00000071, got 00000000
[C66xx_0] Assertion "Valid value" failed at line 305 in ../main.c
[C66xx_0] At addr 80000440 + 6, index 18, expected 00000072, got 00000000
[C66xx_0] Assertion "Valid value" failed at line 305 in ../main.c
[C66xx_0] At addr 80000460 + 6, index 19, expected 00000073, got 00000000
[C66xx_0] Assertion "Valid value" failed at line 305 in ../main.c
[C66xx_0] At addr 80000480 + 6, index 20, expected 00000074, got 00000000
[C66xx_0] Assertion "Valid value" failed at line 305 in ../main.c
[C66xx_0] At addr 800004A0 + 6, index 21, expected 00000075, got 00000000
[C66xx_0] Assertion "Valid value" failed at line 305 in ../main.c
[C66xx_0] At addr 800004C0 + 6, index 22, expected 00000076, got 00000000
[C66xx_0] Assertion "Valid value" failed at line 305 in ../main.c
[C66xx_0] At addr 800004E0 + 6, index 23, expected 00000077, got 00000000
[C66xx_0] Assertion "Valid value" failed at line 305 in ../main.c
[C66xx_0] At addr 80000300 + 6, index 8, expected 00000068, got 00000000
[C66xx_0] Assertion "Valid value" failed at line 305 in ../main.c
[C66xx_0] At addr 80000320 + 6, index 9, expected 00000069, got 00000000
[C66xx_0] Assertion "Valid value" failed at line 305 in ../main.c
[C66xx_0] At addr 80000300 + 6, index 8, expected 00000068, got 00000000
[C66xx_0] Assertion "Valid value" failed at line 305 in ../main.c
[C66xx_0] At addr 80000320 + 6, index 9, expected 00000069, got 00000000
[C66xx_0] Assertion "Valid value" failed at line 305 in ../main.c
[C66xx_0] At addr 80000340 + 6, index 10, expected 0000006A, got 00000000
[C66xx_0] Assertion "Valid value" failed at line 305 in ../main.c
[C66xx_0] At addr 80000360 + 6, index 11, expected 0000006B, got 00000000
[C66xx_0] Assertion "Valid value" failed at line 305 in ../main.c
[C66xx_0] At addr 80000380 + 6, index 12, expected 0000006C, got 00000000
[C66xx_0] Assertion "Valid value" failed at line 305 in ../main.c
[C66xx_0] At addr 800003A0 + 6, index 13, expected 0000006D, got 00000000
[C66xx_0] Assertion "Valid value" failed at line 305 in ../main.c
[C66xx_0] At addr 800003C0 + 6, index 14, expected 0000006E, got 00000000
[C66xx_0] Assertion "Valid value" failed at line 305 in ../main.c
[C66xx_0] At addr 800003E0 + 6, index 15, expected 0000006F, got 00000000
[C66xx_0] Assertion "Valid value" failed at line 305 in ../main.c
[C66xx_0] At addr 80000400 + 6, index 16, expected 00000070, got 00000000
[C66xx_0] Assertion "Valid value" failed at line 305 in ../main.c
[C66xx_0] At addr 80000420 + 6, index 17, expected 00000071, got 00000000
[C66xx_0] Assertion "Valid value" failed at line 305 in ../main.c
[C66xx_0] At addr 80000440 + 6, index 18, expected 00000072, got 00000000
[C66xx_0] Assertion "Valid value" failed at line 305 in ../main.c
[C66xx_0] At addr 80000460 + 6, index 19, expected 00000073, got 00000000
[C66xx_0] Assertion "Valid value" failed at line 305 in ../main.c
[C66xx_0] At addr 80000480 + 6, index 20, expected 00000074, got 00000000
[C66xx_0] Assertion "Valid value" failed at line 305 in ../main.c
[C66xx_0] At addr 800004A0 + 6, index 21, expected 00000075, got 00000000
[C66xx_0] Assertion "Valid value" failed at line 305 in ../main.c
[C66xx_0] At addr 800004C0 + 6, index 22, expected 00000076, got 00000000
[C66xx_0] Assertion "Valid value" failed at line 305 in ../main.c
[C66xx_0] At addr 800004E0 + 6, index 23, expected 00000077, got 00000000
[C66xx_0] Assertion "Valid value" failed at line 305 in ../main.c
[C66xx_0] At addr 80000500 + 6, index 24, expected 00000078, got 00000000
[C66xx_0] Assertion "Valid value" failed at line 305 in ../main.c
[C66xx_0] At addr 80000520 + 6, index 25, expected 00000079, got 00000000
[C66xx_0] Assertion "Valid value" failed at line 305 in ../main.c
[C66xx_0] At addr 80000540 + 6, index 26, expected 0000007A, got 00000000
[C66xx_0] Assertion "Valid value" failed at line 305 in ../main.c
[C66xx_0] At addr 80000560 + 6, index 27, expected 0000007B, got 00000000
[C66xx_0] Assertion "Valid value" failed at line 305 in ../main.c
[C66xx_0] At addr 80000580 + 6, index 28, expected 0000007C, got 00000000

-------------

Linker command file:

------------

-stack          0x00002000      /* Stack Size */
-heap           0x0000A000      /* Heap Size */

MEMORY
{
    BOOTLDR_RSV:    o = 0x10800000  l = 0x00001000
    CORE0_L2:        o = 0x10801000  l = 0x000BF000
    CORE1_L2:        o = 0x11800000  l = 0x000C0000
    DDR2:            o = 0x80000100  l = 0x0FFFFF00  // Exclude 0x90000000 because we are using it for MEMLOG
}

SECTIONS
{
    .boot         load = 0x00800000
    .bss        >   CORE0_L2 fill = 0x00, align = 8
    .cinit      >   CORE0_L2, align = 8
    .cio        >   CORE0_L2, align = 8
    .const      >   CORE0_L2, align = 8
      .far        >   CORE0_L2 fill = 0x00, align = 8
    .neardata   >   CORE0_L2 fill = 0x00, align = 8
    .fardata    >   CORE0_L2 fill = 0x00, align = 8
    .rodata        >   CORE0_L2, align = 8
    .stack      >   CORE1_L2, align = 8
    .switch     >   CORE0_L2, align = 8
    .sysmem     >   CORE1_L2, align = 8
    .text       >   CORE0_L2, align = 8
}

--------

main.c attached as main.c.txt

#include <ti/csl/src/intc/csl_intc.h>
#include <ti/csl/csl_edma3.h>
#include <ti/csl/csl_edma3Aux.h>
#include <ti/csl/csl_cache.h>
#include <ti/csl/csl_cacheAux.h>
#include <ti/csl/cslr_device.h>

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

#define DEFAULT_CACHE_WAIT CACHE_WAIT

#define CACHE_PRE_OP_STEPS \
unsigned intStatus = _disable_interrupts();

#define CACHE_POST_OP_STEPS \
asm(" MFENCE "); \
asm(" NOP "); asm(" NOP "); asm(" NOP "); asm(" NOP "); \
asm(" NOP "); asm(" NOP "); asm(" NOP "); asm(" NOP "); \
asm(" NOP "); asm(" NOP "); asm(" NOP "); asm(" NOP "); \
asm(" NOP "); asm(" NOP "); asm(" NOP "); asm(" NOP "); \
_restore_interrupts(intStatus);

#define ERRATA_CACHE_invAllL1dWait()

#define ERRATA_CACHE_invAllL1d(wait) { \
CACHE_PRE_OP_STEPS \
CACHE_invAllL1d(DEFAULT_CACHE_WAIT); \
CACHE_POST_OP_STEPS \
}

#define ERRATA_CACHE_wbAllL1dWait()

#define ERRATA_CACHE_wbAllL1d(wait) { \
CACHE_PRE_OP_STEPS \
CACHE_wbAllL1d(DEFAULT_CACHE_WAIT); \
CACHE_POST_OP_STEPS \
}

#define ERRATA_CACHE_wbInvAllL1dWait()

#define ERRATA_CACHE_wbInvAllL1d(wait) { \
CACHE_PRE_OP_STEPS \
CACHE_wbInvAllL1d(DEFAULT_CACHE_WAIT); \
CACHE_POST_OP_STEPS \
}

#define ERRATA_CACHE_invL1dWait()

#define ERRATA_CACHE_invL1d(block,size,wait) { \
CACHE_PRE_OP_STEPS \
CACHE_invL1d(block,size,DEFAULT_CACHE_WAIT); \
CACHE_POST_OP_STEPS \
}

#define ERRATA_CACHE_wbL1dWait()

#define ERRATA_CACHE_wbL1d(block,size,wait) { \
CACHE_PRE_OP_STEPS \
CACHE_wbL1d(block,size,DEFAULT_CACHE_WAIT); \
CACHE_POST_OP_STEPS \
}

#define ERRATA_CACHE_wbInvL1dWait()

#define ERRATA_CACHE_wbInvL1d(block,size,wait) { \
CACHE_PRE_OP_STEPS \
CACHE_wbInvL1d(block,size,DEFAULT_CACHE_WAIT); \
CACHE_POST_OP_STEPS \
}

#define ERRATA_CACHE_wbL2Wait()

#define ERRATA_CACHE_wbL2(block,size,wait) { \
CACHE_PRE_OP_STEPS \
CACHE_wbL2(block,size,DEFAULT_CACHE_WAIT); \
CACHE_POST_OP_STEPS \
}

#define ERRATA_CACHE_invL2Wait()

#define ERRATA_CACHE_invL2(block,size,wait) { \
CACHE_PRE_OP_STEPS \
CACHE_invL2(block,size,DEFAULT_CACHE_WAIT); \
CACHE_POST_OP_STEPS \
}

#define ERRATA_CACHE_wbInvL2Wait()

#define ERRATA_CACHE_wbInvL2(block,size,wait) { \
CACHE_PRE_OP_STEPS \
CACHE_wbInvL2(block,size,DEFAULT_CACHE_WAIT); \
CACHE_POST_OP_STEPS \
}

#define ERRATA_CACHE_wbAllL2Wait()

#define ERRATA_CACHE_wbAllL2(wait) { \
CACHE_PRE_OP_STEPS \
CACHE_wbAllL2(DEFAULT_CACHE_WAIT); \
CACHE_POST_OP_STEPS \
}

#define ERRATA_CACHE_invAllL2Wait()

#define ERRATA_CACHE_invAllL2(wait) { \
CACHE_PRE_OP_STEPS \
CACHE_invAllL2(DEFAULT_CACHE_WAIT); \
CACHE_POST_OP_STEPS \
}

#define ERRATA_CACHE_wbInvAllL2Wait()

#define ERRATA_CACHE_wbInvAllL2(wait) { \
CACHE_PRE_OP_STEPS \
CACHE_wbInvAllL2(DEFAULT_CACHE_WAIT); \
CACHE_POST_OP_STEPS \
}

#define PLATFORM_ASSERT(x) do {printf("Assertion \"%s\" failed at line %d in %s\n", x, __LINE__, __FILE__); *(int*)0x00000000 = 0;} while(0)

#define ASSERT(message, assertion) do { if(!(assertion)) PLATFORM_ASSERT(message); } while(0)

typedef struct _wordstruct {
	unsigned WORD8[8];
} wordstruct;

static void ti_cache_init() {
	unsigned char mar;
	CACHE_setL1PSize(CACHE_L1_32KCACHE);		//	set L1 to max					// SPRU871J 3.4.3.1
	CACHE_setL1DSize(CACHE_L1_32KCACHE);		//	set L1 to max					// SPRU871J 3.4.3.1

	// MAR0 - MAR15 are READ-ONLY
	// MAR16 - MAR127 are for addresses from 0x10000000 - 0x7FFFFFFF
	for (mar = 16; mar < 128; mar++)
		CACHE_disableCaching(mar);
	// 512 MB = 0x20000000, correspond to MAR128 to MAR159: 0x80000000 - 0x9FFFFFFF
	for (; mar < 160; mar++)
		CACHE_enableCaching(mar);
	// Disable ALL OTHER addresses
	for (; mar < 255; mar++)
		CACHE_disableCaching(mar);
	CACHE_disableCaching(mar);
	CACHE_setL2Size(CACHE_256KCACHE);			// L2 cache size to 256k (maximum)	// SPRU871J 4.4.5
}

// For some reason, TCC code above 15 DOES NOT SHOW UP IN IPR (Interrupt Pending Register)
#define EDMA_TCC_DESIRED  15

// UNCOMMENT THE FOLLOWING LINE IF MANUALLY ADJUST CACHE_invL2 to CACHE LINE ONLY
#define DO_CACHE_INVALID_FULL_LINE

#define USE_ERRATA_CACHE_FUNCTIONS

void main(void) {
	CSL_Edma3Handle dmaHandle;
	CSL_Edma3Obj edmaObj;
	CSL_Edma3HwSetup setup;
    CSL_Edma3HwDmaChannelSetup dmahwSetup[64];  // 64 DMA + 8 QDMA for tpcc1 and tpcc2
    CSL_Edma3HwQdmaChannelSetup qdmahwSetup[8];
    CSL_Edma3ChannelAttr chAttr;
    CSL_Edma3ChannelObj chObj;
    CSL_Edma3ChannelHandle hChannel;
    CSL_Edma3ParamHandle hParam;
    CSL_Edma3ParamSetup paramSetup;
    CSL_Edma3ChannelErr chErr;
    CSL_InstNum edmaInstance = 0;
    wordstruct *words;

	CSL_Status cslStatus;
	unsigned index, u32val, ddr3Addr;

	ti_cache_init();

	cslStatus = CSL_edma3Init(NULL);
	ASSERT("CSL_edma3Init returns CSL_SOK", CSL_SOK == cslStatus);

	dmaHandle = CSL_edma3Open(&edmaObj, edmaInstance, NULL, &cslStatus);
	ASSERT("CSL_edma3Open returns CSL_SOK", CSL_SOK == cslStatus && dmaHandle != NULL);

	setup.dmaChaSetup = dmahwSetup;
	setup.qdmaChaSetup = qdmahwSetup;

	// 512 param for 64 DMA + 8 QDMA => 7 param per
	u32val = 0;
	for (index = 0; index < 64; index++) {
		dmahwSetup[index].que = index & 7;
		dmahwSetup[index].paramNum = u32val;
		u32val += 7;
	}
	for (index = 0; index < 8; index++) {
		qdmahwSetup[index].que = index & 7;
		qdmahwSetup[index].paramNum = u32val;
		qdmahwSetup[index].triggerWord = 7;
		u32val += 7;
	}
	cslStatus = CSL_edma3HwSetup(dmaHandle, &setup);
	ASSERT("CSL_edma3HwSetup returns CSL_SOK", CSL_SOK == cslStatus);

	chAttr.regionNum = CSL_EDMA3_REGION_GLOBAL;
	chAttr.chaNum = 0;
	hChannel = CSL_edma3ChannelOpen(&chObj, edmaInstance, &chAttr, &cslStatus);
	ASSERT("CSL_edma3ChannelOpen returns CSL_SOK", CSL_SOK == cslStatus && hChannel != NULL);

	hParam = CSL_edma3GetParamHandle(hChannel, 0, &cslStatus);
	ASSERT("CSL_edma3GetParamHandle returns CSL_SOK", CSL_SOK == cslStatus && hParam != NULL);

	cslStatus = CSL_edma3HwChannelControl(hChannel, CSL_EDMA3_CMD_CHANNEL_CLEAR , NULL);
	ASSERT("CSL_edma3HwChannelControl(CLEAR) returns CSL_SOK", CSL_SOK == cslStatus);

	cslStatus = CSL_edma3HwChannelControl(hChannel, CSL_EDMA3_CMD_CHANNEL_CLEARERR, &chErr);
	ASSERT("CSL_edma3HwChannelControl(CLEARERR) returns CSL_SOK", CSL_SOK == cslStatus);

	cslStatus = CSL_edma3HwChannelControl(hChannel, CSL_EDMA3_CMD_CHANNEL_ENABLE, NULL);
	ASSERT("CSL_edma3HwChannelControl(ENABLE) returns CSL_SOK", CSL_SOK == cslStatus);

	CSL_edma3ClearLoPendingInterrupts(dmaHandle, CSL_EDMA3_REGION_GLOBAL, 0xFFFFFFFF);
	CSL_edma3ClearHiPendingInterrupts(dmaHandle, CSL_EDMA3_REGION_GLOBAL, 0xFFFFFFFF);

	CSL_edma3InterruptLoDisable(dmaHandle, CSL_EDMA3_REGION_GLOBAL, 0xFFFFFFFF);
	CSL_edma3InterruptHiDisable(dmaHandle, CSL_EDMA3_REGION_GLOBAL, 0xFFFFFFFF);

	memset((void*)0x80000200, 0, 512 << 5);
#ifdef USE_ERRATA_CACHE_FUNCTIONS
	ERRATA_CACHE_wbInvL2((void*)0x80000200, 512 << 5, CACHE_WAIT);
#else
	CACHE_wbInvL2((void*)0x80000200, 512 << 5, CACHE_WAIT);
#endif

	for (index = 0; index < 512; index++) {
		ddr3Addr = 0x80000200 + (index << 5);
		paramSetup.option = CSL_EDMA3_OPT_MAKE(CSL_EDMA3_ITCCH_DIS,
				CSL_EDMA3_TCCH_DIS,
				CSL_EDMA3_ITCINT_DIS,
				CSL_EDMA3_TCINT_EN,
				EDMA_TCC_DESIRED,
				CSL_EDMA3_TCC_NORMAL,
				CSL_EDMA3_FIFOWIDTH_NONE,
				CSL_EDMA3_STATIC_DIS,
				CSL_EDMA3_SYNC_A,
				CSL_EDMA3_ADDRMODE_INCR,
				CSL_EDMA3_ADDRMODE_INCR);
		paramSetup.srcAddr = 0x90000000;
		paramSetup.aCntbCnt = CSL_EDMA3_CNT_MAKE(32,1);
		paramSetup.dstAddr = ddr3Addr;
		paramSetup.srcDstBidx = CSL_EDMA3_BIDX_MAKE(1,1);
		paramSetup.linkBcntrld = CSL_EDMA3_LINKBCNTRLD_MAKE(CSL_EDMA3_LINK_NULL,0);
		paramSetup.srcDstCidx = CSL_EDMA3_CIDX_MAKE(0,1);
		paramSetup.cCnt = 1;
		cslStatus = CSL_edma3ParamSetup(hParam, &paramSetup);
		ASSERT("CSL_edma3ParamSetup returns CSL_SOK", CSL_SOK == cslStatus);

		words = (wordstruct*)0x90000000;
		words->WORD8[0] = (index << 16) | index;
		words->WORD8[1] = index + 0x10;
		words->WORD8[2] = index + 0x20;
		words->WORD8[3] = index + 0x30;
		words->WORD8[4] = index + 0x40;
		words->WORD8[5] = index + 0x50;
		words->WORD8[6] = index + 0x60;
		words->WORD8[7] = index + 0x70;
#ifdef USE_ERRATA_CACHE_FUNCTIONS
		ERRATA_CACHE_wbL2((void*)0x90000000, 0x20, CACHE_WAIT);
#else
		CACHE_wbL2((void*)0x90000000, 0x20, CACHE_WAIT);
#endif

		cslStatus = CSL_edma3HwChannelControl(hChannel, CSL_EDMA3_CMD_CHANNEL_SET, NULL);
		ASSERT("CSL_edma3HwChannelControl(SET) returns CSL_SOK", CSL_SOK == cslStatus);

		do {
			asm(" NOP ");
			asm(" NOP ");
			asm(" NOP ");
			asm(" NOP ");
			asm(" NOP ");
			asm(" NOP ");
			asm(" NOP ");
			asm(" NOP ");
			CSL_edma3GetLoPendingInterrupts(dmaHandle, CSL_EDMA3_REGION_GLOBAL, &u32val);
			if ((u32val & (1 << EDMA_TCC_DESIRED)) != 0) {
				CSL_edma3ClearLoPendingInterrupts(dmaHandle, CSL_EDMA3_REGION_GLOBAL, 0xFFFFFFFF);
				break;
			}
		} while (1); // Transfer not yet completed

		words = (wordstruct*)ddr3Addr;
#ifdef DO_CACHE_INVALID_FULL_LINE
	#ifdef USE_ERRATA_CACHE_FUNCTIONS
		ERRATA_CACHE_invL2((void*)(ddr3Addr & 0xFFFFFF80), 0x80, CACHE_WAIT); // Invalidate 128 bytes
	#else
		CACHE_invL2((void*)(ddr3Addr & 0xFFFFFF80), 0x80, CACHE_WAIT); // Invalidate 128 bytes
	#endif
#else
	#ifdef USE_ERRATA_CACHE_FUNCTIONS
		ERRATA_CACHE_invL2((void*)ddr3Addr , 32, CACHE_WAIT); // Invalidate 32 bytes
	#else
		CACHE_invL2((void*)ddr3Addr , 32, CACHE_WAIT); // Invalidate 32 bytes
	#endif
#endif

		if (words->WORD8[6] != index + 0x60) {
			printf("At addr %08X + 6, index %u, expected %08X, got %08X\n", ddr3Addr, index, index + 0x60, words->WORD8[6]);
			ASSERT("Valid value", words->WORD8[6] == index + 0x60);
		}
	}

	cslStatus = CSL_edma3Close(dmaHandle);
	ASSERT("CSL_edma3Close returns CSL_SOK", CSL_SOK == cslStatus);
}


  • Hi,

    I try to move the invL2 operation before triggering EDMA, the result is also the same but the first error report happens on index=8.

    Before reading words->WORD8[6], the value in DDR3 is correct and L1D and L2 cache are both invalidated. But after reading request, the value in DDR is not changed but cache

    line value become all zeros. According to the cache mechanism, the value in DDR should be cached into L2 and L1D cache in this situation.

    I also want to know the reason. Thanks.

    Allen

  • Thanks Allen

    Further test (in my LwIP driver) shows that the issue persist also in MSM (Multicore Shared Memory) when accessing in the L2 address space where only L1d would be used for caching it.  So it appears that the cache modules (L1d, and L2) would invalidate the cache line OK, but when a HIT test misses and about to fetch from memory, they would just revive the invalidated lines and mark them as valid...

    The linux-c6x-2.0-beta2 project includes a keystone netcp driver and in it, they appear to specifically turn off CACHING for a 16 M chunk at the 0x2Cxxxxxx range and uses XMC to remap the MSM to this area, so it appears that they have skirted the problem.  What I want to know now is was that done by chance or intentionally?  The driver was written by TI employees.

  • We are digging deeper into this issue and will get back to you soon. Please also keep us posted if you have any updates. Thanks.

    Sincerely,

    Steven

  • Min,

    In your example code, before the comparison, could you please try to writeback-invalidate 32 bytes, instead of invalidating 128 bytes as follows?

            //ERRATA_CACHE_invL2((void*)(ddr3Addr & 0xFFFFFF80), 0x80, CACHE_WAIT); // Invalidate 128 bytes
            ERRATA_CACHE_wbInvL2((void*)(ddr3Addr), 0x20, CACHE_WAIT); // Writeback-invalidate 32 bytes

    This modification does not change the intention of your code and it seems passing on the testing. I also attached the modified code (only one place updated as above). Please give a try and let us know if it is working for you. Thanks a lot.

    Sincerely,

    Steven

    #include <ti/csl/src/intc/csl_intc.h>
    #include <ti/csl/csl_edma3.h>
    #include <ti/csl/csl_edma3Aux.h>
    #include <ti/csl/csl_cache.h>
    #include <ti/csl/csl_cacheAux.h>
    #include <ti/csl/cslr_device.h>
    
    #include <stdio.h>
    #include <string.h>
    #include <stdlib.h>
    
    #define DEFAULT_CACHE_WAIT CACHE_WAIT
    
    #define CACHE_PRE_OP_STEPS \
    unsigned intStatus = _disable_interrupts();
    
    #define CACHE_POST_OP_STEPS \
    asm(" MFENCE "); \
    asm(" NOP "); asm(" NOP "); asm(" NOP "); asm(" NOP "); \
    asm(" NOP "); asm(" NOP "); asm(" NOP "); asm(" NOP "); \
    asm(" NOP "); asm(" NOP "); asm(" NOP "); asm(" NOP "); \
    asm(" NOP "); asm(" NOP "); asm(" NOP "); asm(" NOP "); \
    _restore_interrupts(intStatus);
    
    #define ERRATA_CACHE_invAllL1dWait()
    
    #define ERRATA_CACHE_invAllL1d(wait) { \
    CACHE_PRE_OP_STEPS \
    CACHE_invAllL1d(DEFAULT_CACHE_WAIT); \
    CACHE_POST_OP_STEPS \
    }
    
    #define ERRATA_CACHE_wbAllL1dWait()
    
    #define ERRATA_CACHE_wbAllL1d(wait) { \
    CACHE_PRE_OP_STEPS \
    CACHE_wbAllL1d(DEFAULT_CACHE_WAIT); \
    CACHE_POST_OP_STEPS \
    }
    
    #define ERRATA_CACHE_wbInvAllL1dWait()
    
    #define ERRATA_CACHE_wbInvAllL1d(wait) { \
    CACHE_PRE_OP_STEPS \
    CACHE_wbInvAllL1d(DEFAULT_CACHE_WAIT); \
    CACHE_POST_OP_STEPS \
    }
    
    #define ERRATA_CACHE_invL1dWait()
    
    #define ERRATA_CACHE_invL1d(block,size,wait) { \
    CACHE_PRE_OP_STEPS \
    CACHE_invL1d(block,size,DEFAULT_CACHE_WAIT); \
    CACHE_POST_OP_STEPS \
    }
    
    #define ERRATA_CACHE_wbL1dWait()
    
    #define ERRATA_CACHE_wbL1d(block,size,wait) { \
    CACHE_PRE_OP_STEPS \
    CACHE_wbL1d(block,size,DEFAULT_CACHE_WAIT); \
    CACHE_POST_OP_STEPS \
    }
    
    #define ERRATA_CACHE_wbInvL1dWait()
    
    #define ERRATA_CACHE_wbInvL1d(block,size,wait) { \
    CACHE_PRE_OP_STEPS \
    CACHE_wbInvL1d(block,size,DEFAULT_CACHE_WAIT); \
    CACHE_POST_OP_STEPS \
    }
    
    #define ERRATA_CACHE_wbL2Wait()
    
    #define ERRATA_CACHE_wbL2(block,size,wait) { \
    CACHE_PRE_OP_STEPS \
    CACHE_wbL2(block,size,DEFAULT_CACHE_WAIT); \
    CACHE_POST_OP_STEPS \
    }
    
    #define ERRATA_CACHE_invL2Wait()
    
    #define ERRATA_CACHE_invL2(block,size,wait) { \
    CACHE_PRE_OP_STEPS \
    CACHE_invL2(block,size,DEFAULT_CACHE_WAIT); \
    CACHE_POST_OP_STEPS \
    }
    
    #define ERRATA_CACHE_wbInvL2Wait()
    
    #define ERRATA_CACHE_wbInvL2(block,size,wait) { \
    CACHE_PRE_OP_STEPS \
    CACHE_wbInvL2(block,size,DEFAULT_CACHE_WAIT); \
    CACHE_POST_OP_STEPS \
    }
    
    #define ERRATA_CACHE_wbAllL2Wait()
    
    #define ERRATA_CACHE_wbAllL2(wait) { \
    CACHE_PRE_OP_STEPS \
    CACHE_wbAllL2(DEFAULT_CACHE_WAIT); \
    CACHE_POST_OP_STEPS \
    }
    
    #define ERRATA_CACHE_invAllL2Wait()
    
    #define ERRATA_CACHE_invAllL2(wait) { \
    CACHE_PRE_OP_STEPS \
    CACHE_invAllL2(DEFAULT_CACHE_WAIT); \
    CACHE_POST_OP_STEPS \
    }
    
    #define ERRATA_CACHE_wbInvAllL2Wait()
    
    #define ERRATA_CACHE_wbInvAllL2(wait) { \
    CACHE_PRE_OP_STEPS \
    CACHE_wbInvAllL2(DEFAULT_CACHE_WAIT); \
    CACHE_POST_OP_STEPS \
    }
    
    #define PLATFORM_ASSERT(x) do {printf("Assertion \"%s\" failed at line %d in %s\n", x, __LINE__, __FILE__); *(int*)0x00000000 = 0;} while(0)
    
    #define ASSERT(message, assertion) do { if(!(assertion)) PLATFORM_ASSERT(message); } while(0)
    
    typedef struct _wordstruct {
    	unsigned WORD8[8];
    } wordstruct;
    
    static void ti_cache_init() {
    	unsigned char mar;
    	CACHE_setL1PSize(CACHE_L1_32KCACHE);		//	set L1 to max					// SPRU871J 3.4.3.1
    	CACHE_setL1DSize(CACHE_L1_32KCACHE);		//	set L1 to max					// SPRU871J 3.4.3.1
    
    	// MAR0 - MAR15 are READ-ONLY
    	// MAR16 - MAR127 are for addresses from 0x10000000 - 0x7FFFFFFF
    	for (mar = 16; mar < 128; mar++)
    		CACHE_disableCaching(mar);
    	// 512 MB = 0x20000000, correspond to MAR128 to MAR159: 0x80000000 - 0x9FFFFFFF
    	for (; mar < 160; mar++)
    		CACHE_enableCaching(mar);
    	// Disable ALL OTHER addresses
    	for (; mar < 255; mar++)
    		CACHE_disableCaching(mar);
    	CACHE_disableCaching(mar);
    	CACHE_setL2Size(CACHE_256KCACHE);			// L2 cache size to 256k (maximum)	// SPRU871J 4.4.5
    }
    
    // For some reason, TCC code above 15 DOES NOT SHOW UP IN IPR (Interrupt Pending Register)
    #define EDMA_TCC_DESIRED  15
    
    // UNCOMMENT THE FOLLOWING LINE IF MANUALLY ADJUST CACHE_invL2 to CACHE LINE ONLY
    #define DO_CACHE_INVALID_FULL_LINE
    
    #define USE_ERRATA_CACHE_FUNCTIONS
    
    void main(void) {
    	CSL_Edma3Handle dmaHandle;
    	CSL_Edma3Obj edmaObj;
    	CSL_Edma3HwSetup setup;
        CSL_Edma3HwDmaChannelSetup dmahwSetup[64];  // 64 DMA + 8 QDMA for tpcc1 and tpcc2
        CSL_Edma3HwQdmaChannelSetup qdmahwSetup[8];
        CSL_Edma3ChannelAttr chAttr;
        CSL_Edma3ChannelObj chObj;
        CSL_Edma3ChannelHandle hChannel;
        CSL_Edma3ParamHandle hParam;
        CSL_Edma3ParamSetup paramSetup;
        CSL_Edma3ChannelErr chErr;
        CSL_InstNum edmaInstance = 0;
        wordstruct *words;
    
    	CSL_Status cslStatus;
    	unsigned index, u32val, ddr3Addr;
    
    	ti_cache_init();
    
    	cslStatus = CSL_edma3Init(NULL);
    	ASSERT("CSL_edma3Init returns CSL_SOK", CSL_SOK == cslStatus);
    
    	dmaHandle = CSL_edma3Open(&edmaObj, edmaInstance, NULL, &cslStatus);
    	ASSERT("CSL_edma3Open returns CSL_SOK", CSL_SOK == cslStatus && dmaHandle != NULL);
    
    	setup.dmaChaSetup = dmahwSetup;
    	setup.qdmaChaSetup = qdmahwSetup;
    
    	// 512 param for 64 DMA + 8 QDMA => 7 param per
    	u32val = 0;
    	for (index = 0; index < 64; index++) {
    		dmahwSetup[index].que = index & 7;
    		dmahwSetup[index].paramNum = u32val;
    		u32val += 7;
    	}
    	for (index = 0; index < 8; index++) {
    		qdmahwSetup[index].que = index & 7;
    		qdmahwSetup[index].paramNum = u32val;
    		qdmahwSetup[index].triggerWord = 7;
    		u32val += 7;
    	}
    	cslStatus = CSL_edma3HwSetup(dmaHandle, &setup);
    	ASSERT("CSL_edma3HwSetup returns CSL_SOK", CSL_SOK == cslStatus);
    
    	chAttr.regionNum = CSL_EDMA3_REGION_GLOBAL;
    	chAttr.chaNum = 0;
    	hChannel = CSL_edma3ChannelOpen(&chObj, edmaInstance, &chAttr, &cslStatus);
    	ASSERT("CSL_edma3ChannelOpen returns CSL_SOK", CSL_SOK == cslStatus && hChannel != NULL);
    
    	hParam = CSL_edma3GetParamHandle(hChannel, 0, &cslStatus);
    	ASSERT("CSL_edma3GetParamHandle returns CSL_SOK", CSL_SOK == cslStatus && hParam != NULL);
    
    	cslStatus = CSL_edma3HwChannelControl(hChannel, CSL_EDMA3_CMD_CHANNEL_CLEAR , NULL);
    	ASSERT("CSL_edma3HwChannelControl(CLEAR) returns CSL_SOK", CSL_SOK == cslStatus);
    
    	cslStatus = CSL_edma3HwChannelControl(hChannel, CSL_EDMA3_CMD_CHANNEL_CLEARERR, &chErr);
    	ASSERT("CSL_edma3HwChannelControl(CLEARERR) returns CSL_SOK", CSL_SOK == cslStatus);
    
    	cslStatus = CSL_edma3HwChannelControl(hChannel, CSL_EDMA3_CMD_CHANNEL_ENABLE, NULL);
    	ASSERT("CSL_edma3HwChannelControl(ENABLE) returns CSL_SOK", CSL_SOK == cslStatus);
    
    	CSL_edma3ClearLoPendingInterrupts(dmaHandle, CSL_EDMA3_REGION_GLOBAL, 0xFFFFFFFF);
    	CSL_edma3ClearHiPendingInterrupts(dmaHandle, CSL_EDMA3_REGION_GLOBAL, 0xFFFFFFFF);
    
    	CSL_edma3InterruptLoDisable(dmaHandle, CSL_EDMA3_REGION_GLOBAL, 0xFFFFFFFF);
    	CSL_edma3InterruptHiDisable(dmaHandle, CSL_EDMA3_REGION_GLOBAL, 0xFFFFFFFF);
    
    	memset((void*)0x80000200, 0, 512 << 5);
    #ifdef USE_ERRATA_CACHE_FUNCTIONS
    	ERRATA_CACHE_wbInvL2((void*)0x80000200, 512 << 5, CACHE_WAIT);
    #else
    	CACHE_wbInvL2((void*)0x80000200, 512 << 5, CACHE_WAIT);
    #endif
    
    	for (index = 0; index < 512; index++) {
    		ddr3Addr = 0x80000200 + (index << 5);
    		paramSetup.option = CSL_EDMA3_OPT_MAKE(CSL_EDMA3_ITCCH_DIS,
    				CSL_EDMA3_TCCH_DIS,
    				CSL_EDMA3_ITCINT_DIS,
    				CSL_EDMA3_TCINT_EN,
    				EDMA_TCC_DESIRED,
    				CSL_EDMA3_TCC_NORMAL,
    				CSL_EDMA3_FIFOWIDTH_NONE,
    				CSL_EDMA3_STATIC_DIS,
    				CSL_EDMA3_SYNC_A,
    				CSL_EDMA3_ADDRMODE_INCR,
    				CSL_EDMA3_ADDRMODE_INCR);
    		paramSetup.srcAddr = 0x90000000;
    		paramSetup.aCntbCnt = CSL_EDMA3_CNT_MAKE(32,1);
    		paramSetup.dstAddr = ddr3Addr;
    		paramSetup.srcDstBidx = CSL_EDMA3_BIDX_MAKE(1,1);
    		paramSetup.linkBcntrld = CSL_EDMA3_LINKBCNTRLD_MAKE(CSL_EDMA3_LINK_NULL,0);
    		paramSetup.srcDstCidx = CSL_EDMA3_CIDX_MAKE(0,1);
    		paramSetup.cCnt = 1;
    		cslStatus = CSL_edma3ParamSetup(hParam, &paramSetup);
    		ASSERT("CSL_edma3ParamSetup returns CSL_SOK", CSL_SOK == cslStatus);
    
    		words = (wordstruct*)0x90000000;
    		words->WORD8[0] = (index << 16) | index;
    		words->WORD8[1] = index + 0x10;
    		words->WORD8[2] = index + 0x20;
    		words->WORD8[3] = index + 0x30;
    		words->WORD8[4] = index + 0x40;
    		words->WORD8[5] = index + 0x50;
    		words->WORD8[6] = index + 0x60;
    		words->WORD8[7] = index + 0x70;
    #ifdef USE_ERRATA_CACHE_FUNCTIONS
    		ERRATA_CACHE_wbL2((void*)0x90000000, 0x20, CACHE_WAIT);
    #else
    		CACHE_wbL2((void*)0x90000000, 0x20, CACHE_WAIT);
    #endif
    
    		cslStatus = CSL_edma3HwChannelControl(hChannel, CSL_EDMA3_CMD_CHANNEL_SET, NULL);
    		ASSERT("CSL_edma3HwChannelControl(SET) returns CSL_SOK", CSL_SOK == cslStatus);
    
    		do {
    			asm(" NOP ");
    			asm(" NOP ");
    			asm(" NOP ");
    			asm(" NOP ");
    			asm(" NOP ");
    			asm(" NOP ");
    			asm(" NOP ");
    			asm(" NOP ");
    			CSL_edma3GetLoPendingInterrupts(dmaHandle, CSL_EDMA3_REGION_GLOBAL, &u32val);
    			if ((u32val & (1 << EDMA_TCC_DESIRED)) != 0) {
    				CSL_edma3ClearLoPendingInterrupts(dmaHandle, CSL_EDMA3_REGION_GLOBAL, 0xFFFFFFFF);
    				break;
    			}
    		} while (1); // Transfer not yet completed
    
    		words = (wordstruct*)ddr3Addr;
    #ifdef DO_CACHE_INVALID_FULL_LINE
    	#ifdef USE_ERRATA_CACHE_FUNCTIONS
    		//ERRATA_CACHE_invL2((void*)(ddr3Addr & 0xFFFFFF80), 0x80, CACHE_WAIT); // Invalidate 128 bytes
    		ERRATA_CACHE_wbInvL2((void*)(ddr3Addr), 0x20, CACHE_WAIT); // Writeback-invalidate 32 bytes
    	#else
    		CACHE_invL2((void*)(ddr3Addr & 0xFFFFFF80), 0x80, CACHE_WAIT); // Invalidate 128 bytes
    	#endif
    #else
    	#ifdef USE_ERRATA_CACHE_FUNCTIONS
    		ERRATA_CACHE_invL2((void*)ddr3Addr , 32, CACHE_WAIT); // Invalidate 32 bytes
    	#else
    		CACHE_invL2((void*)ddr3Addr , 32, CACHE_WAIT); // Invalidate 32 bytes
    	#endif
    #endif
    
    		if (words->WORD8[6] != index + 0x60) {
    			printf("At addr %08X + 6, index %u, expected %08X, got %08X\n", ddr3Addr, index, index + 0x60, words->WORD8[6]);
    			ASSERT("Valid value", words->WORD8[6] == index + 0x60);
    		}
    	}
    
    	cslStatus = CSL_edma3Close(dmaHandle);
    	ASSERT("CSL_edma3Close returns CSL_SOK", CSL_SOK == cslStatus);
    }
    

  • Hi Steven That does appear to answer the question although passing the whole cache line works (128 bytes aligned on cache line). It seems then that I have to insure that the cache line is not dirty (e.g., user modified received data from the last transfer) before I read the address range. Can you then confirm that CACHE_invL2 / CACHE_invL1 would be unreliable and to be avoided? Best Regards
  • Hi Steven,

    So according to your explaination, the error is produced because there is "dirty" data in L2 cache before the reading request?

    But even so, if I didn't make WB operation(only do INV operation), the only result is the dirty data will be discarded, and it shouldn't lead the whole line becomes all ZERO.

    Could you give me a more detailed analysis on this phenomenon?

    Thanks very much!

    Allen

  • All,

    Sorry for the long delay of the explanation.

    The L2 invalidation issue is related to the prefetch buffer in XMC.

    If you disable the prefetch bit (bit 3, PFX) in MAR register for the cacheable region you are testing, the L2 block invalidation should be working.

    Or you can manually invalidate the prefetch buffer (XPFCMD.INV=1) along with the L2 block invalidation, the cache behavior should be correct as well.

     

    The detailed explanation could be as follows:

    The prefetcher in XMC will see the cache misses from the current iteration of the loop, and will send prefetches for the next iteration.  The prefetcher likely gets well ahead of the EDMA.

    The prefetcher kicks in because the accesses form a nice linear sequence from the CPU (e.g. 0x80000200, 0x80000280, 0x80000300, etc.).  It takes two consecutive misses before the prefetcher kicks in, which is why it takes a couple iterations before it fails.

     The reason writeback-invalidate "fixes" the problem is that it sends a dummy write at the end of the block writeback for the last address in the block, and that forces XMC to invalidate the prefetch buffer for that line.  If your buffer was two lines instead of one, the wbinv version should still show a problem for the first line in the buffer, which is why the "fixes" in quotes.

     You can think of the XMC prefetch buffer as being a tiny cache, with similar potential for coherence issues.  You can manually invalidate the XMC's prefetch buffer at any time, which is probably the better approach in general, unless you have large ranges of DDR that should never be prefetched.  In those cases, disabling prefetch on those ranges makes more sense.

     

    Please take a look at the section 7.5 Prefetch Buffer in C66x CorePac User’s Guide for more details.

     

    Hope it helps.

    Sincerely,

    Steven

  • I recommend looking at the examples that come with CCS before writing code, such as pdk_C6678_1_0_0_XX/packages/ti/drv/pcie/example/sample/pcie_sample.c, where we see the example code using CSL to invalidate the prefetch buffer (and disabling interrupts).

          /* Disable Interrupts */
          key = _disable_interrupts();

          /*  Cleanup the prefetch buffer also. */
          CSL_XMC_invalidatePrefetchBuffer();

          CACHE_invL1d ((void *)dstBuf.buf,  PCIE_EXAMPLE_DSTBUF_BYTES, CACHE_FENCE_WAIT);
          CACHE_invL2  ((void *)dstBuf.buf,  PCIE_EXAMPLE_DSTBUF_BYTES, CACHE_FENCE_WAIT);

          /* Reenable Interrupts. */
          _restore_interrupts(key)

    When using BIOS, the BIOS cache apis should be used, as they handle some extra interrupt issues related to tasking, and provide workarounds for the cache silicon errata.

     

  • Hi Steven,

    Thanks for the reply, it truly solve the problem.

    And it seems a better way to invalidate the prefetch for shared-buffer according to the user guide though there will be some loss on effectivity.

    Allen