Invalidate cache on C6701 DSP

willshad

Other Parts Discussed in Thread: SPRC090

I've read the other posts on this forum about invalidating the cache. But I could not make out what the final conclusion was so I am re-posting.

Using a C6701 DSP with external SRAM. Our system is required to run on orbit in a science satellite. We expect to be clobbered with single-event upsets (SEUs). My solution is to place all my code and data in SRAM that is protected by EDAC. But the execution time running from SRAM will not allow us to meet our system specs. If I enable cache in the DSP IPRAM then the execution time is acceptable. But DSP IPRAM is not protected from SEUs. Thinking that I could invalidate the cache periodically so that the DSP gets a clean copy so to speak from SRAM. Thinking it best to invalidate all of L1 & L2 caches.

I did try inserting the following code block but no luck. Not sure if I missed a step or the calls don't work or both. Using CCS 5.3.

#include <csl_cache.h>

CACHE_invalidate(CACHE_L1DALL, (void*)0x00000000, 0x00000000);
CACHE_invalidate(CACHE_L1PALL, (void*)0x00000000, 0x00000000);
CACHE_invalidate(CACHE_L2ALL, (void*)0x00000000, 0x00000000);

Any suggestions?

over 8 years ago

0 Rahul Prabhu over 8 years ago

TI__Guru** 116170 points

Willshad,

As specified in the Appendix C of this document, can you please switch to the latest Cache CSL APIs.

Are all the calls failing or are some of the calls taking effect.

Regards,

Rahul

0 RandyP over 8 years ago

TI__Guru* 84110 points

Will,

What happens when you execute these lines? I apologize, but 'no luck' is not much to go on for figuring out a problem.

The instructions you used are supposed to still work, but it will be good to see what results you get when you follow Rahul's request to use the latest APIs.

It can be dangerous to simply invalidate your L1D and L2 data caches. If anything has changed in the data held in those caches, you will lose that data. This could include stacks and heaps and any variables or arrays. Unfortunately, doing a writeback-invalidate will not protect you from any SEU that occurred in the internal data caches since the upset data will be written to SRAM as if correct and would then have EDAC protecting that.

If you have some smaller regions of data memory that you can use wb-inv with and then invalidate the remainder, that might improve your retention plus protection. There might be games you can play with preloading data cache differently each time to move the way-storage each time. But for the data portion, there is no perfect way to protect from SEUs while using the internal caches. Program will be helped a lot by your periodic invalidate, which might also need a block-invalidate of L2 for all of the program space that might be cached there. Study the cache user guide(s) to be sure.

You should have full source for the CACHE_* CSL commands, so you can look through them to see what they will do with the *_*ALL arguments. There might be something clear to you that explains the results you are seeing or not.

Regards,
RandyP

0 willshad over 8 years ago in reply to RandyP

Intellectual 405 points

Hi Rahul & RandyP,

Thank you for responding. I apologize for "no luck". Went into great detail on other items except that one. By no luck, I meant that I don't see my execution times change after invalidating the caches. If I step thru the code with an emulator, I cannot step into the API code either whereas I can step into the other csl calls that I have.

I did try the newer API calls listed in the SPRU401 doc but they don't seem to work any better.

I have test points on the card that I monitor with a scope and control from the GPIO pins on my DSP. These test points are driven on entry/exit from module A code and module B code performs the invalidation.

On 1st execution of module A code, the time to execute is 10.4us and subsequent executions are 7.2us. I figure this means that module A code is in cache and now runs faster. If I execute module B code to invalidate, module A code still runs faster, thought it would run slower.

I tried "CACHE_wbInvAllL2(CACHE_WAIT);" to invalidate. I know that sometimes, code compiles successfully with this compiler but will not execute...is skipped by the PC...because I'm missing an include file. Wondering if that's the case here.

Researched this DSP (6701) and it doesn't appear to have an L1 data cache, just a program cache in L2. Does this sound right?

Thx, Bill.

0 RandyP over 8 years ago in reply to willshad

TI__Guru* 84110 points

Bill,

willshad said:
Researched this DSP (6701) and it doesn't appear to have an L1 data cache, just a program cache in L2. Does this sound right?

Yes, almost. This is a part that has been around for a while, and although I new its architecture really well 15 years ago, I have not spent as much time with it lately. The internal data memory is memory-mapped SRAM only, not used as cache. The internal program memory can be configured as either memory-mapped program SRAM or as program cache. These internal memories are very close to the DSP logically and performance-wise, so we would refer to them as L1 today; we did not use those terms early in the C6000 series, but the L1 term is more appropriate in this case, I believe.

Do you have a C6701-specific GEL file that you use with emulation and CCS to load and debug your program? Or how are you doing your code development? I assume this is on your own board and not one of our EVMs, right?

The right CSL for the C6701 is from SPRC090, and in it I do not see any cache commands. Clearing the cache must not have been a common thing to do with the first generation of parts. Our opinion back then was that the program cache could not hurt your program execution but could easily help it whenever anything was cached. If there are cache commands in the CSL, I did not find them in a quick look; if the C6701 is supported in newer CSL versions as suggested above, you would want to look at the CSL source to find out, in my opinion. The source might be available during debug but you should be able to find it on your computer as C source files and headers.

In the C670x DSP Program and Data Memory Controller/ Reference Guide SPRU577, you will find detailed information on the internal memories. First, look at Section 1.2.4.2 Cache Invalidation to understand how the cache can be invalidated. This is very different than how our newer parts are invalidated. Second, look at Section 1.2.1 Internal Program Memory Modes where there is a detailed Note: that includes assembly code you can use to set the cache mode field of the CSR.

You can cut-and-paste that code into an assembly file to create a function you can call from C to invalidate the program cache. You will need to go through the assembly sequence twice, once to change the mode to mapped then to change it back to cache-enabled.

A device reset puts the program memory in memory-mapped mode, so something in the initialization code or in your main program must be setting cache-enabled mode. Do you know where this is happening?

If I am missing something by looking at these things so quickly, I am sorry. I am simply reading the documents with a familiar eye and reporting what I believe and recall.

Please let us know if this gets things working the way you expect them to work.

Regards,
RandyP

0 willshad over 8 years ago in reply to RandyP

Intellectual 405 points

Hi RandyP,

I'm using CCS V5.3 for code development. Can't remember the last time I used a GEL file, maybe CCS V3.3. In the TCF file, I can configure the Program Space as memory-mapped RAM or as cache. I am developing s/w for a custom card.

I was already using libraries from SPRC090. I can successfully breakpoint prior to entry to the cache API call but trying step in just takes me to the next line of C code. So I agree that there is probably no cache commands in the CSL for the C6701 DSP. I found the source code in both the lib_2x and lib_3x folders, csl6000.src.

Not sure which one to hijack code from?

I understand that the only way to flush the cache is by switching modes, i.e. flip from cache-enabled to memory-mapped and then back to cache-enabled.

I encountered two issues that I need to clarify. In memory-mapped mode, the CSR contains 0x02020183. So the PCC (bits 7:5) = 100 and according to SPRU733A, this is "Reserved". In cache-enabled mode, the CSR contains 0x02020143. So the PCC (bits 7:5) = 010 and according to SPRU733A, this is "Direct-mapped cache enabled". According to the same document, PCC = 000 also means "Direct-mapped cache enabled".

Is this a typo? Where is the bit pattern for memory-mapped mode defined?

So, when changing the PCC bits to flush the cache, should I set the value to '100' and then back to '010'?

As mentioned in the 1st para, the TCF file is setting the cache to enabled mode.

Thx, Bill.

0 RandyP over 8 years ago in reply to willshad

TI__Guru* 84110 points

Bill,

My recommendation is to use the code shown in the Note in SPRU577, and make your own cache code. At least do this as a test to see if you get the measurements you expect. And I recommend using the values 000b for mapped mode and 010b for cache enabled mode.

Unfortunately, I do not have hardware to confirm this for you. If you have any difficulty writing the enable and disable functions, please let us know.

Regards,
RandyP

0 willshad over 8 years ago in reply to RandyP

Intellectual 405 points

Hi RandyP,

I'm oh so close.

1. Start my DSP with cache-enabled via TCF file, then call an assembler routine to flip to memory-mapped mode. DSP keeps running but execution is slower. Thats expected. if I then call an assembler routine to flip to cache-enabled then the DSP hangs. I know it hangs because the unit's telemetry is no longer updated by the DSP. I try to pause emulation and get an error message and the only way to proceed is to disconnect.

2. Same results if I start the DSP with memory-map enabled on startup.

3. If I change my code to start with memory-mapped mode and change the assembler code to flip to memory-mapped followed by flip to cache-enable then the DSP does not hang when calling the assembler code. But not sure cache is being flushed because I don't see a change in the execution time.

The following is my assembler code. The comments don't quite match the code. Seems, by tracing the code, that when I call _mapped, the PC seems to run thru all the _mapped routine as well as the _cache routine. May be I need to add a return in there to make it come out. But when the routines are separated into separate files, then the DSP crashes as indicated above. Note that I also had to change the code from the SPRU to make it compile successfully. Wonder if I also need a delay between the two routines to give the DSP a chance to flush?

I see from other posts that you highly recommend avoiding writing assembler routines and I can see why.

 .global _mapped
 .global _cache

 .text
 .align 32
_mapped:
	MVC CSR,B5 		;copy control status register
 	MVK 0xff1f,A5
	AND A5,B5,A5 	;clear PCC field of CSR value
	MVK 0x0000,B5 	;set cache enable mask
	OR A5,B5,B5 	;set cache enable bit
	MVC B5,CSR 		;update CSR to enable cache
	NOP 4
	NOP

_cache:
	MVC CSR,B5 		;copy control status register
 	MVK 0xff1f,A5
	AND A5,B5,A5 	;clear PCC field of CSR value
	MVK 0x0040,B5 	;set cache enable mask
	OR A5,B5,B5 	;set cache enable bit
	MVC B5,CSR 		;update CSR to enable cache
	NOP 4
	NOP

0 RandyP over 8 years ago in reply to willshad

TI__Guru* 84110 points

Bill,

Excellent work!

In both assembly routines, replace

NOP 4
NOP

with

B B3
NOP 5

The B B3 is the return instruction to get back to the C code. The NOP 5 uses up the next 5 cycles in the pipeline to avoid unwanted code from executing. If you leave the NOP 4/NOP in the _mapped routine, it will just pass on to the _cached code and do both in one call. Just depends on how you want to do it. [Ed: it may be important to keep the 32-byte alignment for the _cache routine, so keeping exactly 8 instructions in _mapped makes that happen. This alignment is especially important when doing cache manipulation.]

You should be safe with the registers being used, according to the C Compiler Guide. A3/B3 may need to be saved, and A10-A15/B10-B15 need to be saved, but since you are not using any of those, you are okay.

Regards,
RandyP

0 willshad over 8 years ago in reply to RandyP

Intellectual 405 points

Hi RandyP,

Thank you for all your help. made the changes to perform the return to caller and the routines run successfully. Made a few other tweaks to eliminate the compiler warnings on the MVK expressions and fixed the comments. I attached the code snippet just in case someone else tries to follow this thread.

I am using a 10 millisecond delay in C code to delay between each call to give the CPU time to flush the program cache. Do you think that's too much or too little?

; code was ported from SPRU577A page 1-7 and modified to build for C6701
; place this file in the asm folder
; to call the code, add the following to your C code:
; #ifdef __cplusplus
; extern "C" {
; #endif
; 	extern far void _mapped();
; 	extern far void _cache();
; #ifdef __cplusplus
; }
; #endif /* extern "C" */

; // Disable global interrupts
; IRQ_globalDisable();
; // enable mapped mode
; mapped();
; cache();
; // Enable global interrupts
; IRQ_globalEnable();

 .global _mapped
 .global _cache

 .text
 .align 32
_mapped:
	MVC CSR,B5 			; make a copy of control status register
 	MVKL 0xffffff1f,A5	; create mask to clear PCC bits
 	MVKH 0xffffff1f,A5
	AND A5,B5,A5 		; clear PCC field of CSR
	MVKL 0x00000000,B5	; set memory-mapped enable mask (000)
	MVKH 0x00000000,B5
	OR A5,B5,B5 		; OR value with CSR
	MVC B5,CSR 			; update CSR
	B B3				; return to caller
	NOP 5

_cache:
	MVC CSR,B5 			; make a copy of control status register
 	MVKL 0xffffff1f,A5	; create mask to clear PCC bits
 	MVKH 0xffffff1f,A5
	AND A5,B5,A5 		; clear PCC field of CSR
	MVKL 0x00000040,B5	; set cache enable mask (010)
	MVKH 0x00000040,B5
	OR A5,B5,B5 		; OR value with CSR
	MVC B5,CSR 			; update CSR
	B B3				; return to caller
	NOP 5

0 RandyP over 8 years ago in reply to willshad

TI__Guru* 84110 points

Bill,

Thank you for sharing your code, and we are glad to hear you have it working now.

What was the compiler message you got for the MVK instructions?

The 32-byte alignment for the functions is not vital for the C67x architecture (I think) but at some point we changed the compiler to enforce that for branch targets, which includes these two functions. It would be safest (maybe only habit for me?) to remove the MVKH instructions so the 32-byte alignment will remain. You could add the .align 32 before _cache, but that would insert 6 words of NOPs.

Also, if you put the || back in on the two MVKL instructions, the code will run 2 cycles faster.

Sorry for the minor points, but since the CSR is only 16 bits, you can get away without initializing the upper 16 bits of the registers.

Regards,
RandyP

0 willshad over 8 years ago in reply to RandyP

Intellectual 405 points

Hi RandyP,

The compiler error was "Value out of range; converted to -225".

I see your point that the CSR register is only 16 writable bits so I will make the changes.

Any thoughts on my question from the previous post concerning the delay between each call?

Thx, Bill.

0 RandyP over 8 years ago in reply to willshad

TI__Guru* 84110 points

Bill,

I do not think you need to allow any time delay for the invalidation. The SPRU577a User Guide says

1.2.4.2 Cache Invalidation said:
The PMEMC halts the CPU while it initializes its tags.

To me, this means the _cache() function will not return until after all the invalidation has been completed. That means you are safe to start running immediately, and it means interrupts could have been stalled for a short time. Of course, you should make sure no interrupts can occur during these functions anyway, as mentioned in the Note with the code sample.

According to the CPU & Instruction Set Reference Guide, the MVK instruction takes a 16-bit signed constant and sign extends it. So the original value in the code sample 0xff1f would be sign extended to 0xffffff1f, which is exactly what you want. But the Ref Guide goes on to say that a 16-bit hex value with a 1 in the msb will generate a warning. That seems silly to me, but that is what it does. And the Ref Guide says you can avoid that warning by using a 32-bit hex value with the 16-bit value's sign extended, such as 0xffffff1f. That might also work with MVKL, which also sign-extends from a 16-bit value.

Regards,
RandyP

0 willshad over 8 years ago in reply to RandyP

Intellectual 405 points

Good take on section 1.2.4.2. I gotta read between the lines more often.

That's exactly what I did, code is now || MVKL 0xffffff1f,A5 and no compiler warnings.

Again, thanks for all your help. E2E forum rocks.
Bill.

Processors

Processors forum

Invalidate cache on C6701 DSP