How to Perform a software reset without using DSP/BIOS ?

adnan khan14207

Hello folks,

I am working on a system (TMS320C6713 based custom board, programming in C++ using Code Composer Studio 3.0, NOT using DSP/BIOS), with two applications each with a separate boot loader located at two distinct locations on Program Flash. On startup, I want my system to boot using the first application and monitor UART for commands, If a special command is received, switch to the second application.

Now the problem is that my system is inconsistent, sometimes it successfully switches to the second application, sometimes its not. Sometimes it hangs and sometimes first application restarts instead of starting the second application.

In my opinion, the problem is because of the fact that i don't reset the processor state. The processor while executing First application when encounters a specific command from UART, It simply copies the first 1K Code of the second application to RAM at address 0x0000000 and branches to address _c_int00. Since the processors is not reset, some of the registers are not in default state, so the system responses inconsistently.

Question 1:

Is there any thing wrong with what I think is the real problem? If I am right about the problem, then what is the best way to do a software reset of DSP since I want to avoid toggling the RESET pin in hardware. (I searched the internet and didn't find anything related to my problem).

Question 2:

Is there any better way of doing this same thing? If yes then please explain.

NOTE :

1. My both applications work fine if there is no switching involve. The problems start when we switch the applications.

2. I am not using DSP/BIOS.

3. All my development is using C++ on Code composer Studio 3.0 on Custom board based on TMS320C6713 DSP.

If you have any other question, plz do ask.!!!

I am running a little short of deadline, So Guys plz help me ASAP !!!

Regards

Adnan

over 15 years ago

0 RandyP over 15 years ago

TI__Guru* 84110 points

It sounds like you are performing the exactly right warm reset by branching to _c_int00. If you application only did CPU processing and had no interaction with peripherals in any way, then there would be nothing different between this warm reset and a cold reset from the RESET pin. Starting from _c_int00 takes care of initializing all of your variables, as long as you linked with the -c Run-Time Autoinitialization switch which is the default.

If your restart problem is because of a memory variable that is at the wrong value during initialization, that would mean that the variable is not initialized properly in the source code and that it is always lucky enough to be at an acceptable (non-zero?) value at power-on reset. This is very unlikely, but one possible scenario.

My first guess at a cause is that one or more peripherals should not be re-initialized after the warm-reset. Or a more thorough resetting of some peripherals may be required. For example, if you change the PLL frequency during cold reset, it may need to be handled much differently during a warm reset since the PLL would already be running; this could lead to erratic behavior and could be avoided by not performing that initialization after a warm reset.

Other peripherals, like any serial ports or the EMIF (especially for SDRAM), may need to have their configuration registers used to force them through a reset, if available, to make sure they are in a stopped and known state before continuing with their initialization. You will want to make sure they are not interacting with an outside device during re-initialization. One way to debug this, since there are probably many peripherals that get initialized, would be to wrap each peripheral initialization sequence inside "if ( !warmreset ) {}" to try to eliminate each one.

My second and preferred guess is the state of the on-chip memory caches. A very important feature to control, which could be a very likely place for the problem to be caused, would be the cache. When you artificially write to program space such as during your copy of 1K code to address 0x00000000, there can be coherency issues with the L1P cache, and there can also be issues with coherency with the L1D and L2 caches. The Two Level Memory Internal Memory Ref Guide SPRU609(b) discusses ways to change the cache configuration. I suspect a typo in the descriptions in Tables 16 and 17 (both values do the same thing?), but Table 18 has a good description of steps to take to reduce L2 cache. I believe all L2 is SRAM at reset, so this and the invalidation of L1P and L1D could bring you closer to the true cold reset state of the chip.

Please let me know if any of this is helpful, or if your investigation reveals any new clues.

0 adnan khan14207 over 15 years ago in reply to RandyP

Prodigy 245 points

Randy, thank you very much for such a detailed and prompt reply. here are few of the things that i wanted to say about your post.

In my project, c Run Time Autoinitialization option is on.

I have carefully checked all my variables, they are always initialized...

My Second application re-initializes PLL, EMIF, and FPGA. The FPGA code resides inside the program flash. Although there is no change in configuration of Peripherals, But even then we re-initialize all the peripherals so that they reacquire their default state. But after reading your post, I am doubtful about my re-initialization thing. Please guide me whether or not I should not re-initialize my peripherals again in the second application.

I am not enabling cache. That means my on-chip cache works like IRAM. I am not sure if I still need to care about the coherence of LI and L2 cache. In that case, what could be the cause of this issue? Please Comment!!

Regards

Adnan

0 RandyP over 15 years ago in reply to adnan khan14207

TI__Guru* 84110 points

adnan khan said:

3.

My first guess at a cause is that one or more peripherals should not be re-initialized after the warm-reset. Or a more thorough resetting of some peripherals may be required. For example, if you change the PLL frequency during cold reset, it may need to be handled much differently during a warm reset since the PLL would already be running; this could lead to erratic behavior and could be avoided by not performing that initialization after a warm reset.

My Second application re-initializes PLL, EMIF, and FPGA. The FPGA code resides inside the program flash. Although there is no change in configuration of Peripherals, But even then we re-initialize all the peripherals so that they reacquire their default state. But after reading your post, I am doubtful about my re-initialization thing. Please guide me whether or not I should not re-initialize my peripherals again in the second application.

If the PLL, EMIF, and FPGA do not need to be changed, then there could be problems that happen when you re-initialize them. If the SDRAM were in the middle of a refresh cycle when you reset the EMIF, it could cause some problems with the SDRAM.

The PLL User's Guide or datasheet will describe the original initialization procedure, which you are already following. I have not looked closely at it yet, unless you determine that this is the problem, but usually our PLL will power up in bypass mode to allow you to get the PLL started from a well-known condition. If you change its setting after it is already running, that could lead to clock signal distortion that could lead to erratic behavior.

If the FPGA has some state machines that should be reset, you probably have a way to do that. But there is no way for me to predict that part of this. I can see it to be reasonable to just leave the FPGA alone or to send it a warm reset type of signal from a GPIO or other means.

adnan khan said:

4.

My second and preferred guess is the state of the on-chip memory caches. A very important feature to control, which could be a very likely place for the problem to be caused, would be the cache. When you artificially write to program space such as during your copy of 1K code to address 0x00000000, there can be coherency issues with the L1P cache, and there can also be issues with coherency with the L1D and L2 caches. The Two Level Memory Internal Memory Ref Guide SPRU609(b) discusses ways to change the cache configuration. I suspect a typo in the descriptions in Tables 16 and 17 (both values do the same thing?), but Table 18 has a good description of steps to take to reduce L2 cache. I believe all L2 is SRAM at reset, so this and the invalidation of L1P and L1D could bring you closer to the true cold reset state of the chip.

I am not enabling cache. That means my on-chip cache works like IRAM. I am not sure if I still need to care about the coherence of LI and L2 cache. In that case, what could be the cause of this issue? Please Comment!!

It should be considered a rule to do a L1P invalidate just after downloading the new program and before branching to 0x00000000. There may be an invalidate-all command that would work, but I am pretty sure you can at least invalidate L1P. It should also be a rule to invalidate L1D for the same protection for the case where you download any .const or other initialized data space.

0 adnan khan14207 over 15 years ago in reply to RandyP

Prodigy 245 points

RandyP, Thank you very much for your detailed post.

I followed your both suggestions.

1. Not Reinitializing the Peripherals, EMIF, FPGA, PLL

2. Invalidate L1D and L1P.

Since I told you, i am not using cache in my code. But even then on your suggestion, I studied the Cache manual, and tried to use the CACHE_wbInvL1D/ CACHE_InvL1P FUCTIONS. But unfortunately my code halted as i switch from first application to the second. So, i removed this cache invalidation code and just went on with the other change and to my surprise, things were working fine with it.!!!

Now, theoratically speaking, L1D and L1P invalidate functions are quite necessary to use before branching to execute second application. but the question is, Why My application is working fine without this change??

Secondly, I think I am making some mistake in the use of these CACHE_ XXXX functions. So, Can you plese guide me related to the arguments of these functions...

Regards

adnan

0 Brad Griffis over 15 years ago in reply to adnan khan14207

TI__Guru*** 125430 points

Copying code to that first 1k of memory can be a tricky thing. One concern that I have is that some of that code is landing in the L1D cache rather than L2 SRAM. The L1D is a read-allocate cache. That means that unless there was already data cached in L1D then writing something to that address would not "land" in the cache. So for example, if you have data from the first application located at address 0x00000000 and that data is cached at the time you try to write the second application, the new instructions will actually end up in L1D cache rather than L2! I don't know if that's your issue, but it's something that I would be concerned about. Also, because you mention erratic behavior that could explain the issue, i.e. in times where data is cached things get screwed up and other times they don't. To avoid this issue I think you should simply make sure that no data is allocated in the 1st 1KB of memory from either application. Otherwise it gets a bit harry in terms of all the cache operations.

0 RandyP over 15 years ago in reply to adnan khan14207

TI__Guru* 84110 points

Adnan,

The most important thing is that you have it working. Good job. And it is evidence of your dedication and your skills that you are still concerned with "why" for the cache commands. You will have many continuing successes, I am sure.

adnan khan said:

Now, theoretically speaking, L1D and L1P invalidate functions are quite necessary to use before branching to execute second application. but the question is, Why My application is working fine without this change??

This is a very difficult question to answer without complete understanding of your software flow, of what code had executed for program and data prior to the download and invalidate procedures, and what the program is executing when the halt occurs. Here are some thoughts:

As Brad explained, data may or may not land in the L1D cache when the DSP does a write operation. If this is how you are writing the new application to memory (using the DSP to write data), then doing an L1D writeback would be required if any of the new application landed in L1D. Otherwise, it could not possibly be read by the DSP as an instruction. You will want to refer to the Two-Level Internal Memory User's Guide SPRU609B for more information. The next to the last paragraph on page 54 talks about this situation and recommends the block wbinv and inv commands.
If you use the EDMA to do the write, it will definitely not land in the L1D and there can be cache coherency problems. In this case, you would want to do an L1D wbinv before starting the EDMA doing its copy/write.
If the L1P inv is what is causing your problem, you may be violating some of the cautions in SPRU609B. For example, it is recommended that you wait until the cache command is completed before moving on with the program.
If the L1D wbinv is what is causing your problem, then you might also need to wait for the cache command to complete before continuing. Also for this case, there may be old data that overwrites new data, most likely to happen if you use the EDMA to do the writing.
There may or may not be an issue with invalidating a region out of L1P while executing code from that same region. I am sorry for the vague comment, but this is a small but important detail that I could not find explicitly stated in SPRU609B. Usually, that means there is not a problem, or else it would be documented how to handle it. But an extra-safe method would be to have two code parts that can invalidate L1P and then wait for that operation to complete. The first one would be located in the upper half of L2 and would invalidate L1P for the addresses in the bottom half of L2 and wait for that operation to complete, then it would branch to the second code part that would be located in the lower half of L2 and would invalidate L1P for the addresses in the bottom half of L2 and wait for that operation to complete.

adnan khan said:

Secondly, I think I am making some mistake in the use of these CACHE_ XXXX functions. So, Can you plese guide me related to the arguments of these functions...

I have used the following in benchmarking applications on similar DSPs just before running a routine that I wanted to benchmark. The syntax should be correct.

    CACHE_invAllL1p( CACHE_WAIT );
    CACHE_wbInvL1d( (void*)0x00000000 /*start of L2*/, CACHE_getL2SramSize(), CACHE_WAIT );
    CACHE_wbInvL1d( (void*)0x80000000 /*SDRAM*/, 0x2000000 /*length of SDRAM*/, CACHE_WAIT );
    CACHE_wbInvAllL2(CACHE_WAIT);

I hope some of this will help, or will at least help you understand how the cache works and how it may related to your application re-load operation.

Best of continuing success,
RandyP

0 Victor Kazmirenko over 15 years ago

Guru 13202 points

Hello,

I am doing almost same with that minor difference that my application uses DSP/BIOS. I did not perform enough experiments yet, still reading in hope to get more understanding.

adnan khan said:
The processor while executing First application when encounters a specific command from UART, It simply copies the first 1K Code of the second application to RAM at address 0x0000000 and branches to address _c_int00.

This description is not very clear. Since you are booting very first program from flash, I suspect, that 1K is secondary bootloader, which copies application image from flash to DSP's memory, presumably starting at 0x0400. I think, that assembly

b _c_int00

perform branching to address of _c_int00, which is calculated during link stage for the application. Needless to say, this address could be different for App.A and App.B. Now let's see possible scenario. App.A is written to flash at address, which is read on reset. Then, from App.A's image 1K read which is effectively App.A's bootloader. It loads rest of the program image and branches to _c_int00 of App.A. Then you receive command to reboot. You copy 1K of App.B to address 0x00000000, which is bootloader of App.B. Then, according to your description, you branch to _c_int00, which still is address of _c_int00 in App.A. Moreover, code of App.B is not loaded to DSP's memory - still there is image of App.A. This, I think. explains, why you still have seen App.A on reboot. My guess is that after copying bootloader of App.B we have to branch to address 0x00000000. This would execute bootloader of App.B, which copies image of App.B to DSP's memory and then branches to _c_int00 of App.B.

Please comment on this guess. And would you mind to share your solution too? Thanks.

0 Brad Griffis over 15 years ago in reply to Victor Kazmirenko

TI__Guru*** 125430 points

I don't think it's a good idea to have multiple bootloaders. It adds an additional and in my opinion unnecessary layer of complexity to the software. If you want to have multiple applications and/or the ability to do firmware updates I would recommend the following:

One block of memory designated as the secondary bootloader. It would ALWAYS run at power on. Its responsibilities would be to configure PLL(s), external memory, etc and then either load one of the applications or to perform a firmware update, i.e. overwrite one of the applications with a new one. This block of memory would NEVER be erased. That way, even if you had a power loss or some other catastrophic event in the middle of updating an application you would not have a "bricked" board.
One block of memory for each application. These blocks would each be formatted as a giant boot table. This formatting is achieved by TI's hex conversion utility. Your secondary bootloader would be setup to read from the start of this block and to copy all the code/data to its proper location in memory and branch to the entry point.

0 Victor Kazmirenko over 15 years ago in reply to Brad Griffis

Guru 13202 points

Brad,

Thank you for valuable comment.

My situation is more complicated because of two things:

To control DSP and communicate with host PC we use rather complex infrastructure with USB connection. Code required to do that is far bigger, than 1K which is automatically loaded on reset in C6414.
I have to keep compatibility with legacy systems.

At present we already have 2 applications burned in flash. One we call "emergency boot", which is very close in concept with what you described. Though it uses 2-stage boot process, its role is only to provide USB connection with host PC and upgrade main application.

Main application is burned at the origin of the flash memory chip. That chip is arranged to appear at 0x6400 0000 address which is CE1 on EMIF B, where DSP starts to boot from. Emergency boot application is burned at higher addresses. Switching to emergency boot application is done with hardware switch, which effectively permutes flash memory chip's address lines in such a way, that now emergency boot image appears at 0x6400 0000.

For compatibility reasons I cannot change this scheme. Rather, I have to add possibility to load more then one "main application". Point is that I have to run one application and on receipt of certain command switch to second application and vice versa. Technically its also copy all data/code to memory and branch to the new entry point. But I cannot do it while running application. Copying new one to DSP's memory would write over just running application, and destroy copying code. I have to run copying code from location which would not get overwritten during this process.

I am trying to establish another strategy. As I mentioned before, we use 2-step boot process. So there is 1K hole at 0x0000 0000, which is used solely for bootloader. Its function is to copy application from flash (yes, from hardcoded 0x6400 0000) and branch to _c_int00. After that jump, 1K area of the boot loader is no longer used and safe to overwrite. So I want to reconfigure loader in a way that will load second, third and so on application, branch to 0x0000 0000 and let loader do the rest. Branching seems to work by simple

typedef void (*pFunction)(void);
pFunction Jump_To_Application;
Jump_To_Application = (pFunction) 0x00000000;
Jump_To_Application();

So my question is how to copy new loader from known location in flash. Earlier in this thread there were concerns about caching issues, so more feedback still very appreciated.

0 Victor Kazmirenko over 15 years ago in reply to Victor Kazmirenko

Guru 13202 points

Well, my case seems to be resolved. Several things helped:

Reading this thread about cache operation (thank you, guys).
spru656a.pdf, section 2.4. Self-Modifying Code and L1P Coherence.

Here is brief summary on my solution.

I store complete images of applications in the flash. Each image consists of 1K bootloader and rest of application. One image is located at x6400 0000 address, where DSP starts booting from. Bootloader of second application knows where its image located.

To reboot new application I perform the following sequence:

INTR_GLOBAL_DISABLE(); with obvious objective.
IER = 1; with same objective
ICR = 0x0000FFFF; which might be redundant.
Copy bootloader of the second application with
memcpy((u_int8 *)0x00000000 /*destination*/, (u_int8 *)0x64080000/*new app is here*/, 0x0400/*length*/);
Coherency operations
CACHE_wbInvL1d((void *)0x00000000, 0x400, CACHE_WAIT);
CACHE_invAllL1p();
Jump to zero address with
typedef void (*pFunction)(void);
pFunction Jump_To_Application;
Jump_To_Application = (pFunction) 0x00000000;
Jump_To_Application();

Perhaps, inline assembly might do the same as latter four lines. Hope, it will save someone a day. Comments still very appreciated.

0 Francesco Danilo De Vita over 15 years ago in reply to Brad Griffis

Intellectual 255 points

Brad Griffis said:

I don't think it's a good idea to have multiple bootloaders. It adds an additional and in my opinion unnecessary layer of complexity to the software. If you want to have multiple applications and/or the ability to do firmware updates I would recommend the following:

One block of memory designated as the secondary bootloader. It would ALWAYS run at power on. Its responsibilities would be to configure PLL(s), external memory, etc and then either load one of the applications or to perform a firmware update, i.e. overwrite one of the applications with a new one. This block of memory would NEVER be erased. That way, even if you had a power loss or some other catastrophic event in the middle of updating an application you would not have a "bricked" board.

One block of memory for each application. These blocks would each be formatted as a giant boot table. This formatting is achieved by TI's hex conversion utility. Your secondary bootloader would be setup to read from the start of this block and to copy all the code/data to its proper location in memory and branch to the entry point.

Hello Bred.
I am developing an application with the same structure that you described. I have a program (first program) that run at start-up, it performs some operations (fo example receiving from UART the second program in Intel format, decoding it and placing it into the SDRAM).
The two applications are mapped into two different areas of the SDRAM. Both use the DSP / BIOS.
After running its code, the first program jumps to memory location where is located the routine _c_init00 of the second program.
The second program doesn't work.
Is conceptually correct this operation? Can I launch two times the operating system without make an hardware reset of the processor?

Thanks

Danilo F. De Vita

0 Victor Kazmirenko over 15 years ago in reply to Francesco Danilo De Vita

Guru 13202 points

Though I'm not Brad, let me share some of my experience ;-)

I have found, that hardware reset of processor is not required. I am able to boot multiple applications without it.

Running _c_init00() should be sufficient. One concern of mine is how do you know that address in the first application? Another one is how big is your first application? What memory location it uses? It should not overlap with memory of the second program at least in copied sections. And don't forget to perform cache coherency operations as described in this thread.

If you might still have troubles, consider providing more details or even fragments of code.

Good luck!

0 Francesco Danilo De Vita over 15 years ago in reply to Victor Kazmirenko

Intellectual 255 points

Thank you, rrlagic, for your rapid answer.

rrlagic said:
One concern of mine is how do you know that address in the first application?

I assigned .sysinit section to a fixed memory area so I know where _c_init00 is positioned.

rrlagic said:
how big is your first application? What memory location it uses?

In the first program I defined the RAM section from 0xc3000000 to 0xc3100000.
In the second application I defined the RAM from 0xc3100000 to 0xc3F00000.

In any case I didn't try yet to perform cache coherency operations.

I will inform you about my next tests.

Bye

0 Brad Griffis over 15 years ago in reply to Francesco Danilo De Vita

TI__Guru*** 125430 points

Personally I try to steer people away from simply branching to c_int00 to perform a software reset. In doing so the peripherals (including the cache controller) will be in an unknown state, i.e. however the first application left them. So unless you go to great lengths to try and put the peripherals into a known state in the software you might have some flaky behavior.

Did you get it all working? It's not impossible, just a little more problematic in general than starting from a nice clean hardware reset.

Processors

Processors forum

How to Perform a software reset without using DSP/BIOS ?