Updating firmware in any FLASH bank on the TMS570LC4357.

Marek Christer

Other Parts Discussed in Thread: TMS320F2812, HALCOGEN, TMS570LC4357

I have finished writing a boot loader that will update the second FLASH bank just fine. Now I need to make it work with either FLASH bank, so I figured out that would be easy! Same procedure as on the TMS320F2812, just load the updating and communication routines into RAM and execute from there. IT IS NOT WORKING!!! The linker creates something that it calls a "trampoline table". Apparently the addressing distance from the FLASH to the RAM is greater than can be handled in a single jump, so an intermediate address in the table is used to complete the RAM based calls. The table is in FLASH, so there goes the FLASH upgrade down the tubes! Any suggestions???

The second issue is that the CPU is refusing to execute code from RAM. I did specify the section as (RX), but that does not help. The first fetch attempt from RAM causes a prefetch failure interrupt. Do I need to reprogram the MMU here in some way to accommodate execution out of RAM???

I will be looking at the TI bootloader example next, but the comments are telling me that it is not capable of updating itself, only the other FLASH bank.

over 9 years ago

0 Chuck Davenport over 9 years ago

TI__Guru 59540 points

Hello Marek,

I have forwarded your post to one of our bootloader experts. They should get back with you shortly.

0 Salome Ramirez Carrion over 8 years ago in reply to Chuck Davenport

TI__Intellectual 2180 points

Additional details:

Issue #1 is the “trampoline”. The jump table is placed in flash and this makes it impossible to share the SRAM routines between the SRAM based code and the flash based code.

Issue #2 Any attempt to execute instructions out of SRAM is creating a bus error interrupt indicating that instructions are fetched from invalid memory locations.

-Salome

0 Zhaohong Zhang over 8 years ago in reply to Salome Ramirez Carrion

TI__Mastermind 22715 points

Hi, there.

If you build a test based on Halcogen, it will be very hard to run it from RAM. There are too many diagnostic tests during startup. I would like to suggest you to use the approach discussed in the following forum thread.

e2e.ti.com/.../413345

Thanks and regards,

Zhaohong

0 Marek Christer over 8 years ago in reply to Zhaohong Zhang

Intellectual 360 points

Dear Zhaohong,

Unfortunately I am not building a test. I am converting our software that used to run on '2812 and '28335 based systems to the TMS570LC4357 environment. I cannot swap RAM into the FLASH area. It has to stay where it is at. Yes, I used Halcogen to create some initialization files, but solely to look at what initializations were done and how. I am not using any other files from Halcogen, except for the definitions files for the memory map (*.h).
We have an bootloader application compliant with the automotive UDS specification. This is what I am finishing right now. The application is initializing the necessary portion of the system, testing some of the functionality and if there is a valid motor control application present, passing the control to that application.
Per the UDS specification, if the diagnostics server requests firmware update, I need to download new firmware via CAN and program it into FLASH. I must be able to do it at any address within the memory map. Swapping the memory space between RAM and FLASH does not do me any good in this case.

What I did on the previous systems was to place all the routines needed for CAN download and FLASH burning into RAM. Many of those routines are also used during normal system operation, so they need to be called from within FLASH. The one and only time when there is a sole execution from RAM is during a firmware update. This could be ANY one of the firmware components, application, bootloader one, bootloader two or basic startup code. I have been able to implement everything as planned to update Flash bank 0 while running out of Flash bank 1. Now I need to create the ability to update firmware in Flash bank 1 and I need to run out of RAM. I have no issue placing the needed routines in RAM at all. The compiler creates what it calls a "trampoline table" with addresses to the routines in RAM. There are two issues that I am running into: one - the "trampoline table" is in Flash bank 1 and thus prevents me from erasing the needed sectors unless I create duplicate set of needed routines in FLASH and RAM which I do not believe that I can afford to do looking at the space constraints. Issue number two is much worse. The very moment that the address to a RAM routine is loaded into the PC (it can be done just fine to this point) and an attempt is made by the processing unit to fetch instructions from RAM, I am getting a bus error interrupt which is from now on repeated forever and the system is locked.

The only alternative that I have left is to possibly duplicate the download and FLASH burning routines in Flash bank 0. Unfortunately this will put me in conflict with the UDS specification, as I need to be able to download the full FLASH range if needed in one download.

I can probably play some tricks with the "trampoline table", but I MUST be able to execute code from RAM at any system address and as of right now I have not been able to figure out how to do this.

Regards,

Marek

0 Zhaohong Zhang over 8 years ago in reply to Marek Christer

TI__Mastermind 22715 points

Marek,

Would you please take a look at information about TMS570 bootloader at the following link?

processors.wiki.ti.com/.../TMS570_Hercules_MCU_Bootloader

We normally put the bootloader in the first sector of Flash bank 0 (starting at address 0x0). If the bootloading is needed, bootloader will first copy the code for data downloading and flash programming from Flash to RAM. Then you can program any flash bank at your wish. However, we do not recommend to update the bootloader sector because if update of this sector is failed for some reason, you will have to reprogram the unit from JTAG. I think that you can use the same approach in the TI released bootloader with a different version of Flash API library.

There is no issue in calling functions in RAM from the Flash. Your failure in jumping to RAM must be caused by something in your code. Did you enable MPU? What is MPU setting? Is the RAM is configured as executable?

Thanks and regards,

Zhaohong

0 Marek Christer over 8 years ago in reply to Zhaohong Zhang

Intellectual 360 points

Thank you for the reply! Unfortunately due to both UDS specifications as well as my company requirements I cannot follow the TI recommendation. I looked at a boot loader suggested by TI about a month ago, but that version could only update the opposite Flash bank. I just downloaded the code that you recommended and will look at it tonight.

When you ask about RAM being configured as executable, do you mean in the linker command file (in this case the answer is yes!), or is there some register within the Hercules CPU that needs to be set properly? I have been unable to locate that information in the Technical Reference Manual for TMS570LC43x.
Also, there is no failure jumping to RAM. The problem happens immediately after the jump, when the first instruction to be executed from RAM is fetched.

Another question: is there a way to locate the "trampoline tables" in RAM? I have not found any reference in the documentation as to how to make that to happen.

Regards,

Marek

0 Zhaohong Zhang over 8 years ago in reply to Marek Christer

TI__Mastermind 22715 points

Marek,

There is a MPU unit inside R5F. If enabled, the available memory will be configured into several regions. An attribute will be set to each memory region to determine the access right. Please check the Cortex-R5 user guide.

Would you please provide more details about "trampoline tables"? We may use different terminology. In the approach discussed in my earlier reply, two blocks of memories are defined in the link command file for the code to be run in RAM. One in the Flash with load address and one in RAM with run address. The compiler will generate the function with run address so that the function can be called from anywhere. It is user's responsibility to copy code from load address in Flash to the run address in RAM before execution of this function.

We can take a look at your project if you can share it.

Thanks and regards,

Zhaohong

0 Marek Christer over 8 years ago in reply to Zhaohong Zhang

Intellectual 360 points

Well, lets look at the "trampoline" first. I never heard of this term until working with the TI ARM compiler. Apparently the branch instruction in the ARM CPU has a limited reach within the address space, so the TI compiler is creating what TI calls a "trampoline table" for FAR functions. The first time I saw the term was inside the 560Pro Trace debugger telling me "this function is a trampoline function". This happened the first time when I placed some routines in RAM. Basically, all FAR function addresses are stored in this table and the PC is loaded from the table+offset to the function address. Again this is a TI invention, I cannot help you any more than that in this regard.

The first thing I did when I run into the execution problem from RAM was to look at the MPU unit. All of the sub-modules are disabled. What is possible since you are mentioning the MPU again, is that I am looking at the code locking the MPU registers from being accessed, and not a code disabling the MPU functionality. I will check on this a little bit further tomorrow. The explanation in the Technical Reference Manual about the MPU is somewhat vague.

I have no problems copying the code to RAM and all of the function entry addresses in the table are correct. Also the disassembly of the functions in RAM as well as the C listing are correct. Sorry to repeat myself here, but my only issue is the fetching of the instructions from RAM for execution purposes. I am getting a bus error interrupt. Somewhere in the Technical Manual I found a statement that the FLASH controller WILL issue a bus error interrupt for instruction access outside of the FLASH area. I will look for it tomorrow and post it here for you.

The only hope that I have left right now is that the "disabled" codes that I was looking at in the MPU register set were not meant for "MPU disable", but for "access to the MPU registers disable".

Regards,

Marek

0 Zhaohong Zhang over 8 years ago in reply to Marek Christer

TI__Mastermind 22715 points

Marek,

I do not think that you need this table any more on TMS570LC4357. I have test which starts in RAM and executing in SDRAM without any issue. Can you provide more details about the bus error? Is it some kind of abort? As I propose earlier, we can take a look at the your project if you can share. I believe that there is something wrong in your code.

Thanks and regards,

Zhaohong

0 Marek Christer over 8 years ago in reply to Zhaohong Zhang

Intellectual 360 points

Zhaohong,

I still do not believe that you understand quite everything that is happening.

I am NOT creating this table by myself. I do NOT WANT this table! The TI ARM compiler is doing this automatically by itself when I have the main code running out of the default FLASH location (0x00000000 - 0x3fffff) and there are subroutine calls to RAM at the default RAM address (0x08000000 and higher). The ARM CPU call and branch instructions have a limit in the address space range, so TI has implemented these tables by themselves within the ARM compiler to support further branches and calls that are beyond the limits of the assembler instructions. This is done completely automatically by the TI compiler without any requests from me at all!!!

The bus error is the same as a Prefetch Abort (PABT per TI terminology) - a precise abort.

Paragraph 7.1.2 in the Technical Reference Manual for TMS570LC43x (spnu563.pdf) - page 333 - ""> • Bus Error - L2FMC will generate a bus error to the bus master on certain accesses for example, writes to flash on Port A/Port B or access to addresses beyond the available flash space." - This last sentence is written in the TI reference manual for the TMS570LC43x.

I need to find a way to ignore this bus error (again - a TI definition, not mine) when executing out of RAM. My last and only hope right now is that I misunderstood something about the MPU module, that it is not disabled, and it is actually trying to protect the CPU from an invalid access. As to your code example, you are executing completely out of RAM. I need to do both - FLASH and RAM.

We are going to have a very similar problem with our application code as it requires to execute out of FLASH and access some subroutines located in RAM at all times.

Tomorrow I will try to play more with the MPU subsystem. So far you have only convinced me that there is nothing wrong with my code, and the only hopeful possibility to solve our problems is that I may have an issue with the MPU initialization (or lack thereof).

I do not even want to think about a potential silicone problem that will prevent us from executing code out of FLASH and RAM at the same time. Hopefully it is simply something that I missed to initialize in this system, which I must admit is relatively complex.

Regards,

Marek Christer

0 Zhaohong Zhang over 8 years ago in reply to Marek Christer

TI__Mastermind 22715 points

Marek,

prefetch abort indicates that the memory location is either non-exist or not executable. As I offered earlier, I can take a look at your project if you can share it.

Thanks and regards,

Zhaohong

0 Marek Christer over 8 years ago in reply to Zhaohong Zhang

Intellectual 360 points

Sorry, can't share it!

I did come to the conclusion that the location is not executable 4 weeks ago. It exist for sure, so that is not even up for discussion. I can open the area in a debugger memory window and I see the code in RAM just as well as when it is copied from FLASH to RAM by my routines. What I need help with is to find out why this location is not executable!

We are back to my question #2 (#1, that TI "trampoline table" I can most likely deal with in some way). What can possibly make the RAM not executable in this MPU? I have already specified the area in the linker command file as RWX, so that is not it. The MPU is disabled. What else can be preventing execution from RAM in this device??? Oh, and the memory swap (RAM <> FLASH) will not work as only the first 512KB of FLASH becomes visible and the remainder cannot be updated.

By the way, the sample boot loader code that I downloaded yesterday from the TI website has the same problem. It is crashing upon the first attempt to execute from RAM with a Prefetch Abort interrupt. I am looking at the PC, the location to which the PC is pointing to (which contains the next fully valid instruction code located in RAM) and upon a single assembler step command in the debugger I am getting to the Prefetch Abort interrupt in the next step.

Regards,

Marek Christer

0 Zhaohong Zhang over 8 years ago in reply to Marek Christer

TI__Mastermind 22715 points

Marek,

You cannot run the TMS570 bootloader as is because it was built for TMS570LS317 which has a different flash wrapper. I wanted to use this example to illustrate the idea. Are you sure that MPU is disabled in your application? To move forward, I would suggest you building a simplified project by removing IP sensitive content and share with us.

Thanks and regards,

Zhaohong

0 Marek Christer over 8 years ago in reply to Zhaohong Zhang

Intellectual 360 points

Zhaohong,

It is pretty obvious that I am fighting an issue with the flash or the RAM controller here. And yes the MPU is disabled. It is not even initialized.

I was using your boot loader to simply verify whether I could run any code out of RAM using your code and it failed the test. I never went beyond single stepping through the initialization and an attempt to execute the first RAM based instruction.

I highly recommend that instead of making me modify my code to a minimal size with no IP (which is going probably to take a week), that you simply try to implement your boot loader into the TMS570LC4357 environment with the L2FMC and L2RAMW controllers. You stated yourself that your code is not created for this "flash wrapper".
Once you have that, then we can compare apples to apples and not apples to oranges (or something else unrelated).

Regards,

Marek Christer

0 Salome Ramirez Carrion over 8 years ago in reply to Marek Christer

TI__Intellectual 2180 points

(Additional information)
I was able to stop the Prefetch Abort; however, now when the MPU gets to the routines in RAM it does not execute the instructions. It simply increments the Program Counter one 32-bit instruction at a time and skips to the next address, and next, and next and so on.
I can see the proper instructions in RAM at the correct addresses using a debugger memory window, but it seems like the execution unit does not get the correct data when it is fetching instructions or it is potentially ignoring the instructions for some reason. I do not see any parity or ECC errors (both are disabled), and the disassembly window is showing the proper assembler instructions.
I am still having some more progress on my own. I have been looking at all status registers in the MPU to find if errors are reported. I found that even if the ECC is not enabled, there were many ECC errors logged. It took me a little while, but I figured out that if the data cache is enabled too early in the basic initialization code, it creates these errors. According to the data sheet it should not happen because the cached data is actually not used, but unfortunately it does. The data cache is reading data from locations that will not be used by some of the built in compiler initialization routines.
Unfortunately, it did not solve my problem yet.

0 Salome Ramirez Carrion over 8 years ago in reply to Salome Ramirez Carrion

TI__Intellectual 2180 points

(More information)
This problem may be possibly an issue with the IC itself (TMS570LC4357) and/or the emulator (XDS560v2 PRO TRACE).
I have been trying to run to the beginning of the RAM based code and then see if any errors are created upon attempt to execute from RAM. During those attempts, I did set a breakpoint in the RAM code and the system did actually run to the breakpoint and stopped there. I also do know that at least two of the assembler instructions in the RAM based subroutine were executed, as two of the core registers had proper values loaded after the breakpoint was reached. This has never happened while I was single stepping.
As soon as I tried to single step after the breakpoint the CCS6 software locked up. I tried to reset the target, the emulator and the computer multiple times and could not access the upper half of the internal CPU RAM (I had located the RAM code at 0x08040000 – upper half of RAM). After every restart, the CCS6 was stating that it could not access RAM at 0x08040000 and above. I thought for a moment that some kind of damage happened internally in the MPU. This kept happening until I deleted the breakpoint located at 0x08040004, and now the system is back to where it used to be. It is loading the RAM code at 0x08040000, but it is not executing the code from there.
This may be just the emulator, the MPU or a combination of both. We bought the XDS560v2 from Texas Instruments, but I believe that it is developed by Spectrum Digital. Any suggestions how to proceed here?

0 Zhaohong Zhang over 8 years ago in reply to Salome Ramirez Carrion

TI__Mastermind 22715 points

Salome,

If you can share your project, I am willing to take a look.

Thanks and regards,

Zhaohong

0 Marek Christer over 8 years ago in reply to Zhaohong Zhang

Intellectual 360 points

Zhaohong,

Salome is working at TI Technical Support and she posted the contents of my email to her.

Sorry, but these project still cannot be shared.

Regards,

Marek Christer

0 Zhaohong Zhang over 8 years ago in reply to Marek Christer

TI__Mastermind 22715 points

Marek,

I am attaching a CCS project which starts in Flash and runs the fee library in RAM. Halcogen configuration file is also in the project. I would like to bring your attention to the following.

(1) You need to configure the Cortex R5 MPU correctly. The default configures RAM as non-executable.

(2) There is an issue with the cacheEnable() function generated by Halcogen. A delay is required between the instructions invalidating caches. I will inform the Halcogen team to correct the issue.

As I said in another forum thread, I do not like the idea of updating the bootloader because if the update fails you will have to reprogram the unit via JTAG. If you really want to do so, you can make a big function which includes top level control, data communication and flash program/erase. You will need to make sure that this big function and everything it uses runs from RAM.

4113.TMS570LC4357_FEE.zip

Thanks and regards,

Zhaohong

0 Marek Christer over 8 years ago in reply to Zhaohong Zhang

Intellectual 360 points

Zhaohong,

Zhahong,

Thank you for your reply! I have been searching through the Technical Reference Manual for a long time trying to find a register or a bit somewhere configuring RAM as executable/non-executable. I cannot find it! I just looked through the manual again after reading your latest post and I am still failing to find anything about this.
Can you please point me to a page in the manual where that information is located? Thank you!

Is there an errata notice about the delay in cacheEnable() specifying the delay time required?

As to updating the bootloader, I have no choice. That is one of the requirements from our customer.

You have given me some hope here - thank you! I just hope to find the information in the manual somewhere. While I am waiting for your reply, I will check if there is a newer version of the Technical Reference Manual available online.

Regards,

Marek

0 Zhaohong Zhang over 8 years ago in reply to Marek Christer

TI__Mastermind 22715 points

It is the XN bit in Figure 4-36 on page 4-57 of Cortex R5 r1p2 TRM.

There is no formal errata on the Halcogen cache enable function. I spent almost a day yesterday trying to figure out why the halcogen based test does not work once the cache is enabled. I have some validation tests which does not use halcogen and never saw this kind of issue. Then I compared the source code and run a couple of tests to figure out what is missing.

Thanks and regards,

Zhaohong

0 Marek Christer over 8 years ago in reply to Zhaohong Zhang

Intellectual 360 points

Zhaohong,

I have more of an understanding of what is causing the problem, but not why. I tried different delays during the cache initialization with no result. It is still possible that some gate propagation delays withing the cache hardware setup system may be causing a problem.

As of right now I can clearly state that if the data cache is enabled, code execution out of RAM will not happen on the TMS570LC4357. If that bit is cleared and the data cache is disabled, I can execute out of RAM just fine. I can perform this switch at any time during program execution.

I have not designed semiconductor level electronics for almost 30 years, but my guess in this case is that when the data cache is enabled, any fetched from RAM are directed solely into the data cache and prevented from getting into the instruction prefetch queue.

If you can help me out as to what kind of delays are recommended and between which accesses, I am willing to try that. If that does not solve the issue, we may have a silicone design problem...

Regards,

Marek Christer

0 Charles Tsai over 8 years ago in reply to Marek Christer

TI__Guru**** 159345 points

Hi Marek,

Zhaohong is on vaction and will not be back in another two weeks. In the meantime I will try to help you make progress. Looks like it is not a simple problem.

First, when you get a prefetch abort from the code that runs out of RAM, the CPU will record the instruction fault address and status. Can you please tell me the values of the below three registers? Look for them in register window under Cp15.

CP15_INSTRUCTION_FAULT_STATUS, CP_15_AUX_INSTRUCTION_FAULT_STATUS and CP15_INSTRUCTION_FAULT_ADDRESS

Since you are getting a bus error when you run code out of RAM, the L2RAMW controller might have also log some information. Can you please tell me the value of RamErrStat register in the L2RAMW module?

Let's start from here and see if we can work our way to the root casue.

0 Marek Christer over 8 years ago in reply to Charles Tsai

Intellectual 360 points

Charles,
I am way past the RAM and exception issues right now... I finally figured out that I had to initialize the NMPU registers to something, and then disable the NMPU. Prior to that I was not initializing anything in the NMPU submodule. It was never enabled, but in some way it was causing problems anyway.

My next issue was that the code in RAM would not execute. The PC would increment but none of the instructions were either fetched or executed. I finally figured out that it has to do with the data cache. If I enable the data cache, the code in RAM will not execute. If data cache does not get enabled, the RAM code is working. This will cause a problem for our applications group when they start working on the high speed modules, but for me it is OK... Right now I am simply enabling the pre-fetch, but no data cache.

Zhaohong mentioned earlier that the routine from Halcogen initializing cache had a problem and needed some delays in-between register accesses. I asked him where the delays were needed and how long those should be but he never replied to that.

My current issue is the _memInit_() routine from Halcogen that I am using. It has been working fine in the past before I started to put code in the RAM and it still seems to be working fine after initial power up. When I am trying to re-initialize the system (interrupts disabled) I am crashing into a continuous interrupt (I have not yet determined which one, but it is one of the system interrupts that use vectors in the beginning of the FLASH). This happens only when the software is executing at full speed. If I single step through the _memInit_() routine using the debugger (assembler single step), everything works fine. That one seems like another timing issue in the silicone to me.

Regards,

Marek Christer

0 Marek Christer over 8 years ago in reply to Charles Tsai

Intellectual 360 points

Charles,

The interrupt in question is data abort interrupt. The CP15_INSTRUCTION_FAULT_STATUS, CP_15_AUX_INSTRUCTION_FAULT_STATUS and CP15_INSTRUCTION_FAULT_ADDRESS are all 0x00000000. The CP15_DATA_FAULT_STATUS is 0x00001008, CP_15_AUX_DATA_FAULT_STATUS is 0x00000000 and CP15_DATA_FAULT_ADDRESS is 0x08001505. I cannot find any references to "DATA_FAULT" in the Hercules manual nor in the Cortex-R5 manual.

0 Charles Tsai over 8 years ago in reply to Marek Christer

TI__Guru**** 159345 points

Hello Marek,

Can you please tell me the RamErrStat value in the L2ramw module. Please see below below screenshot.

Search for Data Fault Status Register (DFSR) or Data Fault Address Register (DFAR) in the Cortex-R5 TRM, you will find the definition of these registers. The values you have shown indicated that there is a precise external abort. This means that the abort is due to a bus error generated by the external system. Since the DFAR captures the value of 0x08001505 so this error is most likely coming from the L2RAMW. If you can find the RamErrStat value then we might have some clue as to what exactly caused the error.

0 Charles Tsai over 8 years ago in reply to Marek Christer

TI__Guru**** 159345 points

Marek,

You have sent two posts today. In the first post you mentioned that you are in an infinite interrupt that has something to do with _menitit_(). IN the second post you gave me the data fault status and address values. Are these two posts related to each other? The data fault status and address registers are meaningful to look at when you have a data abort. This is when the CPU is trapped at vector 0x10. The interrupt you referred to is an IRQ or FIQ interrupt, right? The exception vector for these two are at 0x18 and 0x1C. If the two posts are unrelated then maybe we should separate them out so we don't get confused. But if they are related to each other please let me know.

0 Marek Christer over 8 years ago in reply to Charles Tsai

Intellectual 360 points

Hello Charles,
All of my postings in this set are related to each other. There is one issue that I am trying to solve - how to execute code out of RAM. My problem has been that so far I have been fighting these issues pretty much on my own, and every time when I solve an issue - something else gets in the way. Zhaohong has not been of much help, with one exception. He stated that the Halcogen code was not initializing the cache properly, but when I asked what the initialization needed to be like, I did not get a reply.

I have gotten so far that I am running code out of RAM right now, but without the use of data cache. The moment when I enable data cache, the code stops executing out of RAM. If there is some special initialization sequence required with specific delays, etc, I still have not been notified about it. I cannot find anything about it in the documentation for the Hercules MPU.

Right now I am fighting the issue that when I am re-initializing the system after a download of our updated software, the system crashes during the call of the Halcogen _memInit_() subroutine. If I single step through the assembler lines of _memInit_(), everything works fine. Any other full speed execution crashes. I can even run to the line in _c_int00() that is calling _memInit_() and then try to perform a debugger stepover(F6) command. The execution never returns from _memInit_() and I find the PC in one of the abort interrupts.

Yesterday, I had always been running into an error in valid RAM (0x08001505). Today it is slightly different since I changed my development board (still our design, just a different board with a different TMS570LC4357 on it). I will mention more about this later. The L2ramw error code is 0x00002000. The MIE bit is set, which tells me that the Halcogen group must not have created a code that is waiting properly for the finishing of the RAM initialization (unless something is wrong with the silicone itself). I will look into this and try to figure out what to do and how to modify the _memInit_() to function properly. Maybe you have a suggestion what needs to be done?

Today's problem is very similar to yesterday. The _memInit_() call is still the one crashing with an abort interrupt as a result, but this time my stack pointer is all messed up and I have invalid access to FLASH instead at address 0x00400080 (sometimes at 0x00400048). The data fault status is 0x00001008, but there are no errors in the Flashwrapper. This is relatively clear as the address 0x00400080 does not access any valid peripherals and I have not yet figured out where it is coming from.

I will try to figure out how to upload screen shots of Flashwrapper and CP15 registers.

Regards,

Marek Christer

0 Charles Tsai over 8 years ago in reply to Marek Christer

TI__Guru**** 159345 points

Hi Marek,

I understand your frustration. Let's hope your problem can be quickly resolved.

Let's start with the first one about not being able run code from RAM with cache enabled. It seems like there is some data coherency issue between the cache and the external memory when you copy your bootloader code to RAM and run from it. Let's try one experiment. Just before you execute your code from RAM, try to invalidate the cache. You can still keep the cache enabled. To invalidate the cache, call _iCacheInvalidate_() and _dCacheInvalidate_(). I tend to think that _iCacheInvalidate_() is sufficient since the problem is instruction execution from RAM. But let's invalidate both data and instruction cache for now.

For the 2nd problem about meminit(), the L2RAMW aserted the bus error and also logged this error in the MIE flag. This indicates that a bus master has tried to read the L2RAM while the memory initialization is on going. The memory initialzation takes 4096 HCLK cycles to complete. When you single step the CPU is in debug mode before excuting the next instruction. When you are in debug mode you may be in seconds before you step to the next instruction. This gives plenty of time for the memory to finish initialization. When you are in free run, it looks like the CPU is accessing the L2RAM before the memory initialization completes. I have never seen the defalt HalcoGen generated _meminit_() with problem like yours. Normally HalcoGen has the _meminit_() called in the very beginning of the startup file. When the memory initialization completes, the L2RAMW sends a complete signal to the SYS module. The bit0 of MinitStat register in the SYS module will indicate this. The _memInit_() is supposed to poll and wait for this bit to set before continuing. Can you tell me the below two questions?

1. Can you please tell where you call _meminit_()?

2. Is your cache disabled throughout your bootloader operation?

0 Marek Christer over 8 years ago in reply to Charles Tsai

Intellectual 360 points

Charles,
This is a little bit more complex than just the bootloader. I should have probably mentioned that. At this moment I have temporarily given up on the code in RAM and I am trying to make a run solely out of FLASH. Maybe we should create a different question for that, if so, please tell me.
The system consists of three totally independent applications (more apps will be added later) and each one restarts from scratch upon entry and re-initializes the system. I am wondering right now if your suggestion to invalidate the cache will solve the issue. That is very possible.
As to the completion of the initialization of RAM, I did verify long time ago that the Halcogen created _memInit_() does check for the completion of the initialization. That is why I have been wondering about potential timing issues in the initialization state machine. I had to modify this routine slightly to preserve R4 and R5:

;-------------------------------------------------------------------------------
;Initialize RAM memory

.def _memInit_
.asmfunc

_memInit_

mov r8, r4
mov r9, r5

ldr r12, MINITGCR ;Load MINITGCR register address
mov r4, #0xA
str r4, [r12] ;Enable global memory hardware initialization

ldr r11, MSIENA ;Load MSIENA register address
mov r4, #0x1 ;Bit position 0 of MSIENA corresponds to SRAM
str r4, [r11] ;Enable auto hardware initalisation for SRAM
mloop ;Loop till memory hardware initialization comletes
ldr r5, MSTCGSTAT
ldr r4, [r5]
tst r4, #0x100
beq mloop

mov r4, #5
str r4, [r12] ;Disable global memory hardware initialization

mov r4, r8
mov r5, r9

bx lr
.endasmfunc

If I set a breakpoint on "ldr r11, MSIENA" and continue the code excecution, the system fails. If I set a breakpoint on "mov r4, #0x1" everything is working fine. Please notice that due to the multiple applications, this is the third time that _c_int00() is being executed to create a brand new and clean start. Almost all of the applications are using common initialization code with the exception of the basic startup code that is the one starting from the reset vector. This startup code is collecting some information from the environment and storing the data in two 32bit words at address 0x08001500. The next thing that it does is to check for the presence and validity of two alternate bootloaders. It then calls one of these two bootloaders depending on validity, revision and other needs. Both bootloaders are compiled as independent applications with a complete clean system restart (_c_int00()). The _memInit_() is not the first thing done in that initialization, but almost. The two 32bit words at 0x08001500 are stored into R4 and R5 to preserve this data until the re-initialization is completed.
The bootloader that is started up checks the environment based on specific FLASH contents and the two 32bit words mentioned earlier. After that it determines which application to launch and validates that the application code is intact. The launched application re-initializes the system again using _c_int00(). This is where the code is failing inside the _memInit_().

As I mentioned to you earlier today, I have different failure on two different PCBs here (same design). On one I have the RAM failure (no RAM code present at this time), on the other something happens with the core registers. When I stop the execution using the debugger, I am always in the data abort interrupt but I can have the stack (SP) pointing to FLASH, or the SP may pointing to non-existent memory area.

I will try your suggestion with invalidating the cache, but first I will try to put a delay between the two assembler instructions that I mentioned to you above. Yesterday, I did already try to put in a delay loop just after _memInit_() is testing for the init completion. I used 0x02500 for the loop counter, which should more than satisfy the requirement of 4096 HCLK cycles that you mentioned.

Regards,

Marek Christer

0 Marek Christer over 8 years ago in reply to Charles Tsai

Intellectual 360 points

Charles,
To make a long story short, I put a delay loop between "ldr r11, MSIENA" and "mov r4, #0x1" in the _memInit_() routine and things started to work. I will lower the loop value until I find the low limit. At this moment I started out with 2.5 million loops.

Marek

0 Charles Tsai over 8 years ago in reply to Marek Christer

TI__Guru**** 159345 points

Marek,

I wonder if at any time you have the cache enabled before the launched application calls the _meminit_()? Also when you do a system restart, did you just move the program counter to _c_int00() or you really did a CPU reset? If you just restart by moving the PC to _c_int00() then it is possible that you might have some unfinished read/write bus transaction to the L2RAM left from the booloader prior to the meminit is called by which the L2RAMW will assert the bus error. I guess that putting a manual delay to wait for 4096 wait cycles is a temporary solution for you. For you sure it will work if you wait long enough.

0 Marek Christer over 8 years ago in reply to Charles Tsai

Intellectual 360 points

Charles,
At this moment I do never enable the data cache on this chip. If you mean the instruction pre-fetch then yes, it is enabled; however, it should not cause data caching errors.

Also, I am not really doing a system restart. I am launching a new application that is starting with a _c_int00() to perform a full initialization needed for the particular application. The starting address for the application specific _c_00int() is different depending on which application I am launching. I cannot do a CPU reset. That would throw me back into the basic boot software, going into one of the bootloaders again and then launching one of the applications which would reset the CPU and continue in an endless loop over and over again.

As to the delay, I think that you misunderstood. If I put a delay after the RAM initialization starts, it is useless. The delay that I am putting in to solve the issue is between writing to the MINITGCR register (enable global memory initialization) and the start of the initialization itself by writing to the MSIENA register (specifying that RAM should be initialized). I have a tight loop but I have not checked how many HCLK cycles it requires. The counter in the loop must be greater than 0x500 and it works pretty reliably at 0x550. The 4096 cycles to complete the initialization is completely separate from this. It does not start until the MSIENA register is written to. That is where the check for the completion bit is used in MSTCGSTAT.

I need to run out from here right now, but I will check the invalidation of cache tomorrow morning. What I still do not understand is why the stack pointer would be getting so completely messed up.

Regards,

Marek

0 Charles Tsai over 8 years ago in reply to Marek Christer

TI__Guru**** 159345 points

Marek,

Honestly I can not explain why putting the wait loop before MSIENA will work. Can you check the current value of MINITGCR and MSIENA, and MINITSTAT prior to _memInit_() is called when the memInit would result in bus error? Is it possible that the MINITSTAT had already set the flag in the earlier call for _memInit_() and in the final time when the _memInit_() is called it thought that the memInit has already completed. In this case,It did not really wait for the 4096 cycles.

Arm-based microcontrollers

Arm-based microcontrollers forum

Updating firmware in any FLASH bank on the TMS570LC4357.