Here's my scenario....
- bootloader based on uboot for NOR flash
- Wanting to configure the L2 first thing in bootloader and boot into our custom OS with only the OS switching in a new MMU mapping table.
- On GPMC CS0 - NOR flash (executing bootloader from this false, sets up cache, mmu, etc)
- On GPMC CSx - deviceX has slower access times then the NOR flash and is synchronous with use of the wait signals on read/writes (wait lines can delay any where from 1us - 2ms for a single word read/write access)
- GPMC is configured with no timeout and the L3 timeout is setup for max. So this allows the wait lines on CSx to stall the processor until data is ready. (blocking GPMC and Initiator target access from ARM core to GPMC (through L3))
- MMU configured with NOR flash as normal write through caching, DDR and SRAM as normal write through cached, and the deviceX as strong ordered non cached.
So every thing will function OK in this configuration if I don't turn on L2 (L1 caching only). I believe the reason is based on the fact the L2 module manages all memory accesses when cache is enabled (cached and not cached). In this configuration, is it possible that the L2 can't handle the stalls that the CSx is causing?
ISSUE:
After doing only 2-6 accesses to CSx while running out of flash on CS0, I start to get a few different failure states if I keep power cycling. The first is random memory corruption in DDR. I'm loading a image from flash into DDR, doing a CRC that passes right after the load. Then I do an access to CSx (while instructions are executing out of CS0) and some words in the image's DDR location change. Thus my CRC check I added after the CSx call fails. The second is a GPMC Err type/address registers being populated with a invalid address (memory not GPMC addr). The address isn't valid for any place on the system and it looks like it must have been a failure of a GPMC access attempted by the L2 controller to GPMC CS0 while I had a CSx access stalling.
If I setup the MMU entry for NOR flash to be strong ordered and not cached the issue doesn't occur, but that drastically slows down my system when running from flash. We're not sure if this is the correct way to resolve the issue (as it could just be we slow down execution enough it works).... If that is the correct configuration, we were planning to move more of our boot code to DDR to speed it up and should be able to make it work.
So I guess the simple question is how long of a GPMC wait signal stall can the L2 controller handle when it's managing the non cached memory access to that device? Since if I disable L2 this all works and has worked for months....
We have applied the following ARM errata....
460075430973458693
Did you utilize this ROM service prior to enabling the L2?
Can you post the code that enables the L2 cache for review?
---------------------------------------------------------------------------------------------------------
Please click the Verify Answer button on this post if it answers your question.---------------------------------------------------------------------------------------------------------
We are currently not using the ROM service. I will look into getting that enabled.
Below is the code we are using to enable the L2 cache and MMU.
// Invalidate I-Cache__asm__ volatile ("mcr p15,0,%0,c7,c5,0" : : "r" (0x0));__asm__ volatile ("isb");/* * Setup the MMU Table here. Leaving code out to save space. *//* Drain write buffer */__asm__ volatile ("mcr p15,0,%0,c7,c10,4" : : "r" (0));/* TLB Flush */__asm__ volatile ("mcr p15,0,%0,c8,c7,0" : : "r" (0));/* Load TTBR0 */__asm__ volatile ("mcr p15,0,%0,c2,c0,0" : : "r" (TTBR0_BASEADDR((uint32 )trans_table) | TTBR0_C | TTBR0_RGN_WB_WA));// L2 Cache enable__asm__ volatile ("mrc p15,0,%0,c1,c0,1" : "=r" (ctrlReg) :);ctrlReg = ctrlReg | 0x2;__asm__ volatile ("mcr p15,0,%0,c1,c0,1" : : "r" (ctrlReg));/* Set default domain access (all manager) */__asm__ volatile ("mcr p15,0,%0,c3,c0,0" : : "r" (0xfffffffd));/* Enable MMU, alignment, instruction cache, branch prediction, data cache */__asm__ volatile ("mrc p15,0,%0,c1,c0,0" : "=r" (mmu_ctrl) :);mmu_ctrl |= (SCTLR_I | SCTLR_Z | SCTLR_C | SCTLR_A | SCTLR_M);__asm__ volatile ("mcr p15,0,%0,c1,c0,0" : : "r" (mmu_ctrl));__asm__ volatile ("isb");// Invalidate I-Cache__asm__ volatile ("mcr p15,0,%0,c7,c5,0" : : "r" (0x0));__asm__ volatile ("isb");
Looked more into using the ROM service to invalidate the L2 cache. We are currently using the following code to do achieve the same functionality. We are disabling the L2 at this point and enabling a little later when we setup the MMU.
/* Invalidate L1 */ mov r0, #0 mcr p15, 0, r0, c8, c7, 0 /* Invalidate TLBs */ isb mcr p15, 0, r0, c7, c5, 0 /* Invalidate I-Cache */ isb /* Invalidate L2 */ mrc p15, 1, r0, c0, c0, 1 ands r3, r0, #0x7000000 mov r3, r3, lsr #23 beq finished mov r10, #0loop1: add r2, r10, r10, lsr #1 mov r1, r0, lsr r2 and r1, r1, #7 cmp r1, #2 blt skip mcr p15, 2, r10, c0, c0, 0 isb mrc p15, 1, r1, c0, c0, 0 and r2, r1, #0x7 add r2, r2, #4 ldr r4, =0x3ff ands r4, r4, r1, lsr #3 clz r5, r4 ldr r7, =0x00007fff ands r7, r7, r1, lsr #13 loop2: mov r9, r4 loop3: orr r11, r10, r9, lsl r5 orr r11, r11, r7, lsl r2 mcr p15, 0, r11, c7, c6, 2 subs r9, r9, #1 bge loop3 subs r7, r7, #1 bge loop2skip: add r10, r10, #2 cmp r3, r10 bgt loop1finished: /* Turn off L2 Cache */ isb MRC p15, 0, r0, c1, c0, 1 and r0, r0, #0xFFFFFFFD MCR p15, 0, r0, c1, c0, 1 isb
Clayton ShotwellLooked more into using the ROM service to invalidate the L2 cache. We are currently using the following code to do achieve the same functionality.
It is absolutely required to use the ROM service. It's not possible for you to achieve the same result because you cannot run in secure mode like the ROM code.
I added the following lines just above the assembly I posted earlier.
/* Use the ROM to invalidate the L2 cache */moveq r12, #0x1smc #1
I have been working with the MMU configuration to see if there is a different configuration that might work better with our hardware setup. I have tried configuring the NOR flash as strongly ordered or device memory and that allowed the bootloader to run but it slowed everything way down. Next I tried to configure the L1 and L2 cache models separately to see if I could get better results. I ended up configuring the L1 as non-cacheable and the L2 as write-through, no write allocate (write back with allocate or no-allocate also work) and not shareable (enabling the shareable bit in the MMU table slows everything down considerably). This allows the bootloader to run without errors. I found the configuration table I referenced on page 7-5 of the Cortex-A8 TRM revr3p2 for normal memory and another configuration table in the ARMv7 Architecture Reference Manual in section B3.8.2 that details the register values I've set to configure the MMU. I am not sure why I am having to disable the caching in L1 to avoid the GPMC bad address errors but it seems to help.To backtrack just a little bit for some more information on the errors I am seeing. The errors occur on the GPMC interface during reads and writes. I am seeing bad address errors when I check the device before I try a read or a write to the deviceX (not the NOR flash device). I checked the addresses from many different failures and none of the addresses fall into a valid address range that was configured in the GPMC config7_i register. What's really odd is the address I am trying to access is not the address in the GPMC error address register (not even close).
Clayton Shotwell I added the following lines just above the assembly I posted earlier. /* Use the ROM to invalidate the L2 cache */moveq r12, #0x1smc #1
I was hoping that would fix the issue, but in any case it's definitely the right thing to do so keep it there.
Clayton ShotwellI checked the addresses from many different failures and none of the addresses fall into a valid address range that was configured in the GPMC config7_i register.
Will you please provide the address you accessed (virtual and physical) and the address of the failure? A few register values would be good too, e.g. GPMC_config, any registers pertinent to the error you're seeing, etc.
One other issue comes to mind that I should mention. When you enable cache for the bootloader you need to be very careful not to have data (i.e. the instructions) "stuck" inside the cache. If you have configured write-through cache then that shouldn't be an issue as all levels of memory would be updated immediately. However, in the case of write-back cache I think you would need to manually flush the data in order to "push" the instructions to the physical memory.
Have you done any profiling of the boot sequence, i.e. back when everything worked fine but was slow, what was the biggest offender? Was it the actual copying of the application from NOR flash to DDR? Or was it the execution of some piece of initialization code?
In order to narrow down your issue I think we should try to come up with a specific configuration and then work to debug it. Right now there are a lot of variables changing and it's hard for me to understand what all is happening. For example, I was thinking something like this:
Here are a few of the address errors I have seen. There doesn't seem to be any pattern in the values. These errors occur on both reads and write with no pattern and they do not always happen at the same point. Address Accessed in GPMC Error Address from GPMC register0x18600006 0x15F1ECC00x18086020 0x02BEDB400x18600002 0x35C7ED800x18600006 0x38FD6D900x18600006 0x17C7ED800x18600002 0x03726690Below are the address ranges I have mapped in the GPMC and the MMU mappings that are configured.Device GPMC Address GPMC Size MMU Address MMU SizeNOR 0x08000000 64MB 0x08000000 16MB 0x09000000 16MBDeviceX 0x18000000 128MB 0x18000000 16MB 0x1C000000 16MB 0x1D000000 16MBGPMC is being configured with smart idle and auto idle enabled. DeviceX uses its wait pin while the NOR flash does not use its wait pin.
Brad, Clayton is out this week and I'll be filling in.
To continue with what you guys had already checked out.... We believe all the setup for cache and MMU are correct (very similar to other project we've done and per the TRM for the most part). Do we want to go through our use case of the GPMC interface and the really long processor stalls we're doing with the Wait lines? I believe that is our root cause, but I haven't been able to find enough support evidence to figureout how to workaround/fix it.
I have confirmed with my hardware guy that the GPMC CS3 wait line can be held from 10s of uSec to max ~2.5ms.
If I set the GPMC module to timeout, my transactions fail on CS3 (the one using the waits). Which tells me the wait line is being held for longer then 6uS (max GPMC wait that can be configured. So this is in essence canceling my transactions. So I have to stay configued at the GPMC level with no timeout. Next I look at the L3 timeout settings. When I calculate my max timeout on that interface it's about 1.5ms. I don't know how to catch that interface's failure case if it's ever exceeded. So my theory is the GPMC stalls out the L3 beyond it's timeout, but the GPMC transaction still attempts to complete... Which in this state, possibly??? leaves some unknown values locked in for the address, causing the GPMC access err? (Can you confirm the behavior of each module if a wait line is held beyond GPMC and L3 timeouts? And idea why this works with only L1 enabled?)
Section 5.2.3.4.2 "Time-Out" of the OMAP3530 TRM discusses the various registers associated with configuring and detecting a timeout on the L3 interconnect. Do you see a timeout error being logged in the L3_TA_AGENT_STATUS register for GPMC (0x6800 2428)? Note that it is a 64-bit register, i.e. get the whole thing.
What is the value of L3_TA_AGENT_CONTROL for the GPMC (0x6800 2420)? Note that it is a 64-bit register, i.e. get the whole thing.
Hopefully looking at the registers will show that the root cause of the issue is truly an L3 timeout. If that's the case it seems like you would simply want to turn off the timeout.
The last time we checked, I believe it was showing one of the 5 MPU agent ids as the timeout reason in the status reg. I'll get a dump of those for you.
Can you completely turn off the timeout? Would it truely stall the L2 controller or is there a level of inhert timeout in that controller that we're going to run into?
1962332Can you completely turn off the timeout? Would it truely stall the L2 controller or is there a level of inhert timeout in that controller that we're going to run into?
It looks like there are a number of different time outs. As far as I can tell both the GPMC and L3 Interconnect timeout can be turned off. I did some quick searching of the ARM documentation and didn't see anything related to a timeout mechanism. That said, it seems plausible that there might still be some kind of timeout mechanism in the Cortex A8 that I've not yet discovered. Maybe I'm using the wrong word in my search or something... That would be the only thing that would make any sense out of the fact that things work/break depending on how the cache is configured, etc. We both need to dig some more to try and figure that out!
In the mean time I think the register dumps related to the interconnect are a good place to start. Once we can get rid of the timeouts at the Interconnect then we can worry about the Cortex A8.
Brad, I'm back from vacation and back to work.
I did a register dump after disabling the L3 timeouts for the MPU, GPMC, and the RT (this one should disable all of the timeouts). The timeouts are getting disabled right after clocks are being setup. I get the following register values when I get a GPMC error. I have two sets below. One is for a GPMC write error and the other is for a GPMC read error. From what I have read, this looks like and error with an MPU transaction. The MPU L3 Error Log register points to an initiator ID of 0x19 which is the MPU subsystem. I'll start digging through the ARM documentation to see what I can find.
IO Write Access Post Err @[0x18082042] GPMC ERR Addr [0x0c9e8b80] GPMC ERR Type [0x00000111] GPMC Status [0x00000001] GPMC L3 Ctrl [0x0000000003000000] GPMC L3 Status [0x0000000000000000] GPMC L3 Error [0x0000000000000000] GPMC L3 ErrAddr[0x0000000000000000] MPU L3 Ctrl [0x000000003e000000] MPU L3 Status [0x0000000010000010] MPU L3 Error [0x0000000082001901] MPU L3 ErrAddr [0x0000000047f20ac0] SI Flg Sts0 [0x0000000000000004] SI Flg Sts1 [0x0000000000000000] IO Read Access Post Err @[0x18082028] GPMC ERR Addr [0x3f7ab580] GPMC ERR Type [0x00000111] GPMC Status [0x00000001] GPMC L3 Ctrl [0x0000000003000000] GPMC L3 Status [0x0000000000000000] GPMC L3 Error [0x0000000000000000] GPMC L3 ErrAddr[0x0000000000000000] MPU L3 Ctrl [0x000000003e000000] MPU L3 Status [0x0000000010000010] MPU L3 Error [0x0000000084001900] MPU L3 ErrAddr [0x0000000000000000] SI Flg Sts0 [0x0000000000000004] SI Flg Sts1 [0x0000000000000000]
Here are some notes I'm taking related to your first register dump.
Looking at the GPMC related registers I see the following:
Clayton Shotwell GPMC ERR Type [0x00000111]
I see the following errors: ILLEGALMCMD and ERRORNOTSUPPADD, i.e. no timeout.
Looking at the L3 interconnect related registers I see the following:
Clayton Shotwell SI Flg Sts0 [0x0000000000000004]
According to Table 5-29 "L3_SI_FLAG_STATUS_0 for Application Error" this is considered a "Functional Inband Error".
Clayton Shotwell MPU L3 Status [0x0000000010000010]
I see INBAND_ERROR_PRIMARY and REQ_ACTIVE are set.
Clayton Shotwell MPU L3 Error [0x0000000082001901]
This decodes as:
MULTI=1 (There are other errors in addition to this one.)
SECONDARY=0
CODE=2 (Address hole according to Table 5-26 "CODE Field Definition")
INITID=0x19
CMD=1
Clayton Shotwell MPU L3 ErrAddr [0x0000000047f20ac0]
This looks like a Reserved/invalid address. This doesn't even map to the GPMC which I expect is related to why we have MULTI=1 (i.e. there was probably another error that did actually get to the GPMC).