Possible TM4C123 hardware bug

Andrew G

Other Parts Discussed in Thread: TM4C123GH6PM, EK-TM4C123GXL, TM4C1294NCPDT, EK-TM4C1294XL

---------------------------------------------

Summary of this long thread as of 2/4/2015:

There is a serious bug in Tiva TM4C123 microprocessors that may lead to undefined behavior (such as stack corruption) under the following conditions:

- A flash WRITE / ERASE operations is performed

- The code performing the flash operations is itself running from flash (i.e. not copied to SRAM)

- The master clock is faster than 40MHz

The problem is not currently mentioned in the datasheet or the errata. The suggested workaround is to lower the clock frequency to 40MHz during flash operations, or to copy code performing flash operations to SRAM.

----------------------------------------------------------------

Original post:

I think I may have stumbled across a hardware bug in Tiva's flash implementation. I checked the errata and did not see this issue mentioned there.

It appears that when a flash erase operation is in progress, fetching and execution of instructions from flash may be broken. The expected behavior is for instruction fetching to block while the erase is in progress, and then to resume normally. Of course I am talking about erasing pages which do not contain executable code.

The manifestation of this bug is that code randomly fails due to stack corruption. Timing is important -- inserting a single NOP often causes the bug to go away. Running the code in a debugger may also produce a different outcome.

The code in question looks like this:

bool Flash::Busy() const {
return FLASH_FMC_R &
(FLASH_FMC_WRITE | FLASH_FMC_ERASE | FLASH_FMC_MERASE);
}

void Flash::StartErasePage(uint32_t address) {
CHECK(((address & 1023) == 0) && (address < end_address()));
while (Busy()) {}

FLASH_FMA_R = address;

FLASH_FMC_R = (FLASH_FMC_WRKEY | FLASH_FMC_ERASE);

num_erased_pages_++;

asm("nop");
}

void Flash::ErasePage(uint32_t address) {
StartErasePage(address);
while (Busy()) {}
}

Invoking ErasePage(64 * 1024) causes stack corruption in my test. This is apparently caused by an instruction, "mov sp, r7" not working correctly. That is, if I break before this mov instruction, and then step over it, the debugger shows the value of sp as unchanged (and not equal to r7). My guess is that the instruction is not executed, or executed incorrectly, due to the pending flash erase operation. Replacing

FLASH_FMC_R = (FLASH_FMC_WRKEY | FLASH_FMC_ERASE);

with

FLASH_FMD_R = (FLASH_FMC_WRKEY | FLASH_FMC_ERASE);

results in an identical binary, except that the erase operation is replaced by a nop. In this case, the "mov sp, r7" instruction works as expected.

-------------------------------------------

A bit more details:

The function StartErasePage() is compiled by gcc as follows:

00000300 <_ZN5Flash14StartErasePageEm>:

void Flash::StartErasePage(uint32_t address) {
300: b580 push {r7, lr}
302: b082 sub sp, #8
304: af00 add r7, sp, #0
306: 6078 str r0, [r7, #4]
308: 6039 str r1, [r7, #0]
CHECK(((address & 1023) == 0) && (address < end_address()));
30a: 683b ldr r3, [r7, #0]
30c: ea4f 5383 mov.w r3, r3, lsl #22
310: ea4f 5393 mov.w r3, r3, lsr #22
314: 2b00 cmp r3, #0
316: d105 bne.n 324 <_ZN5Flash14StartErasePageEm+0x24>
318: f000 f898 bl 44c <_ZN5Flash11end_addressEv>
31c: 4602 mov r2, r0
31e: 683b ldr r3, [r7, #0]
320: 429a cmp r2, r3
322: d802 bhi.n 32a <_ZN5Flash14StartErasePageEm+0x2a>
324: f04f 0301 mov.w r3, #1
328: e001 b.n 32e <_ZN5Flash14StartErasePageEm+0x2e>
32a: f04f 0300 mov.w r3, #0
32e: 2b00 cmp r3, #0
330: d000 beq.n 334 <_ZN5Flash14StartErasePageEm+0x34>
332: be02 bkpt 0x0002
while (Busy()) {}
334: 6878 ldr r0, [r7, #4]
336: f7ff ffcd bl 2d4 <_ZNK5Flash4BusyEv>
33a: 4603 mov r3, r0
33c: 2b00 cmp r3, #0
33e: d1f9 bne.n 334 <_ZN5Flash14StartErasePageEm+0x34>

FLASH_FMA_R = address;
340: f44f 4350 mov.w r3, #53248 ; 0xd000
344: f2c4 030f movt r3, #16399 ; 0x400f
348: 683a ldr r2, [r7, #0]
34a: 601a str r2, [r3, #0]

FLASH_FMC_R = (FLASH_FMC_WRKEY | FLASH_FMC_ERASE);
34c: f24d 0308 movw r3, #53256 ; 0xd008
350: f2c4 030f movt r3, #16399 ; 0x400f
354: f04f 0202 mov.w r2, #2
358: f2ca 4242 movt r2, #42050 ; 0xa442
35c: 601a str r2, [r3, #0]

num_erased_pages_++;
35e: 687b ldr r3, [r7, #4]
360: 681b ldr r3, [r3, #0]
362: f103 0201 add.w r2, r3, #1
366: 687b ldr r3, [r7, #4]
368: 601a str r2, [r3, #0]

asm("nop");
36a: bf00 nop
}
36c: f107 0708 add.w r7, r7, #8
370: 46bd mov sp, r7
372: bd80 pop {r7, pc}

Using the launchpad, OpenOCD, and arm-none-eabi-gdb, it can be shown that the instruction at offset 0x370 has no effect:

(gdb) b *0x370
Breakpoint 1 at 0x370: file tiva_flash.cc, line 30.
(gdb) display/i $pc
1: x/i $pc
=> 0x0 <_stack_ptr>: ldrb r4, [r7, #31]
(gdb) c
Continuing.
Note: automatically using hardware breakpoints for read-only addresses.

Breakpoint 1, 0x00000370 in Flash::StartErasePage (this=0x387, address=536903588) at tiva_flash.cc:30
30 }
1: x/i $pc
=> 0x370 <Flash::StartErasePage(unsigned long)+112>: mov sp, r7
(gdb) info registers
r0 0x0 0
r1 0xc00 3072
r2 0x1 1
r3 0x20007fb8 536903608
r4 0x0 0
r5 0x0 0
r6 0x0 0
r7 0x20007f9c 536903580
r8 0x0 0
r9 0x0 0
r10 0x0 0
r11 0x0 0
r12 0x0 0
sp 0x20007f94 0x20007f94
lr 0x33b 827
pc 0x371 0x371 <Flash::StartErasePage(unsigned long)+112>
xpsr 0x0 0
(gdb) ni
Info : halted: PC: 0x00000372
0x00000372 30 }
1: x/i $pc
=> 0x372 <Flash::StartErasePage(unsigned long)+114>: pop {r7, pc}
(gdb) info registers
r0 0x0 0
r1 0xc00 3072
r2 0x1 1
r3 0x20007fb8 536903608
r4 0x0 0
r5 0x0 0
r6 0x0 0
r7 0x20007f9c 536903580
r8 0x0 0
r9 0x0 0
r10 0x0 0
r11 0x0 0
r12 0x0 0
sp 0x20007f94 0x20007f94
lr 0x33b 827
pc 0x373 0x373 <Flash::StartErasePage(unsigned long)+114>
xpsr 0x0 0

Note how after stepping through "mov sp, r7", the value of sp does not change. This of course results in stack corruption.

The above is not completely deterministic -- sometimes this code works ok in the debugger -- and as I said inserting a single NOP somewhere in the code may also change the above behavior. But clearly something fishy is going on here.

If this is a previously unknown issue, and a TI engineer is interested in investigating this further, I will be happy to share a couple of .elf files (with debugging turned on) that demonstrate this problem.

A workaround appears to be waiting for the erase operation to complete before making any modifications to the SP register.

------------------------------------------------------------------------------

- Crystal frequency: 16 mhz, but running at 80 mhz via PLL

- Silicone revision: latest (multiple Tiva launchpad boards bought directly from TI in the last couple of months)

- Tools: arm-none-eabi-gcc / gdb on Linux; OpenOCD.

over 9 years ago

0 Petrei over 9 years ago

Guru 26105 points

Hi,

Seems you use a stripped down version of the original, TI provided erase flash block function, so in this case, anything could happen...

You may write a small test program using the original function and see the behaviour.

Petrei

0 QJ Wang over 9 years ago in reply to Petrei

TI__Guru**** 186196 points

Hi Andrew,

I tried in TI compiler (5.1.6) in CCS6.0 on TM4C123 LaunchPad, and didn't see the issue you mentioned. I will try the compiler you mentioned tomorrow.

Regards,

0 Andrew G over 9 years ago in reply to Petrei

Intellectual 900 points

Thank you for investigating this QJ. The bug is sensitive to code layout and possibly alignment, so I am not surprised that you haven't been able to reproduce it right away.

I have a bare bones test case, whose C++ code is below:

----------------------------------------------------------------

class Trap {

public:

void Nop() {}

void NeverReturn() { for (;;); }

};

class FlashTest {

public:

bool Busy() {

return FLASH_FMC_R & FLASH_FMC_ERASE;

}

void ErasePage(uint32_t address) {

while (Busy()) {}

Trap trap;

trap.Nop();

FLASH_FMA_R = address;

FLASH_FMC_R = (FLASH_FMC_WRKEY | FLASH_FMC_ERASE);

trap.NeverReturn(); // this line has no effect

}

};

void SetupClock() {

// Run at 80 mhz using internal oscillator

// Use RCC2 instead of RCC

SYSCTL_RCC2_R |= SYSCTL_RCC2_USERCC2;

// Disable PLL

SYSCTL_RCC2_R |= SYSCTL_RCC2_BYPASS2;

SYSCTL_RCC2_R |= SYSCTL_RCC2_OSCSRC2_IO;

// Power on the 400 Mhz PLL

SYSCTL_RCC2_R &= ~SYSCTL_RCC2_PWRDN2;

SYSCTL_RCC2_R |= SYSCTL_RCC2_DIV400;

// Set up system divider (2 * SYSDIV + 1 + LSB)

SYSCTL_RCC2_R = (SYSCTL_RCC2_R &

~(SYSCTL_RCC2_SYSDIV2_M | SYSCTL_RCC2_SYSDIV2LSB)) |

(2 << 23) | (0 << 22);

// Wait for the PLL to lock by polling PLLLRIS

while ((SYSCTL_RIS_R & SYSCTL_RIS_PLLLRIS) == 0) {}

// Enable PLL

SYSCTL_RCC2_R &= ~SYSCTL_RCC2_BYPASS2;

}

int main(void) {

SetupClock();

FlashTest flash;

flash.ErasePage (73 * 1024); // should never return

// Blink PF3 - we should never get here

SYSCTL_RCGCGPIO_R |= SYSCTL_RCGCGPIO_R5;

asm("nop; nop; nop"); // 3 cycle delay

GPIO_PORTF_DIR_R |= (1 << 3);

GPIO_PORTF_DEN_R |= (1 << 3);

for (;;) {

for (int i = 0; i < 1000000; i++) {} // delay

GPIO_PORTF_DATA_R ^= (1 << 3); // toggle LED

}

------------------------------------------------------------------------------

Basically, this starts an erase operation and then enters an infinite loop via Trap.NeverReturn(). However, if NeverReturn() is not executed, or executed incorrectly, the code that follows NeverReturn() starts to blink an LED.

The code is compiled with gcc down to a 1232 byte binary, including the interrupt table and startup code. If you flash this code to a launchpad, you will see that an LED does in fact blink -- the infinite for loop in NeverReturn() is never executed!

Now, making almost any change to the above code makes the bug go away. For example, if I remove trap.Nop(), which does nothing, the LED does not blink. Inserting an asm("nop") in many parts of the code has a similar effect.

I am attaching a few files below:

- A .bin file for flashing to the launchpad
- A commented disassembly of the .bin file
- Another set of files (with .working suffix) where FLASH_FMC_R is replaced with FLASH_FMD_R -- this results in an almost identical binary where the flash erase operation never takes place. This binary works as expected.

1070.flashbug2.zip

0 QJ Wang over 9 years ago in reply to Andrew G

TI__Guru**** 186196 points

Hi Andrew,

I modified my CCS (TI IDE) project to erase/program the flash by reading/writing the registers directly instead of using TivaWare. I complied the project using TI compiler while the optimization is off, the flash (from 10 to 128 page) was erased correctly (50 times) and 36 bytes data was written to 64th page without any issue, and led blinked after flash erase/program. The enclosed is code I used for testing, which is same as yours. Please check your code optimization level.

Regards,

#include <stdint.h>
#include "inc/hw_types.h"

#define SYSCTL_RCC2_R (*((volatile uint32_t *)0x400FE070))
#define SYSCTL_RCC2_USERCC2 0x80000000 // Use RCC2
#define SYSCTL_RCC2_DIV400 0x40000000 // Divide PLL as 400 MHz vs. 200
// MHz
#define SYSCTL_RCC2_SYSDIV2_M 0x1F800000 // System Clock Divisor 2
#define SYSCTL_RCC2_SYSDIV2LSB 0x00400000 // Additional LSB for SYSDIV2
#define SYSCTL_RCC2_PWRDN2 0x00002000 // Power-Down PLL 2
#define SYSCTL_RCC2_BYPASS2 0x00000800 // PLL Bypass 2
#define SYSCTL_RCC2_OSCSRC2_IO 0x00000010 // PIOSC

#define SYSCTL_RIS_R (*((volatile uint32_t *)0x400FE050))
#define SYSCTL_RIS_PLLLRIS 0x00000040 // PLL Lock Raw Interrupt Status

#define FLASH_FMA_R (*((volatile uint32_t *)0x400FD000))
#define FLASH_FMD_R (*((volatile uint32_t *)0x400FD004))
#define FLASH_FMC_R (*((volatile uint32_t *)0x400FD008))
#define FLASH_FCRIS_R (*((volatile uint32_t *)0x400FD00C))
#define FLASH_FCIM_R (*((volatile uint32_t *)0x400FD010))
#define FLASH_FCMISC_R (*((volatile uint32_t *)0x400FD014))
#define FLASH_FWBVAL_R (*((volatile uint32_t *)0x400FD030))
#define FLASH_FWBN_R (*((volatile uint32_t *)0x400FD100))
#define FLASH_FMC2_R (*((volatile uint32_t *)0x400FD020))
#define FLASH_FWBN 0x400FD100 // Flash Write Buffer n

#define FLASH_FMC_WRKEY 0xA4420000 // FLASH write key
#define FLASH_FMC_COMT 0x00000008 // Commit Register Value
#define FLASH_FMC_MERASE 0x00000004 // Mass Erase Flash Memory
#define FLASH_FMC_ERASE 0x00000002 // Erase a Page of Flash Memory
#define FLASH_FMC_WRITE 0x00000001 // Write a Word into Flash Memory
#define FLASH_FMC2_WRBUF 0x00000001 // Buffered Flash Memory Write
#define FLASH_FMC2_WRKEY 0xA4420000 // FLASH write key

#define SYSCTL_RCGCGPIO_R (*((volatile uint32_t *)0x400FE608))
#define GPIO_PORTF_DATA_R (*((volatile uint32_t *)0x400253FC))
#define GPIO_PORTF_DIR_R (*((volatile uint32_t *)0x40025400))
#define GPIO_PORTF_DEN_R (*((volatile uint32_t *)0x4002551C))
#define SYSCTL_RCGCGPIO_R5 0x00000020 // GPIO Port F Run Mode Clock

void SetupClock() {
// Run at 80 mhz using internal oscillator
// Use RCC2 instead of RCC
SYSCTL_RCC2_R |= SYSCTL_RCC2_USERCC2;
// Disable PLL
SYSCTL_RCC2_R |= SYSCTL_RCC2_BYPASS2;
SYSCTL_RCC2_R |= SYSCTL_RCC2_OSCSRC2_IO;
// Power on the 400 Mhz PLL
SYSCTL_RCC2_R &= ~SYSCTL_RCC2_PWRDN2;
SYSCTL_RCC2_R |= SYSCTL_RCC2_DIV400;

// Set up system divider (2 * SYSDIV + 1 + LSB)
SYSCTL_RCC2_R = (SYSCTL_RCC2_R &
~(SYSCTL_RCC2_SYSDIV2_M | SYSCTL_RCC2_SYSDIV2LSB)) |
(2 << 23) | (0 << 22);

// Wait for the PLL to lock by polling PLLLRIS
while ((SYSCTL_RIS_R & SYSCTL_RIS_PLLLRIS) == 0) {}

// Enable PLL
SYSCTL_RCC2_R &= ~SYSCTL_RCC2_BYPASS2;
}

void Erase_FlashPage(uint32_t address) {
// while(FLASH_FMC_R & FLASH_FMC_ERASE) {
// }

FLASH_FMA_R = address;
FLASH_FMC_R = (FLASH_FMC_WRKEY | FLASH_FMC_ERASE);

while(FLASH_FMC_R & FLASH_FMC_ERASE) {
}
}

void Program_Flash(uint32_t *pui32Data, uint32_t ui32Address, uint32_t ui32Count) {

while(ui32Count)
{
//
// Set the address of this block of words.
//
FLASH_FMA_R = ui32Address & ~(0x7f);

//
// Loop over the words in this 32-word block.
//
while(((ui32Address & 0x7c) || (FLASH_FWBVAL_R == 0)) && (ui32Count != 0))
{
//
// Write this word into the write buffer.
//
HWREG(FLASH_FWBN + (ui32Address & 0x7c)) = *pui32Data++;
ui32Address += 4;
ui32Count -= 4;
}

//
// Program the contents of the write buffer into flash.
//
FLASH_FMC2_R = FLASH_FMC2_WRKEY | FLASH_FMC2_WRBUF;

//
// Wait until the write buffer has been programmed.
//
while(FLASH_FMC2_R & FLASH_FMC2_WRBUF) {
}
}
}

void main(void)
{
uint32_t pui32Data[9];
uint32_t i, j, ledVal;

pui32Data[0] = 0x12345678;
pui32Data[1] = 0x56789abc;
pui32Data[2] = 0xA0A1A2A3;
pui32Data[3] = 0xA4A5A6A7;
pui32Data[4] = 0xA8A9B0B1;
pui32Data[5] = 0xB2B3B4B5;
pui32Data[6] = 0xB6B7B8B9;
pui32Data[7] = 0xC0C1C2C3;
pui32Data[8] = 0xC4C5C6C7;

//
// Set the clocking to run directly from the crystal.
//
SetupClock();

//
// Enable the GPIO portf.
//
SYSCTL_RCGCGPIO_R |= SYSCTL_RCGCGPIO_R5;

GPIO_PORTF_DIR_R |= (1 << 3);
GPIO_PORTF_DEN_R |= (1 << 3);

for (j=0; j<50; j++)
{
for (i = 10; i < 128; i++)
{
Erase_FlashPage(i*1024);
}
}

Program_Flash(pui32Data, 0x10000, sizeof(pui32Data));

//
// Loop forever: toggle LED.
//
while(1)
{
for(i = 0; i < 400000; i++)
{
}
ledVal = GPIO_PORTF_DATA_R;
GPIO_PORTF_DATA_R = ~ledVal; // toggle LED
}

}

0 Andrew G over 9 years ago in reply to QJ Wang

Intellectual 900 points

QJ -- thank you very much for doing this testing. I think what you have shown is that in some circumstances flash operations work correctly, which I don't doubt.

What I am trying to figure out is why _my_ binary misbehaves. Again, to summarize, I do the following three things sequentially in the test that I previously uploaded:

Erase a flash page
Enter an infinite loop
Blink an led

My test binary, a total of 250 instructions, ends up blinking an led when uploaded to a launchpad. Somehow the infinite loop (step 2 above) is not executed. There can only be three explanations for this:

There is a bug in my code (don't think so -- the whole code is a handful of simple functions)
There is a bug in the compiler (don't think so either -- gone through the 250 instructions one by one and they all seem to make sense)
There is a bug in the microcontroller. My hunch is that under some unknown circumstances, the CPU skips an instruction.

If it is indeed a hardware bug, it is a pretty scary one, as any app that does flash operations has the potential to malfunction. I hope a person familiar with Tiva's internals can take a look at the binary that I uploaded and try to figure out why it doesn't work as expected.

Please do not try to recreate the test case by recompiling C++ code, instead treat the binary as a 250-line assembly program. The disassembled code is commented and should only take a few minutes to go through.

Also, to be clear, I am not trying to "fix" my program. I am trying to characterize what appears to be a previously unknown bug. Hence suggestions to try different optimization levels, use different compilers, or rely on TI's libraries miss the point of why I am posting this.

Thanks again.

0 Andrew G over 9 years ago in reply to Andrew G

Intellectual 900 points

I have a suspicion that this may be related to the following published design exception (http://www.ti.com/lit/er/spmz849d/spmz849d.pdf):

SYSCTL#04 Device May not Wake Correctly From Sleep Mode Under Certain Circumstances
Revision(s) Affected: 6 and 7.
Description: With a certain configuration, the device may not wake correctly from Sleep mode
because invalid data may be fetched from the prefetch buffer. The configuration that
causes this issue is as follows:
• The system clock must be at least 40 MHz
• Interrupts must be disabled

In my test case, the clock frequency is above 40 mhz and interrupts are disabled. In case of a flash erase operation, fetching of instructions is suspended, which sounds similar to sleep mode. Invalid data in the prefetch buffer would explain the infinite loop in my binary not executing.

If this is true, any flash operation can cause undefined behavior (not just erases). From the Tiva manual:

During a Flash memory operation (write, page erase, or mass erase) access to the Flash memory
is inhibited. As a result, instruction and literal fetches are held off until the Flash memory operation
is complete.

The workaround for the known bug is to insert a sacrificial instruction, such as "mov r0, #0" after wakeup. Perhaps this is also what needs to be done after starting any flash operation.

0 Chester Gillon over 9 years ago in reply to Andrew G

Guru 92251 points

Andrew G said:
The code is compiled with gcc down to a 1232 byte binary, including the interrupt table and startup code. If you flash this code to a launchpad, you will see that an LED does in fact blink -- the infinite for loop in NeverReturn() is never executed!

Just to say I can repeat the failure, flashing flash_bug.bin into a Launchpad with a TM4C123GH6PM. The Device Identification Registers are:

SYSCTL_DID0 0x18050101 Device Identification 0 [Memory Mapped]
SYSCTL_DID0_VER 001 DID0 Version
SYSCTL_DID0_CLASS 00000101 Device Class
SYSCTL_DID0_MAJ 00000001 Major Revision
SYSCTL_DID0_MIN 00000001 - Minor Revision
SYSCTL_DID1 0x10A1606C Device Identification 1 [Memory Mapped]
SYSCTL_DID1_VER 0001 DID1 Version
SYSCTL_DID1_FAM 0000 Family
SYSCTL_DID1_PRTNO 10100001 Part Number
SYSCTL_DID1_PINCNT 011 Package Pin Count
SYSCTL_DID1_TEMP 011 Temperature Range
SYSCTL_DID1_PKG 01 Package Type
SYSCTL_DID1_ROHS 1 RoHS-Compliance
SYSCTL_DID1_QUAL 00 - Qualification Status

I agree that it appears to be a timing bug since when run in the debugger:

- If set a breakpoint at 0x28c which is the branch to trap.NeverReturn() then can single step from that point and code stays in trap.NeverReturn() and the LED doesn't blink.

- If set a breakpoint at 0x290, which is after the branch to trap.NeverReturn(), then the failure occurs.

The TM4C123GH6PM is fitted with a Embedded Trace Macrocell (ETM) for instruction trace capture, so it would useful if someone with a trace capable emulator could capture the instruction trace when the program fails to try and see if the trace identifies which instruction has been skipped.

[I don't have a trace capable emulator]

0 Andrew G over 9 years ago in reply to Chester Gillon

Intellectual 900 points

I have verified that the bug also happens when writing to flash, not just erasing.

I find it a little surprising that there has been almost no attention to this thread from TI in more than two weeks. By all looks of it, the microcontroller can quietly skip CPU instructions, leading to a crash or incorrect program execution. Am I the only one who thinks that this is kind of a big deal?

If someone who works at TI is reading this and can forward a link to this thread to Tiva's core team (and especially people responsible for the flash subsystem), I would really appreciate that.

0 Andrew G over 9 years ago in reply to Andrew G

Intellectual 900 points

Here is an even simpler case: a 472 byte binary, 166 instructions. This one simply writes a single word to flash, which causes the microcontroller to ignore an infinite loop and proceed to turning on an LED.

6886.flashbug3.zip

0 Chester Gillon over 9 years ago in reply to Andrew G

Guru 92251 points

Andrew G said:
Here is an even simpler case: a 472 byte binary, 166 instructions. This one simply writes a single word to flash, which causes the microcontroller to ignore an infinite loop and proceed to turning on an LED.

Confirm that can repeat the failure with the simpler case.

[When I attach the debugger after failure the Program Counter has an invalid value of 0x6078AF00 - but that occurs when the ResetHandler() attempts to pop the stack to return to a non-existent caller]

So far, with these examples and no ETM trace capable emulator I haven't been able to work out exactly which instruction is skipped / failed to execute correctly. I will try following the flash write / erase operation with a series of instructions which attempt to set register to specific values - to try and track down which instructions fail to get executed.

0 Chester Gillon over 9 years ago in reply to Andrew G

Guru 92251 points

Andrew G said:
3. There is a bug in the microcontroller. My hunch is that under some unknown circumstances, the CPU skips an instruction.

To try and investigate that the attached project 1055.TIVA_flash_write_bug.zip was created in CCS 6 using the TI ARM compiler, targeting a EK-TM4C123GXL. The code is:

/*
 * main.c
 */

#include <stdbool.h>
#include <stdint.h>

#include <inc/tm4c123gh6pm.h>

#define CLOCK_FREQ_80MHz_RCC2_DIVISORS     (2 << SYSCTL_RCC2_SYSDIV2_S)
#define CLOCK_FREQ_66_67MHz_RCC2_DIVISORS ((2 << SYSCTL_RCC2_SYSDIV2_S) | SYSCTL_RCC2_SYSDIV2LSB)
#define CLOCK_FREQ_50MHz_RCC2_DIVISORS    ((3 << SYSCTL_RCC2_SYSDIV2_S) | SYSCTL_RCC2_SYSDIV2LSB)
#define CLOCK_FREQ_44_44MHz_RCC2_DIVISORS  (4 << SYSCTL_RCC2_SYSDIV2_S)
#define CLOCK_FREQ_40MHz_RCC2_DIVISORS    ((4 << SYSCTL_RCC2_SYSDIV2_S) | SYSCTL_RCC2_SYSDIV2LSB)

#define RCC2_DIVISORS CLOCK_FREQ_80MHz_RCC2_DIVISORS

bool busy (void)
{
	return FLASH_FMC_R & FLASH_FMC_ERASE;
}

void erase_page (uint32_t address)
{
	while (busy())
	{

	}

	asm ("	mov r5, #5");
	asm ("	mov r6, #6");
	asm ("	mov r7, #7");
	asm ("	mov r8, #8");
	asm ("	mov r9, #9");
	asm ("	mov r10, #10");
	asm ("	mov r11, #11");
	asm ("	mov r12, sp");

    FLASH_FMA_R = address;
    FLASH_FMC_R = (FLASH_FMC_WRKEY | FLASH_FMC_ERASE);

    /* sacrificial instruction after flash erase */
    asm (" mov r0, #0");

    asm ("	sub sp, #4");
    asm ("	sub sp, #8");
    asm ("	sub sp, #16");
    asm ("	sub sp, #32");
    asm ("	sub sp, #64");
    asm ("	sub sp, #128");
    asm ("	sub sp, #256");
    asm ("	sub sp, #512");

    for (;;)
    {

    }
}

void SetupClock() {
  // Run using internal oscillator
  // Use RCC2 instead of RCC
  SYSCTL_RCC2_R |= SYSCTL_RCC2_USERCC2;
  // Disable PLL
  SYSCTL_RCC2_R |= SYSCTL_RCC2_BYPASS2;
  SYSCTL_RCC2_R |= SYSCTL_RCC2_OSCSRC2_IO;
  // Power on the 400 Mhz PLL
  SYSCTL_RCC2_R &= ~SYSCTL_RCC2_PWRDN2;
  SYSCTL_RCC2_R |= SYSCTL_RCC2_DIV400;

  // Set up system divider (2 * SYSDIV + 1 + LSB)
  SYSCTL_RCC2_R = (SYSCTL_RCC2_R &
                   ~(SYSCTL_RCC2_SYSDIV2_M | SYSCTL_RCC2_SYSDIV2LSB)) |
                		   RCC2_DIVISORS;

  // Wait for the PLL to lock by polling PLLLRIS
  while ((SYSCTL_RIS_R & SYSCTL_RIS_PLLLRIS) == 0) {}

  // Enable PLL
  SYSCTL_RCC2_R &= ~SYSCTL_RCC2_BYPASS2;
}

int main(void)
{
	SetupClock ();
	erase_page (73 * 1024);
	
	return 0;
}

The erase_page() function is set to:

- Save the current stack pointer in R12

- Erase a flash page

- A series of instructions which decrements that stack pointer by different number of bytes

- Spin in an infinite loop

The ideas was that after running the program, halt in the debugger. If the program operates as expected the program counter will be in the infinite loop at the end of the erase_page() function, with the stack pointer 0x3FC bytes less that the R12 value. If an instruction was skipped following the flash erase, it was thought that value of the stack pointer would indicate which instruction(s) had been skipped.

However, when running the program found that:

a) When single stepping the erase_page() function it operated as expected regardless of if the CPU frequency had been set to 40MHz, 44.44MHz, 50MHz, 66.67MHz or 80MHz

b) If the CPU frequency had been set to 40MHz or 44.44MHz then free running the program the program worked as expected.

c) If the CPU frequency had been set to 50MHz, 66.67MHz or 80MHz then free running the program failed. When halted in the debugger the the CPU had stopped due to a hard fault due to a Undefined Instruction Usage Fault - and the first 1K page of flash (address 0x0 to 0x3ff which contains the interrupt vectors and part of the program) had been erased.

Therefore, the problem doesn't appear to a simple skipping of an instruction, but rather an incorrect instruction gets executed.

The fact that the fault only occurs with CPU frequency speeds above a certain value means the cause may be the flash prefetch buffer, since the data sheet states the flash prefetch buffer is only enabled above a certain CPU frequency:

The Flash memory controller has a prefetch buffer that is automatically used when the CPU frequency
is greater than 40 MHz.

Not sure why the problem doesn't occur with a CPU frequency of 44.44MHz though.

Andrew G said:
The workaround for the known bug is to insert a sacrificial instruction, such as "mov r0, #0" after wakeup. Perhaps this is also what needs to be done after starting any flash operation.

The above test case does contain a sacrificial "mov r0, #0" immediately after the flash erase, but still fails. i.e. the sacrificial instruction doesn't avoid this problem.

0 Amit Ashara over 9 years ago in reply to Chester Gillon

TI__Guru**** 244380 points

Hi Chester and Andrew,

We looked at the code and the execution and the forum and it seems that the code execution is being done from the Flash while the Flash Operation is being done. As per the data sheet

However the code expects to do some more fetches before being held off which "may" be the cause of the incorrect code fetch. It has always been suggested to run Flash programming from SRAM to avoid such issues.

Regards

Amit

0 Chester Gillon over 9 years ago in reply to Amit Ashara

Guru 92251 points

Amit Ashara said:
However the code expects to do some more fetches before being held off which "may" be the cause of the incorrect code fetch.

I have changed the example to insert a wait for the flash erase to complete, but inserted a number of instructions between starting the flash erase and calling a function which waits for the flash erase to complete. The updated project is attached 0257.TIVA_flash_write_bug.zip

The main function is now:

/*
 * main.c
 */

#include <stdbool.h>
#include <stdint.h>

#include <inc/tm4c123gh6pm.h>

#define CLOCK_FREQ_80MHz_RCC2_DIVISORS     (2 << SYSCTL_RCC2_SYSDIV2_S)
#define CLOCK_FREQ_66_67MHz_RCC2_DIVISORS ((2 << SYSCTL_RCC2_SYSDIV2_S) | SYSCTL_RCC2_SYSDIV2LSB)
#define CLOCK_FREQ_50MHz_RCC2_DIVISORS    ((3 << SYSCTL_RCC2_SYSDIV2_S) | SYSCTL_RCC2_SYSDIV2LSB)
#define CLOCK_FREQ_44_44MHz_RCC2_DIVISORS  (4 << SYSCTL_RCC2_SYSDIV2_S)
#define CLOCK_FREQ_40MHz_RCC2_DIVISORS    ((4 << SYSCTL_RCC2_SYSDIV2_S) | SYSCTL_RCC2_SYSDIV2LSB)

#define RCC2_DIVISORS CLOCK_FREQ_80MHz_RCC2_DIVISORS

void prvGetRegistersFromStack( uint32_t *pulFaultStackAddress )
{
/* These are volatile to try and prevent the compiler/linker optimising them
away as the variables never actually get used.  If the debugger won't show the
values of the variables, make them global my moving their declaration outside
of this function. */
volatile uint32_t r0;
volatile uint32_t r1;
volatile uint32_t r2;
volatile uint32_t r3;
volatile uint32_t r12;
volatile uint32_t lr; /* Link register. */
volatile uint32_t pc; /* Program counter. */
volatile uint32_t psr;/* Program status register. */

    r0 = pulFaultStackAddress[ 0 ];
    r1 = pulFaultStackAddress[ 1 ];
    r2 = pulFaultStackAddress[ 2 ];
    r3 = pulFaultStackAddress[ 3 ];

    r12 = pulFaultStackAddress[ 4 ];
    lr = pulFaultStackAddress[ 5 ];
    pc = pulFaultStackAddress[ 6 ];
    psr = pulFaultStackAddress[ 7 ];

    /* When the following line is hit, the variables contain the register values. */
    for( ;; );
}

void HardFault_Handler(void)
{
    __asm volatile
    (
        " tst lr, #4                                                \n"
        " ite EQ                                                    \n"
        " mrseq r0, msp                                             \n"
        " mrsne r0, psp                                             \n"
        " ldr r1, [r0, #24]                                         \n"
        " ldr r2, handler2_address_const                            \n"
        " bx r2                                                     \n"
        "handler2_address_const: .word prvGetRegistersFromStack    \n"
    );
}

void await_not_busy (void)
{
	while (FLASH_FMC_R & FLASH_FMC_ERASE)
	{

	}
}

void erase_page (uint32_t address)
{
	await_not_busy ();

	asm ("	mov r5, #5");
	asm ("	mov r6, #6");
	asm ("	mov r7, #7");
	asm ("	mov r8, #8");
	asm ("	mov r9, #9");
	asm ("	mov r10, sp");
	asm ("	mov r11, sp");
	asm ("	mov r12, sp");

    FLASH_FMA_R = address;
    FLASH_FMC_R = (FLASH_FMC_WRKEY | FLASH_FMC_ERASE);

    asm ("	sub r11, #4");
    asm ("	sub r11, #8");
    asm ("	sub r11, #16");
    asm ("	sub r11, #32");
    asm ("	sub r11, #64");
    asm ("	sub r11, #128");
	await_not_busy ();
    asm ("	sub r11, #256");
    asm ("	sub r11, #512");

    for (;;)
    {

    }
}

void SetupClock() {
  // Run using internal oscillator
  // Use RCC2 instead of RCC
  SYSCTL_RCC2_R |= SYSCTL_RCC2_USERCC2;
  // Disable PLL
  SYSCTL_RCC2_R |= SYSCTL_RCC2_BYPASS2;
  SYSCTL_RCC2_R |= SYSCTL_RCC2_OSCSRC2_IO;
  // Power on the 400 Mhz PLL
  SYSCTL_RCC2_R &= ~SYSCTL_RCC2_PWRDN2;
  SYSCTL_RCC2_R |= SYSCTL_RCC2_DIV400;

  // Set up system divider (2 * SYSDIV + 1 + LSB)
  SYSCTL_RCC2_R = (SYSCTL_RCC2_R &
                   ~(SYSCTL_RCC2_SYSDIV2_M | SYSCTL_RCC2_SYSDIV2LSB)) |
                		   RCC2_DIVISORS;

  // Wait for the PLL to lock by polling PLLLRIS
  while ((SYSCTL_RIS_R & SYSCTL_RIS_PLLLRIS) == 0) {}

  // Enable PLL
  SYSCTL_RCC2_R &= ~SYSCTL_RCC2_BYPASS2;
}

int main(void)
{
	SetupClock ();
	erase_page (73 * 1024);
	
	return 0;
}

When free-run this program fails with a hard fault due to a "Undefined Instruction Usage Fault". The disassembly of the erase_page function is:

          erase_page():
000002c8:   B508     PUSH            {R3, LR}
000002ca:   9000     STR             R0, [SP]
000002cc:   F7FFFFF7 BL              await_not_busy
000002d0:   F04F0505 MOV.W           R5, #5
000002d4:   F04F0606 MOV.W           R6, #6
000002d8:   F04F0707 MOV.W           R7, #7
000002dc:   F04F0808 MOV.W           R8, #8
000002e0:   F04F0909 MOV.W           R9, #9
000002e4:   46EA     MOV             R10, R13
000002e6:   46EB     MOV             R11, R13
000002e8:   46EC     MOV             R12, R13
000002ea:   4927     LDR             R1, $C$CON2
000002ec:   9800     LDR             R0, [SP]
000002ee:   6008     STR             R0, [R1]
000002f0:   4826     LDR             R0, $C$CON3
000002f2:   4924     LDR             R1, $C$CON1
000002f4:   6008     STR             R0, [R1]
000002f6:   F1AB0B04 SUB.W           R11, R11, #4
000002fa:   F1AB0B08 SUB.W           R11, R11, #8
000002fe:   F1AB0B10 SUB.W           R11, R11, #16
00000302:   F1AB0B20 SUB.W           R11, R11, #32
00000306:   F1AB0B40 SUB.W           R11, R11, #64
0000030a:   F1AB0B80 SUB.W           R11, R11, #128
0000030e:   F7FFFFD6 BL              await_not_busy
00000312:   F5AB7B80 SUB.W           R11, R11, #256
00000316:   F5AB7B00 SUB.W           R11, R11, #512
          $C$L3:
0000031a:   E7FE     B               $C$L3

When the hard fault occurs:

- The Program Counter was 0x30E, which is the call to await_not_busy() after the flash erase was started.

- R11 is 0x20003EF4 == R12 (0x20003FF0) - 4 - 8 - 16 - 32 - 64 - 128 which means the 6 SUB.W R11,R11 instructions between the flash erase starting and the call to await_not_busy() were executed correctly.

Therefore, in this case six 32-bit instructions were successfully executed after the flash erase starting which is a indication of how many bytes / instructions can be prefetched.

The program can be made to execute correctly by either:

a) Reducing the CPU frequency from 80MHz to 40MHz

b) Moving the call to await_not_busy() one instruction closer to the flash erase being started, i.e.:

    FLASH_FMA_R = address;
    FLASH_FMC_R = (FLASH_FMC_WRKEY | FLASH_FMC_ERASE);

    asm ("	sub r11, #4");
    asm ("	sub r11, #8");
    asm ("	sub r11, #16");
    asm ("	sub r11, #32");
    asm ("	sub r11, #64");
	await_not_busy ();
    asm ("	sub r11, #128");
    asm ("	sub r11, #256");
    asm ("	sub r11, #512");

Amit Ashara said:
It has always been suggested to run Flash programming from SRAM to avoid such issues.

My reading of the datasheet was that flash programming from SRAM is only necessary if the CPU needs to continue executing instructions while the flash programming is in progress. E.g. looking at the TivaWare flash.c the flash erase / programming code appear to all execute from flash, and block until the flash operation is complete.

0 Amit Ashara over 9 years ago in reply to Chester Gillon

TI__Guru**** 244380 points

Hello Chester,

The setting of the Flash Operation in the FMC register is not the true start of the Flash operation. There is a delay internally after the FMC is set to the actual Flash Operation. Please note that there is a branch instruction execution at the point of the actual Flash Operation, which requires the Flash Prefetch buffer to be flushed. The CPU does not know that Flash Prefetch Buffer is flushed w.r.t its execution of instruction. Hence the data sheet was written to clarify that to have a consistent result, the code execution must move to SRAM as it is difficult to sync the actual Flash operation to the code execution

Do note that the flush operation is only when the Prefetch Buffer is used so a lower frequency operation always succeed as the CPU interacts with the Flash Memory w/o the buffer.

Regards

Amit

0 Chester Gillon over 9 years ago in reply to Amit Ashara

Guru 92251 points

Amit Ashara said:
The setting of the Flash Operation in the FMC register is not the true start of the Flash operation. There is a delay internally after the FMC is set to the actual Flash Operation. Please note that there is a branch instruction execution at the point of the actual Flash Operation, which requires the Flash Prefetch buffer to be flushed. The CPU does not know that Flash Prefetch Buffer is flushed w.r.t its execution of instruction. Hence the data sheet was written to clarify that to have a consistent result, the code execution must move to SRAM as it is difficult to sync the actual Flash operation to the code execution

Thanks for the information.

Confirm that when my example program was changed to execute the erase_page and await_not_busy functions in SRAM rather than FLASH the program no longer failed.

However, when looking at the TivaWare 2.1.0.12573 qs-logger example for the EK-LM4F232, the TivaWare FlashProgram and FlashErase functions are executed in FLASH rather than in SRAM. Should the TivaWare examples be changed to match the data sheet by executing FlashProgram and FlashErase from SRAM rather than FLASH?

0 Andrew G over 9 years ago in reply to Amit Ashara

Intellectual 900 points

Amit -

We looked at the code and the execution and the forum and it seems that the code execution is being done from the Flash while the Flash Operation is being done.

Yes, that is exactly what I reported in my initial message a month ago. The expected behavior is that execution from flash blocks during a flash operation and then resumes normally. This is how all other microcontollers that I am familiar with function, and what is expected by most developers. For example, your own MSP430 series explicitly allows flash updates while executing code from flash:

slau144j, page 311:

If CPU execution is required during [flash] write or erase, the code to be executed must be in RAM. Any flash update can be initiated from within flash memory or RAM.

It has always been suggested to run Flash programming from SRAM to avoid such issues.

Can you please point me to where you suggest this in your documentation? The paragraph from the datasheet that you quoted simply says that flash cannot be read and written at the same time (access is inhibited), and that instruction fetching is held off (delayed) until the flash operation is finished. If blocking is not desirable, code can be run from SRAM. All of this makes sense and is expected. Nowhere does it say in the datasheet that running flash operations without copying code to SRAM will lead to incorrect program execution.

Also, as Chester pointed out, all of the code samples published by TI appear to initiate flash operations from code running in flash.

I believe that simply "we already told you to do flash updates from SRAM" is not sufficient resolution for this issue. What I would like to see is along the following lines:

1) Ideally, a fix for this bug in the next hardware iteration. Flash updates are a fairly common operation (built-in flash is a good place to keep application's persistent state), and at least with the tools that I use, copying individual functions to SRAM is both a hassle and can lead to subtle errors when not done carefully.

2) An errata that mentions this problem and suggests a reliable workaround (preferably other than "copy your code to SRAM")

3) A rewording of the datasheet. How about: "NOTE: if you perform any flash operations, your code must be executed from SRAM. Otherwise, your program will fail".

0 Andrew G over 9 years ago in reply to Andrew G

Intellectual 900 points

Also, I am still not clear about what exactly happens when flash is updated. Can you please answer a few questions:

1) The prefetch buffer is 2 words (64 bits). Is this buffer updated one word at a time on 32-bit boundary?

2) You mentioned that flash operations don't happen instantly. Is the delay deterministic, and if so, how many cycles occur before the update starts?

3) My understanding is that flash operations cause the prefetch buffer to be flushed, while the CPU relies on instructions that have already been buffered. Is this correct?

4) What sort of corruption is caused by flushing the buffer in this way? Does data in the prefetch buffer get zeroed out, replaced by random values, or removed entirely (so that a few bytes in the instruction stream are missing)?

5) Is there something special about the SP register? I am pretty sure I tried the example where Chester incremented SP by several values using R0 instead, and didn't see corruption in that case.

6) Is there a workaround where the prefetch buffer corruption is harmless (e.g. inserting a few nops after modifying the FMC register)?

7) What other microcontrollers in the TI ARM family are affected by this issue?

0 Amit Ashara over 9 years ago in reply to Andrew G

TI__Guru**** 244380 points

Hello Andrew,

As of now we are not sure if the data sheet be updated with a Note or an errata be done. I agree there with you that the data sheet is not clear in terms of the Flash operation to be done from SRAM (like the MSP data sheet).

In the most simplistic manner, while it may be deterministic, the fact that an interrupt can cause a code jump to occur, just adding NOP's may not be an apt solution, cause in the worst case scenario an interrupt jump after FMC is written may end up in the same issue, even though NOP's may be placed. Only TM4C123 devices may be affected by the same as other TI ARM products have SRAM execution during Flash operation specified.

The issue is the prefetch buffer zero's out the data when the Flash operation is done, so it is not a random instruction with every run.

Regards

Amit

0 Andrew G over 9 years ago in reply to Amit Ashara

Intellectual 900 points

Only TM4C123 devices may be affected by the same as other TI ARM products have SRAM execution during Flash operation specified.

This doesn't seem to be the case for TM4C1294 at least, as the flash description there is almost identical:

When a Flash memory operation write, page erase, or mass erase is executed in a Flash bank,
access to that particular bank pair is inhibited. As a result, instruction and literal fetches to the bank
pair are held off until the Flash memory operation is complete. If instruction execution is required
during a Flash memory operation, the code that is executing must be placed in SRAM and executed
from there while the flash operation is in progress.

(http://www.ti.com/lit/ds/symlink/tm4c1294ncpdt.pdf, page 612). Do you think TM4C1294 might be similarly affected, or is its flash implementation different?

Thank you for your attention to this matter, and many thanks to Chester for chipping in and helping to diagnose the issue.

Sounds like the only safe way to do flash updates on Tiva is to run code from SRAM, which we will start doing.

Is it out of the question to change flash implementation in the next silicone revision so that the CPU is notified when a flash operation takes place and the prefetch buffer is zeroed out? I guess the interaction between TI's proprietary peripherals and the ARM core can be problematic, as you may not have full control over the latter.

0 Amit Ashara over 9 years ago in reply to Andrew G

TI__Guru**** 244380 points

Hello Andrew,

TM4C129 has a different flash architecture, but it would be worthwhile effort in light of the current situation for us to pay due diligence to the same.

At this point no new rev is being planned for the device, so may be the writeup/errata would be the way to mitigate the issue.

Regards

Amit

0 Andrew G over 9 years ago in reply to Amit Ashara

Intellectual 900 points

Amit,

What about the following flash-based workaround?

1) Disable interrupts

2) Start flash operation

3) Add 10 (or however many is necessary) asm("movs r0, r0"). Can you please clarify how many cycles it takes for the flash operation to begin after a change is made to the FMC register?

4) Wait for flash operation to complete

5) Re-enable interrupts

"movs r0, r0" encodes to "00 00", so zeroing out of the prefetch buffer would have no effect.

I would prefer to find a workaround which doesn't involve copying code to RAM (which is a big hassle)

0 Chester Gillon over 9 years ago in reply to Andrew G

Guru 92251 points

Andrew G said:
Do you think TM4C1294 might be similarly affected, or is its flash implementation different?

I ported the program which failed on the TM4C123 device to a XM4C1294NCPTD (prototype silicon) in a EK-TM4C1294XL. I have also been unable make the TM4C1294 fail using either:

- The CPU frequency set to 120MHz which should automatically enable the flash pre-fetch.

- Force the flash pre-fetch on in the FLASH_CONF register

- Force the flash pre-fetch off in the FLASH_CONF register

Edit: Add test of 120MHz CPU frequency

0 Amit Ashara over 9 years ago in reply to Andrew G

TI__Guru**** 244380 points

Hello Andrew,

The NMI still will be a cause to consider even if the other interrupt sources are disabled.

As for the number of cycles, I would need to check and confirm.

Regards

Amit

0 Andrew G over 9 years ago in reply to Amit Ashara

Intellectual 900 points

Hi Amit,

Can you please give an update on this bug? I see that the errata for the microprocessor has not been updated yet.

Can you also please comment on how many cycles we need to wait before re-enabling interrupts? We don't use NMI, so I am hoping we can get away with a solution that doesn't involve copying code into memory.

Thanks.

0 Amit Ashara over 9 years ago in reply to Andrew G

TI__Guru**** 244380 points

Hello Andrew,

If we have to arrive at the minimum number of NOP's then it would be a varying number. Of course not a number like between 1-100 but more like 10-11 (safe value of 12). But this is a rather uncomfortable workaround to the issue. Instead we would suggest using 40MHz system clock when doing a Flash Program/Erase operation when executing from the Flash as the most reliable solution.

Regards

Amit

0 Andrew G over 9 years ago in reply to Amit Ashara

Intellectual 900 points

Amit,

The workaround of inserting 12 nops after commencement of a flash operation and disabling all interrupts doesn't seem to work. We found this out after upgrading GCC to version 4.8.2, which produced slightly different code for our flash library. Now, 11 nops works, but not 12 or 13. 14 nops works again. It seems that the previous characterization of the bug, which was zeroing out of the prefetch buffer with a delay of a few instructions, is inaccurate. Something else must be going on in the microprocessor during flash operations.

It appears that neither the errata nor the datasheet have been updated with information about this bug after several months. This is rather surprising. If incorrect code execution during a common operation is not considered a serious issue, I don't know what is.

0 Amit Ashara over 9 years ago in reply to Andrew G

TI__Guru**** 244380 points

Hello Andrew

The new errata is being prepared for release. As I mentioned in my previous post the WA of using NOPs is not a reliable WA, bur rather use 40MHz for flash operations when executing from Flash.

Regards
Amit

0 Andrew G over 9 years ago in reply to Amit Ashara

Intellectual 900 points

Switching to 40MHz is not feasible for us as we use the master clock for time keeping. Switching it back and forth between 40Mhz and 80Mhz to perform flash operations would throw off our timing logic.

If 12 nops is not safe, is there a value that is safe? Or is the problem more complex than zeroing out of bytes in the prefetch buffer?

0 Amit Ashara over 9 years ago in reply to Andrew G

TI__Guru**** 244380 points

Hello Andrew,

That is the exact reason why we did not want to use NOP's as the WA. The issue is with the zeroing of the bytes in the prefetch buffer but the conditions of interrupts may cause the code to behave the same with additional NOP's

Regards
Amit

0 Andrew G over 9 years ago in reply to Amit Ashara

Intellectual 900 points

I don't understand this. We disable all interrupts during flash operations, and we don't use NMIs, yet still see problems with 12 nops. Are you saying that prefetch buffer corruption somehow affects pending interrupts?

0 Amit Ashara over 9 years ago in reply to Andrew G

TI__Guru**** 244380 points

Hello Andrew,

I am saying that if the interrupts are enabled then it can potentially affect the NOP WA. That is why the errata will state only 40MHz or SRAM code execution when Flash Programming/Erase as the only WA

Regards
Amit

0 Andrew G over 9 years ago in reply to Amit Ashara

Intellectual 900 points

I see. However, if we do disable interrupts, is there a safe number of nops? You suggested 12 before, but that wasn't sufficient.

0 Amit Ashara over 9 years ago in reply to Andrew G

TI__Guru**** 244380 points

Hello Andrew,

I am not sure if the next number of NOPs would be good till another optimization from the compiler will hold it invalid.

Regards
Amit

0 Andrew G over 9 years ago in reply to Amit Ashara

Intellectual 900 points

What do compiler optimizations have to do with whether the hardware corrupts past 12 nops or not? I give up, won't push this any further.

0 Chester Gillon over 9 years ago in reply to Andrew G

Guru 92251 points

Andrew G said:
We found this out after upgrading GCC to version 4.8.2, which produced slightly different code for our flash library. Now, 11 nops works, but not 12 or 13. 14 nops works again.

Some thoughts, which maybe only someone at TI can answer::

1) Is use of the flash programming functions in the Tiva ROM a suitable work-around?

i.e. if the programming software is being executed from ROM with a CPU frequency > 40MHz does it avoid the issue with the flash prefetch buffer?

2) To see if the bug is sensitive to alignment, is it possible to leave the number of NOPs (and the rest of the code) alone and adjust the linker script to adjust the start address used in the flash by say two bytes at a time and see if the fault comes-and-goes?

[I am not sure if the flash pre-fetch buffer interacts with how the Cortex-M4F core prefetches / pipelines instructions]

0 Amit Ashara over 9 years ago in reply to Chester Gillon

TI__Guru**** 244380 points

Hello Chester,

Thanks again. I think (1) would be a valid workaround (though it has to be tested). However the issue in the data sheet where stall of the ICODE+DCODE during Flash operation still needs to be cleared.

Regards
Amit

0 JR Simma over 9 years ago

Intellectual 650 points

I found this post today and think it relates to a problem we've been seeing for a week. The reason I'm pointing this out is because I think the underlying issue will happen with the CPU also running at 16MHz. We see the problem at 16MHz and 80MHz (both) and are the only two speeds we've tested.

Here is the post we made:

e2e.ti.com/.../404282

Our fix is to run the flash routines and interrupts from RAM.

0 Amit Ashara over 9 years ago in reply to JR Simma

TI__Guru**** 244380 points

Hello JR,

The WorkAround is to disable the Master Interrupt when running from Flash, otherwise the code will still try to access the Flash.

Regards
Amit

Arm-based microcontrollers

Arm-based microcontrollers forum

Possible TM4C123 hardware bug