Data integrity in a multi task system

Randy Bulloch

Hi All,

I have 2 questions:

Q1: I have several critical data (floats, uint16 & uint32) types that I want to make sure can not be corrupted during task switching. Since the Tiva data bus is 32 bits wide, is it reasonable to assume that any data that is 32 bits or less will be treated as an atomic value and hence there is no opportunity for the data to be corrupted during multi thread access?

Q2: If protection is required for 32 bit data, is a semaphore the best mechanism?

Discussion:

I believe that data transfer of variables in TIVA processors is always 32 bits wide. That a single 32 bits or less read or write in "C code is an atomic operation. If the code is only intended to run on a TIVA processor, access by multiple TI-RTOS tasks to single 32 bit or smaller variable in a TIVA processor does not require semaphores or other locking mechanism to protect against corruption. The assembly code I have reviewed seems to confirm this.

In many discussions on the forums I did not recognize any examples stating otherwise for access to a single variable. I normally would use a semaphore as a protection mechanism but question the need. However, I am not at all certain that I understand all the issues, i.e., optimization, need for volatile type.

Summary of question: Is a semaphore necessary assuming code only executing on the TIVA, a TI only compiler, and only single variables?

CCS 6.1, TI RTOS 2.14, Tiva 129XNCZAD MCU

I am asking about basics again and appreciate all the help I can get,

Randy

over 10 years ago

0 Amit Ashara over 10 years ago

TI__Guru**** 244440 points

Hello Randy,

A semaphore is required when multiple tasks modify the same variable. However when using different variables it has to be taken care that the variables are defined correctly and used correctly. I recently made a careless mistake of defining a 32-bit return value as 16 bit and then reading it as 32-bit corrupting a compile placed variable. No assumptions to be made on compiler.

Regards
Amit

0 Robert Adsett over 10 years ago

Guru 27665 points

Randy Bulloch said:
Since the Tiva data bus is 32 bits wide, is it reasonable to assume that any data that is 32 bits or less will be treated as an atomic value and hence there is no opportunity for the data to be corrupted during multi thread access?

No, consider

unaligned access may require several bus operations to read or write a value
sub-32 bit access will require a read/modify/write cycle which may be interruptible.

I used an RTK once that made similar assumptions. It took me months to find that particular Heisenbug.

Randy Bulloch said:
Q2: If protection is required for 32 bit data, is a semaphore the best mechanism?

Maybe. It may be simpler and faster to simply disable task switching. Latency might be lower as well as overhead. The disadvantage is you lock everything, however semaphores generally need to do something like this as well, the overhead is implementation/RTOS specific.

Randy Bulloch said:
volatile type.

Just a quick note. volatile is often misunderstood. All volatile does is tell the compiler that all accesses to this variable must be performed (no caching in registers for later use) and they must be accessed in the "order"¹ specified by the code.

Robert

1 - You need to keep your code very direct if you want to be sure of what this means.

0 Randy Bulloch over 10 years ago in reply to Robert Adsett

Intellectual 460 points

Thanks Amit and Robert for the nearly instant response.

I thought that all data in the TIVA was 32 bit aligned (based on my too quick checking), obviously I did not check carefully enough. I need to write a tiny snippet that gives me unaligned variables and remember that is the case.

Again, thanks for the correction and quick advice.

Randy

0 Robert Adsett over 10 years ago in reply to Randy Bulloch

Guru 27665 points

It has become common for compiler's to have a switch to operate either with variables aligned (for speed) or without attention paid to alignment (for compactness). This normally only affects structures with simple variables aligned to an appropriate boundary.

Keyword based alignment is also common but probably best avoided. It's fairly easy to avoid it with proper code.

Robert

0 Amit Ashara over 10 years ago in reply to Randy Bulloch

TI__Guru**** 244440 points

Hello Randy,

Need to be careful when it comes to compiler assigning memory space.

Regards
Amit

0 Randy Bulloch over 10 years ago in reply to Amit Ashara

Intellectual 460 points

Hi Amit & Robert,
In checking I see that even without optimization, variables in structures are not always aligned on 32 bit boundaries. So it seems that staying with semaphores makes sense.

thanks,
Randy

0 Amit Ashara over 10 years ago in reply to Randy Bulloch

TI__Guru**** 244440 points

Hello Randy

I think there is an option to prevent unaligned access during compile phase.

Regards
Amit

0 Randy Bulloch over 10 years ago in reply to Amit Ashara

Intellectual 460 points

Hi Amit,
Yes, I agree. We are looking into using compiler command setting to align public variables of interest.
Thanks,
Randy

0 Robert Adsett over 10 years ago in reply to Randy Bulloch

Guru 27665 points

Be careful Randy, this could become a maintenance nightmare very easily.

Robert

0 Petrei over 10 years ago

Guru 26105 points

Hi,

Useful for your case: LDREX/STREX asm instructions - the "exclusives" implementation in Cortex-Mx for memory protection. See Yiu for a more detailed explanation.

Also useful to know - ARM's Application Note 321 - Programming Guide to Memory Barrier Instructions.

0 Amit Ashara over 10 years ago in reply to Robert Adsett

TI__Guru**** 244440 points

Hello Robert

Can you explain how? May be an example would aid Randy and others.

Regards
Amit

0 Robert Adsett over 10 years ago in reply to Amit Ashara

Guru 27665 points

Consider where you would use alignment switches

Global (either program or Module)

I don't think I've ever used a compiler that didn't allocate global memory according to alignment. This would in any case almost certainly be a global switch at the link and locate stage. If you know you depend on this you certainly need to document it in your build

Stack

Similarly to global, I've not sure I've ever used a compiler that had a switch to change this. One thing that would need special attention here is any called routines. I would expect mixing modes here would be unsafe.

Structure

This is where you would usually see alignment switches used. The problem here is if a structure is used in more than one place all modules that use it need to likewise have the switch set. Likewise any modules that use any structure used in those modules (even by reference). Maintaining that would be nightmarish.

That's why keyword alignment was invented. Hidden behind typedefs this can work but there are things you would need to check. Like how is an aligned structure contained in an unaligned structure is dealt with and vice versa. These keywords are vendor specific and cause problems with tools that expect standard C or if you need to move the code to a different compiler and/or micro.

This leads to the question, why would you need to control alignment?

Atomicity - Well that guarantee is rather limited. It works only for exact sizes and is processor specific. As well it only works for the simple assignment case and may be dependent on compiler optimization levels (see maintenance nightmare). The alternatives are simple locking (like but not limited to interrupt disabling), semaphores or dropping to assembly and using the atomic instructions noted earlier via an access function. You can make your semaphores. cover too narrow a region as well as too wide a region. Also take a look at your RTOS IPC primitives, there are usually more than semaphores available.
Speed - look at alternative algorithms first. then if you still have a problem look at things like caching or assembly language. Unless your compiler is unable to optimize (maybe restricted by volatile) alignment is unlikely to be a speed constraint easily dealt with by a compiler switch.
Space - The usual reason for forcing structures to be unaligned is space. But are you really sure that is a problem? There is a simple approach that reduces wasted space in structures. Simple ordering the members from the largest to the smallest reduces the gaps necessary to maintain alignment so you get your speed advantage with minimal wasted space. It still won't be zero in some cases but unless it's shown to be a problem why overreact?
I/O - The thought here is to eliminate padding bytes in I/O. This really breaks down into separate cases

Network - Any communication channel really if you are sending binary and not text (There are real advantages to using text if you have the overhead to handle it). However removing padding is only half the problem, you also have to be sure that the endianess is correct and the size is correct. Better handled IMO with cracker macros or functions. That has the advantage of being reusable and in the ideal case the cracker macros will resolve to zero instructions.
Storage (Large) - Go ASCII, seriously why even consider binary? ASCII is more portable, easier to recover in case of corruption etc... There are cases for binary but if you don't need it why bother optimizing for it? If you do need to use binary then take a look at the network discussion. The arguments are the same.
Storage (Small) - This is the case where I think you are most likely to want it but I think you should measure before going that route. By using the structure order trick mentioned earlier you can reduce the wasted space considerably even when you keep mirror copies. Only when it is the difference between fitting and not fitting would I look at it and I'd first consider a cracking function. Hint: Put all your parameters in a single (or few) structures, it makes a lot of things simpler.

Robert

"Premature Optimization is the Root of all evil" - Attributed to CAR Hoare by Knuth IIRC

0 Randy Bulloch over 10 years ago in reply to Robert Adsett

Intellectual 460 points

Hi Robert & Amit,
Great to have you both on this forum helping. I am (still) tempted to continue the use of semaphores, but the design decision will be made (probably by others) using the considerable info gained with your input.

Regards,
Randy

0 Robert Adsett72 over 10 years ago in reply to Robert Adsett

Guru 10570 points

One quick point of clarification on atomicity. I would expect aligned 32 bit reads and aligned 32 bit writes to be atomic on an ARM Cortex. Read Modify Write I would not expect to be guaranteed atomic.

Robert

0 Amit Ashara over 10 years ago in reply to Robert Adsett72

TI__Guru**** 244440 points

Hello Robert

And neither a bit banded address access

Regards
Amit

0 Fred Assadi over 10 years ago in reply to Robert Adsett

Intellectual 330 points

Hi Robert,
I also work with Randy, and we were discussing your ideas. I have a few questions of my own that I am hoping you can answer:

Q1. You state that 32 bit Read or Write operations are atomic and may be performed without a locking mechanism, and that structures and Read-Modify-Write instructions are the ones to look out for. (I suppose that this is also true of 8 /16 bit RMW operations) I would think that an operation like: myUint32Var += 1 is a RMW type operation. Can you give some other examples of RMW operations that we should be careful about?

Q2. Is it safe to assume that as long as there is only one writer task, and the data is atomic (not a structure) then there is no RMW problem?

Q3. If there are multiple tasks that write the variable, then disabling all interrupts during the write will always take care of the problems. T/F?

Q4. Is R/W access to 16 bit data that falls at 16 bit boundaries, but not 32 bit boundary still thread safe? (Question also applies to any global 8 bit data members.)

Q5. I’m assuming that it does not matter if the data in Q4 is global or member of a structure, as long as the structure is either protected or data in it can be updated individually.

Thanks again for your previous thoughtful answers,
Fred

0 Robert Adsett72 over 10 years ago in reply to Fred Assadi

Guru 10570 points

It'll take me a few posts to answer because of the difficulties working from this machine. Also it's a rather large area to address.

Robert

0 Robert Adsett72 over 10 years ago in reply to Fred Assadi

Guru 10570 points

Q1
Besides the increment and decrement operators using the bit operators for masking comes to mind immediately.

Basically if you can rewrite it as x = x op ....
Or as x = func(x)
It's probably a read modify write operation.
8 and 16 bit writes are always read modify writes since the processor must read the word, mask and update part of it and write it back. Some processor cores provide uninterruptible instructions to do this, I don't think the Cortex does but i don't recall. Some processors have implemented byte lanes that allow reading/writing single bytes within a word.

Robert

0 Robert Adsett72 over 10 years ago in reply to Fred Assadi

Guru 10570 points

Q2

Maybe, it does simplify the problem.

Consider the following

X = X/2+g

It's an rmw operation but the compiler is free to implement it as
X = X/2
X=X+g
If X is only valid upon completion then there is an opportunity for the reader to get the wrong result.

There is another case of concern. Consider something like the following

X=1
…
X=2
…
X=3

Unless the elided code contains a function call the compiler is free to cache the result. It does not need to write the value until the routine is exited either by a function call or a return from. Note that it's also free to rearrange and eliminate as long as the global is appropriately changed when leaving the function.

There is a way to restrict the compiler's ability to perform any of these transformations. If the target is volatile the the writes and reads must be performed exactly as many times as they are written and in the order written. You can see how that would address the cases above.

Robert

0 Robert Adsett72 over 10 years ago in reply to Robert Adsett72

Guru 10570 points

The the --> then the

0 Robert Adsett72 over 10 years ago in reply to Robert Adsett72

Guru 10570 points

Note that this single writer technique is essentially what is used to create nonblocking FIFOs

Robert

0 Robert Adsett72 over 10 years ago in reply to Fred Assadi

Guru 10570 points

Q3,

I think so, yes. Although multiple writers may be an issue even if the update is well defined. I've used it successfully in some cases though.

Robert

0 Robert Adsett72 over 10 years ago in reply to Fred Assadi

Guru 10570 points

Q4

No. Earlier I explained why writing variable sizes smaller than 32 bits is a RMW operation. Now the processor may provide a non interruptible instruction to write smaller sizes but the compiler is not compelled to use these instructions. I ran into that problem years ago with a commercial RTOS, they made the assumption that assignments to a particular size of variable was atomic, the processor had instructions that ensured it was, but it turned out the compiler only used those instructions under high levels of optimization. Never assume that generated code must use a particular instruction sequence.

Robert

0 Robert Adsett72 over 10 years ago in reply to Robert Adsett72

Guru 10570 points

If you start using this technique watch out for nested disable sequences. There are ways to deal with that but you need to be aware that the issue exists

Robert

0 Robert Adsett72 over 10 years ago in reply to Fred Assadi

Guru 10570 points

Q5

Yes, protection methods generally do not depend on memory organization

Robert

0 Robert Adsett72 over 10 years ago in reply to Fred Assadi

Guru 10570 points

If you are going through all of this the question is why?

You are basically talking about inter process communication. An RTOS/RTK will have structures already developed to do this.

There will usually be constructs like signaling semaphores, counting semaphores, queues, mailboxes, FIFOs and mutexes. There will be a nonblocking subset available for interrupts. From these you can build almost any IPC structure you would like. And if you cannot build it from the available IPC structures then the primitives used to make them can be used to build a new type.

There are quality free and commercial kernels available, it makes sense to use them and their facilities.

Robert

0 Robert Adsett72 over 10 years ago in reply to Robert Adsett72

Guru 10570 points

A couple of additional notes edging towards being off topic

On architectures of 16 bits or larger the types int and unsigned int are supposed to represent the natural word size of the processor so using int and unsigned int should give atomic reads and writes and represent the best performing integer. This is where I think that the recent emphasis on sized types has gone astray. The reason that smaller architectures have an exception is that the int types have minimum ranges that require 16 bits to represent in a two's complement implementation.

I've neglected consideration of heavily cached systems which can impose further restrictions. See the comment on memory barriers earlier. I don't think that's an issue for any Cortex M but may be for Cortex A.

Robert

Arm-based microcontrollers

Arm-based microcontrollers forum

Data integrity in a multi task system