CODECOMPOSER: Compiler sometimes splits a volatile 32-bit access into two 16-bit accesses

Adam Strelsky

Part Number: CODECOMPOSER
Other Parts Discussed in Thread: C2000-CGT, C2000WARE

Tool/software:

Hello,

according to the manual, the C compiler from C2000-CGT is not supposed to change sizes of volatile accesses (explained in spru514y.pdf, J.3.10 Qualifiers), but there seems to be a corner case where this is not actually the case, where the compiler splits a 32-bit volatile read into two 16-bit reads.

Example of the issue, compiled with "cl2000.exe --c99 -Ooff -v28 -k":

volatile unsigned long value;

unsigned long foo_0(unsigned long arg) { return value & arg; }
unsigned long foo_1(unsigned long arg) { return value | arg; }
unsigned long foo_2(unsigned long arg) { return value ^ arg; }

The example above compiles to the following assembly instructions when compiled with 22.6.0.LTS, 22.6.1.LTS, and 22.6.2.LTS under Windows:

_foo_0:
        ADDB      SP,#2                 ; [CPU_ARAU] 
        MOVL      *-SP[2],ACC           ; [CPU_ALU] |3| 
        MOVW      DP,#_value            ; [CPU_ARAU] 
        AND       AL,@_value            ; [CPU_ALU] |3| 
        AND       AH,@$BLOCKED(_value)+1 ; [CPU_ALU] |3| 
        SUBB      SP,#2                 ; [CPU_ARAU] 
        LRETR     ; [CPU_ALU] 

_foo_1:
        ADDB      SP,#2                 ; [CPU_ARAU] 
        MOVL      *-SP[2],ACC           ; [CPU_ALU] |4| 
        MOVW      DP,#_value            ; [CPU_ARAU] 
        OR        AL,@_value            ; [CPU_ALU] |4| 
        OR        AH,@$BLOCKED(_value)+1 ; [CPU_ALU] |4| 
        SUBB      SP,#2                 ; [CPU_ARAU] 
        LRETR     ; [CPU_ALU] 

_foo_2:
        ADDB      SP,#2                 ; [CPU_ARAU] 
        MOVL      *-SP[2],ACC           ; [CPU_ALU] |5| 
        MOVW      DP,#_value            ; [CPU_ARAU] 
        XOR       AL,@_value            ; [CPU_ALU] |5| 
        XOR       AH,@$BLOCKED(_value)+1 ; [CPU_ALU] |5| 
        SUBB      SP,#2                 ; [CPU_ARAU] 
        LRETR     ; [CPU_ALU]

The issue is present regardless of the selected optimization level, even if the optimizations are completely turned off with "-Ooff", and this can then cause issues when a program is reading data from a peripheral device that only supports 32-bit accesses, or if you rely on a single volatile 32-bit read/write to be atomic with regards to thread-safety in an environment with interrupts (which both are intended approaches with C2000 as I understand, as they are also used in C2000Ware examples and drivers).

Possible workaround for this issue seems to be to use the __byte_peripheral_32 intrinsic to read the 32-bit data, which seems to stop the compiler from splitting the access, but it requires the user to find all places in source code relevant to this issue.

unsigned long foo_fixed(unsigned long arg) { return __byte_peripheral_32(&value) & arg; }

which then compiles to:

_foo_fixed:
        ADDB      SP,#2                 ; [CPU_ARAU] 
        MOVL      *-SP[2],ACC           ; [CPU_ALU] |6| 
        MOVL      XAR4,#_value          ; [CPU_ARAU] |6| 
        MOVL      ACC,*+XAR4[0]         ; [CPU_ALU] |6| 
        AND       AL,*-SP[2]            ; [CPU_ALU] |6| 
        AND       AH,*-SP[1]            ; [CPU_ALU] |6| 
        SUBB      SP,#2                 ; [CPU_ARAU] 
        LRETR     ; [CPU_ALU]

The above examples are the only ones with this issue that I am aware of. If I try other accesses to bitfields, or different operators, then the compiler correctly loads whole 32-bit words at once without splitting them.

5 months ago

+1 George Mock 5 months ago

TI__Guru**** 249980 points

It is incorrect to presume applying volatile with 32-bit wide accesses means the compiler is limited to 32-bit wide memory access instructions. Instead, the compiler is required to use instructions that access each bit of data the same number of times as the source. The sequences you show above meet that constraint, even though they take two 16-bit wide instructions to do it.

I realize this is not the result you expect. But that is how volatile works.

For further background, please see this similar thread. It resulted in the issue EXT_EP-9400 . The Release Notes part of that issue, unfortunately, is truncated. Here is the full passage:

C2000 does not have 32-bit bitwise (AND, OR, XOR) instructions, so 32-bit bitwise operations must be done with two 16-bit bitwise instructions. This means that it is not possible to truly handle a 32-bit bitwise operation in an atomic manner, so it's not possible to satisfy the atomic requirement for bitwise operations on 32-bit volatile variables. In this case, the compiler is supposed to follow the rule that for a volatile read, the compiler will read every bit in the variable exactly once, and for every volatile write, the compiler will write every bit in the variable exactly once. This means there are only two valid instruction sequences: either two bitwise operations directly to memory, or a 32-bit load, followed by bitwise manipulation, followed by a 32-bit write. When generating the two 16-bit bitwise instructions, the compiler checked to see if any of the 16-bit bitwise instructions was a tautology, such as binary OR with an all-zero constant, in which case it dropped the tautological instruction. This is not legal if the instruction operated directly on volatile memory. The bug in this case is that the compiler did it anyway.

Thanks and regards,

-George

0 Adam Strelsky 5 months ago in reply to George Mock

Prodigy 10 points

Hi, thank you for getting back to me! There were two reasons why I thought the TI compiler would not be allowed to generate two 16-bit reads from a single 32-bit volatile read. The compiler manual says this about volatile accesses (spru514y.pdf, J.3.10 Qualifiers) and I have read it as meaning that a volatile 32-bit read is sufficient to ensure that the compiler will indeed use a 32-bit access:

The TI compiler does not shrink or grow volatile accesses. It is the user's responsibility to make sure the access size is appropriate for devices that only tolerate accesses of certain widths.

But I see that this does not conflict with your explanation, so it was a misunderstanding on my part. But the second reason is that C2000Ware drivers and code examples for the C2000 architecture seem to have this misunderstanding too then. In C2000Ware_5_04_00_00, in file driverlib\f2838x\examples\c28x\interrupt\interrupt_ex1_external.c, there are 32-bit volatile variables which are counting the number of executed interrupt service routines, and the value of these counters is read in the main function without turning the interrupts off before doing the read, which means an interrupt can happen after the first 16-bit read is executed, but before the second one is (the 32-bit read isn't actually split by the compiler in this case, but the compiler would be allowed to split it here).

Another example from C2000Ware_5_04_00_00 I looked at is from driver code for the MCAN peripheral device, specifically with regards to how the register MCAN_TXBRP is accessed (file driverlib\f2838x\driverlib\mcan.c, line 1399), but it does apply to other registers too. This register requires specifically a 32-bit access in order to return correct data, but the code in the driver only seems to be relying on the access being volatile to ensure this.

The takeaway is then: access being volatile isn't enough when accessing 32-bit data from across asynchronous tasks when targeting C2000, and peripheral devices that require 32-bit reads do actually need to use __byte_peripheral_32 or __attribute__((byte_peripheral)).

Thank you again for the explanation! The exact meaning of volatile is always a bit different between embedded architectures, so I was doing my best to understand this from available resources.

Code Composer Studio™︎

Code Composer Studio forum

CODECOMPOSER: Compiler sometimes splits a volatile 32-bit access into two 16-bit accesses