This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Compiler: bool and trap representations?

Expert 1226 points

Tool/software: TI C/C++ Compiler

How does the TI compiler handle _Bool variables that happen to be neither 0 nor 1?

(Take your pick for how this happened: they weren't initialized, type punning or other side-effects, cosmic radiation, what-have-you.)

I'm generally interested in the answer for any of TI's compiler targets, but I'm most specifically interested in TI's ARM compiler.

The ARM Optimizing C/C++ Compiler v18.1.0.LTS User's Guide (spnu151r.pdf) doesn't have much to say about _Bool, It shows up in exactly 3 places, one of which is an entry in Table 5-1 which gives it a "Maximum Range" of 255.  I suspect that this is a bit misleading.

The only reference to "trap representations" in the user's guide is a statement in entry J.3.5 inside section 5.1.1 that states "Integer types are represented as two's complements, and there are no trap representations."  I assume this is only meant to be referring to _signed_ integer types, correct?

Does the TI compiler consider reads of _Bool variables that are neither 0 nor 1 to be undefined behavior?

--thx

  • Yes, it is undefined behavior, because the standard provides no definition for what should happen in that situation. In practice, most operations on _Bool treat it as a builtin type and compare it to zero when testing it, so it would probably work as though all non-zero values are true, which is what one expects. However, I can't guarantee that this always holds true. There may very well be optimizations that assume that if a _Bool is true that it is exactly 1. Yes, the user's guide should not say it has a maximum range of 255. Yes, only signed integers are represented as two's complement. Unsigned numbers are plain binary values.
  • Ok, fair enough, thanks.

    Since the User's Guide does go to the trouble to specifiy that (signed) integers do not have trap representations, it would seem prudent for the UG to also call out that _Bool variables do have trap representations, as well as ameliorating the content in Table 5.1.

    It's a bit of a shame; the introduction of _Bool was a nice improvement to design expressiveness in C, and we've used it in a number of places, but having yet another way to fall into UB is no-one's friend. I guess we'll go back to BOOL typedefs to uint8_t (for example).

    For what it's worth, we have observed an occasion where the TI ARM compiler trusts that a _Bool variable can only be 0 or 1, generating code that created surprising results when the _Bool happened to be 0x81.

    --thx
  • From the compiler's point of view, declaring a variable with _Bool type is a promise that it does in fact contain either 0 or 1. If that's not true, such as the case you cite where the value was 0x81, it should not be declared _Bool. It's still OK to use _Bool in other contexts. When writing to a _Bool, the compiler will take steps to ensure the value is 0 or 1.
  • We declared our variable as a _Bool correctly (well, actually, we declared it as a "bool" after #including stdbool.h), in the sense that our intent was that it would only ever be 0 or 1, and declaring it as bool expressed that intent.

    We had also thought that we liked the feature where assignments to a bool (including providing a non-bool integer to a bool function argument) would automatically convert non-zero integers to 1.  We just hadn't looked deeply into the question of whether or not similar conversions happen when reading from a bool.  It's not surprising that it doesn't; it's just that we hadn't really appreciated that by using the C99 standard _Bool / bool type we've been exposing ourselves to a new category of vulnerability (trap representations) that didn't otherwise exist.

    The situation where its value managed to be 0x81 was a bug.  (Presumably our bug, but not necessarily; we haven't tracked it down yet.)  It will need to be fixed, regardless of what type of variable we use.

    But there's more to making programs safe than finding all of the bugs and fixing them.  It's very hard to reason about undefined behavior.  Take the following function:

    #ifdef AVOID_UNDEFINED_BEHAVIOR
    typedef unsigned char BOOL;
    #else
    typedef _Bool BOOL;
    #endif
    
    void checkStatus(BOOL errorDetected)
    {
        if (errorDetected)
            handleError();
        else
            handleSuccess();
    }
    

    It's easy to imagine that a developer or a peer reviewer might assume that regardless of which way BOOL is defined, in every invocation of checkStatus() either (but not both!) errorDetected() or handleSuccess() would be called, regardless of what kind of bugs might have occurred earlier in the program's execution.

    That's naive, of course.  There's over a hundred kinds of undefined behavior in C, not to mention the possibility of corrupted stacks or other egregious "external" problems.  If neither handleError() nor handleSuccess() get called, or if both get called, then, well, that's hardly the worst thing that can happen when undefined behavior occurs.

    But defining BOOL as _Bool adds an additional fundamental vulnerability that it doesn't have when BOOL is defined as unsigned char.

    (For that matter, using unsigned char would allow us to add an assertion check to confirm that errorDetected was always either 0 or 1.)

    (I don't mean to imply that we detected the TI ARM compiler compiling a function like the above in a surprising-to-us fashion.  Our actual surprise had to do with bit-fields.  I'll simplify it a bit and add it to this thread, just as an FYI if anyone is curious; I accept that the compiler was conforming to the applicable C standards.)

    --thx

  • Yes, it is unfortunate that there are so many areas of undefined behavior, but they generally arise from the C standard not wanting to over-specify implementation, often so that assumptions can be made for optimization. One might argue that this is not the safest thing, but the language is what it is...