Compiler/TMS320F28379D: Runtime issues using lvgl graphics library with C2000 compiler.

Kyle Dunn1

Part Number: TMS320F28379D
Other Parts Discussed in Thread: LAUNCHXL-F28379D,

Tool/software: TI C/C++ Compiler

Hello,

I'm trying to use the LVGL graphics library on a Delfnio LAUNCHXL-F28379D (TMS320F28379D). I've spent some time testing various functions of the library on the board and noticed some unexpected results (compared to known-working output). I also saw in the compiler manual the note about "keeping expressions simple", though it's not entirely clear how this is always "possible". I'm looking for help re-writing some macros / functions so they are compatible with the hardware/toolchain.

For example:

#define LV_MATH_UDIV255(x) ((uint32_t)((uint32_t) (x) * 0x8081) >> 0x17)

This results in incorrect results at runtime.

Similarly, the following fails for almost every case:

long long _lv_pow(long long base, int8_t exp)
{
    long long result = 1;
    while(exp) {
        if(exp & 1)
            result *= base;
        exp >>= 1;
        base *= base;
    }

    return result;
}

I noticed there are hardware intrinsics in the compiler manual that are sure to come in handy. Many of them appear to specify a "return type", though that doesn't seem to be how it behaves are runtime. For example, I'd expect the following to be equivalent to the udiv255 macro shown above, but it results in a sign flip and wrong results.

#define LV_MATH_UDIV255(x) (uint32_t)( (uint32_t)__mpy((uint32_t)x, (uint32_t)0x8081) >> (uint32_t)23)

I apologize if much of this is 101 embedded stuff - I'm still learning embedded and (relearning) C. Thank you in advance for any help.

Cheers,

Kyle

over 5 years ago

0 George Mock over 5 years ago

TI__Guru**** 252400 points

For one source file that uses this macro ...

Kyle Dunn1 said:

1

#define LV_MATH_UDIV255(x) ((uint32_t)((uint32_t) (x) * 0x8081) >> 0x17)

... please follow the directions in the article How to Submit a Compiler Test Case. In addition, please describe the value input to the macro, the result you expect for that value, and the result you get instead.

Thanks and regards,

-George

0 Kyle Dunn1 over 5 years ago in reply to George Mock

Prodigy 50 points

Please find the attached test case file, which includes the test input as part of the logic.

The expected (input,output), when the same code is run on a x86 machine is:

Testing udiv255
1,0
2,0
4,0
8,0
16,0
32,0
64,0
128,0
256,1
512,2
1024,4
2048,8
4096,16
8192,32
16384,64
32768,128
65536,257
131072,514
262144,1028
524288,2056
1048576,4112
2097152,8224
4194304,16448
8388608,32897
16777216,65794
33554432,131588
67108864,263176
134217728,526352
268435456,1052704
536870912,2105408

The (input,output) results I see when running the attached test case on the TI board are:

Testing udiv255
1,0
2,0
4,0
8,0
16,0
32,0
64,0
128,0
256,1
512,2
1024,4
2048,8
4096,16
8192,32
16384,64
32768,128
65536,257
131072,2
262144,4
524288,8
1048576,16
2097152,32
4194304,64
8388608,129
16777216,258
33554432,4
67108864,8
134217728,16
268435456,32
536870912,64

The compile flags I used:

-v28 -ml -mt --cla_support=cla1 --float_support=fpu32 --tmu_support=tmu0
--vcu_support=vcu2 -Ooff --define=_FLASH --define=DEBUG --define=CPU1 
--c99 --diag_suppress=10063 --diag_warning=225 --diag_wrap=off 
--display_error_number --abi=eabi --aliased_variables

I'm having issues uploading the file using the editor (it errors when I hit reply) so I've hosted it using IPFS here: https://ipfs.io/ipfs/QmUwYtbtPEfX9pLHh5StdYNBqREmE35TVDLxiMszQaDzuG

0 Chester Gillon over 5 years ago in reply to Kyle Dunn1

Guru 92251 points

The result of the LV_MATH_UDIV255 macro first differs from x86 machine results when base is 131072.

The LV_MATH_UDIV255 calculation is (base * 0x8081) >> 17. For this calculation to produce the same results as the x86 machine the result of the multiply needs to be stored in 64-bits.

The original LV_MATH_UDIV255 macro was:

#define LV_MATH_UDIV255(x) ((uint32_t)((uint32_t) (x) * 0x8081) >> 0x17)

Which does a 32x32=>32 multiplication where with a base of 131072 or the higher values overflows the 32-bit result.

Adding casts to the macro to perform a 32x32=>64 multiplication, with the compiler generating a pair of QMPYXUL and IMPYL instructions for the multiply followed by a LSR64 for the shift:

#define LV_MATH_UDIV255(x) ((uint32_t)(((uint64_t) (x) * 0x8081) >> 0x17))

Then caused the program on a TMS320F28379D to produce the same result as on the x86 machine:

[C28xx_CPU1] 
Testing udiv255
1,0
2,0
4,0
8,0
16,0
32,0
64,0
128,0
256,1
512,2
1024,4
2048,8
4096,16
8192,32
16384,64
32768,128
65536,257
131072,514
262144,1028
524288,2056
1048576,4112
2097152,8224
4194304,16448
8388608,32897
16777216,65794
33554432,131588
67108864,263176
134217728,526352
268435456,1052704
536870912,2105408

I would say the TI compiler generated the expected code for the original macro.

My test project with the modified macro is attached.

TMS320F28379D_udiv255.zip

0 Kyle Dunn1 over 5 years ago in reply to Chester Gillon

Prodigy 50 points

Thank you very much for clarifying what is happening and how to fix the issue in this instance. I worry the lvgl codebase has many other cases where compiler-specific, implicit promotions are assumed for the computations to work reliably.

Chester Gillon said:

Adding casts to the macro to perform a 32x32=>64 multiplication, with the compiler generating a pair of QMPYXUL and IMPYL instructions for the multiply followed by a LSR64 for the shift:

?

1

#define LV_MATH_UDIV255(x) ((uint32_t)(((uint64_t) (x) * 0x8081) >> 0x17))

I am reading this as "typecast the result of the (x * 0x8081) multiplication to 64 bits then shift". Is that correct?

Again, my C/C++ is a bit dated but at first glance (and absent your description), I read this as "typecast (x) to 64 bits, multiply by 0x8081, then shift". Apologies if this is obvious, but some reference on this topic i've seen suggest that either 1) it's necessary to typecast both arguments or 2) it's necessary to typecast the expected result. Basically, I'm looking to understand how to (somewhat) easily identify cases in the code base where this silent loss of precision occurs and which casts to add where. Is there a compiler flag I can enable that will remark or warn about potential ambiguity or loss of precision?

Thanks again for the help!

-Kyle

0 Chester Gillon over 5 years ago in reply to Kyle Dunn1

Guru 92251 points

Kyle Dunn1 said:
Again, my C/C++ is a bit dated but at first glance (and absent your description), I read this as "typecast (x) to 64 bits, multiply by 0x8081, then shift".

Yes, it is the typecast of (x) to 64 bits which causes the multiply to be performed as 64-bits. While it is a bit dated, How to Write Multiplies Correctly in C Code has some background information.

Kyle Dunn1 said:
Is there a compiler flag I can enable that will remark or warn about potential ambiguity or loss of precision?

Not that I am aware of. It is perfectly valid to have a 32x32 multiply and only require a 32-bit result.

0 George Mock over 5 years ago in reply to Kyle Dunn1

TI__Guru**** 252400 points

Please reconsider the first variant of the macro LV_MATH_UDIV255 which does not have a cast to uint64_t. I think this result ...

Kyle Dunn1 said:

The expected (input,output), when the same code is run on a x86 machine is:

?

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

Testing udiv255

1,0

2,0

4,0

8,0

16,0

32,0

64,0

128,0

256,1

512,2

1024,4

2048,8

4096,16

8192,32

16384,64

32768,128

65536,257

131072,514

262144,1028

524288,2056

1048576,4112

2097152,8224

4194304,16448

8388608,32897

16777216,65794

33554432,131588

67108864,263176

134217728,526352

268435456,1052704

536870912,2105408

... is incorrect. The C99 ANSI standard, in section 6.2.5, states the following ...

A computation involving unsigned operands can never overflow, because a result that cannot be represented by the resulting unsigned integer type is reduced modulo the number that is one greater than the largest value that can be represented by the resulting type.

The multiply in this macro is ((uint32_t) (x) * 0x8081). Both operands are uint32_t, so the multiply is done in that type. Therefore, the largest result possible is 0xffffffff. If the multiply overflows that value, it is truncated. The bits above the 32nd one are dropped. In practice, this means a 32-bit wide multiply instruction is used. This multiply is followed by >> 0x17. So, the largest possible result of this macro is 0xffffffff >> 0x17. Written in decimal, that is 511.

To further back this up, I tried it on my laptop, with gcc version 4.9.3. (I realize that's a bit old, but that's what I happen to have right now.) I get the same result the C28x compiler does.

Thanks and regards,

-George

0 Kyle Dunn1 over 5 years ago in reply to George Mock

Prodigy 50 points

I very much appreciate all the input on this specific issue and the pointers to explaining how to better inform the compiler what the computation needs to be. Just to be clear, I'm not making any claims about the correctness of a given compiler's C/C++ implementation, rather showing the difference between doing a "thoughtless compile" of a library I need to use in a project.

After making the change suggested it actually looks like I misdiagnosed where the specific runtime issue is for the UI - as I still have rendering issues that seem to be related to computation. The problem I'm _really_ trying to solve is to get lvgl running "hello world" so I can proceed doing the higher level UI development needed for this project.

If the only way I can use this library on the TI device is for me to dig through the code base and find/fix places where macros and other functions are incorrect - according to ANSI standard - I won't be able to deliver the work on time.

I noticed that TI makes LVGL available for other (non TMS320) devices via a plugin - does anyone have experience using this plugin with the C2000 compiler? Should I expect that code base to account for these nuances in statement syntax?

0 Chester Gillon over 5 years ago in reply to Kyle Dunn1

Guru 92251 points

Kyle Dunn1 said:
The expected (input,output), when the same code is run on a x86 machine is:

What caused the creation of the test case which tested LV_MATH_UDIV255 with values of up to 536870912?

From an initial browse of https://github.com/lvgl/lvgl the LV_MATH_UDIV255 macro is used to mix individual RGB pixel colour values, and given each RGB pixel value can be a max of 8 bits, then based upon the lvgl source code I would expect the max input for LV_MATH_UDIV255 to be 255 * 255. The result of the 65025 * 0x8081 multiply performed by LV_MATH_UDIV255 would be 0x7f807e81 which doesn't overflow a 32-bit result.

0 Chester Gillon over 5 years ago in reply to Kyle Dunn1

Guru 92251 points

Kyle Dunn1 said:
I noticed that TI makes LVGL available for other (non TMS320) devices via a plugin - does anyone have experience using this plugin with the C2000 compiler? Should I expect that code base to account for these nuances in statement syntax?

I note that the https://github.com/lvgl/lvgl/blob/master/src/lv_misc/lv_color.h has the following union:

typedef union {
    struct {
        uint8_t blue;
        uint8_t green;
        uint8_t red;
        uint8_t alpha;
    } ch;
    uint32_t full;
} lv_color32_t;

Since the C2000 compiler stdint.h doesn't define the uint8_t type, due to on a C28xx device a char being is 16-bits, then the above union wouldn't compile unless the project takes steps to define the uint8_t type. Since C28xx isn't byte addressable, such a union which assumes can overlay four uint8_t's on a uint32_t wouldn't work and I think would require code changes.

Not sure if the use of the lv_color32_t structure is the cause of your observed rendering issues.

0 Kyle Dunn1 over 5 years ago in reply to Chester Gillon

Prodigy 50 points

Chester Gillon said:

What caused the creation of the test case which tested LV_MATH_UDIV255 with values of up to 536870912?

I noticed this macro being used in many places and (wrongfully) thought it might be related to my issue - after making the suggested change it doesn't have any noticeable affect.

The manifestation of the issue is being documented on the lvgl side here. The jagged edges in the most recent screenshot (as of 31 August) suggests (to me) there is a multiply precision / truncation issue somewhere in the drawing/rendering path, though, this is also just a hunch.

Chester Gillon said:

Since the C2000 compiler stdint.h doesn't define the uint8_t type, due to on a C28xx device a char being is 16-bits, then the above union wouldn't compile unless the project takes steps to define the uint8_t type.

You are correct. I am using this project as my base. It defines [u]int8_t to be [u]int16_t. I wasn't actually explicitly aware of this until you pointed it out, though much of the lvgl core functionality seems to be intact absent the (byte == 8 bits) assumption being false for this MCU. I've since added some #ifdef logic in the place of the code you linked to, making lv_color_t full a uint64_t and trying to locate places where additional touchups would be required.

At this point I'm considering using some static analysis tools to see if it can identify possible places where signed overflow is likely - I'll then see if sprinkling casts around those places gets me closer. Unfortunately, at some point I may have to pull up the stakes and change hardware - I'd very much like to stay on this platform.

0 Lori Heustess over 5 years ago in reply to Kyle Dunn1

TI__Guru* 93410 points

Kyle,

Not sure if it is an option, but the F2838x family has an ARM M4 subsystem. It is basically a F2837x + ARM M4. The ARM could handle the SW where you have the 8-bit byte requirement and it would free up the C28x to do other things.

The dual core (2 C28x + 1 ARM) has been released and the single core (1 C28x + 1 ARM) is in preview status.

https://www.ti.com/product/TMS320F28384D

Best Regards

Lori

C2000™︎ microcontrollers

C2000 microcontrollers forum

Compiler/TMS320F28379D: Runtime issues using lvgl graphics library with C2000 compiler.