This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

PRU-CGT optimization bug when calling memcpy

Other Parts Discussed in Thread: OMAP-L138, PRU-CGT

Hi All!

I have attached a minimal working bug in a code composer project.

Environment is the following:

  • OMAP-L138
  • pru-cgt v2.3.3
  • optimization setting: -O0 or higher

The source code is the following:

#include <stdint.h>

volatile uint32_t dumpvariable;
void GetTrashIntoRegister16(uint32_t r14, uint32_t r15, uint32_t r16)
{
    //make sure registers get loaded and used
    dumpvariable = r14;
    dumpvariable = r15;
    dumpvariable = r16;
}

void foo(void* from, void *to, uint16_t size)
{
    //Register r16 has the value 0xFFFF000A
    //Register r16 is the third argument: size
    //So memcpy is getting called with argument size == 0xFFFF000A, which will lead to an exception whenever a memory boundary is breached
    memcpy(to, from, size);

    //We will never reach here
}

uint32_t pfrom[10];
uint32_t pto[10];
volatile uint16_t size = 10;

int main(void)
{

    //Make sure, that register r16 is filled with 0xff
    GetTrashIntoRegister16(UINT32_MAX, UINT32_MAX, UINT32_MAX);

    foo(&pfrom, &pto, size);
    while(1)
    {

    }
}

The bug in short is this:

Function foo has 3 arguments, which get passed into memcpy. The third argument is in r16. Although function foo's third argument is uint16_t, and memcpy's third argument is uint32_t, the r16 is just passed through without cleaning the upper word. This leads to having trash value in the register/argument whenever memcpy is executing. In this example it leads to a copying size of 0xFFFF000A, instead of it being 0xA . Obviously this will cause an exception and halt the code execution.

The bug explained:

First we make sure that some known value gets written into register r16, by calling a function which does this.

Now we observe that r16 has the value of 0xFFFF000A, right before we call memcpy.

The disassembly does not show us any sign that r16 is being modified in our foo function, from which the memcpy will be called.

Memcpy is called, r16 has a value, which is totally wrong

Workaround:

If optimization is set to -Ooff we can observe that before calling memcpy, r16 is being clipped, after which it has the proper value of 0xA

Further investigation shows the following:

If the function which calls memcpy uses its 4th argument to pass on to memcpy's third argument, then the clipping from uint32_t to uint16_t is performed with the following assembly command:

AND R16.w0, R17.w0, R17.w0

As can be observed in compiler explorer

It also should be noted, that gcc for pru does not have this defect.

Furthermore, if I would call a C implementation of memcpy, then the r16 clipping would be performed:

As a footnote I would like to just note, that it would be such a great thing if TI would integrate the CGT compilers into compiler explorer properly. My own on premises, rudimentary integration is not really clean.

Also posting compiler bugs would be easier, as people could just send links to godbolt.org

I hope you can reproduce this, and maybe propose a solution, or even better: provide a fix.

If a compiler fix could not be provided in the near future, it would be great if TI could analyze, if all the library functions written in assembler are effected.

Thanks!

Cheers!

Mate Rigo

hello-memcpy.zip