AM2434: GCC memcpy stalls when trying to write to the PRU

Alvaro Arizaga

Part Number: AM2434

Tool/software:

Hello,

I have an EtherCAT example that is working fine with the TI Clang compiler. However, with the GCC compiler, the EtherCAT demo stalls with unaligned access to the PRU. I have the compiler flag "-munaligned-access" but it does not help because unaligned memcpy works fine in RAM but it can potentially cause issues when accessing other peripheral memories (as it seems to happen).

It is interesting to notice that uneven length values such as 45 work fine, but with length values such as 47, memcpy stalls. As a workaround, I modified the memcpy wrapper function from the EtherCAT demo and I copy the last bytes bytewise.

void * tiesc_memcpy(uint8_t *dst, const uint8_t *src, uint32_t size_bytes)
{
#if 0
    memcpy(dst, src, size_bytes);
    return dst;
#endif

    if(size_bytes % 4 != 3)
    {
        memcpy(dst, src, size_bytes);
    }
    else
    {
        memcpy(dst, src, size_bytes - 3);
        for(int i = size_bytes - 3; i < size_bytes; i++)
        {
            dst[i] = src[i];
        }
    }
    return dst;
}

The GCC version I'm using is GCC-10.3-2021.10. Do you have any better idea than using this workaround?

Thanks in advance,

Álvaro

1 month ago

0 Aaron Thomas 29 days ago

TI__Expert 4885 points

Hi Alvaro,

Please expect a delay in response. Will get back to your query by end of this week. Meanwhile you can go forward with your workaround.

Regards,
Aaron

0 Alvaro Arizaga 29 days ago in reply to Aaron Thomas

Prodigy 40 points

Hi Aaron,

Thanks for the reply. I'm in no rush

Best,

Álvaro

0 Aaron Thomas 29 days ago in reply to Alvaro Arizaga

TI__Expert 4885 points

Thank you for your patience Alvaro.

Regards,

Aaron

0 Alvaro Arizaga 23 days ago in reply to Aaron Thomas

Prodigy 40 points

Hi Aaaron,

I hope you are doing well. Do you have any update about this topic?

Thanks in advance,

Álvaro

0 Zach O'Brien 23 days ago

Prodigy 20 points

I am also seeing this issue, but slightly different, so I'll add my problem here.

I'm sign the `ti-cgt-armllvm_3.2.2.LTS` compiler. The unaligned memory access issue occurs with optimizations turned on (-O2 and -Os) in tiescbsp.c at the line

tiesc_memset(pRegPermUpdateAddr, TIESC_PERM_READ_ONLY, SYNC_PERMISSION_UPDATE_PDI_SIZE);
// equivalent to
tiesc_memset(0x48000522, 0x02, 10);

It seems the memset is being optimized to

// r0 = 0x48000000
// r1 = 0x0202
// r2 = 0x02020202
strh.w  r1, [r0, #0x52a]
str.w   r2, [r0, #0x526]
str.w   r2, [r0, #0x522]

This causes an unaligned access fault, as it's trying to do a DWORD store at an address that is only word aligned. From what I could tell stepping though the assembly of the builtin memset with -O0, it ends up doing the these stores like so

// r0 = 0x48000000
// r1 = 0x02020202
strh.w  r1, [r0, #0x522]
str.w   r1, [r0, #0x524]
str.w   r1, [r0, #0x528]

These are correctly aligned stores, but produces highly un-optimized code. I can fix this by finding all the places memset ends up doing unaligned stores and manually replacing them, but would much prefer a better workaround.

I'm left wondering why the compile time optimized memset does unaligned stores when the builtin memset does not. This does seem to be regular clang / llvm behavior though, as this simplified example shows.

0 PratheeshGangadhar 22 days ago

TI__Mastermind 44561 points

Alvaro Arizaga said:
I have the compiler flag "-munaligned-access"

In our AM335x implementation, which is gcc based, we are using -mno-unaligned-access. munaligned-access will enable this!!!

-munaligned-access-mno-unaligned-access

Enables (or disables) reading and writing of 16- and 32- bit values from addresses that are not 16- or 32- bit aligned. By default unaligned access is disabled for all pre-ARMv6, all ARMv6-M and for ARMv8-M Baseline architectures, and enabled for all other architectures. If unaligned access is not enabled, then words in packed data structures are accessed a byte at a time.

This eventually depends on libc memcpy implementation which is not something TI can control. Anyway check this and let us know...

0 PratheeshGangadhar 22 days ago in reply to Zach O'Brien

TI__Mastermind 44561 points

Zach O'Brien said:
I'm sign the `ti-cgt-armllvm_3.2.2.LTS` compiler. The unaligned memory access issue occurs

With -mno-unaligned-access

I see below for this simplified example shows. (BTW this is a good tool)

main:
push {r7, lr}
ldr r0, .LCPI0_0
movs r1, #2
movs r2, #10
.LPC0_0:
add r0, pc
adds r0, #2
bl memset
movs r0, #0
pop {r7, pc}
.LCPI0_0:
.long buffer-(.LPC0_0+4)

buffer:
.zero 16

PS: tiarmclang is a TI supported toolchain, so better to create a new post for issues related to that.

0 Zach O'Brien 22 days ago in reply to PratheeshGangadhar

Prodigy 20 points

It looks like adding __attribute((noinline)) to tiesc_memset ends up producing the same result. It forces a call to libc memset, and for clang the libc memset only does aligned memory accesses.

__attribute((noinline)) void * tiesc_memset(uint8_t *dst, int8_t val, uint32_t size_bytes)
{
    memset(dst, val, size_bytes);
    return dst;
}

I like this as a solution as it allows the rest of the program to make use of the hardware's capabilities and it's portable between clang and gcc*.

*I didn't test gcc, it's possible the libc memset function tries to perform unaligned memory writes when allowed.

0 Alvaro Arizaga 22 days ago in reply to PratheeshGangadhar

Prodigy 40 points

Hi Pratheesh,

I already tried with -mno-unaligned-access but I have the same problem.

These are my compile flags:

set(COMPILER_FLAGS

-mcpu=cortex-r5

-mthumb

-mlittle-endian

-mfloat-abi=hard

-mfpu=vfpv3-d16

-fmessage-length=0

-fsigned-char

-ffunction-sections

-fdata-sections

-mno-unaligned-access

-Wno-unused-function

-Wno-address-of-packed-member)

Best regards,

Álvaro

0 PratheeshGangadhar 21 days ago in reply to Alvaro Arizaga

TI__Mastermind 44561 points

Alvaro

As Zach mentioned above this could be a problem with gcc memset which tries to perform unaligned memory writes when allowed. I do not think we can help here, so you can probably stick with the workaround you implemented.

Regards

Pratheesh

Arm-based microcontrollers

Arm-based microcontrollers forum

AM2434: GCC memcpy stalls when trying to write to the PRU