AM2434: GCC memcpy stalls when trying to write to the PRU

Part Number: AM2434

Tool/software:

Hello,

I have an EtherCAT example that is working fine with the TI Clang compiler. However, with the GCC compiler, the EtherCAT demo stalls with unaligned access to the PRU. I have the compiler flag "-munaligned-access" but it does not help because unaligned memcpy works fine in RAM but it can potentially cause issues when accessing other peripheral memories (as it seems to happen).

It is interesting to notice that uneven length values such as 45 work fine, but with length values such as 47, memcpy stalls. As a workaround, I modified the memcpy wrapper function from the EtherCAT demo and I copy the last bytes bytewise.

void * tiesc_memcpy(uint8_t *dst, const uint8_t *src, uint32_t size_bytes)
{
#if 0
    memcpy(dst, src, size_bytes);
    return dst;
#endif

    if(size_bytes % 4 != 3)
    {
        memcpy(dst, src, size_bytes);
    }
    else
    {
        memcpy(dst, src, size_bytes - 3);
        for(int i = size_bytes - 3; i < size_bytes; i++)
        {
            dst[i] = src[i];
        }
    }
    return dst;
}

The GCC version I'm using is GCC-10.3-2021.10. Do you have any better idea than using this workaround?

Thanks in advance,

Álvaro

  • Hi Alvaro,

    Please expect a delay in response. Will get back to your query by end of this week. Meanwhile you can go forward with your workaround.

    Regards,
    Aaron

  • Hi Aaron,

    Thanks for the reply. I'm in no rush Slight smile

    Best,

    Álvaro

  • Thank you for your patience Alvaro.

    Regards,

    Aaron

  • Hi Aaaron,

    I hope you are doing well. Do you have any update about this topic?

    Thanks in advance,

    Álvaro

  • I am also seeing this issue, but slightly different, so I'll add my problem here.

    I'm sign the `ti-cgt-armllvm_3.2.2.LTS` compiler. The unaligned memory access issue occurs with optimizations turned on (-O2 and -Os) in tiescbsp.c at the line

    tiesc_memset(pRegPermUpdateAddr, TIESC_PERM_READ_ONLY, SYNC_PERMISSION_UPDATE_PDI_SIZE);
    // equivalent to
    tiesc_memset(0x48000522, 0x02, 10);

    It seems the memset is being optimized to 

    // r0 = 0x48000000
    // r1 = 0x0202
    // r2 = 0x02020202
    strh.w  r1, [r0, #0x52a]
    str.w   r2, [r0, #0x526]
    str.w   r2, [r0, #0x522]

    This causes an unaligned access fault, as it's trying to do a DWORD store at an address that is only word aligned. From what I could tell stepping though the assembly of the builtin memset with -O0, it ends up doing the these stores like so

    // r0 = 0x48000000
    // r1 = 0x02020202
    strh.w  r1, [r0, #0x522]
    str.w   r1, [r0, #0x524]
    str.w   r1, [r0, #0x528]

    These are correctly aligned stores, but produces highly un-optimized code. I can fix this by finding all the places memset ends up doing unaligned stores and manually replacing them, but would much prefer a better workaround.

    I'm left wondering why the compile time optimized memset does unaligned stores when the builtin memset does not. This does seem to be regular clang / llvm behavior though, as this simplified example shows.

  • I have the compiler flag "-munaligned-access"

    In our AM335x implementation, which is gcc based, we are using -mno-unaligned-access. munaligned-access will enable this!!!

    -munaligned-access-mno-unaligned-access

    Enables (or disables) reading and writing of 16- and 32- bit values from addresses that are not 16- or 32- bit aligned. By default unaligned access is disabled for all pre-ARMv6, all ARMv6-M and for ARMv8-M Baseline architectures, and enabled for all other architectures. If unaligned access is not enabled, then words in packed data structures are accessed a byte at a time. 

    This eventually depends on libc memcpy implementation which is not something TI can control. Anyway check this and let us know...

  • I'm sign the `ti-cgt-armllvm_3.2.2.LTS` compiler. The unaligned memory access issue occurs

    With -mno-unaligned-access 

    I see below for this simplified example shows. (BTW this is a good tool)

    main:
    push {r7, lr}
    ldr r0, .LCPI0_0
    movs r1, #2
    movs r2, #10
    .LPC0_0:
    add r0, pc
    adds r0, #2
    bl memset
    movs r0, #0
    pop {r7, pc}
    .LCPI0_0:
    .long buffer-(.LPC0_0+4)

    buffer:
    .zero 16

    PS: tiarmclang is a TI supported toolchain, so better to create a new post for issues related to that.

  • It looks like adding __attribute((noinline)) to tiesc_memset ends up producing the same result. It forces a call to libc memset, and for clang the libc memset only does aligned memory accesses. 

    __attribute((noinline)) void * tiesc_memset(uint8_t *dst, int8_t val, uint32_t size_bytes)
    {
        memset(dst, val, size_bytes);
        return dst;
    }

    I like this as a solution as it allows the rest of the program to make use of the hardware's capabilities and it's portable between clang and gcc*.

    *I didn't test gcc, it's possible the libc memset function tries to perform unaligned memory writes when allowed.

  • Hi Pratheesh,

    I already tried with -mno-unaligned-access but I have the same problem.

    These are my compile flags:

        set(COMPILER_FLAGS
                           -mcpu=cortex-r5
                           -mthumb
                           -mlittle-endian
                           -mfloat-abi=hard
                           -mfpu=vfpv3-d16
                           -fmessage-length=0
                           -fsigned-char
                           -ffunction-sections
                           -fdata-sections
                           -mno-unaligned-access
                           -Wno-unused-function
                           -Wno-address-of-packed-member)

    Best regards,

    Álvaro

  • Alvaro

    As Zach mentioned above this could be a problem with gcc memset which tries to perform unaligned memory writes when allowed. I do not think we can help here, so you can probably stick with the workaround you implemented.

    Regards

    Pratheesh