This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TDA4VM: LLVM r5 C library memset/memcpy too slow ?

Part Number: TDA4VM

Hi experts,
I am using tda4vm sdk8.1, and I try to set no cached memory region (64M) to zero by c library memset function on mcu3_0,
but it consume about 14 second.

I use mem_set8_arm function to replace memset as follow , it consume about 500ms.

static void *mem_set8_arm (void *dest, int c, size_t n)
{
    uint32_t *d = dest;
    uint8_t *dc = dest;
    uint32_t setflag32 =
        (c & 0xff) |
        ((c << 8) & 0xff00) |
        ((c << 16) & 0xff0000) |
        ((c << 24) & 0xff000000);
    uint8_t setflag8 = c & 0xff;

    while (n >= 64) {
        __asm __volatile
        (
            "\n\t mov r4, %[flag]"
            "\n\t mov r5, r4"
            "\n\t mov r6, r4"
            "\n\t mov r7, r4"
            "\n\t stmia %[dst]!,{r4-r7}"
            "\n\t stmia %[dst]!,{r4-r7}"
        :: [dst] "r" (d), [flag] "r" (&setflag32) : "r4", "r4", "r6", "r7");

        d += 16;

        n -= 64;
    }

    while (n >= 4) {
        *d++ = setflag32;
        n -= 4;
    }

    dc = (uint8_t *) d;

    while (n--)
        *dc++ = setflag8;

    return dest;
}


there are two problem:
1. Why LLVM c library memset/memcpy function too slow ?
2. Why r5 access ddr bandwidth only about 100M/s ?

Regards,

Li quan

  • Hi expers:

    My disassembly code as follow,

    armv7r-ti-none-eabihf/libclang_rt.builtins.a (aeabi_memset.S.obj) disassembly code as follow,

    why TI_memset_small (memcpy32.S) function be called instead of memset (memcpy32.S)? 

    Regards,

    Li quan

  • Thank you for reporting the problem.  I am able to produce a build that has a similar problem.  I used it to file the entry EXT_EP-10824.  This entry does not report a bug, but code that is slower than it should be.  You are welcome to follow it with that link.

    Thanks and regards,

    -George

  • Hi George,

    Refer to TI_Arm_Clang_Compiler_Tools_User_Guide.pdf,  I add -–use_memset=fast  linker options to select faster memset function, but memset 64M data consume about 1s , 

    1. why is R5 accessing DDR so slow (using faster memset about 64M/s),Is this normal? Is there any relevant test data? 

    2. Is there any way to increase R5 access to DDR bandwidth?

    R5 needs to go through the north bridge to access DDR. Is this the reason for the slow speed of accessing DDR?

    Regards,

    Li quan