Compiler/F28M35H52C: Convert inline extended asm to standalone asm

Mathias Meier

Part Number: F28M35H52C

Tool/software: TI C/C++ Compiler

Hi,

I would like to ask for help to convert inline assembler code that uses Extended Asm to a standalone assembler file, as Extended Asm isn’t supported from the ARM compiler (see spnu151i, page 116).

It is a part from a big number multiply library, which provide a part of the code as C and Extended Asm. With the C code everything runs fine but a bit slow. So I want replace the this C code with the assembler code that I need to convert since Extended Asm isn’t supported

Used CPU is F28M35H52C

This is the function from the library :

void mpi_mul_hlp( size_t i, mbedtls_mpi_uint *s, mbedtls_mpi_uint *d, mbedtls_mpi_uint b )					
{
    mbedtls_mpi_uint c = 0, t = 0;


    for( ; i >= 16; i -= 16 )
    {
        MULADDC_INIT
        MULADDC_CORE   MULADDC_CORE
        MULADDC_CORE   MULADDC_CORE
        MULADDC_CORE   MULADDC_CORE
        MULADDC_CORE   MULADDC_CORE

        MULADDC_CORE   MULADDC_CORE
        MULADDC_CORE   MULADDC_CORE
        MULADDC_CORE   MULADDC_CORE
        MULADDC_CORE   MULADDC_CORE
	MULADDC_STOP
    }

    for( ; i >= 8; i -= 8 )
    {
        MULADDC_INIT
        MULADDC_CORE   MULADDC_CORE
        MULADDC_CORE   MULADDC_CORE

        MULADDC_CORE   MULADDC_CORE
        MULADDC_CORE   MULADDC_CORE
        MULADDC_STOP
    }

    for( ; i > 0; i-- )
    {
        MULADDC_INIT
        MULADDC_CORE
        MULADDC_STOP
    }

    t++;

    do {
        *d += c; c = ( *d < c ); d++;
    }
    while( c != 0 );
}

( mbedtls_mpi_uint is a uint32_t )

This is the Extended Asm code:

#define MULADDC_INIT                                    \
		asm (																								\
            "ldr    r0, %3                      \n\t"   \
            "ldr    r1, %4                      \n\t"   \
            "ldr    r2, %5                      \n\t"   \
            "ldr    r3, %6                      \n\t"


#define MULADDC_CORE                                    \
            "ldr    r4, [r0], #4                \n\t"   \
            "mov    r5, #0                      \n\t"   \
            "ldr    r6, [r1]                    \n\t"   \
            "umlal  r2, r5, r3, r4              \n\t"   \
            "adds   r7, r6, r2                  \n\t"   \
            "adc    r2, r5, #0                  \n\t"   \
            "str    r7, [r1], #4                \n\t"


#define MULADDC_STOP                                    \
            "str    r2, %0                      \n\t"   \
            "str    r1, %1                      \n\t"   \
            "str    r0, %2                      \n\t"   \
         : "=m" (c),  "=m" (d), "=m" (s)        \
         : "m" (s), "m" (d), "m" (c), "m" (b)   \
         : "r0", "r1", "r2", "r3", "r4", "r5",  \
           "r6", "r7", "cc"                     \
         );

This is the C code:

#define MULADDC_INIT                    \
{                                       \
    mbedtls_mpi_uint s0, s1, b0, b1;              \
    mbedtls_mpi_uint r0, r1, rx, ry;              \
    b0 = ( b << biH ) >> biH;           \
    b1 = ( b >> biH );

#define MULADDC_CORE                    \
    s0 = ( *s << biH ) >> biH;          \
    s1 = ( *s >> biH ); s++;            \
    rx = s0 * b1; r0 = s0 * b0;         \
    ry = s1 * b0; r1 = s1 * b1;         \
    r1 += ( rx >> biH );                \
    r1 += ( ry >> biH );                \
    rx <<= biH; ry <<= biH;             \
    r0 += rx; r1 += (r0 < rx);          \
    r0 += ry; r1 += (r0 < ry);          \
    r0 +=  c; r1 += (r0 <  c);          \
    r0 += *d; r1 += (r0 < *d);          \
    c = r1; *(d++) = r0;

#define MULADDC_STOP                    \
}

The first step is to try to replace the last part. So I defined the needed variables as global to use them in the assembler code.

Changed last part of the C code:

    s_glo = s;
    d_glo = d;
    b_glo = b;
    c_glo = c;
    for( ; i > 0; i-- )
    {
    	test_funktion();
    }
    b = b_glo;
    c = c_glo;

Changed assembler code:

.text



				.global s_glo
s_glo_add			.uword  s_glo
				.global d_glo

				.global b_glo
b_glo_add			.uword  b_glo
				.global c_glo
c_glo_add			.uword  c_glo

				.global test_funktion


test_funktion:		PUSH	{R4-R11}
					ldr	 R0, s_glo_add
					ldr	 R1, d_glo_add
					ldr	 R2, c_glo_add
					ldr	 R3, b_glo_add
					; Start MULADDC_CORE
					ldr    r4, [r0], #4
					mov    r5, #0
					ldr    r6, [r1]
					umlal  r2, r5, r3, r4
					adds   r7, r6, r2
					adc    r2, r5, #0
					str    r7, [r1], #4
					; End MULADDC_CORE
					ldr	 R8, c_glo_add
					str R2, [R8]
					ldr	 R8, d_glo_add
					str R1, [R8]
					ldr	 R8, s_glo_add
					str R0, [R8]
					POP	{R4-R11}
					BX LR

d_glo_add			.uword  d_glo

Unfortunately the code don’t work as intended (Result from C Code and assembler code is completely different) and as I don’t have any experience with assembler, I don’t have any idea left what is wrong.

Hopefully someone can help me with this.

over 5 years ago

0 George Mock over 5 years ago

TI__Guru**** 232670 points

This goes beyond the scope of what we typically handle on the forum.

It is possible to combine code built with the GCC ARM toolchain with code built by the TI ARM toolchain. Have you considered going this route?

Thanks and regards,

-George

0 George Mock over 5 years ago in reply to George Mock

TI__Guru**** 232670 points

Did you attempt to use the GCC ARM toolchain? If so, how did that go?

Thanks and regards,

-George

0 Mathias Meier over 5 years ago in reply to George Mock

Prodigy 140 points

Dear George,

we tried to use the GCC ARM toolchain the last day, but unfortunately without success.

We will try to find out how to change the ASM commands so that the standalone ASM and extended ASM will do the same. Any help is still very welcome.

Thank you for your suggenstions,

0 George Mock over 5 years ago in reply to Mathias Meier

TI__Guru**** 232670 points

I presume the big number multiply library is built with the GCC ARM compiler. Everything else is built with the TI ARM compiler. Then it is all linked together with the TI ARM linker. Is that what you are doing? If not, please describe your build process in similar terms.

Thanks and regards,

-George

0 Archaeologist over 5 years ago in reply to Mathias Meier

TI__Guru* 84225 points

The easiest way to turn a C file (even one with embedded assembly code) into assembly code is with the compiler.
If possible, run the GCC ARM compiler with the "-S" option. This will produce an assembly output file.
This assembly file will have GNU directives, so it can't be fed directly to the TI assembler, but the instructions have exactly the same format.

0 George Mock over 5 years ago in reply to Archaeologist

TI__Guru**** 232670 points

Since it has been a while, I presume you have resolved the problem. I'd appreciate if you would describe how you resolved it.

Thanks and regards,

-George

Code Composer Studio™︎

Code Composer Studio forum

Compiler/F28M35H52C: Convert inline extended asm to standalone asm