Beegle Tool Chain Doubt!!

Audio Dave

Hi,

I am using a tool chain and compile the source code for cortexA8. The tool chain and the enabled switches are given below

CPP=arm-none-linux-gnueabi-gcc
SW=-march=armv7-a -mtune=cortex-a8 -mfpu=neon -mfloat-abi=softfp -flax-vector-conversions

I have following doubts regarding this tool chain!

How can i disable the neon vectorization of the given C only source ? but i need an executable for beegle board, because the beegle board is my target

If i didn't give any neon intrinsic or neon assembly then the compiler will generate a default neon code for the given C to the target beegle using the above tool chain ?

my aim is to check the performence of C code without any neon or vectorization Vs Neon enabled C code (C Neon Intrinsics + Neon Assembly) for an image viewer application

Rgds

Dave

over 15 years ago

0 Jeff L over 15 years ago

TI__Expert 5960 points

Hi Dave,

You need some form of: -ftree-vectorize to autovectorize for NEON. Since you do not have -ftree-vectorize, you should not generate auto vectorized NEON code. Also you mentioned you do not have NEON intrinsics or NEON assembly included, so the only way to generate NEON code is by compiler autovectorization.

If you add -ftree-vectorizer-verbose=2, you should be able to see where the NEON vectorization occursif any. Then when you remove this compiler option, you should see the NEON code go away. You could verify this by dumping out the assembly, NEON uses different registers than ARM. NEON uses d0-d31 for 64 bit SIMD operations and q0-q15 for 128 bit SIMD operations.

Are you using any floating point operations? If so it might be a little more complicated to verify because the VFP also uses d0-d15, but the VFP is not SIMD, so you can verify whether NEON is used or not.

Regards,

0 Audio Dave over 15 years ago in reply to Jeff L

Intellectual 280 points

Hi Jeff

Thanks for your replay but my problem is still existing
I explain my issues once again !

I try to develop an image viewer application to view RGB and BitMap.
My target is Beegle board and Kernal is Angstrom

I have two set of source code (All versions are in Fixed Point)

Version 1 : Pure ANSI C

Only the C code is considered
The make file is given below

OBJFILES = # objfiles.o
INCLUDE = -I./../Header
ABC=arm-none-linux-gnueabi-gABC
PQR=-march=armv7-a -mtune=cortex-a8
CFLAGS = -O3 -Wall $(INCLUDE) $(PQR)
HOME = IMViewer.so
$(HOME) : $(OBJFILES)
$(ABC) -o $@ $^ $(CFLAGS) -fPIC -L. -shared
${ABC} -o IMView $(OBJFILES) -ldl -L. -lIMViewer
install $(HOME) ../lib/
mv IMView ../lib/
rm -rf IMViewer.so
@echo "C Version completed..."
%.o : %.c
$(ABC) -c $(CFLAGS) $< -o $@

Version 2 : C + Neon Intrinsics

In this version i use the neon intrinsics where ever applicable
and the resulting source is mixed with C and Neon intrinsics
The make file used for compiling this is given below

OBJFILES = # objfiles.o
INCLUDE = -I./../Header
ABC=arm-none-linux-gnueabi-gABC
PQR=-march=armv7-a -mtune=cortex-a8 -mfpu=neon -mfloat-abi=softfp -ftree-vectorizer-verbose=2 -flax-vector-conversions
CFLAGS = -O3 -Wall $(INCLUDE) $(PQR)
HOME = IMViewer.so
$(HOME) : $(OBJFILES)
$(ABC) -o $@ $^ $(CFLAGS) -fPIC -L. -shared
${ABC} -o IMView $(OBJFILES) -ldl -L. -lIMViewer
install $(HOME) ../lib/
mv IMView ../lib/
rm -rf IMViewer.so
@echo "C Neon Version completed..."
%.o : %.c
$(ABC) -c $(CFLAGS) $< -o $@

Hope u get my real set up

Then in my IMViewer application i take the performence of both versions
the code fragment is given below

#include<stdio.h>
#include <sys/time.h>
long st = 0,et = 0;
struct timeval First, Last;
void main(int argc, char**argv)
{
     gettimeofday(&First, NULL);
st = (First.tv_sec * 1000) + (First.tv_usec/1000) ;       /* Time In Mill Second Unit */

IMViewer();

gettimeofday(&Last, NULL);
et = (Last.tv_sec * 1000) + (Last.tv_usec/1000) ;       /* Time In Mill Second Unit */
printf("The Effective time in Millisecond is %d",(et - st));

}

This code fragment is working in common for two versions to take the time to complete .

But sadly the performence for version 2 is not good. It is near to C version. I don't spot
what is the problem here !

I did the checking the following cases and it is Ok

1. OS Kernal is NEON enabled (OMAP 3530)
2. In the generated assembly files of NEON code there is assembly instruction of neon intrinsics

Following doubts still exists

1. Will i can configure the L1 and L2 cache size of OS kernal?
2. Is there any hand written assembly is needed for enable the Neon processor of beegle board

Kindly look in to my issue and please help me !

Rgds
Dave

0 Jeff L over 15 years ago in reply to Audio Dave

TI__Expert 5960 points

Dave,

Now I understand your setup better. I don't have a beagle board to test with, so you may want to also make this post on beagleboard.org. Since you have Neon intrisic code, you can ignore the previous post about autovectorization. If your code runs then NEON is enabled. If Neon were not enabled, then you would get an illegal instruction when executing NEON code. Can you check the L1NEON bit and make sure Neon is able to cache in L1. You can find it at bit 5 of the Auxiliary Control Register:

MRC p15, 0, <Rd>, c1, c0, 1 ; Read Auxiliary Control Register

Regards,

Jeff

0 Audio Dave over 15 years ago in reply to Jeff L

Intellectual 280 points

Hi Jeff,

Happy to see your replay.

Please look in to the following code section

void main()

{

/* Declarations */

    unsigned long aux;
    __asm__ __volatile__("mrc p15, 0, %0, c1, c0, 1":"=r" (aux));
    printf("Aux control Before: %08X\n", aux );

    __asm__ __volatile__("mrc p15, 0, r0, c1, c0, 1");
     /* Enabling ASA */
    __asm__ __volatile__("orr r0, r0, #0x10");
    /* Enable L1NEON */
    __asm__ __volatile__("orr r0, r0, #1<<5");

    __asm__ __volatile__("mrc p15, 0, %0, c1, c0, 1":"=r" (aux));
    printf("Aux control After: %08X\n", aux );

/* IMViewer Code */

}

Here the fifth bit will set for L1 cache. But i don't know what is mean by ASA

Is this necessary for L1 cache enabling and get an optimized result ?

Like this L2 cache has any dependency with NEON core ?

Thanks,

Dave

Processors

Processors forum

Beegle Tool Chain Doubt!!