C6Run Benchmark DSP much Slower than ARM even for different memory maps

Steve Haviland

Other Parts Discussed in Thread: DM3730

Hello

New to the TI/DSP stuff, I have been trying to figure out why I can't get DSP code to run faster. I have read through the forums as much as I could but have not figured out what my issue is. I have tried adjusting the DSP mapping but that does not effect anything at all I think I am missing a key point.

I have attached a text file that contains two different memory map runs with the benchmark runs

Here are two different memory maps I did for C6RUN:

First I had DSP_REGION_CMEM_SIZE at 16 MB and
DSP_REGION_CODE_SIZE at 13 MB

then I tried DSP_REGION_CMEM_SIZE at 28 MB and
DSP_REGION_CODE_SIZE at 28 MB

Beside that only not making any difference. When I checked lsmod for both there was no difference even though I suspect there should be...

I do all the steps that the readme for c6run says to do in setting up platform etc..and I do not run into any run time errors

I read that I am NOT suppose to adjust DSPLINK memory map as the platform config does that for me...

How do I get the DSP to run faster? What I'm I missing?

Any recommendations would be great.

over 13 years ago

0 Gagan Maur over 13 years ago

TI__Expert 8150 points

Performance depends on:

What is the code you are trying to run? Is the code suitable for DSP? Generally DSP is good at running code that does same stuff over and over again. So code that has loops will do better on DSP
What are the options you are using to compile the code? http://processors.wiki.ti.com/index.php/C6RunLib_Documentation#Common_Command-line_Options Make sure you are using options that help improve the performance
How much work are you asking DSP to do? Note, every time ARM offloads processing to DSP, there is some overhead involved. So make sure when you are asking DSP to do certain processing, you are giving enough work to DSP.. else, just your calling overheads will dominate. I think the ballpark for overhead is ~200usecs
Are you helping DSP codegen tools to get best performance by providing them the relevant information? See the document here: http://www.ti.com/lit/pdf/sprabf2 by helping the DSP codegen to not make worst case assumptions about your code, you can gain significant performance

0 Steve Haviland over 13 years ago in reply to Gagan Maur

Prodigy 90 points

Thank you for your response Gagan. I was running all the included C6Run Example code ( bench_dsp and cfft_dsp ) and comparing them with ( bench_arm and cfft_arm)

Both Arm version's run much faster. I used the standard memory map and proper u-boot environment variable setup. I search the forum and found someone else:

http://e2e.ti.com/support/dsp/omap_applications_processors/f/447/p/70317/255208.aspx#255208

who wrote some Matrix calculation sample and ran that too. The DSP version was running close to the same speed as the ARM.

I also ran the C6Accel sample code(c6accel_app)

that goes through each function and logs time it takes to go through it. Comparing mine to the pdf TI provided mine is slower as well.

Running on DM3730.

I will look more into what you wrote (2,3 and 4) and will report back if anything helps.

Thanks

Steve

0 Gagan Maur over 13 years ago in reply to Steve Haviland

TI__Expert 8150 points

Steve, one other thing that I didn't mention is the impact of running floating point code. Note for DM3730, DSP is fixed point whereas the A8 supports floating point. So if the benchmark you are running is natively floating point, the performance of the DSP will not be great. There are fixed point version of FFTs provided in the DSPLIB.

The other thing to note is the CPU freq for the two cores. You should account for that when comparing performance

Cheers,
Gagan

Processors

Processors forum

C6Run Benchmark DSP much Slower than ARM even for different memory maps