Hello
New to the TI/DSP stuff, I have been trying to figure out why I can't get DSP code to run faster. I have read through the forums as much as I could but have not figured out what my issue is. I have tried adjusting the DSP mapping but that does not effect anything at all I think I am missing a key point.
I have attached a text file that contains two different memory map runs with the benchmark runs
Here are two different memory maps I did for C6RUN:
First I had DSP_REGION_CMEM_SIZE at 16 MB andDSP_REGION_CODE_SIZE at 13 MB
then I tried DSP_REGION_CMEM_SIZE at 28 MB andDSP_REGION_CODE_SIZE at 28 MB
Beside that only not making any difference. When I checked lsmod for both there was no difference even though I suspect there should be...
I do all the steps that the readme for c6run says to do in setting up platform etc..and I do not run into any run time errors
I read that I am NOT suppose to adjust DSPLINK memory map as the platform config does that for me...
How do I get the DSP to run faster? What I'm I missing?
Any recommendations would be great.
Performance depends on:
----------------------------------------------------------------------------------------------------------Please click the Verify Answer button on this post if it answers your question.----------------------------------------------------------------------------------------------------------
Thank you for your response Gagan. I was running all the included C6Run Example code ( bench_dsp and cfft_dsp ) and comparing them with ( bench_arm and cfft_arm)
Both Arm version's run much faster. I used the standard memory map and proper u-boot environment variable setup. I search the forum and found someone else:
http://e2e.ti.com/support/dsp/omap_applications_processors/f/447/p/70317/255208.aspx#255208
who wrote some Matrix calculation sample and ran that too. The DSP version was running close to the same speed as the ARM.
I also ran the C6Accel sample code(c6accel_app)
that goes through each function and logs time it takes to go through it. Comparing mine to the pdf TI provided mine is slower as well.
Running on DM3730.
I will look more into what you wrote (2,3 and 4) and will report back if anything helps.
Thanks
Steve
Steve, one other thing that I didn't mention is the impact of running floating point code. Note for DM3730, DSP is fixed point whereas the A8 supports floating point. So if the benchmark you are running is natively floating point, the performance of the DSP will not be great. There are fixed point version of FFTs provided in the DSPLIB.
The other thing to note is the CPU freq for the two cores. You should account for that when comparing performance
Cheers,Gagan