Hello!
I have some code running on the beaglebone, but it seems that performance is not that great. I am currently running a timer interrupt with some code in it. But the code that is being run in this is not going especially fast. I have one function that toggles a gpio pin. The difference between running the register command and running the register command via a function is 1.5us. So what I am saying is basically that a single function call is taking up 1.5us of processing time. That seems very bad.
Is there anything I'm missing here? Is the default clock rate of the beaglebone with starterware not set to 500 or 720? I've been troubleshooting this for several hours, and I can just not find whats wrong, because surely a function call should not take 1.5us?
Regards
Karl
I have done some more tests. I have also enabled caches as in the demo application. Even then an application that takes 0.3 seconds to run on a blackfin at 400MHz takes about 5 seconds with the BeagleBone running starterware... There must be something wrong, anybody have any ideas?
Hmm... this problem sounds interesting, though I know it's paining you :-)
Just putting my understanding of the problem here.
You have a function which does some task (which is unusual, but lets have it that way), in the context of the timer interrupt.
You set/reset a GPIO on entering the interrupt handler, and reset/set the GPIO before exiting. You see that the pulse duration is 5 secs.
Are you doing any intensive computation? Like a lot of divisions, or complex math etc?
Though I have never worked on Blackfin processors, my two cents:
While I continue to think, hope this helps..
Regards,
Madhvapathi Sriram
Thanks and regards,
Hi Madhvapathi!
Sorry about the confusion, I will try to clarify. I currently have two programs.
The first program consists of basically just a timer interrupt (dmtimer2) running at 50 KHz. The code inside the interrupt is not especially complex, it toggles a gpio on and off, and has some other calculations in between. While looking at it with my analyzer I noticed that the interrupts did not occur at steady 50KHz, but rather at like 30 KHz. So I began stripping down the code. During this time I noticed that just removing one function call and replacing it with what was inside of it would reduce the computation time of the interrupt with 1.5us, which seems awfully long for a function call I thought.
So, I went back to my other program I wrote earlier (it is just a bunch of integer computations, very few floats). I had never benchmarked this before, I just made sure it could be run properly. Now, while running the program I notice that the execution time of it is very slow as well. The same program running on a blackfin is around 40 times faster.
By now I'm basically thinking that it is not especially the interrupt that is slow, but the whole computation of the processor as a whole. So I looked around on the forums, and I saw that some guys were able to improve performance by enabling the caches as written in the demo application. So I take that code and apply it to my programs. The first program with the timer is now able to run at 50KHz, but replacing a single function call with its content is still telling me that function calls are very slow (>1us), so it did not seem to offer that much of an improvment.
I now try to apply the cache code to my second program, this helps a bit, and basically cuts the execution time in half (from 10 seconds down to 5), but the blackfin is still 20 times faster.
Now I'm starting to wonder if the processor is not running at full speed? Is the bootloader putting it at 500 MHz? Is there some other pipelining issues at hand? It just doesn't seem right that a single function call takes over 1us, or that the performance of the processor is basically at least 20 times slower than I would expect.
I have currently only used CCS and the TMS470 compiler. I have not been digging into any optimization options (althought there does not seem to be many). I have tried both running the code via the debugger and booting from memory card, although the performance seems to be the same. To me it seems like there is some sort of initialization of something missing, because surely the performance must be greater...
Thanks for the help!
Regrards
Ok, I finally fixed it!
It seems like the demo-application did not use D-cache, only I-cache. When I enabled the D-cache (I used the code from the uartEdma_Cache project), things speeded up drastically, it is now faster than the blackfin with 50% :)