This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Regarding performance issue with SDRAM in OMAP3503 EVM(3530 Chip)

Other Parts Discussed in Thread: OMAP3503, OMAP3530

we are facing a problem while profiling our sample code in OMAP3503 EVM.

Problem: One sample application we wrote in DSP.In that we have two input buffers and one output buffer.If we put all the three buffers in the external memory(SDRAM) we are getting the 183019 cycles.But when we put the same buffers in the internal memory,we are getting the 3352 cycles i.e (60times less).

Is this a normal behaviour of EVM or are we doing something wrong in configuration of the memory.

If you need we can send the sample code and .cmd file.

Looking forward to your reply

Thanks in Advance

  • This is probably normal behavior you are seeing, it is not unusual to see a performance difference of many orders of magnitude when comparing code using external memory accesses to code that is entirely internal, I have seen higher and lower performance gains on various DSPs though 60x is on the high end.

    It is hard to say if there could be a problem anywhere because the performance is so system specific, having dependencies on not only the external memory speed, but how the code is written determining how the external memory accesses happen, and what system level bandwidth is being used by peripherals and potentially the ARM as well as alignment of the code and data. Chances are that your external memory is configured properly and you are just seeing the sum total of all the latencies that come about from relying on external memory accesses.

    For improving the external memory performance the first area I would check is to ensure that caching is enabled, and that any of the most commonly accessed code/data is moved to internal memory, as much as possible.

  • To me this definitely sounds like the cache is not properly enabled.  Perhaps this wiki article I wrote will help explain:

    http://wiki.davincidsp.com/index.php?title=Enabling_64x%2B_Cache

    Of course, the benefit you receive from cache will be dependent on your application code, but generally speaking when you move your code to external memory I've typically seen around a 10-15% performance hit.  There are some "cache busting" algorithms out there that could be much much worse, though I wouldn't expect to see a 60x difference unless it was not using cache at all.

    At one point I noticed that dsplink was not turning on the right MAR bits on OMAP3530, so that could potentially be your issue.  I think that was fixed in dsplink 1.60.

    If you need more help please provide more info as to which TI software components you are using (e.g. BIOS, dsplink, etc.) and what versions.  Also, please provide any cache related details, either from BIOS tcf or from your own code.

    Brad

  • Hi all,

       Thanks for your valuable inputs.

      Today i enabled the cache and made the complete external memory cacheble.Even then iam seeing the 8x difference in performance compared to simulator.Is it because of cache is not properly enabled? For Our video Decoder,the performance was degraded 8 times with respect to simulator.Most of code and data sit in external memory as we have only (80kb+96kb+32kb) of internal memory.

    Can you please tell me what could be the other reasons if it not the problem with cache.

    With Regards

    Pratap

     

  • The simulator does not model everything in the device.  For example, it does not model the DDR (open page, close page, refresh, CAS latency, etc.).  Also, if you're just using a CPU simulator there would be other things to consider.  For example, in OMAP3530 the DDR would be shared amongst all the initiators on the OCP bus.  So if data is being sent out a display or being captured this will load the DDR and reduce the bandwidth available by the CPU.  Also, whatever simulator you were using would not have accounted for additional cycle hits introduced by the MMU on the IVA subsystem.

    The other big item to check would be the clock speed.  If the cores are not running at the speed you think they are running then your performance will be greatly reduced.  If you are using a different main clock frequency than the EVM than you should double-check the u-boot code to verify the PLLs are being setup correctly.  It might be printing out one frequency even though it's running at something different.