This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Migrating from DSP to ARM

Other Parts Discussed in Thread: OMAPL138

Hello fellow developers!

I have somewhat of a phylosophical question.

We currently use OmapL138 (ARM9+C6748 DSP) as our processor.

This CPU is a bit slow and old for us and we need to move forward.

There are lots of cpu's in the market now but what I want to know is Should I migrate to a different DSP (s.a. DM8148) or should I migrate all my code to ARM (Some kind of Arm Cortex A8\A9 or maybe cortex A53 s.a. snapdragon).

We do image processing and FFT calculations, we don't need video encoding or decoding.

I know this is a bit general but I would love to hear what you think on this matter.

And if TI workers are here as well, I would really like to understand why TI is one of the only big companies that still make DSP?

Thanks!

Yoel

  • Hi Yoel,

    Yoel said:
    We currently use OmapL138 (ARM9+C6748 DSP) as our processor.

    This CPU is a bit slow and old for us and we need to move forward.

    There are lots of cpu's in the market now but what I want to know is Should I migrate to a different DSP (s.a. DM8148) or should I migrate all my code to ARM (Some kind of Arm Cortex A8\A9 or maybe cortex A53 s.a. snapdragon).

    If you would like to migrate, I would recommend TI's C66x DSP as it is called as the world's fastest DSP with the below given features.

    •    Up to 1.4GHz of fixed and floating point performance per DSP core
    •    Single core to eight core scalability
    •    Keystone architecture for enhanced multicore performance
    •    Large embedded memory and high bandwidth DDR3/DDR3L interface
    •    Network Coprocessor (NetCP) option including security and packet acceleration
    •    High Speed IO including PCIe, Serial RapidIO, Gigabit Ethernet, Hyperlink.

    From the perspective of Realtime, highperformance, power efficiency, scalability, DSP has its own space when compared to ARM.

  • Thanks Shankari,

    What about the power consumption of the C6678? I saw in one post that it consumes up to 12W, I need to stay at the sub 6W.

    Also, there are now quad cores A9 with neon co-processor running at 1.2GHz doesn't that compensate for that?

    Yoel

  • While C66x capabilities are without doubt impressive, one has to recognize that theoretical limits are rarely achievable in real life and no performance guarantee for arbitrary algorithm. I can't tell where do 32GMACs come from, but 16 GFLOPs  can be achieved only when multiplying and accumulating complex single-precision values. Anything different from this gives dramatically different outcome. For example single-precision FFT can't achieve even half of it. For another example, ceiling for double-precision is much more modest 3GFLOPs, provided that algorithm "spends" two additions per one multiplication. If not, you should consider yourself lucky to achieve 2GFLOPs. And these numbers are already comparable with Cortex-A15. Once again, when it comes to double precision.

    As for Cortexes. A8 floating-point performance is very poor (because scalar FP operations are not pipelined), so that A9 is probably the minimum candidate that can deliver something remotely usable in FP domain. As for NEON, yes, it's an option for single-precision that promises 4 multiply-n-adds per cycle on processors such as A9. But then question if it's achievable on particular algorithm and even if there are implementations that actually do that [and if not, if you're willing to invest into one]. Bottom line is that question doesn't really have definitive answer...

  • Thanks Andy!

    I know there isn't a definitive answer for this, that is why I marked this post as a discussion (:
    Single-percision floating point should be enough for me.
    I am definitely afraid from the implementation of non native algorithms and that is my main concern.
    Have you got a chance to experience such a migration?

    Yoel
  • Yoel Motola said:
    Have you got a chance to experience such a migration?

    No, DSP is really completely parallel track for me. As for "non-native algorithms" I don't really understand the expression. There are algorithms that DSP is better suited for, C66x has some very specialized instructions, but as for algorithms per se I see them as "cosmopolitan". But in either case, FFT was mentioned and googling for "fft arm neon" suggest that there are optimized FFT libraries, some reports suggest that you can reach for somewhere between 1 and 2 GFLOPs on A8/A9. As for C66x on the other hand there was a question on multicore forum if 3GFLOPs is adequate for FFT...

  • By "non-native algorithms" I mean, functions that are native C for example (and not DSP Kernel specific).
    Well, thanks for the help, I have some more study to do.

    Yoel
  • One can sense contradiction between earlier assertion that Cortex-A8 can't deliver adequate FP performance and mention that it delivers FFT results on-par with A9. It's general-purpose vs. specialized case. I mean assertion basically is that FP code not specifically targeting NEON would suck on A8. What does it mean for any particular real-life mixture of general-purpose and specialized code is meant to remain generally impossible to answer.