Gene FrantzTI Principal Fellow and Business Development Manager, DSP
We recently announced a new floating-point core that allows our designers to do fixed- and floating-point instructions on the same DSP opening up many new applications that need this type of precision for fixed-point prices. If you are interested in that, you can see it here: www.ti.com/c674xgfblog. This announcement got me thinking about floating-point again…to be honest, it had been a long time.
Floating point has been around for a very long time. I can remember the day we introduced the TMS320C30 to the industry. There had been a long debate on whether the DSP community would embrace floating point. What we found, to our surprise, was immediate acceptance. But, this raised a question in my mind of whether the TMS320C30 was accepted because it was floating point, 32-bits or C friendly? And, unfortunately, when I asked our system design customers, their answer to the question was “yes.”
Several years later, I spent about a half a year visiting many companies using floating-point processors. My hope was to determine why many of our customers were still using the TMS320C30 when we had told them not to. At the same time the TMS320C67x (a VLIW based floating-point device) was not being accepted even though we recommended it for design. And then there was a third question rolling around in my mind of “why were we considering the design of a yet third floating-point device without knowing the answer to the first two questions?”
Well, I did find the answer to all three questions – none of which were very satisfying. I’ll leave the answers to another blog. What I want to cover in this blog is something I learned from the study, which had nothing to do with the three questions I had hoped to answer.
What I learned was that floating point, as we defined it, had only one significant market – professional audio. Professional audio first went digital when the Motorola 56K was introduced. That processor gave 24-bits of data accuracy and 56-bits of internal accuracy. This was enough to drive the market until ADI introduced its 21060, which was an extended precision floating-point device (8-bit exponent and 32-bit mantissa). We, at TI, were beginning to enter this market with our TMS320C67x generation, which did not have extended precision floating point, but could do a pretty good job of double precision floating point (11 bit exponent and 53 bit mantissa). What I realized was that every application had three accuracy requirements:
It turned out that the key to audio was the coefficient accuracy. And, as we move from 48 KHz sampling to 96 to 192 to 384 KHz, the accuracy requirement only increases.
But, let’s get back to the point of this blog. That is, when we speak of floating point we are always speaking of a 32-bit version (or a 64-bit double precision version). What this means is that floating point is always lower in raw performance compared to a fixed-point processor. But this is actually more determined by the fact that the fixed-point processor is 16-bits and the floating-point processor is 32-bits.
So, what if we decided to do a floating-point processor of a different size? Let’s take a simple example of a 16-bit floating-point processor. For now let’s call it “half precision.” For simplicity let’s also assume its format is 5-bits exponent and 11-bits mantissa. We now have an interesting set of new questions to answer:
Let me give quick answers to the first and third question, and then spend a bit of time on the second. There are several applications that could use this “half precision” floating-point format. And, there are other formats that might also be interesting.
With those two answered, albeit not satisfactorily, let’s focus on performance. In simple terms because the multiplier would be an 11 X 11 rather than a 24 X 24 (single precision), the raw performance could be easily four times that of a 32-bit floating-point device. That is the 24 X 24 bit multiplier found in a single precision could be split into four 12 X 12 bit multipliers giving four times the multiplies each instruction cycle, It could also be 30% faster than a 16-bit multiply in a fixed point system. That means the 16-bit floating-point machine could be 4X a present day floating-point machine and perhaps 30% faster than its 16-bit fixed-point equivalent.
With this in mind, I asked two professor friends of mine to help me push this a bit further. Dr. Armand Tanguay at USC is looking at a possible application area that 16-bit floating point could be used, and Dr. David Anderson at Georgia Tech University is thinking about the 16-bit floating-point math system.
To sum it up, I’m floating a new idea surrounding floating point. As usual, it asks more questions than it answers. But, then, that is the fun of it all.
What are your thoughts on this?
Look back at what Keith Larson did 10-15 years ago. He proposed something like this, except implemented on fixed point architecture. I don't remember the details anymore (I'm almost as old as you), but had ideas for applications that could use this type implementation.
Gene, Interesting thoughts. It's GREAT to see TI's commitment to floating point continue! However, I'm not 100% sure I see any advantage to 16 bit floating point unless there is a significant cost advantage due to the reduction in logic of the datapath. I suppose that on chip RAM size is a significant cost in a 32 bit processor with large on chip caches. In most devices we are talking these days about 32 bit "native" integer data and 32 bit wide instructions from the start. That's a good thing.
With regard to floating point acceptance of the 67xx vs C3x. I think both were revolutionary and in the case of the C3x there was a strong acceptance of C as a programming model. (First DSP that had a s/w stack!) It seemed like the VLIW architecture struggled with very long interrupt latency which hindered Analog in / Analog out signal processing.
TI has an EXCELLENT Floating Point architecture in the C2000 Digital Signal Controller family.
The 'F28335 and now the 'F28035 have full IEEE754 floating point support and up to 600 MFLOPs! Great alternatives to the C3x legacy and a bargain at twice the price ! They might be enhanced for Audio with a slightly differnt peripheral mix but fill the bill with low interrupt latency, and high performance and efficient floating point support.
Hi, I want to have the evaluation version of CCS to study purpose as i don't want to buy it. Can anyone provide me exact link of it so that i can download it.