This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Performance of algorithm between PC(intel CPU 2.6GHZ) and DSP(TMS320DM648 891MHZ)

Other Parts Discussed in Thread: TMS320DM648

hi,TI compilers experts:

I want to know the difference about performance when algorithms were implemented on PC(intel cpu or AMD cpu) and on DSP(TMS320DM648).Now,My job was DSP Algorithm 

engineer.so I always transplanted algorithms from PC (intel cpu or AMD cpu) to DSP(TMS320DM648) and then optimized algorithms on DSP.But a question is always troubling me

that I don't know when I can stop my task and what's the limit of performance on DSP. 

         For example,  time wasted on PC(intel cpu G620 2.6GHZ) was A ms  and B ms on DSP(DM648  891MHZ).my question was that what's the relationship of A and B. B > A  or B<A. To another way ,supposed C = B/A .

                  1.

                     what's the range of C .C belongs to (0,1) or (1,+∞)

                  2.

                      what's the limit of C .what's the value of C that could be regarded as the best performance on DSP

         Recently,my algorithm was implemented on PC (intel cpu 2.6GHZ) about 250ms  and when I transplanted to DSP (DM648 891MHZ)  about 330ms.I want to decrease the time from 330ms to 10ms or less.Can I achieve that goal? what about 5ms? what about 1ms?

                                                                                                                                                                                 Steve

                                                                                                                                                                               Best Regards!

  • Several factors, combined together, comprise system performance.  Compiler developers (mostly) focus on only one factor: CPU cycle count.  This ignores cycles lost to system effects such as cache misses, memory latency, etc.

    I'm really saying we know less about performance than you think we do.  We'd be happy to help you with a situation where a specific function is too slow, or you think some instruction sequence could be better.  But commenting, in a general way, on the performance difference between a C64+ CPU and an Intel CPU is not something we are able to do.

    Thanks and regards,

    -George

  • George Mock said:

    Several factors, combined together, comprise system performance.  Compiler developers (mostly) focus on only one factor: CPU cycle count.  This ignores cycles lost to system effects such as cache misses, memory latency, etc.

    I'm really saying we know less about performance than you think we do.  We'd be happy to help you with a situation where a specific function is too slow, or you think some instruction sequence could be better.  But commenting, in a general way, on the performance difference between a C64+ CPU and an Intel CPU is not something we are able to do.

    Thanks and regards,

    -George

    Hi,George:

    First,I'm appreciated that you give me some advice.But I can't agree with you totally  and I had some opinion to discuss with you.

    1. 

    I thought C64+ CPU was less efficient than Intel CPU 

    Of course,you said ARM and DSP belong to  RSIC and x86 Intel CPU belongs to CSIC. Maybe, you said x86 Intel CPU was more power-consumption  than ARM and DSP

    But even so I still said x86 Intel CPU was powerful than C64+ DSP. As far as I konw, the highest frequency of C64+ DSP was 1GHZ(such as DM648) or more 1.5GHZ(C6455 ? I am not sure).Until now I can't hear that the frequency of C64+ CPU reach 2GHZ single core DSP.While x86 Intel CPU had dual core ,four core cpu and the frequency of cpu was more than 3GHZ.Even not so high ,taking my PC x86 cpu (2.6GHZ ) for example,I said my algorithm run more quickly on x86 Intel CPU (2.6GHZ) than C64+ CPU TMS320DM648(891MHZ).    

    So George,did you agree with me? 

    2.

    Another question.About the limit of optimization as to algorithm on TI C64+ dsp. To be a dsp engineer,my basic task was to transplant algorithm from PC to DSP and then optimize algorithms on C64+ CPU dsp,to make algorithm more and more efficiently. it's always puzzled me that I can't know when the algorithm was no space to be optimized .

    for example. Supposed the algorithm run on DM648  100ms  before optimization,and let a more significant dsp optimization expert to optimize.what was the time-waste of algorithm?    10ms ?  1ms ?  0ms ?????     I didn't know totally. I thought DSP compiler was like a black box that I couldn't deal with.

           In my opinion ,  however your ti compiler optimization skills was advanced.it also had a limit . Although, these skills belong to sofware optimization .the optimization relying on software was less significant than   relying on hardware.

          what's your opinion George?

  • steve zhang said:

    Several factors, combined together, comprise system performance.  Compiler developers (mostly) focus on only one factor: CPU cycle count.  This ignores cycles lost to system effects such as cache misses, memory latency, etc.

    I'm really saying we know less about performance than you think we do.  We'd be happy to help you with a situation where a specific function is too slow, or you think some instruction sequence could be better.  But commenting, in a general way, on the performance difference between a C64+ CPU and an Intel CPU is not something we are able to do.

    Thanks and regards,

    -George

    Hi,George:

    First,I'm appreciated that you give me some advice.But I can't agree with you totally  and I had some opinion to discuss with you.

    1. 

    I thought C64+ CPU was less efficient than Intel CPU 

    Of course,you said ARM and DSP belong to  RSIC and x86 Intel CPU belongs to CSIC. Maybe, you said x86 Intel CPU was more power-consumption  than ARM and DSP

    But even so I still said x86 Intel CPU was powerful than C64+ DSP. As far as I konw, the highest frequency of C64+ DSP was 1GHZ(such as DM648) or more 1.5GHZ(C6455 ? I am not sure).Until now I can't hear that the frequency of C64+ CPU reach 2GHZ single core DSP.While x86 Intel CPU had dual core ,four core cpu and the frequency of cpu was more than 3GHZ.Even not so high ,taking my PC x86 cpu (2.6GHZ ) for example,I said my algorithm run more quickly on x86 Intel CPU (2.6GHZ) than C64+ CPU TMS320DM648(891MHZ).    

    So George,did you agree with me? 

    2.

    Another question.About the limit of optimization as to algorithm on TI C64+ dsp. To be a dsp engineer,my basic task was to transplant algorithm from PC to DSP and then optimize algorithms on C64+ CPU dsp,to make algorithm more and more efficiently. it's always puzzled me that I can't know when the algorithm was no space to be optimized .

    for example. Supposed the algorithm run on DM648  100ms  before optimization,and let a more significant dsp optimization expert to optimize.what was the time-waste of algorithm?    10ms ?  1ms ?  0ms ?????     I didn't know totally. I thought DSP compiler was like a black box that I couldn't deal with.

           In my opinion ,  however your ti compiler optimization skills was advanced.it also had a limit . Although, these skills belong to sofware optimization .the optimization relying on software was less significant than   relying on hardware.

          what's your opinion George?

    [/quote]

    Did you agree with that optimization was no endlessness? what was the limit of optimization?

  • Hi,my dear ti experts!

    anyone who can help me?  These problem always puzzled me.

  • steve zhang said:
    what was the limit of optimization?

    That is an undecidable problem.  That means it's not just hard, it's effectively impossible to compute.  There is no way to know for sure what the limit of optimization is.  See Kolmogorov complexity, especially the part about Gödel's incompleteness theorem.  It is impossible to put a number on the ultimate limit of optimization and know for sure that the number is correct.  This applies to both hardware and software.

    Many benchmarks attempt to estimate the number, but the results are not necessarily generalizable to other programs you would want to run.  No benchmark is perfect; they cannot be.  It's not possible to create a single number representing the power of a processor/compiler combination.  A processor might be better at tasks of a certain type than another processor, and might be worse at other kinds of tasks.  Clearly, it's not possible to aggregate this into a single number representing relative power of processors.

  • Archaeologist said:

    what was the limit of optimization?

    That is an undecidable problem.  That means it's not just hard, it's effectively impossible to compute.  There is no way to know for sure what the limit of optimization is.  See Kolmogorov complexity, especially the part about Gödel's incompleteness theorem.  It is impossible to put a number on the ultimate limit of optimization and know for sure that the number is correct.  This applies to both hardware and software.

    Many benchmarks attempt to estimate the number, but the results are not necessarily generalizable to other programs you would want to run.  No benchmark is perfect; they cannot be.  It's not possible to create a single number representing the power of a processor/compiler combination.  A processor might be better at tasks of a certain type than another processor, and might be worse at other kinds of tasks.  Clearly, it's not possible to aggregate this into a single number representing relative power of processors.

    [/quote]

    Archaeologist:

    I'm very appreciated that you can give me some advice. if as you said, there's no way to know how what the limit of optimization is.Maybe,I decide to stop my job and drop my job now.

         To be honest,my daily work was to improve algorithm's performance on DSP (c64+).To optimize a certain algorithm is the thing that I have to do everyday.Although there's no way to know the limit ,it means that I had no way to konw when I could stop my task-optimization.I don't know what performance of algorithm on DSP (c64+) is bad and what performance is good. I also don't know  how much time the algorithm waste on DSP(c64+) is best performance that we can achieve.

         I don't know anything. Optimizing a certain algotithm for me looks like a black box,a bottomless pit. I don't know when I can stop and I can finish,because no goal. For example,you join a long-run match,but no one tell you how many kms you must run and no one tell you what's the end point you must arrive.The only thing you konw is just running .beside running is also running.

         So I feel very helpless. optimization is a bottomless pit. you can't see the goals ,no certain goals ,no aim,no end point .I can't stand it any more.I will drop.

    best wish!

  • why? no people help me?