This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

CCS 3.3 UPDATE problem--- cross path stall increased heavily

I was using CCS3.3  6482 simulator, with cgt 6.0.16, bios 5.31, and SR4( 3.3.50), everything went well.

Days ago I updated CCS to cgt 6.0.25, bios 5.33.06,and SR 11(3.3.81.6), and I compiled the same project ,then run it. Somthing surprising is that

I saw the "Cross Path Stalls " in the profile window came to be 3.6%,  wheras it was just 0.2% in the version before updating.

I did some test  and found that , the SR11 update seems to cause the problem. If I keep SR4(3.3.50), whatever the cgt or bios is, no increasing of cross path stalls could be seen.

What it is?About CROSS PATH STALL  concerning C64+ SPLOOP, it is said the cgt 6.0.25 or later could be a fix.What's my problem?? I am so bored and confused...

 

 

  • Another question ,  since the SPLOOP has something to do with hardware, does ti mean that using just simulator can not get the real performance?

    Currently I have no board in my hand.

  • The TCI6482 is sold only within a limited market, and support is generally not available on the E2E forum. If this applies to you, then you may contact your TI sales team to determine the correct support channels; if this does not apply to you, you should switch to a similar broad-market DSP like the C6455.

    *** said:
    the SR11 update seems to cause the problem. If I keep SR4(3.3.50), whatever the cgt or bios is, no increasing of cross path stalls could be seen

    This is very good debugging that you have done. And this is also good testimony to why you should stay upgraded to the latest Service Release version. By the way, there is also an SR12 available.

    Since the exact same program code gets different analysis results on the simulator in SR4 vs. the simulator in SR11, things did not get worse from SR4 to SR11. The analysis is now more accurate. We take great pride in making the simulator accurate so we can call it a Cycle Accurate Simulator, and it is very difficult to make a simulator that runs quickly while simulating a wide range of hardware issues, like cross-path delays and other pipeline "features".

    The result of your debug is that by using the updated SR11 you have a more accurate representation of the cross-path stalls in your code. The program did not get worse, the simulator got more accurate. And please update to SR12 when you have a chance.

    *** said:
    it is said the cgt 6.0.25 or later could be a fix.What's my problem?? I am so bored and confused.

    CGT 6.0.25 - You can look at the release notes for information on what is fixed. There may be a newer release that this, and we also have the 6.1.x tool chain released.

    What's my problem?? - If you want the most accurate analysis of your program, you have no problem. It appears that SR11 is more accurate than SR4. Consider moving to SR12 and trying the newer releases of the CGT to find the best answer.

    "bored and confused" - Above are several suggestions that you can try out, so maybe this will help with your boredom, or you can move on to the next phase of your project; I hope my answers have helped with your confusion, although you seem to be very good at your job.

    *** said:
    since the SPLOOP has something to do with hardware, does ti mean that using just simulator can not get the real performance?

    Every instruction and pipeline path, and the SPLOOP capability, has something to do with hardware. The CPU Cycle Accurate Simulator will give you very accurate analysis of the operation of the CPU on instructions, but it may not model the memory buses or cache operation or external access timing. The Device Cycle Accurate Simulator generally adds models for the memory and cache and some peripherals, but there are limitations to how accurate it can be. The only true measure of performance is to use hardware, but the simulator is much easier for getting detailed analysis of things like cross-path stalls so that you can optimize your performance.

  • Thanks a lot  for fast replying and your patience !

    Increasing of cross path stalls means increasing of expense.I am working on a codec project,  the performance suffered a degradation of 0.3MCPS ,which was calculated applying the clock() function.  That's just unbearable to the client. One enc&dec process costs more cycles, things seem to become worse here. 

    I tried SR12 earlier , things got more worse: more cycles for one enc/dec.

    YOu've helped a lot, thanks again.

    Is there some forum or anything like that  where I can get more info ??

  • If your questions are about the simulator and its accuracy, the experts are in the Code Composer Forum under Development Tools.
    If your questions are about the Compiler tools and how they can be invoked to improve performance, the experts are in the TI C/C++ Compiler - Forum under Development Tools.
    If your questions are about the C6455 and how its components work, this is the right place (C64x Single Core DSP Forum).

    SR12 is most likely more accurate. If at all possible, you should get your measurements from hardware if you are going to make system cost tradeoffs.

    Is your codec compiled C or assembly? The compiler is usually pretty good at avoiding stalls, so perhaps newer compiler tools will give you better results.

  • There are both C functions and assembly files,we did some optimization.

    Just as you what you said in the reply, new compiler should be good ar avoiding stalls, that why I take it as a problem when the result went to the opposite direction.

    Thanks, RandyP, I appreciate your help, your are such a kind guy.