This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Running code in Release mode via PC

Other Parts Discussed in Thread: TMS320C6657

I'm running CCSv6 on TMS320C6657 1 with XDS2xx. My code works in debug mode but very slowly. I want to run it from a command line in Release mode without bothering with debugging. 

How is that done?

  • Hi,

    You can either use loadti to simply load the code to the target board or, if you need more control, Debug Server Scripting (DSS) is the right tool for the job.

    Hope this helps,

    Rafael

  • LoadTI worked - thanks.

    My project deals with acoustic signals and is heavily into matrix algebra. I have the same code running in gcc on 3 different computers: Beagle Bone Black, a 2.8GHz Dell desktop and a TMS320C6657.

    The PC is the festest, processing a data sample in 60msec, the BBB about 1/3 this speed. The TMS320C6657 is very slow, taking minutes to process eachdata sample. Clearly I'm doing something wrong: any idea what?

    Thanks, Peter
  • Peter,

    Thanks for reporting your findings.

    Without knowing exactly what the code is, certainly there may be several points to optimize it and make use of the Digital Signal Processing extensions of the C6657. I would start with the excellent wiki page below:

    processors.wiki.ti.com/.../Optimization_Techniques_for_the_TI_C6000_Compiler

    The first link is a good overview to quickly start obtaining more optimized results from your existing code - especially if you can adapt your code to use the DSPLIB functions that are specialized in matrix math.

    Deeper optimization techniques for the compiler can be seen further in the page (section 5).

    At last, check the last chapters of the C6000 Optimization workshop for a system-level approach to the optimization process, which will include cache and memory aspects.

    Hope this helps,
    Rafael
  • Thanks for the help. I'd like to be sure I'm starting with the code running as fast as possible. So far I have:
    i) built the Release version rather than debug
    ii) eliminated as many printfs() as possible
    iii) run with loadti

    I'm dubious because my code runs no faster like this than in the debug mode of CCSv6.
    The speedup I need is so huge that I feel I'm missing something important.
    Peter
  • Just curious. Not sure if I can help. Do those long times include the loading the progam, loading the input and outputting the results? I am assuming you are loading everything through the JTAG. Bare metal? TI-ROS?
  • No: the long time is what's needed to process s lump of data. On the BBB each lump takes about 1/3 second, on the PC 1 second, on the TI hardware more than 80 seconds. I don't really understand what the JTAG is doing or what you mean by 'bare metal'.
  • Sounds like you are going "bare metal" or no operating system. Let's say that you have the following example code:

    #include <stdio.h>

    #define MAXN 65536
    float g_input[MAXN];
    float g_output[MAXN];

    void getdata(void)
    {
    int i;
    for(i=0; i<MAXN; i++)
    scanf("%f\n", &g_input[i]);
    }

    void procdata(void)
    {
    int i,j;
    for(i=0,j=MAXN-1; i<MAXN; i++,j--)
    g_output[i] = g_input[j];
    }

    void putdata(void)
    {
    int i;
    for(i=0; i<MAXN; i++)
    printf("%f\n", g_output[i]);
    }

    int main(int argc, char *argv[])
    {
    getdata();
    procdata();
    putdata();
    return(0);
    }

    The getdata() and putdata() can be quite slow on development boards because data has to travel between the PC and board via the slow JTAG connector. On the PC, it's more or less direct from a virtual file. The procdata() part will depend on the clock speed and processor design. As you found out, the difference between debug and release will not be huge.