This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Slow cio printf after update CCS 6.0.1 to 6.1.0

Dear TI support,

I'm using primarily ezdsp5502 board and Debian GNU/Linux (Jessie/testing). After update CCS 6.0.1 to 6.1.0 the output of printf became noticeably slower. I've tried to replace some emupack's shared libraries and found out, that it depends on:

ccsv6/ccs_base/emulation/drivers/libtixds55x.so

By just replacing this file, I've obtained following results:

ti_emupack_setup_5.1.507.0.bin <-- normal, the same as in CCS 6.0.1
ti_emupack_setup_5.1.600.0.bin <-- slow
ti_emupack_setup_5.1.636.0.bin <-- slow
ti_emupack_setup_5.1.641.0.bin <-- slow, the same as in CCS 6.1.0

Can this be fixed?

Regards,
Andrey

  • Hello Andrey,
    What kind of benchmarks did you run to determine the speed? Do you have any numbers? And what kind of printf activity are you doing? A small code snippet would be useful here.

    Thanks
    ki
  • Ki,

    I wanted to do some benchmarking, but I haven't managed how to do it. It seems to be impossible to acquire spent time from C code, because CPU is being halted for this and ticks don't get incremented. Is there any way to do this with some events within C code and listening on these from script?

    But anyway it can be seen just by looking at 10 printf like this:
    printf("0123456789012345678901234567890123456789\n");

    The difference is evident.

    Thanks,
    Andrey
  • > Is there any way to do this with some events within C code and listening on these from script?

    Meanwhile I got in the DSS scripting. I still see no events, except of using breakpoints. So I could post the code. But if you have some 55xx board with rev2 CPU, you could reproduce it as I described.
  • Hi,

    so if you didn't believe me, I did benchmarking as well. The program, linker and .out files are attached: test_printf.tar.gz.

    The relevant part of script:

    debugSession.memory.loadProgram("projects/test_printf/Debug/test_printf.out");
    endBP = debugSession.breakpoint.add("main.c", 38);
    print("time measurement...");
    time1 = Date.now();
    debugSession.target.run();
    time2 = Date.now();
    print("time measurement... done");
    timediff = new Date(time2-time1);
    print("elapsed: " + timediff.getSeconds() + "." + timediff.getMilliseconds() + "s");

    I get following results:

    CCS 6.0.1:

    $ bash /opt/ti601/ccsv6/ccs_base/scripting/bin/dss.sh test_printf.js 
    test printf performance started
    
    opening debug session...
    
    opening debug session... done
    
    C55xx: GEL Output: 
    OnTargetConnect()
    
    C55xx: GEL Output: 
    OnReset()
    
    C55xx: GEL Output: PLL configured successfully.
    
    C55xx: GEL Output: CE1 configured for 16-bit asynchronous memory.
    
    C55xx: GEL Output: CE3 configured for 16-bit Ethernet.
    
    C55xx: GEL Output: CE0 configured for 32-bit SDRAM memory.
    
    C55xx: GEL Output: GEL StartUp Complete.
    
    loading program...
    
    C55xx: GEL Output: 
    OnPreFileLoaded()
    
    C55xx: GEL Output: PLL configured successfully.
    
    C55xx: GEL Output: CE1 configured for 16-bit asynchronous memory.
    
    C55xx: GEL Output: CE3 configured for 16-bit Ethernet.
    
    C55xx: GEL Output: CE0 configured for 32-bit SDRAM memory.
    
    C55xx: GEL Output: 
    OnRestart()
    
    loading program... done
    
    time measurement...
    0123456789012345678901234567890123456789
    1234567890123456789012345678901234567890
    2345678901234567890123456789012345678901
    3456789012345678901234567890123456789012
    4567890123456789012345678901234567890123
    5678901234567890123456789012345678901234
    6789012345678901234567890123456789012345
    7890123456789012345678901234567890123456
    8901234567890123456789012345678901234567
    9012345678901234567890123456789012345678
    0123456789012345678901234567890123456789
    1234567890123456789012345678901234567890
    2345678901234567890123456789012345678901
    3456789012345678901234567890123456789012
    4567890123456789012345678901234567890123
    5678901234567890123456789012345678901234
    6789012345678901234567890123456789012345
    7890123456789012345678901234567890123456
    8901234567890123456789012345678901234567
    9012345678901234567890123456789012345678
    time measurement... done
    elapsed: 0.964s
    $

    CCS 6.1.0:

    $ bash /opt/ti/ccsv6/ccs_base/scripting/bin/dss.sh test_printf.js 
    test printf performance started
    
    opening debug session...
    
    opening debug session... done
    
    C55xx: GEL Output: 
    OnTargetConnect()
    
    C55xx: GEL Output: 
    OnReset()
    
    C55xx: GEL Output: PLL configured successfully.
    
    C55xx: GEL Output: CE1 configured for 16-bit asynchronous memory.
    
    C55xx: GEL Output: CE3 configured for 16-bit Ethernet.
    
    C55xx: GEL Output: CE0 configured for 32-bit SDRAM memory.
    
    C55xx: GEL Output: GEL StartUp Complete.
    
    loading program...
    
    C55xx: GEL Output: 
    OnPreFileLoaded()
    
    C55xx: GEL Output: PLL configured successfully.
    
    C55xx: GEL Output: CE1 configured for 16-bit asynchronous memory.
    
    C55xx: GEL Output: CE3 configured for 16-bit Ethernet.
    
    C55xx: GEL Output: CE0 configured for 32-bit SDRAM memory.
    
    C55xx: GEL Output: 
    OnRestart()
    
    loading program... done
    
    time measurement...
    0123456789012345678901234567890123456789
    1234567890123456789012345678901234567890
    2345678901234567890123456789012345678901
    3456789012345678901234567890123456789012
    4567890123456789012345678901234567890123
    5678901234567890123456789012345678901234
    6789012345678901234567890123456789012345
    7890123456789012345678901234567890123456
    8901234567890123456789012345678901234567
    9012345678901234567890123456789012345678
    0123456789012345678901234567890123456789
    1234567890123456789012345678901234567890
    2345678901234567890123456789012345678901
    3456789012345678901234567890123456789012
    4567890123456789012345678901234567890123
    5678901234567890123456789012345678901234
    6789012345678901234567890123456789012345
    7890123456789012345678901234567890123456
    8901234567890123456789012345678901234567
    9012345678901234567890123456789012345678
    time measurement... done
    elapsed: 5.71s
    $

    As you can see, it's almost 6 time slower, thus noticeable without benchmarking at all.

    Hope, you can fix the regression.

    Thanks,
    Andrey

  • I have been doing some benchmarks myself using a 5515ezdsp with CCSv6.0.1 (emu pack 5.1.507.2) and v6.1.0 (emu pack 5.1.641.0) on Ubuntu 14.04 and I have not seen any slowdown. I am benchmarking using the profile clock to count the cycles to execute each printf and they are identical. Can you try using the profile clock to see if you see differences there?

    Thanks
    ki
  • Ki,

    they are indeed equally. I used following to do measurements:

    endBP = debugSession.breakpoint.add("main.c", 38);
    print("time measurement...");
    debugSession.clock.enable();
    time1 = Date.now();
    debugSession.target.run();
    time2 = Date.now();
    clock_count = debugSession.clock.read();
    debugSession.clock.disable();

    clock_count is in both cases 209130. But it is also supposed to be, since the same code is loaded, thus the CPU processes the same instructions, resulted from printf (RTS library). So there is no direct dependency on emupacks at all. At least this is how I see the process. If something is more complicated, your clarifications are welcome.

    The difference in wall clock depends on what the driver code from emupacks does when the CPU is halted on a breakpoint in printf to copy the preformated data from the CPU to the host. And this seems to be broken.

    By the way, I measured also fprintf performance. It is also slower.

    Regards,
    Andrey

  • Hi,

    I did further investigation. I compared debugserver logs and it turned out, that GTI_STAT takes 100ms longer. And there is 2 GTI_STAT calls per cio breakpoint, thus almost exactly matching the performance penalty.

    Then I looked into GTI_STAT in libtixds55x.so with objdump and found new usleep call! After overwriting 5 bytes (call to usleep) with 0x90 (nop) the performance seems to be recovered.

    The question is, why libtixds55x.so programmers added a call to sleep for such a extremely long time of 100ms (!) into such frequently called function?

    Regards,
    Andrey

  • Hi Andrey,
    My apologies for the delayed response. I have notified the engineers of your analysis with a request for a response.

    ki
  • At the end of last year we had a problem with GTI_STAT() and the XDS200 debug probes where calling GTI_STAT() too often, too quickly caused a race condition in the XDS200 firmware that would hang the communications channel for 15 seconds.  The main problem this caused was that flash programming on certain DSPs failed nearly 100% of the time for this issue.  While the developer of the debug probe was working on the real fix, we put in the delay as a temporary workaround.  

    The root cause has been fixed, and the delay will be removed for the next official release of the emupack.