This thread has been locked.
If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.
Dear TI support,
I'm using primarily ezdsp5502 board and Debian GNU/Linux (Jessie/testing). After update CCS 6.0.1 to 6.1.0 the output of printf became noticeably slower. I've tried to replace some emupack's shared libraries and found out, that it depends on:
ccsv6/ccs_base/emulation/drivers/libtixds55x.so
By just replacing this file, I've obtained following results:
ti_emupack_setup_5.1.507.0.bin <-- normal, the same as in CCS 6.0.1
ti_emupack_setup_5.1.600.0.bin <-- slow
ti_emupack_setup_5.1.636.0.bin <-- slow
ti_emupack_setup_5.1.641.0.bin <-- slow, the same as in CCS 6.1.0
Can this be fixed?
Regards,
Andrey
Hi,
so if you didn't believe me, I did benchmarking as well. The program, linker and .out files are attached: test_printf.tar.gz.
The relevant part of script:
debugSession.memory.loadProgram("projects/test_printf/Debug/test_printf.out"); endBP = debugSession.breakpoint.add("main.c", 38); print("time measurement..."); time1 = Date.now(); debugSession.target.run(); time2 = Date.now(); print("time measurement... done"); timediff = new Date(time2-time1); print("elapsed: " + timediff.getSeconds() + "." + timediff.getMilliseconds() + "s");
I get following results:
CCS 6.0.1:
$ bash /opt/ti601/ccsv6/ccs_base/scripting/bin/dss.sh test_printf.js test printf performance started opening debug session... opening debug session... done C55xx: GEL Output: OnTargetConnect() C55xx: GEL Output: OnReset() C55xx: GEL Output: PLL configured successfully. C55xx: GEL Output: CE1 configured for 16-bit asynchronous memory. C55xx: GEL Output: CE3 configured for 16-bit Ethernet. C55xx: GEL Output: CE0 configured for 32-bit SDRAM memory. C55xx: GEL Output: GEL StartUp Complete. loading program... C55xx: GEL Output: OnPreFileLoaded() C55xx: GEL Output: PLL configured successfully. C55xx: GEL Output: CE1 configured for 16-bit asynchronous memory. C55xx: GEL Output: CE3 configured for 16-bit Ethernet. C55xx: GEL Output: CE0 configured for 32-bit SDRAM memory. C55xx: GEL Output: OnRestart() loading program... done time measurement... 0123456789012345678901234567890123456789 1234567890123456789012345678901234567890 2345678901234567890123456789012345678901 3456789012345678901234567890123456789012 4567890123456789012345678901234567890123 5678901234567890123456789012345678901234 6789012345678901234567890123456789012345 7890123456789012345678901234567890123456 8901234567890123456789012345678901234567 9012345678901234567890123456789012345678 0123456789012345678901234567890123456789 1234567890123456789012345678901234567890 2345678901234567890123456789012345678901 3456789012345678901234567890123456789012 4567890123456789012345678901234567890123 5678901234567890123456789012345678901234 6789012345678901234567890123456789012345 7890123456789012345678901234567890123456 8901234567890123456789012345678901234567 9012345678901234567890123456789012345678 time measurement... done elapsed: 0.964s $
CCS 6.1.0:
$ bash /opt/ti/ccsv6/ccs_base/scripting/bin/dss.sh test_printf.js test printf performance started opening debug session... opening debug session... done C55xx: GEL Output: OnTargetConnect() C55xx: GEL Output: OnReset() C55xx: GEL Output: PLL configured successfully. C55xx: GEL Output: CE1 configured for 16-bit asynchronous memory. C55xx: GEL Output: CE3 configured for 16-bit Ethernet. C55xx: GEL Output: CE0 configured for 32-bit SDRAM memory. C55xx: GEL Output: GEL StartUp Complete. loading program... C55xx: GEL Output: OnPreFileLoaded() C55xx: GEL Output: PLL configured successfully. C55xx: GEL Output: CE1 configured for 16-bit asynchronous memory. C55xx: GEL Output: CE3 configured for 16-bit Ethernet. C55xx: GEL Output: CE0 configured for 32-bit SDRAM memory. C55xx: GEL Output: OnRestart() loading program... done time measurement... 0123456789012345678901234567890123456789 1234567890123456789012345678901234567890 2345678901234567890123456789012345678901 3456789012345678901234567890123456789012 4567890123456789012345678901234567890123 5678901234567890123456789012345678901234 6789012345678901234567890123456789012345 7890123456789012345678901234567890123456 8901234567890123456789012345678901234567 9012345678901234567890123456789012345678 0123456789012345678901234567890123456789 1234567890123456789012345678901234567890 2345678901234567890123456789012345678901 3456789012345678901234567890123456789012 4567890123456789012345678901234567890123 5678901234567890123456789012345678901234 6789012345678901234567890123456789012345 7890123456789012345678901234567890123456 8901234567890123456789012345678901234567 9012345678901234567890123456789012345678 time measurement... done elapsed: 5.71s $
As you can see, it's almost 6 time slower, thus noticeable without benchmarking at all.
Hope, you can fix the regression.
Thanks,
Andrey
Ki,
they are indeed equally. I used following to do measurements:
endBP = debugSession.breakpoint.add("main.c", 38); print("time measurement..."); debugSession.clock.enable(); time1 = Date.now(); debugSession.target.run(); time2 = Date.now(); clock_count = debugSession.clock.read(); debugSession.clock.disable();
clock_count is in both cases 209130. But it is also supposed to be, since the same code is loaded, thus the CPU processes the same instructions, resulted from printf (RTS library). So there is no direct dependency on emupacks at all. At least this is how I see the process. If something is more complicated, your clarifications are welcome.
The difference in wall clock depends on what the driver code from emupacks does when the CPU is halted on a breakpoint in printf to copy the preformated data from the CPU to the host. And this seems to be broken.
By the way, I measured also fprintf performance. It is also slower.
Regards,
Andrey
Hi,
I did further investigation. I compared debugserver logs and it turned out, that GTI_STAT takes 100ms longer. And there is 2 GTI_STAT calls per cio breakpoint, thus almost exactly matching the performance penalty.
Then I looked into GTI_STAT in libtixds55x.so with objdump and found new usleep call! After overwriting 5 bytes (call to usleep) with 0x90 (nop) the performance seems to be recovered.
The question is, why libtixds55x.so programmers added a call to sleep for such a extremely long time of 100ms (!) into such frequently called function?
Regards,
Andrey
At the end of last year we had a problem with GTI_STAT() and the XDS200 debug probes where calling GTI_STAT() too often, too quickly caused a race condition in the XDS200 firmware that would hang the communications channel for 15 seconds. The main problem this caused was that flash programming on certain DSPs failed nearly 100% of the time for this issue. While the developer of the debug probe was working on the real fix, we put in the delay as a temporary workaround.
The root cause has been fixed, and the delay will be removed for the next official release of the emupack.