I am starting to get my feet wet with developing code for the 674x DSP core. I am using the TI816x device, and, as such I am working with the xdc tools and cgt packages that are included with the ezsdk. I have been working through the RTSC tutorial and I have been trying to convert the samples to create elf binaries (as required by the syslink/sysbios system on the TI816x device. In the process I have found that the same "Hello World" sample app in lesson 5 takes a different number cycles to execute in the simulator depending whether it is compiled to COFF or to ELF (with the ELF taking more cycles). This is not whole-program optimization. Here are the results:
$ /home/bj/ti-ezsdk_dm816x-evm_5_01_00_77/xdctools_3_20_08_88/packages/ti/platforms/sim64Pxx/Linux/kelvin prog.xe674 simulating Joule FP ISA Hello World Simulation done: Total cycles: 4391 Core cycles (excl. stalls): 4390 ( 99.98%) Nop cycles: 1977 ( 45.02%) Stall cycles and overlapped stall cycles Total stall cycles: 1 ( 0.02%) XP : 1 ( 0.02%) [snip] $ /home/bj/ti-ezsdk_dm816x-evm_5_01_00_77/xdctools_3_20_08_88/packages/ti/platforms/sim64Pxx/Linux/kelvin prog.x674 simulating Joule FP ISA Hello World Simulation done: Total cycles: 3480 Core cycles (excl. stalls): 3455 ( 99.28%) Nop cycles: 1471 ( 42.27%) Stall cycles and overlapped stall cycles Total stall cycles: 25 ( 0.72%) XP : 25 ( 0.72%) [snip]
That is a pretty significant overhead (roughly 25%) just for a different executable format. Does anyone have any idea why this would be? Would whole-program optimization make this go away (I am still trying to figure out how to turn that on).
Are there any suggestions as to how to make this difference go away?
TIA, B.J.