This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

DEBUG is stable but RELEASE is unstable for my algorithm

My target is DM648, with the development of CCS3.3 + SR11 + code generation tools 7.0.4.

Now the debug version of my algorithm is running stable with the option ‘–g –mu’, while the release version with the option ‘-o2’ will halt the system randomly. Originally I have other compiler options like ‘-mt –mf5’, and in order to solve this problem, I delete them all except ‘-o2’, but the problem still exists. I have tried ‘-o3’ before, and the system becomes more unstable.

When the system halts, the emulated breaks off the JTAG link and I am not able to debug this problem at all. Then I need to power off and pwer on to re-connect the target. This kind of breakdown seems to be very deep level compared to the normal failure like ‘UTL_Halt()’. The system indeed prefers to halt at some particular point or algorithm stages, but not always there.

The release version with ‘-o2’ alone already meets my speed requirement. What should I do to probe this issue.

  • In my experience, anytime my system 'crashed' like this when the JTAG was connected it was an unbounded pointer read/write.  It could also be a hardware contention, like trying to set the state of an input pin opposite of an external drive source.  As to why your system works without optimization but not with, that can be difficult to track down.  You might compare your memory map file (.map file in your debug folder) to see if there are any differences in the memory layouts.  If it fails at a certain code point, you might try to allocate an unused initialized array on either side of the memory used by that code section, then see if any of those values are being overwritten by the code.  Could be that you always have an unbounded pointer, but that it only affects critical sections when you are optimized?  Also, do you have any variables that should be but aren't declared volatile that would be treated differently under optimization?

  • It is possible that a compiler bug is the cause of this problem.  But, in my experience, an error of the sort described by Mr. Lipsey is more often the cause.  Finding such an error is the challenge.

    One suggestion ... Build with -o -mu (no -g).  I suspect this will work.  If it does, then that proves the problem is related to some loop being software pipelined.  The option -mu disables software pipelining.

    A follow-on suggestion ... Remove the -mu from the build one file at a time, until it fails again.  This will tell you which file contains the problem loop.  Once you have found the file, you could repeat the process on a per function basis by moving each function into its own file, building that function/file without -mu.  Once that fails, you have found the function with the problem loop.  

    Once the problem is sufficiently isolated, you can start to consider some of the suggestions from Mr. Lipsey.

    I hope my suggestions are practical for you.

    Thanks and regards,

    -George

  • Hi Georgem, I have tried with -o -mu with the release version and it also failed, like the failures before. CCS will report two errors:

     

    Trouble Halting Target CPU: Error 0x00000020/-1060 Error during: Execution,  An unknown error prevented the emulator from accessing the processor in a timely fashion. It is recommended to RESET EMULATOR.  This will disconnect each  target from the emulator.  The targets should then be power cycled or hard reset followed by an emureset and reconnect to each target.  

    Failed to remove the debug state from the target before disconnecting.  There may still be breakpoint opcodes embedded in program memory.  It is recommended that you reset the emulator before you connect and reload your program before you continue debugging.

    The DEBUG and RELEASE MAP file are in the attachment.

    0602.mapfile.zip

     

  • Are you writing past the edge of your "trace$buf"? Its at the end of memory on your debug build, but would write over several things in your release build.

  • vcar said:
    I have tried with -o -mu with the release version and it also failed

    Well, that's progress.  That means software pipelining has nothing to do with the problem.  Optimization does.

    To summarize, it works when you build with just -mu, but fails if you build with -mu -o.  So, to isolate the problem file, build the system with just -mu, but add -o to one file at a time.  When the bug returns, you have found the problem file.  If needed, repeat the process on a per function basis, by isolating each function in that file in a temporary file of its own.  When the failure returns, you have found the problem function.  At that point, you should be able to zero in on the issue.

    Thanks and regards,

    -George

     

  • MattLipsey said:

    Are you writing past the edge of your "trace$buf"? Its at the end of memory on your debug build, but would write over several things in your release build.

    Hi MattLipsey,

    I have changed the trace$buf in a particular memory range at the end of the DDR2, and the problem is still there. This might not be the problem source.

    And again I checked the volatile keywords and decorated the variables shared by different threads. The problem is still there.

  • Hi George,

    Here is some updates. 

    For the release version, even if I use -mu option alone(without -o2) will still cause program crash(same phenomenon), mainly at the time of high speed UDP transmission.

    For the debug version, now I changed the build option to "-mt -o2 -on2" (no -g) and it runs still stable.

    Since the current debug version could meet my speed requirement, but I still prefer to make the release version work. Could it possible that there is something wrong with the software library?

    I have used the following software components:

    Gigabit ethernet in NDK 1.94.1

    VPort drivers in PSP 1.10.03 

    DSP/BIOS 5.41.02.14

    CCS3.3 SR11

    Code Generation Tools 6.1.15

  • Well, now this is getting even stranger.  I think we need to step back and reassess.

    This whole time you have been referring to the "debug version" and the "release version".  I presume you are building with CCS and you are actually referring the debug build configuration and the release build configuration.  Is that correct?

    Please show me the exact build options you use when everything works, and when it fails.  It is OK to leave out the -iinclude directory and -dpreprocessor name options.  But I need to see everything else.  I suspect some other difference in the build options, which we have not discussed, is part of the problem.

    Thanks and regards,

    -George

  • Hi Geogre,

    I have the complete pjt file and tcf file of my current project zipped in the attachment. Please check it.

     3108.upload.zip

  • First, I want to set some expectations.  I rather doubt we are going to get to the bottom of your problem with this forum thread alone.  There is a limit to how much information you can exchange in this medium, and I think your problem is bigger than that.

    Once I factor out the options that are the same, or have no material impact on the compiler generated code, I'm left comparing these.

    Debug configuration: -mt -o2
    Release configuration: -mu 

    Note that neither configuration uses -g.

    To summarize, the debug configuration works fine.  The release configuration is unstable.  That is quite counter-intuitive.  The debug configuration options are much more aggressive in their optimization.  In fact, I would expect the performance of the debug configuration to be some 10-20 times better than the release configuration.  This is such a difference in performance, I wonder whether the problem is that the release configuration is taking too long to finish its job.  At any rate, if the debug configuration is performing well, why not just use it?  

    Thanks and regards,

    -George 

  • Hi, George

    First the release version was more aggressive than the debug version, and then I found release version is not stable, so I changed the release build option backwards. So the unstable problem is not related with CPU load.

    You can see that I used XDC in my project. So when I choose DEBUG version, the XDC will automatically choose the DEBUG PSP library, which I think is not optimized well. Since the performance is acceptable, I still prefer a better result.

     

  • Hi, everyone.

        There is some updates. I have finally found the problem.

        I used PSP libraries in my project based on XDC. No matter what build options I choose, if I use the DEBUG version of the PSP libraries, the program runs well. While if I use the 

    RELEASE version of these libraries, the program runs unstable. And more, if I bypass the algorithm, the program will always run correctly using DEBUG version or RELEASE version.

        To summarize, my algorithm is conflicting with the RELEASE version of PSP libraries, these are:

    ti.sdo.pspdrivers.drivers.spi:lib/dm648/Release/spi_bios_drv.lib

    ti.sdo.pspdrivers.system.dm648.bios.evmDM648.video:lib/dm648/Release/vport_edc_bios_drv.lib

    ti.sdo.pspdrivers.drivers.vport:lib/dm648/Release/vport_bios_drv.lib

    ti.sdo.edma3.drv:lib/Release/edma3_drv_bios.lib

    ti.sdo.edma3.rm:lib/dm648/Release/edma3_rm_bios.lib

    ti.sdo.pspdrivers.drivers.i2c:lib/dm648/Release/i2c_bios_drv.lib

    ti.sdo.pspdrivers.pal_os.bios:lib/dm648/Release/palos_bios.lib

    ti.sdo.pspdrivers.pal_sys.dm648:lib/Release/pal_sys_bios.lib

        Now I can used some more aggressive build options accompany with DEBUG PSP libraries, and the program runs stably. This might be regarded as a resolution any way.

     

  • Hi,

     

    I'm having the phenomena described above.

     

    My target is C6747, with the development of CCS3.3.82.13, code generation tools v6.1.11 , Bios 5.33.06 and PSP driver (pspdrivers_01_30_00_05). 

     

    When compiling with Release option the algorithm is not stable, when compiling with Debug option the algorithm runs well.

     

    According to above description I did some test. I switched between Release <-> Debug directory of the psp gpio by only changed the directory name (located in: C:\Program Files\Texas Instruments\pspdrivers_01_30_00_05\packages\ti\pspiom\gpio\lib\C6747) . this operation solved the problem for the Release option and caused the problem for the Debug option.

     

    Is it possible that my PSP version is not correct?

     

    Where are the locations of the PSP download site?

     

    Best regards,

    Yaniv 

     

     

  • I hate to ask you to post again, but I think it is the best choice in this situation.

    If there were only the above post in this thread, then I would move it to the BIOS forum.  That's the best forum for PSP questions.  But this thread has several posts in it.  It has been in the compiler forum for some time.  I would like it remain here so that other folks will continue to find it and learn from it.

    So, I'd appreciate if you would re-post your question in the BIOS forum.

    Thanks and regards,

    -George