This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Setting up simulator for optimizing code

After having done a great deal of "alpha" level development on the dm643x platform using the actual hardware, I'd like to do some function level optimization using the simulator instead of a target board.  Basically rip the function in question out and write a simple wrapper to do some test cases to try and get my cycles per loop down.  I started looking at the optimizing C tutorial, only to see that it says it is not supported for CCS3.3 release.  I'd love a sample project setup for the c64+ cycle accurate simulator (or even better, setup just like the 643x) to get started on.  So far, after picking this simulator during CCS setup and starting a project from scratch I continue to receive a "[ output name ] does not match the target type, not loaded" error anytime I attempt to load my built output file into the simulator.  Any suggestions for how to fix this or where to find an existing starting project?

  • Ok, I found out that I wasn't entering an rts library in the linker tab under project options.  But this leads to a new question.  What is the difference between all the c64plus rts libraries?  In my C6000\cgtools\lib folder, I see rts64plus.lib, rts64plus_eh.lib, rtspluse.lib, & rtspluse_eg.lib.

  • MattLipsey said:
    What is the difference between all the c64plus rts libraries?  In my C6000\cgtools\lib folder, I see rts64plus.lib, rts64plus_eh.lib, rtspluse.lib, & rtspluse_eg.lib.

    rts64plus - little endian, no exception handling.

    rts64plus_eh.lib - little endian with exception handling

    rtspluse.lib - big endian, no exception handling

    rtspluse_eh.lib - big endian with exception handling

  • I am continuing to have difficulty using the simulator, and I assume I must not be setting things up correctly.  My target processor is the 6435.  Currently the code I am simulating does not use any peripherals, although I would eventually like to include use of the edma in my simulation.  So in the setup tool, I believe I should be using the C64+ cycle accurate simulator, little endian.  As mentioned above, my project would not build until I included the rts64plus library in the linker options tab.  After I fixed this step, I repeated the steps in the application tuning tool and it seemed like I was getting appropriate results.  For example, I optimized some loops and put my data buffers in L1D SRAM and the performance continued to improve until about 75% of my cycles were L1P misses.  At this point, I tried to use the cachetune tool to investigate this issue, with no success.  The tutorial (with the 6416 simulator) worked fine but when I switched to my project and the c64+ imulator the cachetune window was empty.

    So I posted this issue in the forum and continued on.  Since my code size is only ~1500 bytes, I thought most of this L1P cache miss penalty must simply be compulsory.  So I called the processing function back to back in a main loop, and measured the cycles for the first iteration versus both iterations.  I still had a huge L1P cache stall penalty (4164 stall cycles for 1 pass, 7415 for 2 passes) and began to think that the simulator thought cache wasn't enabled.  I noticed that the 6416 project had an associated GEL file, so I imported the gel file I have been using with my hardware and stripped out the external memory config steps.  For some reason, the cache setup function continues to print out that L1P & L1D are both configured for 0k cache, despite giving 32k for both when loaded into hardware. 

    Here is the cache setup routine from my gel file, this function is the same as the evm board.  I guess my question boils down to the fact that I haven't found any examples or tutorials that use the c64+ simulator, and wanted to make sure that what I am trying to do is actually supported, and if so, what am I doing wrong? 

    hotmenu
    Setup_Cache( )
    {
        int l1p, l1d, l2;

        GEL_TextOut( "Setup Cache " );
        #define CACHE_L2CFG         *( unsigned int* )( 0x01840000 )
        #define CACHE_L2INV         *( unsigned int* )( 0x01845008 )
        #define CACHE_L1PCFG        *( unsigned int* )( 0x01840020 )
        #define CACHE_L1PINV        *( unsigned int* )( 0x01845028 )
        #define CACHE_L1DCFG        *( unsigned int* )( 0x01840040 )
        #define CACHE_L1DINV        *( unsigned int* )( 0x01845048 )

        CACHE_L1PINV = 1;           // L1P invalidated
        CACHE_L1PCFG = 7;           // L1P on, MAX size
        CACHE_L1DINV = 1;           // L1D invalidated
        CACHE_L1DCFG = 7;           // L1D on, MAX size
        CACHE_L2INV  = 1;           // L2 invalidated
        CACHE_L2CFG  = 0;           // L2 off, use as RAM

        l1p = CACHE_L1PCFG;
        if ( l1p == 0 )
            GEL_TextOut( "(L1P = 0K) + " );
        if ( l1p == 1 )
            GEL_TextOut( "(L1P = 4K) + " );
        if ( l1p == 2 )
            GEL_TextOut( "(L1P = 8K) + " );
        if ( l1p == 3 )GEL_TextOut( "(L1P = 16K) + " );
        if ( l1p >= 4 )
            GEL_TextOut( "(L1P = 32K) + " );

        l1d = CACHE_L1DCFG;
        if ( l1d == 0 )
            GEL_TextOut( "(L1D = 0K) + " );
        if ( l1d == 1 )
            GEL_TextOut( "(L1D = 4K) + " );
        if ( l1d == 2 )
            GEL_TextOut( "(L1D = 8K) + " );
        if ( l1d == 3 )
            GEL_TextOut( "(L1D = 16K) + " );
        if ( l1d >= 4 )
            GEL_TextOut( "(L1D = 32K) + " );

        l2 = CACHE_L2CFG;
        if ( l2 == 0 )
            GEL_TextOut( "(L2 = ALL SRAM)... " );
        else if ( l2 == 1 )
            GEL_TextOut( "(L2 = 31/32 SRAM)... " );
        else if ( l2 == 2 )
            GEL_TextOut( "(L2 = 15/16 SRAM)... " );
        else if ( l2 == 3 )
            GEL_TextOut( "(L2 = 7/8 SRAM)... " );
        else if ( l2 == 7 )
            GEL_TextOut( "(L2 = 3/4 SRAM)... " );

        GEL_TextOut( "[Done]\n" );
    }

  • As I mentioned in another post on this topic, I spoke to a TI FAE yesterday and he told me that the C64+ cycle accurate simulator is basically worthless for modeling cache.  I switched to the DM6437 simulator "board" in the setup menu and things appear to be working as expected now.

  • Hi,

            C64x+ Cycle accurate simulator has 1MB memory size for L1P, L1D & L2, which won't match any of the device simulators. I believe that's the reason DM6437 simulator is working.  We have recently exposed the simulator configuration file so you can change the cache & sram sizes and match any device configuration. Refer TI express dsp wiki - http://tiexpressdsp.com/wiki/index.php?title=C64x%2B_Cycle_Accurate_Simulator  

    regards,

    Mani