This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

How to change compiler/assembler behavior

Other Parts Discussed in Thread: TMS320C6457, CCSTUDIO

Code Composer Version: 4.2.3.00004 on a TMS320C6457

I have this annoying problem in that my compiler/assembler is smarter than me.

I need to perform sequential operations to a flash memory to read manufacturing id etc.

which requires 3 sequential writes ( using volatile keyword ) and then perform  a read ( also volatile ) to get the manufacturing id , device id etc.

Somehow the "read" access is getting executed before the third write executes - which gives a bogus value.

I have even put the three writes into a separate function and after the function call to the writes , i perform the read.

IT STILL MANAGES TO PERFORM THE READ AFTER ONLY 2 OF THE 3 WRITES  HAVE EXECUTED!!!

Does anyone know how to force the compiler/assembler to execute in order, or stall??? what flags to try?? which antidepressants are best???

Thanks

  • It's smarter than me, too, so I doubt we can outsmart it.

    But seriously, using volatile should be enough to enforce a strict ordering, at least among other volatile accesses.  If you have 4 volatile references which do not come out in exactly the same order as in the source code, that is a bug.  Could we see a test case?

  • I would be glad to give you anything you need . What exactly do you want me to package up?

    this is the code snippet;

    #include  <stdio.h>

    void

    main(void )

    {

    volatile unsigned char *rom = ( volatile unsigned char *)0xB0000000; //ROMBASE;

    volatile unsigned char manuf_id ;

    rom[0xAAA] = 0xAA;    // write unlock cycle 1

    rom[0x555] = 0x55;    // write unlock cycle 2

    rom[0xAAA] = 0x90;   // write autoselect command

    manuf_id = rom[0];

    printf("ROM: Manuf_id = %d\n",manuf_id);

    }

  • I need the compiler version (not the same as the CCS version) and the command-line options used

  • Please let me know if this is incorrect or incomplete. Thank you.

    **** Build of configuration Debug for project testcase ****

    C:\CCStudio_v4\ccsv4\utils\gmake\gmake -k all

    'Building file: ../main.c'

    'Invoking: Compiler'

    "C:/CCStudio_v4/ccsv4/tools/compiler/c6000/bin/cl6x" -mv64+ -g --include_path="C:/CCStudio_v4/ccsv4/tools/compiler/c6000/include" --diag_warning=225 --preproc_with_compile --preproc_dependency="main.pp" "../main.c"

    "../main.c", line 16: warning: last line of file ends without a newline

    'Finished building: ../main.c'

    ' '

    'Building target: testcase.out'

    'Invoking: Linker'

    "C:/CCStudio_v4/ccsv4/tools/compiler/c6000/bin/cl6x" -mv64+ -g --diag_warning=225 -z -m"testcase.map" --warn_sections -i"C:/CCStudio_v4/ccsv4/tools/compiler/c6000/lib" -i"C:/CCStudio_v4/ccsv4/tools/compiler/c6000/include" --reread_libs --rom_model -o "testcase.out" "./main.obj" -l"libc.a" "../lnk.cmd"

    <Linking>

    'Finished building target: testcase.out'

    ' '

    C:/CCStudio_v4/ccsv4/utils/gmake/gmake --no-print-directory post-build

    C:/CCStudio_v4/ccsv4/tools/compiler/c6000/bin/cl6x --compiler_revision

    6.1.9

    ' '

    Build complete for project testcase

  • I cannot reproduce this problem.  Could you please show me the generated assembly code which displays the error?  You'll need to use the -k (--keep_asm) option.

  • i'm not very capable - hopefully i've attached the file correctly.

    Thanks

     

    main.asm
  • I don't see any problem at all in the assembly file you posted ( I get the same assembly code when I compile the test case).  There are plenty of cycles after each write before the next access through that pointer, so the four accesses should be occurring in the order desired, at least as far as the CPU is concerned.  How do you know that the accesses are happening out-of-order?  Perhaps there is a problem with the memory controller; perhaps it doesn't stall long enough upon the read being issued?

  • The access to rom ( 0xb0000000 ) is set up for 8 bit async mode with very slow timings( because of the flash part - not for inspection purposes ) . I am able to observe the external access quite clearly on an oscilloscope. The rom[0] read is occurring after the second write ( rom[0x555] = 0x55 ) . The C6457 is the 1.2 GHZ part and we are set up for 120MHZ EMIFA. the CE3 cfg is 0x033D99EC. AWCC = 0x1FF. If you run the emulator from "main" until the breakpoint at the printf - you should be able to see it hopefully. Thanks for sticking with me so far - I appreciate it.

    Thanks

  • I'm sorry, this issue is beyond my knowledge.  I can verify that the assembly code is correct, so this is not a compiler bug.  For further assistance, you'll need to post a new thread in a C6000 hardware forum.

  • Thank you for your efforts - i will take your advice and post to another forum.

    Thanks

  • If the assembler does indeed see the operations in order (and not executing in parallel) as generated by the compiler, then it won't rearrange their order itself, so as already noted it wouldn't be a code generation problem.  There are only two things I am aware of which intervene in the read/write process: EMIF configuration and caching.  You should have somebody double-check the EMIF configuration (examine its registers right before executing the W/R operations) against the EPROM datasheet.  I'm not familiar with your specific processor model, but if it has something like MARs that control what addresses are eligible for caching, make sure that the ones covering your EPROM are configured to disallow caching of those addresses.

  • Wow, a dozen replies on a day before a USA holiday when lots of people are taking a couple of days off. You are getting some good attention on this.

    Did you choose to have numbers instead of a name?

    We tend to think of the C6000 pipeline as being 6 cycles in length. This is because of our normal view of the code and execution, and the longest apparent effect of the pipeline is in a Branch instruction where you have the instruction itself plus 5 delay slots to deal with. Other instructions like LDx are thought to have 4 delay slots because that is how long it takes to make sure the read result makes it to the destination register.

    But there are a few cycles before the "start" of an instruction, when the instruction is being fetched, and these do not normally play into our attention because we do not see them. And there are potentially several more cycles after the "end" of an instruction when some delay execution activity might occur.

    We have a C6000 Optimization Workshop on the Wiki that goes into a little more detail on this, and the C64x+ CPU & Instruction Set Reference Guide goes into even more detail.

    So, your writes actually occur a little later in the pipeline than the reads, so if a write is followed by a read too soon in the pipeline, the read will actually reach the EMIF peripheral first. I am a little surprised that this happens with all the NOPs in your unoptimized code, but since you say you are seeing this happening, I believe you. And this pipeline inversion (just to make up a good term) can occur. I am only 25% sure that this is the problem, though.

    It is also possible that the EMIF peripheral is giving priority to reads over writes. Since the physical addresses are pretty far apart, it may assume that it is okay to do the read to avoid stalling the DSP. The writes will go into a write buffer and get stored up, then when the read command comes in it may get prioritized over the writes. This is discussed in the C6457 EMIF User's Guide Section 7.1 "Command Ordering and Scheduling". It is a twisted bit of logic and sequencing, but it probably explains what you are seeing. I am 75% sure that this is the problem you are seeing.

    So what are your solutions:

    1. If you can read the ID from any address of the Flash, read it from the same address as the last write. I am pretty sure it will always make sure a write to a certain address will complete before doing a read from that same address.

    2. If you can do a dummy write to offset 0 before the read, this should also make sure everything is in the right order.

    3. Something benign that might work would be to insert a write-read pair to another dummy chip select space. You may need to configure another CEn space so it gets handled cleanly, but it will not interfere with your 0xb0000000+ accesses. This should put that write into the EMIF command FIFO and then the read from that same address would go in also. The EMIF should guarantee that the writes all happen in order, ending with the dummy-space one. And the EMIF should guarantee that the dummy-space read does not happen until after the dummy-space write. Then your Flash read will be forced to wait until the dummy-space read since the reads will always complete in order.

    4. Stick in a delay loop before the read so that all the writes will have finished. You will have to be careful with this when you finally turn on optimization, so you will want to use a counter variable that is volatile.

    And you will eventually turn on optimization, right? It makes a big difference in performance, but of course it is much easier to debug things like this problem with optimization off or at least not at maximum strength.

    Regards,
    RandyP

  • No experience with this processor. On other processors I worked with, there is usually special instructions to prevent reodering of instructions in the pipeline. Usually named something like memory barrier or interrupt barrier, The "CPU and Instruction Set Guide", spru732j.pdf, doesn't appear to show such an instruction but it does show usage of NOP 5 to fill up the pipeline. I am guessing mixing in some NOPs might help:

    main(void )
    {
      volatile unsigned char *rom = ( volatile unsigned char *)0xB0000000; //ROMBASE;
      volatile unsigned char manuf_id ;
      rom[0xAAA] = 0xAA;    // write unlock cycle 1
      asm(" nop 5");
      rom[0x555] = 0x55;    // write unlock cycle 2
      asm(" nop 5");
      rom[0xAAA] = 0x90;   // write autoselect command
      asm(" nop 5");
      manuf_id = rom[0];
      printf("ROM: Manuf_id = %d\n",manuf_id);
    }

  • (Copy from other thread)

    Hello,

    The problem sounds like if the EMIF region of concern was cached (which should not be for R/W access to a flash device).

    You may check the MAR176 register (for the 0xB0000000 region) is 0, or simply add something like "((unsigned*)0x01848000)[0xB0] = 0" before EMIF usage.

    Jakez

  • First of All , thank you to all who contributed their time and brainpower to this issue - it was truly frustrating for me. And thanks for contributing so close to the holiday when all our minds are elsewhere. Thank you RandyP for the excellent suggestion regarding the EMIF controller - for that was apparently the case. The EMIF controller must have been promoting my "critical read" ahead of the last write - due either wholly or in part to the "different block address ( rom[0xAAA] is 2730 bytes away where blocks are 2048 bytes long ) . I have yet to scour my entire code for these potential violations , but the fix for the testcase is elegantly simple;

    main(void )
    {
      volatile unsigned char *rom = ( volatile unsigned char *)0xB0000000; //ROMBASE;
      volatile unsigned char manuf_id ;
      rom[0xAAA] = 0xAA;    // write unlock cycle 1
      rom[0x555] = 0x55;    // write unlock cycle 2
      rom[0xAAA] = 0x90;   // write autoselect command
    //
    // the fix - this next read forces the previous write to happen in order
    // because the read and write address is the same
    //

      manuf_id = rom[0xAAA];
    //
    //
      manuf_id = rom[0];
      printf("ROM: Manuf_id = %d\n",manuf_id);
    }

    What a great community of developers we have - i hope i can contribute something in the future that will help someone else out.

    Thanks again