This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Correct data on simulator, wrong data on device !!

Other Parts Discussed in Thread: CCSTUDIO, TMS320C6678

Hello,

I'm coding an algorithm in Standard C66x ASM, I believe having enough documentation/knowledge to deal directely with Standard ASM ..

When i run my code on simulator (C6678 Device Cycle Approximate Simulator, Little Endian) , i get correct and expected results ;; however when i run it on device (TMX320C6678L EVM) .. i get half of the results wrong .. 

First, is it normal that the simulator gives different results than the associated device ?

And, when i try to step-by-step debug the problem on device, i have noticed that some instructions are not executed even if the core passes through them on program memory !! i notice that phenemenon when almost all the core units are doing something .. Is there a limit on the number of updated registers in a 1 cycle ?

Thanks

  • Another weird behaviour is that in a certain situation, the simple instruction : SUB .S1 A0,1,A0 doesn't work (A0 remains unchanged after 1 cycle) ; when i change the unit .S1 to .D1, it works !!

  • I am not sure if your questions are related to the simulator (which is covered in the Code Composer Forum) or the device. Mostly, your questions seem to be related to the device, and the operation of the device has been confirmed by many users and by TI test programs and applications.

    Nios Ensa said:
    is it normal that the simulator gives different results than the associated device ?

    This is definitely not normal. This has not been reported before, to my knowledge, but I am not on the simulator team nor the silicon design team.

    Nios Ensa said:
    Is there a limit on the number of updated registers in a 1 cycle ?

    There is no such limit. Each core executes independently from the others.

    Nios Ensa said:
    in a certain situation, the simple instruction : SUB .S1 A0,1,A0 doesn't work

    This should never occur. Either you have found significant bugs that should keep these popular devices from operating correctly for all the users who are using them, or you are using instructions that are never used by other users, or our documentation is not sufficient to deal directely with Standard ASM.

    Can you offer any other ideas for what could be different about your setup or observations compared to what successes the device has had in the market?

    Regards,
    RandyP

     

  • In the compressed file, there's an ASM function I wrote (p2p.asm), that you can call from C, in that way :


    #define N 1024

    void _p2p(double *inp , double *out, unsigned int n_per_pas, int *svg, int pas, double W1S, double W2S, double W3S, double W4S, double W6S);

    int pas=N/16;

    double inp[N];
    double outp[N];

    int svg[20];



    _p2p(inp,outp,N/pas,svg,pas,1.0,1.0,1.0,1.0,1.0);

    Without initializing the input buffer, you can run that code on the CCStudio simulator for C66x and will be executed in about 1300 cycles .. that same code won't run on device and will loop for no end and get disconnected from the host (because the instruction on the line number 212 of ASM : [A1] SUB .S1 A1,1,A1 is not decrementing the loop counter, for a really unknown reason) ..

    The asm code structure is a nested loop implemented using the SPLOOP buffer, merging a setup iteration and reloading again ..

    So RandyP, please can you help, and check if the same thing happen with you when running on device [TMS320C6678]

    Thank you

    5621.p2p.rar

  • RandyP,

    One of my questions wasn't clear, i meant : is there a limit on the number of written registers (banc registers) in 1 cycle by 1 core units (.S / .L / .D / .M) ? I didn't find any constraints in the user guide, then apparently, there's no limit ..

    Now, I think that there might be a bug somehow in the device, that could rarely appear in a full pipeline state [using all of the units .S/.L/.D/.M], is there a bug list, and possible workarounds ? I spent much time building the ASM solution, and now i can't even know why it didn't work :(

    Thanks

  • Nios,

    The CPU & Instruction Set Reference Guide is the place to look for details on specific assembly instructions and on instruction-level architecture. If this Reference Guide is not what you mean by "user guide", then it should be referenced.

    There is an errata document that describes issues that have been identified with devices, and workarounds for those when possible.

    We always recommend writing your application in C, using all of the optimization techniques available, and then see if the performance is adequate for your application. If you still decide to write in assembly, then the output from the compiler is the best place to start.

    It is very difficult to debug an assembly program, especially one written by someone else. The C6000 instruction set is very difficult to program, even for experienced programmers. The compiler is very good at it; we humans are not, although we can achieve great results at times. When we write code that does not work, it is harder to justify a mistake in the code than to see a failure in the DSP. But I have my doubts, as you surely understand and expect.

    Regards,
    RandyP

  • Hi RandyP,

    I finally found the source of my problems, then I write a very simple ASM code that works in simulator, and not in device :

     .global _asm_test
    _asm_test:

    MVK .S1 10h,A19

    DADDSP .L1 A15:A14,A11:A10,A13:A12

    NOP 1

    SUB .L1 A19,1,A19 ; that instruction doesn't decrease A19 in device

    NOP 5

    B .S2 B3
    NOP 5



    Working in a simulator the instruction SUB .L1 A19,1,A19 decrease the register A19 by 1 (as expected) .. but in device it doesn't (I tried on 2 C6678 boards, with same behaviours)

    By changing the SUB instruction unit to .S1, it works .. does this mean that DADDSP has functional unit latency more than 1 ?

    Thanks



  • Hello,

    I got an explantion, .L / .S units are having 64-bit write ports, so they can't update more than a register pair in 1 cycle.

    Also, that kind of conflicts is not detectable by the compiler.

    Regards

  • Nios,

    You do truly have excellent knowledge of the C6000 assembly language for the difficult task of programming in assembly. This thread highlights your good skills, and I hope it also offers an example to others of how difficult it is to work with many small and subtle issues with the language.

    This thread appears to show an error in the simulator, assuming it did not throw an exception. Did you check the exception registers in either case (simulator or hardware) to see if the silicon was catching this as an error? One advantage of using SYS/BIOS is that it has an easy interface for using and interpreting the exception registers.

    I will get this thread moved to the Code Composer Forum where the simulator team can evaluate this to decide if it is a simulator bug or not.

    One clarification on your explanation, ".L / .S units are having 64-bit write ports, so they can't update more than a register pair in 1 cycle": The .Lx/.Sx units have a single write port, so they cannot do more than one update in 1 cycle. The text in the CPU & Instruction Set Reference Guide says

    "Even though the .L/.S units can also execute 4-cycle and 2-cycle instructions, two independent writes from the same .L or .S unit to the register file [in] the same instruction cycle is not supported and will result in an exception and erroneous values being written to the destination registers."

    Another clarification on your comment that this "kind of conflict is not detectable by the compiler": The compiler most likely DOES know about this conflict and would never write code that would try to write two values from the same .Lx/.Sx unit in the same cycle. The assembler did not flag this conflict. It would be nice if the assembler did flag it, but I am  not sure if that is a bug or not.

    We thank you for your hard work.

    Regards,
    RandyP