This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TMS320F280049: Assemble instruction Execution timing is not followed user guide

Part Number: TMS320F280049


Hi Team,

  During the TMS320F280049 debug, we found that the assemble instruction execution timing is not followed the user guide,  for example,  using RPT  or NOP  or MOVL for test, the test result would be different.

  Could u kindly give comments for the difference? and how to avoid the difference in assemble source code written or project configuration?F280049 Code Execution Issue.docx

  • Benjamin,

    You have a register conflict here. Consider the instruction pair:
    MOVL XAR6, #_EPWM2_TBCTL
    OR *XAR6, #0x0040

    Keep in mind the OR instruction is a R-M-W (Read-Modify-Write) instruction which reads the data at address #_EPWM2_TBCTL, performs a logical OR with data #0x40, then writes the result back to the same location. Because of the pipeline, if these instructions executed with no delays the OR instruction would read from *XAR6 before the MOVL instruction had written to it. The result would be the OR would happen with data from a different location (in this case #GPASET_H). To avoid this, the C28x inserts delay slots into the pipeline so the conflict doesn't happen and the instructions execute in the order they should. If you'd like to test this, try inserting a couple of NOPs between these two lines and you'l find the timing doesn't change. You can't avoid these delays, but you can make use of them by putting other non-conflicting instructions in the delay slots. The background is in chapter 4.4 of the CPU instruction guide (spru430f).

    Note also that the EPWM registers are in peripheral frame 1 which (I believe) has 2 read wait states, so you'll be seeing those in the OR timing too.

    Regards,

    Richard
  • Hi, Richard
    Your answer was helpful for us about the assemble instruction execution timing, thanks!
    However, recently we use the assemble instruction "LOOPZ @_EPWM1_TBTST,#0x0001", we found that this assemble instruction execution timing was not constant. It would take 7 ~ 10 SYSCLK.
    So how to explain about this phenomenon?
  • Hi Born,

    Could supply details of how you are measuring the execution time please?

    Regards,

    Richard
  • Hi Richard

    I measured the execution time by using XDS100V3 emulator and CCS v8.1.
    With the single step debug on "LOOPZ @_EPWM1_TBTST,#0x0001", I checked the CPU Timer counter to measure the execution time.

    Was this way of mesasuring correct ?

    Thanks for your reply.
  • Hi Born,

    You should be able to do it that way, however I'm getting very repeatable results when I try it.  I see 6 cycles for each iteration of the loop when the test is false, and 7 cycles when the test is true and the loop exits.

    Can you tell me what memory address _EPWM1_TBTST is please?

    Regards,

    Richard

  • Hi Richard,
    At first, thanks for your timely reply.

    The _EPWM1_TBSTS was the register of EPWM1 TBSTS(Time Base Status Register by 280049), its address is 0x00004005.

    Could you tell me why there is 6 cycle when the test is false and 7 cycle when the test is true? How to calculate about it ?

    Will the result be same with 28032 when I do the same thing above?

    Thanks!
  • Hi Born,

    Thanks. I thought it might be a variable in some wait-stated memory but this should be fine. The 6 & 7 cycle timings are what I'm measuring with the CPU timer method, as you are doing. The LOOPZ instruction is single cycle, but as you single step through an instruction with the debugger the pipeline will be flushed so I'm not surprised to see these numbers. Where the additional 1 cycle comes from I'm not sure, but for me the numbers are always repeatable. I'm not able to explain why you get variation. Are you able to say anything about the nature of the 7 - 10 cycle variation: is it random or just an occasional excursion to 10 cycles?

    Regards,

    Richard
  • Born,

    I missed the last part. The F28032 has an identical CPU (just no FPU) so your measurements will be the same.

    Regards,

    Richard
  • Hi Rechard,

    After single step through this instruction(LOOPZ) with the debugger again, I found that there be 7 cycles when the result was 0(As step 5 describle), and there be 9 cycles when the result was not 0.This phenomenon was regular. But different from LOOPZ instruction, the LOOPNZ would found that there be 7~8 cycles when the result was not 0, there be 9 cycles when the result was 0.
    Note that: The LOOPNZ with 8 cycle happened once when the last time result was not 0(Meanwhile. the next single step was 9 cycles).

    Could you have any explanation about it?

    Thanks.
  • Born,

    I cannot explain why you saw that rogue reading. It may have been something to do with the debugger. Let me describe what I'm doing to measure the cycles. Can you try this:

    - Embed the instructions of interest between two sequences of NOPs, like this:
    NOP
    NOP
    NOP
    NOP
    NOP
    NOP
    NOP
    NOP
    NOP
    NOP
    ; LOOPZ @temp,#0x0001
    LOOPNZ @temp,#0x0001
    ; NOP
    NOP
    NOP
    NOP
    NOP
    NOP
    NOP
    NOP
    NOP
    NOP
    NOP

    - Set break points on the first and last instructions
    - Run to the first, clear the profile clock, run to the second
    - Note the reading, then swap two appropriate commented/un-commented lines, and repeat

    What you should be seeing is a difference of 5 cycles when the loop condition is true (i.e. N+5, so you're seeing the "5" for one loop).

    When the condition is not true, you can run to the LOOPZ/LOOPNZ instruction, then click Ctrl+Shift+F5 (Assembly Step Into) and check the profile clock. It should be incrementing by 5 each time.

    If this is what you see it's inline with the manual. I think the issue is getting clouded when you step over instructions using the de-bugger.

    Regards,

    Richard
  • Hi Richard,
    I apologize with my poor English.

    I have done what you suggested, and the result was same with yours.
    But when I only replaced the “temp” with "_EPWM1_TBSTS", the result as I described recurred. The _EPWM1_TBSTS was the register of EPWM1 TBSTS(Time Base Status Register by 280049), its address is 0x00004005. EPWM1 counted up-down by 100kHz.

    Could you do the same operating? And see what happen?

    Thanks.
  • Hi Born,

    I am getting the same result as my previous post with this:

    TBSTS .set 4005h
    ...
    MOVW DP, #TBSTS
    NOP
    NOP
    NOP
    NOP
    NOP
    NOP
    NOP
    NOP
    NOP
    NOP
    LOOPNZ @TBSTS,#0x0001
    NOP
    NOP
    NOP
    NOP
    NOP
    NOP
    NOP
    NOP
    NOP
    NOP

    Can you try this sequence and let me know what you find?

    Regards,

    Richard
  • Hi Richard,

    I have tried the sequence. I found that it takes 7 cycles when LOOPNZ condition was not ture, and takes 9 cycles when LOOPNZ condition was ture.

    May the reason be difference of the debug emulator? (mine is XDS100V3)
    Which is yours?

    Thanks.
  • Hi Born,

    I am using a v2 emulator, but it would be surprising if that was the cause. Can you tell me the physical memory address where your LOOPNZ instruction is located please?

    Regards,

    Richard
  • Hi Richard,
    The physical memory address is 0x000081ac where my LOOPNZ instruction is located.
    Would this be a reason?
  • I was wondering if your variable was located in slower memory but at this address it's in zero wait-state RAM so the access time is not a factor. I must confess I'm stumped on this one. I can't explain why you're see why you're getting different results.

    Just to re-iterate: you set BPs on the first and last NOP, you run and take the cycle count, then replace the LOOPNZ with a NOP, then repeat the measurement and the difference is either 7 or 9 cycles, right?

    Regards,

    Richard
  • Hi Richard,
    I run the code below(which you recommended):

    TBSTS .set 4005h
    ...
    MOVW DP, #TBSTS
    NOP
    NOP
    NOP
    NOP
    NOP
    NOP
    NOP
    NOP
    NOP
    NOP
    LOOPNZ @TBSTS,#0x0001 ; (BP1)
    NOP
    NOP
    NOP
    NOP
    NOP
    NOP
    NOP
    NOP
    NOP
    NOP

    After the code run to BP1, I use single step debug to measure the cycle count by checking CPU TIMER0 TIM Register. Becase of the PWM1 up-down count, it takes 7 cycles when the condition of LOOPNZ is true(code executed: LOOPNZ --> LOOPNZ), and it takes 9 cycles when the condition of LOOPNZ is not true(code executed: LOOPNZ --> NOP).

    Thanks!
  • Hi Born,

    I don't want you to single step. What I'm asking is for you to run between two breakpoints on the first and last NOPs and check the cycle count difference between having the LOOPNZ instruction there and replacing it with a NOP. This will exclude the debugger from the issue.

    Regards,

    Richard
  • Hi Richard,
    I did a test as you mentioned. There had 8 cycles difference between having the LOOPNZ instruction and replacing it with a NOP.

    Did you have the same phenomenon?
    Thanks
  • Hi Born,

    Could you tell me how many total cycles you measured from the first NOP to the last NOP, with and without the LOOPNZ instruction in place please?  I was seeing 26 and 21 respectively.

    Regards,

    Richard

  • Hi Richard,
    My test result was 29 and 21 respectively.

    Expect for your reply, thanks!
  • It is best to use a CPU Timer to profile instead of the Clock feature in CCS. When setting breakpoints and single stepping, the pipeline is usually flushed completely.

    See if you get a more accurate answer using a CPU Timer.

    sal
  • Hi sal,
    I tried to test the executed time by meausring the width of GPIO toggle(not connect to emulator). And I found the same result.
    Expect for your reply, thanks!
  • Born,

    One last try: the assembly stub below is what I'm using to test this. 

    TIMER1TIM  .set    0C08h

    ...

    MOVW DP, #TIMER1TIM>>6
    MOV AL, @TIMER1TIM
    MOVW DP, #TBSTS
    NOP
    NOP
    NOP
    NOP
    NOP
    NOP
    NOP
    NOP
    NOP
    NOP
    LOOPZ @TBSTS,#0x0001
    ; NOP
    NOP
    NOP
    NOP
    NOP
    NOP
    NOP
    NOP
    NOP
    NOP
    NOP
    MOVW DP, #TIMER1TIM>>6
    MOV PL, @TIMER1TIM
    SUB AL, PL

    NOP

    First, configure CPU timer 1 to run at the CPU clock. 

    If you set a BP on the final instruction and run to it you should see 29 (0x1D) in AL.  Then, comment out the LOOPNZ instruction and un-comment the NOP below it.  Re-build and repeat the test.  The difference in AL should be 5.  

    Can you let me know what you find please?

    Regards,

    Richard