This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

What is its unit of number of _delay_cycles()

Other Parts Discussed in Thread: TMS570LS1227, TMS570LS20216, HALCOGEN, RM46L852

Dear TI,

I am using TMS570LS1227 on CCS 6.1(compiler 5.2.2),

As my know, the latest version of compiler 5.2.2 has supported  _delay_cycles(x)

Could you explain what is its unit of x?

For example, when I use _delay_cycles(1000), what is its unit?  1000s?   1000 cpu clock?

Also, I had test the performance of _delay_cycles(x) by RTI.

The _delay_cycles(1000) spend  about 1.97 * 10^-5 sec

The _delay_cycles(100000) spend about   1.94 * 10^-3  sec

Thanks a lot

  • Yang ShunFan said:
    For example, when I use _delay_cycles(1000), what is its unit?  1000s?   1000 cpu clock?

    The units are CPU clock cycles. As noted by the compiler README.txt file the code generated by _delay_cycles assumes zero wait states:

    The __delay_cycles intrinsic inserts code to consume precisely the number of
    specified cycles with no side effects. The number of cycles delayed must be
    a compile-time constant.

    NOTE: Cycle timing data is based on 0 wait states. Results will vary with
    additional wait states. The implementation does not account for dynamic
    prediction. Lower delay cycle counts may be less accurate given pipeline
    flush behaviors.

    As an example I timed the following on a TM4C129NCPDT running with a 120MHz CPU clock, using TI ARM compiler v5.2.3:

    	_delay_cycles (1200000000);
    

    a) When the code was executing from flash, using a timer reported a delay of 1200000019 CPU cycles occurred, and the elapsed time was 10 seconds.

    b) When the code was executing from SRAM, using a timer reported a delay 2400000019 CPU cycles occurred, and the elapsed time was 20 seconds.

    Therefore, the actual delay generated by _delay_cycles depends on the wait-states of the memory used for the code execution, and if interrupts are occurring the delay would also be be affected.

    The example CCS project is attached TM4C129_delay_cycles.zip

  • Dear Chester,

    Thanks your reply.

    1. But if its unit is CPU clocks, what is the result of that I measure the _delay_cycles(1000)  is 3539 CPU clocks? not 1000.

    2. Where can I find the  compiler README.txt file that you talk?

    3. Could you explain what's the wait state?

    Also, Please have my project.

    MCU:TMS570LS1227

    CPU Clock:180MHz

  • Yang ShunFan said:
    1. But if its unit is CPU clocks, what is the result of that I measure the _delay_cycles(1000)  is 3539 CPU clocks? not 1000.

    The intent is for the delay to be measured in CPU cycles. However, the compiler assumes zero wait states are used when generating the code.

    E.g. I had a look at the CPU cycle count reported by CCS 6.1 when a TMS570LS20216 was run with the default reset values for wait states and the CPU clock.

    A _delay_cycles (1000000) reported the number of elapsed CPU cycles of:

    - 3500016 cycles when the _delay_cycles (1000000) code was running in flash (which is 3.5 times the requested delay, which is the same as in your test)

    - 2000024 cycles when the _delay_cycles (1000000) code was running in RAM (which is 2 times the requested delay)

    I think the conclusion of this is that _delay_cycles is not a way to easily get a deterministic way of generating a delay in a Cortex-R4 device, due to having to understand the wait states for the memory and/or pipeling used for the code execution.

    [E.g. It is not immediately obvious why code execution from flash has a delay which is 3.5 times that requested, in that is not exact multiple of CPU clocks]

    Yang ShunFan said:
    2. Where can I find the  compiler README.txt file that you talk?

    The compiler readme file is at:
    <CCS_install_root_directory>/ccsv6/tools/compiler/ti-cgt-arm_<compiler_version>/README.txt

    Yang ShunFan said:
    3. Could you explain what's the wait state?

    The device datasheet has a table which lists the required number of wait-states for a given CPU speed. E.g. for your TMS570LS1227 device:

    The Technical Reference Manual for the device contains details on the registers used to set Wait States.

  • Hello Chester,
    Thanks a lot your reply.
    1. I confirm again. The actually CPU clocks is different with 1000 is due to "wait states"?
    2. For the datasheet you provide, do you know why Flash(data memory) only appears "data waitstates", no "Address waitstates" ?
  • Yang ShunFan said:
    1. I confirm again. The actually CPU clocks is different with 1000 is due to "wait states"?

    That is my understanding. However, enabling the flash "pipeline mode" may change the achieved delay (I haven't tested this).

    My conclusion is that it is difficult to get a deterministic delay from _delay_cycles() due to having to understand exactly how memory interfaces work. To get a known delay suggest instead of _delay_cycles() use either the Cycle Count Register in the Performance Monitor Unit, or a timer. This thread http://e2e.ti.com/support/microcontrollers/hercules/f/312/p/307386/1074253?pi239031349=2#pi239031349=1 has more information.

    Yang ShunFan said:
    2. For the datasheet you provide, do you know why Flash(data memory) only appears "data waitstates", no "Address waitstates" ?

    I am not an expert on the device, but from reading the datasheet think that the Address waitstates only apply to memory which can be used to execute code. Suggest you ask on the Hercules™ Safety Microcontrollers Forum for a better answer.

  • Chester Gillon said:
    My conclusion is that it is difficult to get a deterministic delay from _delay_cycles() due to having to understand exactly how memory interfaces work.

    The generated delay can also change according to the alignment of the generated code.

    E.g. _delay_cycles (10000000) generated the following code:

    00006524:   E304CB3F MOVW            R12, #19263
    00006528:   E340C04C MOVT            R12, #76
              $1_$6:
    0000652c:   E25CC001 SUBS            R12, R12, #1
    00006530:   1AFFFFFD BNE             $1_$6

    When tested in a RM46L852 using HALCoGen project, where HALCoGen had configured the CPU clock to the maximum of 220MHz, with the flash in PIPELINE mode and set the flash wait-states accordingly (1 address wait state and 3 data wait states):

    - If the last two instructions in the code for the _delay_cycles (10000000) don't across a 16 byte boundary took 37,500,000 cycles

    - If the last two instructions in the code for the _delay_cycles (10000000) did cross a 16 byte boundary took 70,000,000 cycles

    The significance of the 16 byte boundary is that is data width of the flash bank in the RM46L852, and the variation in _delay_cycles with code alignment is another reason not to use _delay_cycles_ if a deterministic delay is required.

  • Dear Chester,
    Thanks a lot for your reply.
  • Chester Gillon said:
    To get a known delay suggest instead of _delay_cycles() use either the Cycle Count Register in the Performance Monitor Unit, or a timer.

    A HALCoGen project has support functions to enable and read the Cycle Count Register which is part of the Cortex-R4F Performance Monitor Unit (PMU). A simple function to delay for a given number of can be written as:

    #include "sys_pmu.h"
    
    delay_cycles (const uint32 delay)
    {
    	const uint32 start = _pmuGetCycleCount_ ();
    	while ((_pmuGetCycleCount_ () - start) < delay)
    	{
    	}
    }
    

    Before calling the delay_cycles function for the first time, the Cycle Counter needs to be enabled with the following:

        _pmuEnableCountersGlobal_ ();
        _pmuStartCounters_(pmuCYCLE_COUNTER);
    

    This simple delay function will delay longer the requested value, by the order of 100 cycles as it doesn't account for the minimum delay of calling the function. This could be accounted for by measuring the number of cycles taken for delay_cycles (0) and removing that from the requested delay.