This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Compiler/EVMK2G: Profiling code. Are these reasonable results?

Part Number: EVMK2G

Tool/software: TI C/C++ Compiler

Hi,

I'm running some code on the DSP of the EVMK2G, straight to metal (using the JTag emulator on the board).  I'm not very experienced with DSP programming, and much less on the theoretical side.  However, I wanted to profile some bit of code to make sure it can make timing constraints.  I'm using the profile clock (the clock in the bottom right of CCS).

Before doing that, I wanted to test out profiling a trivial line of code.

	uint8_t triv = 0;
	triv++;

I'm consitently getting 17 clock ticks for the triv++ line.  Just the increment, nothing else.

Can someone tell me if that is reasonable for that line of code?

This might be a stupid question, but I've seen a bug with the profiler in the past, where I was getting wildly inaccurate clock results.  The bug seems to come and go, for some reason.

I would like to be 100% sure that the profiler clock is working before I go ahead profiling our code.

  • Hi,

    We're looking into this. Feedback will be posted here.

    Best Regards,
    Yordan
  • Hi,

    You should get 7 cycles on L2 memory (it can vary if the CPU stall reading the data from slow external memory).

    The CCS clock tool is not appropriate to take time for single statement.

    When you measure a single statement with CCS, the clock tool measure also the time required to clear the processor pipe (well, maybe "clear" is not the best definition), so, for instance, a single NOP seems to cost 6 cycles instead of 1.

    Try for instance to measure 8 or more consecutive increments: you should see that the single increment cycles count will converge toward 7.
  • Justyn,

    For DSP level benchmarking, the DSP has a couple of counter registers that provide very accurate benchmarks. Please look at the TSCH and TSCL registers. There are plenty of app notes that describe benchmarking using these registers.

    www.ti.com/.../core-benchmarks.page
    www.ti.com/.../sprac13.pdf

    If you look at the DSPLIB in your processor SDK installation, you should be able to locate the code under ${DSPLIB_INSTALL_PATH}/packages/ti/src/<algortihm name>/ C66/*_d.c file

    the values obtained from these registers will be in terms of DSP cycle counts.

    CCS clock does work in many cases to get ball park estimates however I believe that there is some dependency on host clock so it is not very accurate. You may be able enquire about the accuracy of the clock on CCS forums.

    If you are using TI RTOS, you can use SOC timers to measure clocks. There is a template for BEnchmarking using TI RTOS in CCS, which you can find in the resource explorer using the steps here:
    processors.wiki.ti.com/.../Processor_SDK_RTOS_Examples

    Regards,
    Rahul
  • One more thing:

    Can you look at the assembly code and verify that indeed the increment takes exactly one instruction?

    You can enable the assembly code by selecting properties->compiler->Advanced Options->Assembly Option and check the keep the assembly. The assembly code will be in the debug (or the release) directory

    Please report what you see


    Ran
  • I really don't know how to read the assembly, but here is where I think the triv++ call occurs.  Also with some context surrounding it.

    $C$RL28:   ; CALLP OCCURS {C6_TIMER_READ} {0}  ; [] |173| 
               DADD    .L2X    0,A5:A4,B5:B4     ; [B_L66] |173| 
               STDW    .D2T2   B5:B4,*SP(16)     ; [B_D64P] |173| 
        .dwpsn  file "/home/osp/dev/osp-keystone/AUDK2G_loopback/src/main.c",line 174,column 2,is_stmt,isa 0
               LDBU    .D2T2   *SP(32),B4        ; [B_D64P] |174| 
               NOP             4                 ; [A_L66] 
               ADD     .L2     1,B4,B4           ; [B_L66] |174| 
               STB     .D2T2   B4,*SP(32)        ; [B_D64P] |174| 
        .dwpsn  file "/home/osp/dev/osp-keystone/AUDK2G_loopback/src/main.c",line 175,column 2,is_stmt,isa 0
    $C$DW$91    .dwtag  DW_TAG_TI_branch
        .dwattr $C$DW$91, DW_AT_low_pc(0x00)
        .dwattr $C$DW$91, DW_AT_name("C6_TIMER_READ_AND_DIFF")
        .dwattr $C$DW$91, DW_AT_TI_call
    

    Here's the context for the c code with correct line numbers:

    	C6_TIMER_START();
    	time = C6_TIMER_READ();
    	triv++;
    	diff = C6_TIMER_READ_AND_DIFF(time);

    I did a bunch of increments in a row, and it seems that the number of clock ticks tends towards ~2, not 7.

    Also, as you might be able to see, I found this guys post regarding profiling using the TSCL and TSCH registers, and used his helper functions.  They work out extremely well: