This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Weird TSCL/TSCH value

Other Parts Discussed in Thread: TEST2

Hi all,

I want to measure the run time of my code. Since I didn't use DSP/BIOS and EVMC6474 does not support clock()(always return 0), I used TSCL/TSCH to measure the time, But the value seems to be very weird and was not what it was supposed to be.

int main() (Core0)

{

TSCL=0; 

start = TSCL; (start=1)

starth = TSCH;(start=0)

<initialize ipc interrupt>

stop = TSCL; (stop=8954)

stoph = TSCH;(stoph=0)

<ipc notify core1>

stop = TSCL; (stop=15954)

stoph = TSCH;(stoph=0)

<ipc wait core1>

stop = TSCL; (stop=1496240583)  //very weird here

stoph = TSCH;(stoph=0)

return 0;

}

 

 

int main() (Core1)

{

TSCL=0; 

start = TSCL; (start=1)

starth = TSCH;(start=0)

<initialize ipc interrupt>

stop = TSCL; (stop=9585)

stoph = TSCH;(stoph=0)

<ipc wait core0>

stop = TSCL; (stop=16606)

stoph = TSCH;(stoph=0)

<ipc notify core0>

stop = TSCL; (stop=23715)

stoph = TSCH;(stoph=0)

return 0;

}

What's wrong here?  Is it because the counter reading was interrupted by IPC? 

 

Thanks,

lpeng

 

  • lpeng said:
    What's wrong here?  Is it because the counter reading was interrupted by IPC?

    Interrupting the read of TSCL and TSCH will not lead to incorrect values being read. The read of each individual register will not be interrupted since it will be read by a single MVC instruction with the value going to a register. Reading the value from TSCL latches the correct value of the high-half of the 64-bit counter into TSCH so it does not matter how long it is before you get around to reading TSCH.

    I recommend you start with the assumption that the value you see in TSCL is correct. If it is correct, this represents approximately 1.5 seconds of delay, and that can be checked by setting a breakpoint at the last read of TSCH and trying to judge the time by watching the second hand on a clock.

    I further recommend toggling a GPIO and watching it with a scope to confirm the long delay, or starting one of the general-purpose timers and checking its value against what you get with the TSCL.

    To explain the long delay time requires knowledge of the memory type / addresses / cache settings and other activity going on with the 3 cores and other bus masters. And of your method of sending and receiving IPC notifications. I do not recall having any such long delays when I used IPC between C6474 cores, but most of my testing was on a step-by-step basis so I could have missed it.

    There is a thread at http://e2e.ti.com/support/dsp/tms320c6000_high_performance_dsps/f/112/p/43611/166416.aspx#166416 which discusses some of the ideas of IPC and includes a link to another thread that discusses an example that uses BIOS. Since you do not want to use BIOS, I will attach here an example that does the same thing without BIOS, but for most applications, DSP/BIOS is the easiest way to get a software system running and it is very scalable to allow you to only use the parts that you need to use. And it has a reasonably small memory footprint even at its largest size.

    Attached is the C6474_Edma_IPC.zip example / demo code for running sequential processing on all three cores of the C6474. This version does not use DSP//BIOS but only uses CSL and the standard run-time support library. It includes an EDMA interrupt dispatcher and an IPC interrupt dispatcher. It uses EDMA to transfer data from one core's L2 SRAM to the next core's L2 SRAM and implements interrupts after the transfers complete (to the initiating core). And it uses the Inter-Processor Communication (IPC) registers to generate signalling interrupts on the next core.

    In the comments at the top of main.c, you will find an explanation of what is happening in the demo code and instructions on how to run it so you can see the behavior of the data movement and processing.

    In this code, I use the simplest method I know of to run demos on 3 cores. All of the cores run identical code, which is duplicated when loaded into their own local L2 SRAMs. Using the Chip-Level register to determine the CoreNumber, each core can then run with slightly different run-time variables. This is just one of the ways you can run code on multiple cores; I find it very easy to do with just a little worry about synchronizing and timing. For more good ideas on writing code for multiple cores, see the Multicore Programming Guide.

    The project file was built for CCSv3.3 and has not yet been updated to CCSv4. If you feel like doing that, please do and post it here. I will not have time to do that for a while. Please note that you may need to adjust some of the paths in the .pjt file to match your installation for the libraries that need to be included.

    C6474_Edma_IPC.zip
  • Hi RandyP,

    Sorry for this late reply, we had a vacation.  I have tried to setup timer0, and the value was different from the value I got using TSCL. Then it should be some kind of delay? whether or no  the value I got using timer0 is what I expect. : ). I have read this EDMA_IPC example before, and I learn how to use IPC and EDMA from it. 

    I have another question about profile using timer. 

    I defined an array in DDR, and it is used by all three cores for communication. But when I set different length of the array, the execution time(using timer) of the same code section in one core(a for loop, which use the data in L2 copied from this DDR array) differ. I am sure I have already turned off the cache. I guess maybe the frequent of the core is not stable? or different array length leads to different read/write DDR competition status?

    Thanks,

    lpeng 

  • Peng Liang said:
    I have tried to setup timer0, and the value was different from the value I got using TSCL.

    That is very odd that you get a different answer, meaning a non-linear relationship between the delta values, right? TSCL should be as accurate and more precise than timer0, so something is strange.

    Peng Liang said:
    I defined an array in DDR . . .

    I do not think I understand what you are describing. It sounds like you are saying that when you make an array larger, it takes longer to process it. I would expect that to be true, so I need you to explain more about what your concern or question is.

  • Hi RandyP,

    RandyP said:

    That is very odd that you get a different answer, meaning a non-linear relationship between the delta values, right? TSCL should be as accurate and more precise than timer0, so something is strange.

    Yes, pretty strange. I will check that again later.

     

    RandyP said:

    I do not think I understand what you are describing. It sounds like you are saying that when you make an array larger, it takes longer to process it. I would expect that to be true, so I need you to explain more about what your concern or question is.

    I tried to show you the problem with a simple project, but I failed.  I attached another test project, and I have a question about the experiment result.  Why there are two different result? (Nocomm_11, 12 and Nocomm_21, 22)(the main difference is the first count value) Have you ever encounter this problem?

    3125.expr_data.rar

    Thanks,

    lpeng

  • lpeng,

    I still do not understand what you are doing or asking about. I see that there are different numbers in the Nocomm_nn.txt files and that you have two timer test project folders. But I do not know what the differences are between the executions that generated the .txt files nor what your concerns are on the differences between the .txt files. More use-case detail and explanation, please.

    lpeng said:
    the main difference is the first count value

    Some do not have a "first count value" and some do. Is this the difference you mention?

    Or is the size of the delta between the counts in the first loop your concern?

    Or is your concern about the magnitude of the count value for the first time one is printed, or the first value in the first loop?

    You can see that I am very unsure how to proceed. My apologies if I have missed a simple part of the description that should have explained this to me.

  • RandyP said:

    I see that there are different numbers in the Nocomm_nn.txt files and that you have two timer test project folders. But I do not know what the differences are between the executions that generated the .txt files

    There is no difference between the two executions, they are just two reload and animate. And my question is that why two executions generated different result.

    Nocomm_AB.txt  A stands for which execution(1st or 2rd), B stands for which project generate this file(test1 or test2).

     

    RandyP said:

    Or is your concern about the magnitude of the count value for the first time one is printed, or the first value in the first loop?

    Yes, my concern is the first value in the first loop, which is the value right after "1 loop begin". I do not understand why the first value of a project2 execution is (0, 525) while the first value of  another execution is (0, 17295).  

    The size of the delta between the counts in the loops are nearly the samefor both project, right (3600 for test1, 3200 for test2) ? 

    Thank you very much. : )

     

    Best wishes,

    lpeng

     

     

     

  • If each execution begins with a reset, reload, and run, you should get the same answer each time. If you do not do a reset each time, then you are not starting from the same point. There can be cache effects that are applied differently and some initializations may behave differently when doing a simple reload and run.

    Why are you using animate? Are you actually going through breakpoints? Is the timer configured to keep running when an emulation halt occurs?

    Since the timer is giving you consistent values for what you are actually trying to measure, you may want to consider if it matters what the initial value is. Do you reset the timer so it always starts counting from 0?

  • RandyP said:

    If each execution begins with a reset, reload, and run, you should get the same answer each time. If you do not do a reset each time, then you are not starting from the same point. There can be cache effects that are applied differently and some initializations may behave differently when doing a simple reload and run.

    No, I didn't manually reset every execution. But the gel file will reset when I reload the program, and turned off all cache segment. Can cache still effect the execution time?

    RandyP said:

    Why are you using animate? Are you actually going through breakpoints? Is the timer configured to keep running when an emulation halt occurs?

    No, I didn't set any breakpoints. And the timer is opened with mode "CSL_TMR_TIMMODE_GPT" and started with "CSL_TMR_ENAMODE_ENABLE".

    RandyP said:

    Since the timer is giving you consistent values for what you are actually trying to measure, you may want to consider if it matters what the initial value is. Do you reset the timer so it always starts counting from 0?

    Yes, I close the timer at the end of execution, and open it at the beginning. Since I only setup one timer, Is it possible that there is competition when both core reads the counter at the beginning of the loop?

    Thanks,

    lpeng

  • RandyP said:

    If each execution begins with a reset, reload, and run, you should get the same answer each time. If you do not do a reset each time, then you are not starting from the same point. There can be cache effects that are applied differently and some initializations may behave differently when doing a simple reload and run.

    lpeng said:

    No, I didn't manually reset every execution. But the gel file will reset when I reload the program, and turned off all cache segment. Can cache still effect the execution time?

    The only way to be absolutely certain that the problem is not residuals from one execution to another is to force a very clean start each time so each execution begins from the same state. If you can get consistent results that way, then you can start trying to figure out what is different between successive executions.

    The 100% clean start is to close CCS, power off the board and unplug JTAG, replug JTAG and power on the board, then restart CCS.

    The 99% clean start is to Disconnect the target in CCS, power off the board and unplug JTAG, replug JTAG and power on the board, then Connect the target in CCS.

    The 95% clean start is to Disconnect in CCS and do a hard reset on the target, not a warm start but a POR or RESETz, then Connect.

    The 80% clean start is to use the CCS Reset CPU manually. But even this, and anything less, counts on the automated steps in the GEL file to clean everything up.

    Please try these steps and the determine at which point the second execution will have different results. Then you will have a better idea what to look at to find the cause of the differences.