Hi Team,
We are working on Worst case execution time calcuation for C6678.
Configuration used is:
CPU frequency | 1000Mhz | |
Cache configuration | ||
L1P | 32KB | |
L1D | 32KB | |
L2D | 0KB | |
Memory Configuraiton | ||
Code | L2SRAM | |
Data | L2SRAM | |
Stack | L2SRAM |
The code used is attached for reference and we running it using CCS5.5.
We didn't changed the other configuration things like bandwidth management etc. So all are in default configuration(including compiler settings in CCS5.5).
We grouped all 8 cores in a single group and used internal core timers for calculating the time.
We put a break point at "TSCL= 0" statement, so, internal timers will be started.
Now, we continue to run the single group and calculated the time.
The excel sheet is attached for reference.
Note: We ensured that all cores tries to access same bank in MSMCSRAM region.
Now, please answer the following questions:
1. As per the document, we could see that all cores have a separate port access towards MSMC SRAM region. So, under what point does serialization will happen?
2. If we look into the column E7,K7,Q7,W7,AC7,AI7,AO7,AU7, which are nothing but, the difference between current core time value(time taken between MSMC SRAM region access) and minimum time taken for MSMC SRAM region access among all cores.
Now, if we took first set of data (i.e., row number 7, core 0 complets the operation with 64 clock signals where as core 1 takes 91 clocks.
So, if we look into all the values, it appears like a difference of 3 to 4 clock signals is present between current core and its previous completion core.
(Ex: core 0 and core 7, we have only 3 clock signals).
Now, why there is only 3 clock signals? Can you explain us a bit more on this why it is taking this much time?
3. As we are running the group core using CCS5.5 with "run" button, will this ensure that all cores are started parallely or will this operation is serialized?