This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

[C6678] execution time differs among cores

I have a question in C6678 , help me please ...

8 cores (core 0 ~ core 7) perform the same algorithm, and also the same input data.

The problem is that the execution time differs among core 0 ~ core 7.

Core 0 = Core 1 =  Core 2 = Core 3 = Core 4 ==> about 3.8 second

Core 5 = Core 6 = Core 7 ==> about 10.8 second

Why core[0:4] and core[5:7] execution time is inconsistency?

(I using CCS 5.4, MCSDK 2.1.2.6, SYS/BIOS 6.35.1.29, and compiler 7.4.6.)

  • Hi Che-Cheng Hu,

    Are you seeing this same behavior (time difference between cores ) for every loading through CCS ?
    Which example code are you running ?
    Also, just want to confirm, are you using the same .out for all 8 cores ?

    Regards,
    Shankari

  • 1. Yes, I can see this same behavior every loading through CCS.

    2. The code is a median filter algorithm.

    3. I using  Master (core 0)/Slave (core 0~7) Model, so I have master.out and slave.out.

    And using IPC (Message Q) to inform each core, as follows:

    File:ImageProcessingSWFramework.png

  • Hi Che-Cheng Hu,

    Che-Cheng Hu said:
    I have a question in C6678 , help me please ...

    8 cores (core 0 ~ core 7) perform the same algorithm, and also the same input data.

    The problem is that the execution time differs among core 0 ~ core 7.

    Core 0 = Core 1 =  Core 2 = Core 3 = Core 4 ==> about 3.8 second

    Core 5 = Core 6 = Core 7 ==> about 10.8 second

    Why core[0:4] and core[5:7] execution time is inconsistency?

    I have created a sample "hello world" for C6678 and loaded on each core of C6678. The clock time taken for executing the code on each core remains same. Atleast from this, in general irrespective of the code we use, we can conclude that the 8 cores perform in the same time.

    Please refer to the screenshot below in which the hello world program gets executed on the 5th core of C6678.

    Regards,

    Shankari

  • Hi Shankari,

    I have found the cause of the problem.

    After testing, there was something wrong with the dynamic memory (Memory_alloc).

    Test program as follows:

    for (i = 0; i < (375x12288); ++i)

            {para[i] += 1;}

    When the para is configured 3MB (Case1), the execution time is normal.

    When the para is configured 48MB (Case2), Core5 ~ Core7 execution time there is a problem.

           *para = (U16*)Memory_alloc(0, size1, 16, NULL);  //3MB (Case1)

                                         *para = (U16*)Memory_alloc(0, size2, 16, NULL);  //48MB (Case2)

    Core-0

    Core-1

    Core-2

    Core-3

    Core-4

    Core-5

    Core-6

    Core-7

    Case1 (ms)

    25.21

    25.16

    25.21

    25.15

    25.22

    25.24

    25.23

    25.24

    Case2 (ms)

    27.18

    27.13

    27.11

    27.11

    27.2

    736.61

    969.39

    972.07

    If static memory (Case3) is to substitute Case2, Core5 ~ Core7 execution time is normal.

           *para = (U16*)0x95400000;

    Core-0

    Core-1

    Core-2

    Core-3

    Core-4

    Core-5

    Core-6

    Core-7

    Case3 (ms)

    25.22

    25.13

    25.21

    25.23

    25.13

    25.21

    25.24

    25.23

    When the algorithm using case3, the execution time is normal.

    Core-0

    Core-1

    Core-2

    Core-3

    Core-4

    Core-5

    Core-6

    Core-7

    Median filter (s)

    3.27

    3.29

    3.29

    3.29

    3.29

    3.29

    3.28

    3.28

    So, why dynamic memory configuration will cause this problem, such as Case2?

  • Hi Che-Cheng Hu,

    Thanks for the update.

    Good to hear that you found the cause for the problem.

    Che-Cheng hu said:
    So, why dynamic memory configuration will cause this problem, such as Case2?

    I guess, for this we may have to look at the *.cfg file ( open in text editor ) and see how much heap memory is defined to the physical memory like L2SRAM ( for example : L2SRAM )  and how that physical memory usage is differentiated / shared between the cores, 5/6/7 Vs cores 0/1/2/3.

    Regards,

    Shankari

    -------------------------------------------------------------------------------------------------------

    Please click the Verify Answer button on this post if it answers your question.
    --------------------------------------------------------------------------------------------------------

  • I think, this is really reason, as follows:

    The address of dynamic memory is 0xBDF04100, Each core is assigned to 6MB(0x600000)

    core 0: 0xBDF04100  ==>  start address

    core 1: 0xBE504100

    core 2: 0xBEB04100

    core 3: 0xBF104100

    core 4: 0xBF704100

    core 5: 0xBFD04100

    core 6: 0xC0304100

    core 7: 0xC0904100

    But, the CACHEABLE (in ROV) of initial setting for cache region on DDR3, number 128 ~ 191 is true and number 192 ~ 255 is false.

    However, cores 5/6/7 memory location is number 192. So, cores 5/6/7 can not cache.

    When the cacheable of number 192 is set to ture, cores 5/6/7 execution time are normal.

  • Hi Che-Cheng Hu,

    Thanks for the update.

    Glad to hear.

    Regards,

    Shankari