This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

PDK is slower than MCSDK



We have bring up PDK 06_03_00_106 kernel 4.10 in K2HK_EVM, I am writing 4 byte data and read back from random locations of  DDR3A  Memory output we are getting getting 49872 cycles per minute. But If run the same application on MCSDK built kernel and U-boot I am getting 1672845 cycles per minute  which is 33 times faster that PDK so I have checked the PLL initialization of DDRA , ARM and MAIN PLL the multiplier,pre-divider and post divider and all the register settings regarding PLL and also o/p frequencies and  parent frequencies are also same in both MCSDK and PDK .  Please check the below debug prints of kernel in both PDK and MCSDK.

Please provide some assistance with this.

MCSDK:-

[    0.000000] LWS PLL has_control
[    0.000000] LWS pllm 0 pllm_lower_mask 3f pllm_upper_shifti 6 plld_mask 3f pllm_upper_mask 7f000LWS clk_register_keystone_pll
[    0.000000] LWS clock NAME mainpllclk
[    0.000000] LWS mult == 30 val == 3800901f
[    0.000000] LWS PLL has no control
[    0.000000] LWS PLLMAIN fixed_postdiv not found
[    0.000000] LWS pllm 0 pllm_lower_mask 0 pllm_upper_shifti 6 plld_mask 3f pllm_upper_mask 7ffc0LWS clk_register_keystone_pll
[    0.000000] LWS clock NAME armpllclk
[    0.000000] LWS mult == 0 val == 17000bc4
[    0.000000] LWS PLL has no control
[    0.000000] LWS PLLMAIN fixed_postdiv not found
[    0.000000] LWS pllm 0 pllm_lower_mask 0 pllm_upper_shifti 6 plld_mask 3f pllm_upper_mask 7ffc0LWS clk_register_keystone_pll
[    0.000000] LWS clock NAME ddr3a_clk
[    0.000000] LWS mult == 0 val == 92804c0
[    0.000000] LWS PLL has no control
[    0.000000] LWS PLLMAIN fixed_postdiv not found
[    0.000000] LWS pllm 0 pllm_lower_mask 0 pllm_upper_shifti 6 plld_mask 3f pllm_upper_mask 7ffc0LWS clk_register_keystone_pll
[    0.000000] LWS clock NAME ddr3b_clk
[    0.000000] LWS mult == 0 val == 98804c0
[    0.000000] LWS PLL has no control
[    0.000000] LWS PLLMAIN fixed_postdiv not found
[    0.000000] LWS pllm 0 pllm_lower_mask 0 pllm_upper_shifti 6 plld_mask 3f pllm_upper_mask 7ffc0LWS clk_register_keystone_pll
[    0.000000] LWS clock NAME papllclk

[    0.000000] LWS clock NAME refclk-main
[    0.000000] LWS mult == 30 val == 3800901f
[    0.000000] Main PLL clk (1200000000 Hz), parent (122880000 Hz),postdiv = 2, mult = 624, prediv = 31
[    0.000000] LWS clock NAME refclk-arm
[    0.000000] LWS mult == 0 val == 17000bc4
[    0.000000] Generic PLL clk (1200000000 Hz), parent (125000000 Hz),postdiv = 1, mult = 47, prediv = 4
[    0.000000] LWS clock NAME refclk-pass
[    0.000000] LWS mult == 0 val == 70803c0
[    0.000000] Generic PLL clk (983040000 Hz), parent (122880000 Hz),postdiv = 2, mult = 15, prediv = 0
[    0.000000] LWS clock NAME refclk-ddr3a
[    0.000000] LWS mult == 0 val == 71803c0
[    0.000000] Generic PLL clk (400000000 Hz), parent (100000000 Hz),postdiv = 4, mult = 15, prediv = 0
[    0.000000] LWS clock NAME refclk-ddr3b
[    0.000000] LWS mult == 0 val == 98804c0
[    0.000000] Generic PLL clk (1000000000 Hz), parent (100000000 Hz),postdiv = 2, mult = 19, prediv = 0
[    0.000000] Architected local timer running at 200.00MHz (phys).

PDK:-

[    0.000000] LWS pllm 0 pllm_lower_mask 3f pllm_upper_shifti 6 plld_mask 3f pllm_upper_mask 7ffc0
[    0.000000] LWS clock name == armpllclk
[    0.000000] LWS read val 1b000dc4
[    0.000000] LWS Generic PLL clk (1400000000 Hz), parent (125000000 Hz) postdiv = 1, mult = 55, prediv = 4
[    0.000000] LWS pllm 0 pllm_lower_mask 3f pllm_upper_shifti 6 plld_mask 3f pllm_upper_mask 7f000
[    0.000000] LWS clock name == mainpllclk
[    0.000000] LWS read val 38009c1f
[    0.000000] LWS Main PLL clk (1200000000 Hz), parent (122880000 Hz) postdiv = 2, mult = 624, prediv = 31
[    0.000000] LWS pllm 0 pllm_lower_mask 3f pllm_upper_shifti 6 plld_mask 3f pllm_upper_mask 7ffc0
[    0.000000] LWS clock name == papllclk
[    0.000000] LWS read val 70803c0
[    0.000000] LWS Generic PLL clk (983040000 Hz), parent (122880000 Hz) postdiv = 2, mult = 15, prediv = 0
[    0.000000] LWS pllm 0 pllm_lower_mask 3f pllm_upper_shifti 6 plld_mask 3f pllm_upper_mask 7ffc0
[    0.000000] LWS clock name == ddr3apllclk
[    0.000000] LWS read val 71803c0
[    0.000000] LWS Generic PLL clk (400000000 Hz), parent (100000000 Hz) postdiv = 4, mult = 15, prediv = 0
[    0.000000] LWS pllm 0 pllm_lower_mask 3f pllm_upper_shifti 6 plld_mask 3f pllm_upper_mask 7ffc0
[    0.000000] LWS clock name == ddr3bpllclk
[    0.000000] LWS read val 98804c0
[    0.000000] LWS Generic PLL clk (1000000000 Hz), parent (100000000 Hz) postdiv = 2, mult = 19, prediv = 0

Thanks & regards

Phaneesh A Kashyap

  • Phaneesh,

    Let me look closer and get back.

    Meanwhile...Please check the Linux version and the u-boot version used in PDK and MCSDK.

    For PDK - Kernel - 4.10. 

    For MCSDK - What is the kernel version ? as well as the u-boot version ?

    ---

    And also, in general, in my usage experience too, MCSDK is more robust than the PDK. 

    --

    Regards

    Shankari G

  • Phaneesh,

    In your log, posted above, PLL clock shows 1.2 GHz and the multiplier value is 47 in MCSDK

    Where as in PDK, the PLL clock is 1.4 GHz and the multiplier is 55.....

    MCSDK : -

    [    0.000000] Generic PLL clk (1200000000 Hz), parent (125000000 Hz),postdiv = 1, mult = 47, prediv = 4
    [    0.000000] LWS clock NAME refclk-pass

     

    PDK 

    [    0.000000] LWS Generic PLL clk (1400000000 Hz), parent (125000000 Hz) postdiv = 1, mult = 55, prediv = 4

    Regards

    Shankari G

  • Hi Shankari,

                         MCSDK Kernel version :- 3.10    u-boot :- U-Boot 2013.01

                         PDK kernel version :- 4.10              u-boot :- U-Boot Version 1.0.0-2020

    Thanks & regards

    Phaneesh A Kashyap

  • For this ARM PLL which is showing 1.4 Ghz in PDK I have made it 1.2 Ghz configuration in U-boot and the print has been updated in kernel but still there  is no change in performance.

    Apart from PLL initialization where else can we check the DDR latency.

    Thanks & regards

    Phaneesh A

  • I am writing 4 byte data and read back from random locations of  DDR3A  Memory output we are getting getting 49872 cycles per minute. But If run the same application on MCSDK built kernel and U-boot I am getting 1672845 cycles per minute 

    "Lesser the CPU cycles,  lesser the time"

    PDK - 49872 cycles -- 

    Time = 1 / Freq ( General formula )

              = 1 / 1000 MHz ( DSP core frequency )

              = 0.001 us ( Micro seconds) 

     

    1000000000 cycles = 1 sec

    ------> 49872 cycles =  [ ( 1 / 1000000000 ) * 49872 ] =  0.000049872 seconds = 49.872 Micro seconds.

    MCSDK - 1672845 cycles --

    Time = 1 / Freq ( General formula )

              = 1 / 1000 MHz ( DSP core frequency )

              = 0.001 us ( Micro seconds) 

     

    1000000000 cycles = 1 sec

    ------> 1672845 cycles =  ( 1 / 1000000000 ) * 1672.845 =   0.001672845 seconds = 1672.845 Micro seconds.

    --

    0.000049872 seconds < 0.001672845 seconds.

    -----

    As per your data on cycles, " PDK is faster than MCSDK ". It takes just 49.872 Micro seconds compared to 1672.845 Micro seconds of MCSDK.

    I hope this calculation helps !

    Regards

    Shankari G

  • No,  I don't think so less no of cycles means it's taking more time . I am considering  one successful write and read back of byte data to a random memory location or (malloc location) as one cycle.

    This is per minute test which is 60000000 us

    PDK:-

    60000000 / 49872 = 1203.079884504 us / cycle.

    MCSDK :-

    60000000 / 1672845 = 35.867040879 us / cycle.

    That is how I am saying that PDK's DDR performance is slower than compared to MCSDK.

  • Phaneesh,

    Some data or the term you use the word  "cycle" is not understandable.

    I refer to the "Core clock cycles"

    ---

    When I say, a core runs at 1 GHz, Which means, it could run at the speed of 1000 MHz per second.

    i.e., For one second, the core is capable of running 1000000000 clock cycles.

    ---> 1 sec = 1000000000 - clock cycles  

    That is, assume, we have a simple CPU instruction  which takes 1 clock cycle to get executed by the core, which means the core is capable of executing1000000000 instructions per second.

    --

    Other than the core frequency, Whch I mentioned earlier  in my previous post,  ( 1.2 GHz - MCSDK and 1.4 GHz - PDK )

    you can check whether any difference in the DDR3 driver portion of the code in both Linux and U-boot.  Particularly initialization and configurations of ddr3 parameters and registers...

    --

    Regards

    Shankari G

  • Hi  Shankari,

                I am not talking about DSP clock cycle, I will put my test code snippet here.

             Code:-

                while(brk){
                    address=(long int*)malloc(sizeof(long int));
                    *address=0xAAAAAAAA;
                    memcpy (readval,address, sizeof(long int));
                    printBuffer((unsigned char *)readval,4);
                    count++;
                    end=time(NULL)-start;
                    if(end>60){
                            brk=0;
                    }
                 }

            This is simple DDR3 test code from the ARM side DSP is not involved here ,

            In the above program dynamically allocation some memory for a pointer in DDR3 and writing a value and then by using memcpy() I am reading it back to  another pointer so if the read back is completed means one cycle is done that's why in above post I have mentioned DDR3 read and write cycle.

    These number of cycles are more in MCSDK which means it is faster compared to PDK,   So earlier I thought that it might be the clock issue but when I compared the clock frequencies from both the SDK it was same, So now I am not sure why there is latency in DDR3 read and write operation.

    Thanks & regards

    Phaneesh A  Kashyap

  • Phaneesh,

    Whatever I said is applicable for ARM core too. Even ARM can run at 1 GHz. 

    And all the clock cycle explanation which I said  is common for any core, for that matter!

    -- 

    you can check whether any difference in the DDR3 driver portion of the code in both Linux and U-boot.  Particularly initialization and configurations of ddr3 parameters and registers...

    Compare the ddr3 configurations and parameter settings between MCSDK and PDK.

    --

    Regards

    Shankari G