This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

DDR3A memory read and write cycles from DSP is slower in PDK than MCSDK

Other Parts Discussed in Thread: SYSBIOS

Hi,

               We are assessing the DDR performance on K2HK with both PDK and MCSDK version of SDKs , we are seeing some timing issues while executing programs on DSP with PDK version kernel loaded. Here are the observations for your reference. We are seeing PDK execution is almost 3 times slower than MCSDK, all the DDR config registers are same on both the platform. Any pointers  will be of great help.

DDR read and write cycles in MCSDK and PDK.

Kernel Memory Size Read & Write Cycles Variables
MCSDK 3.0 DDR3A 1024*12 503349 ab000000  ramBuffer_Wr                                                        
ab00c000  ramBuffer_Rd
PDK 06_03_00_106 DDR3A 1024*12 1696787 ab000000  ramBuffer_Wr                                                        
ab00c000  ramBuffer_Rd

Thanks

Phaneesh A Kashyap

  • Phaneesh,

    Yes, it is true that the MCSDK is more robust and fast than the processor SDK 6.3.

    From the software design point of view, this MCSDK is specific to the keystone II devices such as K2E, K2H etc..

    But the processor SDK 6.3 is a common software design made to cover all the Keystone I ( C6655/57, C6678 ) and II devices  (K2E, K2H ) and much more other series of devices too.  Similarly the documentations of Processor SDKs are also maintained in a generic way to support/cover more devices.

    The reason might be... ease of development, migration, maintenance etc...   if a single base SDK is common for all the processor platforms.

    -----

    Customer can still choose MCSDK 3.0 for their development. But the support from E2E will be limited compared to Processor SDK 6.3.

    Regards

    Shankari G

  • Hi Shankari,

                Actually we want to access Security Accelerator so we are migrating  to PDK 06-03 otherwise MCSDK is fine as you said.

               Also QMSS can be simultaneously accessed by  both DSP and ARM in the PDK, So we are not sure how to improve the DDR performance, Your inputs regarding this will be helpful.

    Regards,

    Phaneesh A

  • Phaneesh,

    For security accelerator, MCSDK has all the software tests running successfully with through put details.

    But the latest processor SDK 6.3 does not have the working software sample programs for Security accelerators. It has some tests failing with framework errors etc.... ( I personally experienced this by running on TI EVMs. )

    Also QMSS can be simultaneously accessed by  both DSP and ARM in the PDK, So we are not sure how to improve the DDR performance, Your inputs regarding this will be helpful.

    Regards,

    For this, we may have to compare the software architecture of both MCSDK and PDK and improvise. Not a easy task though..... because both are the complete packages.

    But however, having said that, one can focus on the portion of the software flow of DDR initialization, configuration, and application-programs with dependency software components like CSL version, sysbios version, other library products-version  between the two SDKs and then try to improvise...

    Regards

    Shankari G

  • Hi Shankari,

                       We are not convinced with your reply, can we just know the reason why memory access is slower in PDK compared to MCSDK and also earlier when we asked for support from TI regarding the MCSDK they told us to move to PDK for proper support from TI , Now again we can't move back to MCSDK.

    Regards,

    Phaneesh A

  • Phaneesh,

    I am projecting the performance, features and availability of both the softwares - MCSDK Vs Processor SDK , "AS-IS"....

    It is the fact that the software development, maintenance and the support are frozen and limited for the older software packages like MCSDK.

    Processor SDK 6.3 is the latest for all the processor devices. This SDK is currently supported by the Forum.

    --

    I am giving some additional info here.....

    Please note the security accelerator with all its lists of tests-- around 7 tests....were working perfectly in MCSDK 3.1 version ---

    "SA_UnitTest_K2EBiosTestProject - MCSDK 3.1 on K2E EVM ----( For one of the customers from Thales, I was supporting them to improvise the throughput performance on keystone -II device--- i.e, on K2E board. ) It is the same base code for K2H too...

    can we just know the reason why memory access is slower in PDK compared to MCSDK

    Please check the PLL settings, core frequency and DDR3 input clock are same while testing through MCSDK and processor SDK.

    Please give details like which sample programs you use to test the DDR3 on both the MCSDK and Processor SDK 6.3.

    Let me experiment from my end and check why it is slower.....

    Regards

    Shankari G

  • Hi,

           I have attached the test code below that we are running on DSP core to check the DDR read and write cycles in both PDK and MCSDK . 

    Please let us know after you test this code

    /*
     *  ======== main.c ========
     */
    
    #include <xdc/std.h>
    
    #include <xdc/runtime/System.h>
    
    #include <ti/sysbios/BIOS.h>
    
    #include <ti/sysbios/knl/Task.h>
    
    #include <ti/sysbios/family/c64p/Hwi.h>
    #include <ti/sysbios/family/c64p/EventCombiner.h>
    
    #include <xdc/runtime/Error.h>
    #include <ti/ipc/Ipc.h>
    #include <c6x.h>
    
    #pragma DATA_SECTION(L2SramBuffer_Wr,".const");
    #pragma DATA_ALIGN(L2SramBuffer_Wr, 64)
    unsigned int  L2SramBuffer_Wr[1024*12];
    
    
    #pragma DATA_SECTION(L2SramBuffer_Rd,".const");
    #pragma DATA_ALIGN(L2SramBuffer_Rd, 64)
    unsigned int  L2SramBuffer_Rd[1024*12];
    
    
    #define UIO_ENABLE
    
    #ifdef UIO_ENABLE
    #define IPCGRARM_REG 0x02620264
    #else
    #define IPCGRARM_REG 0x02620260
    #endif
    
    
    
    #define L1IF_RSP_PING_INTERRUPT_NUM         5
    #define L1IF_RSP_PONG_INTERRUPT_NUM         8
    
    
    int ui32NuminterruptsSent = 0, ms =0;
    unsigned int t1 = 0;
    unsigned int  cur_time = 0,prev_time = 0;
    unsigned int max_int_time = 0;
    unsigned int min_int_time = 1000000;
    unsigned int timeDiff = 0;
    
    unsigned int start_time  = 0;
    unsigned int end_time = 0;
    
    
    
    
    
    
    /*
     *  ======== taskFxn ========
     */
    Void taskFxn(UArg a0, UArg a1)
    {
        System_printf("enter taskFxn()\n");
        Uint16 ui16test = 0;
    	
        Task_sleep(10);
    	
        System_printf("exit taskFxn()\n");
        while(1)
        {
            ui16test++;
        }
    }
    
    void Interrupt_isr(void)
    {
    
        ResponseInterrupt_ARM(L1IF_RSP_PONG_INTERRUPT_NUM);
        ui32NuminterruptsSent++;
    
    }
    
    
    
    void ResponseInterrupt_ARM(int id)
    {
        volatile unsigned int* arm_pntr;
        arm_pntr = (volatile unsigned int*)IPCGRARM_REG;
        *arm_pntr = (0x01<<id)|(0x01);
    }
    
    
    #if 0
    Int main()
    {
       System_printf("hello world\n");
    
       /*
        *  normal BIOS programs, would call BIOS_start() to enable interrupts
        *  and start the scheduler and kick BIOS into gear.  But, this program
        *  is a simple sanity test and calls BIOS_exit() instead.
        */
       BIOS_exit(0);  /* terminates program and dumps SysMin output */
       return(0);
    }
    #endif
    
    
    
    
    /*
     *  ======== main ========
     */
    #if 1
    Int main()
    { 
        Int32 status;
        Hwi_Params hwiAttrs;
        int ii, jj = 0;
        int kk=0;
        /*
         * use ROV->SysMin to view the characters in the circular buffer
         */
        System_printf("enter main()\n");
    
    
        while ( 1 ) {
    
            start_time = TSCL;
    
            for (ii = 0; ii< (1024*3) ; ii++) {
                L2SramBuffer_Wr[ii] = (0x00000000+ jj+ ii);
    
            }
    
            jj++;
    
            for (ii = 0; ii< (1024*3) ; ii++) {
                L2SramBuffer_Rd[ii] = L2SramBuffer_Wr[ii];
    
            }
            end_time = TSCL;
    
            if( (end_time - start_time ) >= max_int_time  ) {
                max_int_time = (end_time - start_time );
            }
    
            kk++;
            if ((kk%10000) == 0) {
                kk = 0;
            System_printf("Cycles for the DDR Copy  %d\n",max_int_time );
            }
    
    
    
    
    
    
    
    
    
    
        }
    #if 0
        status = Ipc_start();
        if (status < 0)
        {
            System_abort("Ipc_start failed\n");
        }
    
        while (1) {
            ResponseInterrupt_ARM(L1IF_RSP_PING_INTERRUPT_NUM);
            cur_time = TSCL;
            System_printf ("Interrupt Frequency = %d\n", (cur_time - prev_time) );
            prev_time = cur_time;
    
    
    
            ms = 1;
            t1 = TSCL;
             while(1)
             {
    
    
    
                    timeDiff = (TSCL - t1) ;
                    if( timeDiff  >= (ms*1200000) )
                    {
                        if (max_int_time <  timeDiff ) {
    
                            max_int_time = timeDiff ;
    
                        }
                        if (min_int_time >  timeDiff ) {
    
                            min_int_time =  timeDiff;
                        }
                        break;
                    }
              }
    
        }
    #endif
    #if 0
        EventCombiner_dispatchPlug (105, Interrupt_isr, NULL, TRUE);
        EventCombiner_enableEvent(105);
    
        /* Map the event id to hardware interrupt. */
        Hwi_eventMap(5, 105);
    
        /* Enable interrupt. */
        Hwi_enableInterrupt(5);
    #endif
    #if 0
    
        Hwi_Handle hwi0;
        Hwi_Params hwiParams;
        Error_Block eb;
        Error_init(&eb);
        Hwi_Params_init(&hwiParams);
        hwiParams.arg = 1;
        hwiParams.eventId = 105;
        hwiParams.maskSetting = Hwi_MaskingOption_SELF;
        hwi0 = Hwi_create(5, Interrupt_isr, &hwiParams, &eb);
        if (hwi0 == NULL)
        {
            System_abort("Hwi create failed");
        }
    #endif
    #if 0
        BIOS_start();    /* does not return */
        return(0);
    #endif
    
    }
    #endif
    
    

    Thanks,

    Phaneesh A

  • Please upload the complete CCS projects on MCSDK and Processor-sdk.

    or

    Point me the name of the example you use on  MCSK and  processor-SDK

    Regards

    Shankari G

  • Shankari,

                     This attached file test.rar is the complete project for DDR test we have not taken this example program by TI  we created this for testing you can try this project, And the PLL frequencies and PPL register setting are same in both PDK and MCSDK the text file attached is a comparison between MCSK and PDK PLL configuration and frequency if any config is missing please let me know.

    .

    __________________________________________________________________________________________________________________________________________________________________________________________________
    PLL COMPARISION
    ____________________________________________________________________________________________________________________________________________________________________________________________________
    MCSDK kernel prints :- MAIN_PLL init
    
    [    0.000000] LWS PLL has_control
    [    0.000000] LWS pllm 0 pllm_lower_mask 3f pllm_upper_shifti 6 plld_mask 3f pllm_upper_mask 7f000LWS clk_register_keystone_pll
    [    0.000000] LWS clock NAME mainpllclk
    [    0.000000] LWS mult == 30 val == 3800901f
    [    0.000000] LWS PLL has no control
    [    0.000000] LWS PLLMAIN fixed_postdiv not found 
    [    0.000000] LWS pllm 0 pllm_lower_mask 0 pllm_upper_shifti 6 plld_mask 3f pllm_upper_mask 7ffc0LWS clk_register_keystone_pll
    [    0.000000] LWS clock NAME armpllclk
    [    0.000000] LWS mult == 0 val == 17000bc4
    [    0.000000] LWS PLL has no control
    [    0.000000] LWS PLLMAIN fixed_postdiv not found 
    [    0.000000] LWS pllm 0 pllm_lower_mask 0 pllm_upper_shifti 6 plld_mask 3f pllm_upper_mask 7ffc0LWS clk_register_keystone_pll
    [    0.000000] LWS clock NAME ddr3a_clk
    [    0.000000] LWS mult == 0 val == 92804c0
    [    0.000000] LWS PLL has no control
    [    0.000000] LWS PLLMAIN fixed_postdiv not found 
    [    0.000000] LWS pllm 0 pllm_lower_mask 0 pllm_upper_shifti 6 plld_mask 3f pllm_upper_mask 7ffc0LWS clk_register_keystone_pll
    [    0.000000] LWS clock NAME ddr3b_clk
    [    0.000000] LWS mult == 0 val == 98804c0
    [    0.000000] LWS PLL has no control
    [    0.000000] LWS PLLMAIN fixed_postdiv not found 
    [    0.000000] LWS pllm 0 pllm_lower_mask 0 pllm_upper_shifti 6 plld_mask 3f pllm_upper_mask 7ffc0LWS clk_register_keystone_pll
    [    0.000000] LWS clock NAME papllclk
    
    [    0.000000] LWS clock NAME refclk-main
    [    0.000000] LWS mult == 30 val == 3800901f
    [    0.000000] Main PLL clk (1200000000 Hz), parent (122880000 Hz),postdiv = 2, mult = 624, prediv = 31
    [    0.000000] LWS clock NAME refclk-arm
    [    0.000000] LWS mult == 0 val == 17000bc4
    [    0.000000] Generic PLL clk (1200000000 Hz), parent (125000000 Hz),postdiv = 1, mult = 47, prediv = 4
    [    0.000000] LWS clock NAME refclk-pass
    [    0.000000] LWS mult == 0 val == 70803c0
    [    0.000000] Generic PLL clk (983040000 Hz), parent (122880000 Hz),postdiv = 2, mult = 15, prediv = 0
    [    0.000000] LWS clock NAME refclk-ddr3a
    [    0.000000] LWS mult == 0 val == 71803c0
    [    0.000000] Generic PLL clk (400000000 Hz), parent (100000000 Hz),postdiv = 4, mult = 15, prediv = 0
    [    0.000000] LWS clock NAME refclk-ddr3b
    [    0.000000] LWS mult == 0 val == 98804c0
    [    0.000000] Generic PLL clk (1000000000 Hz), parent (100000000 Hz),postdiv = 2, mult = 19, prediv = 0
    [    0.000000] Architected local timer running at 200.00MHz (phys).
    
    
    REF CLOCK freq :        pllctl0_val:
    
    MAIN_PLL :-122880000    => 3800901f
    ARM_PLL :- 125000000    => 17000bc4
    DDR3A_PLL :- 100000000  => 71803c0
    DDR3B_PLL :- 100000000  => 98804c0
    PASS_PLL :- 122880000   => 70803c0
    ____________________________________________________________________________________________________________________________________________________________________________________________________
    
    REF CLOCK freq :
    
    MAIN_PLL :-122880000    => 38009c1f                             
    ARM_PLL :- 125000000    => 1b000dc4
    DDR3A_PLL :- 100000000  => 71803c0
    DDR3B_PLL :- 100000000  => 98804c0
    PASS_PLL :- 122880000   => 70803c0
    
    
    PDK kernel Prints :- MAIN_PLL
    
    [    0.000000] LWS pllm 0 pllm_lower_mask 3f pllm_upper_shifti 6 plld_mask 3f pllm_upper_mask 7ffc0
    [    0.000000] LWS clock name == armpllclk
    [    0.000000] LWS read val 1b000dc4
    [    0.000000] LWS Generic PLL clk (1400000000 Hz), parent (125000000 Hz) postdiv = 1, mult = 55, prediv = 4
    [    0.000000] LWS pllm 0 pllm_lower_mask 3f pllm_upper_shifti 6 plld_mask 3f pllm_upper_mask 7f000
    [    0.000000] LWS clock name == mainpllclk
    [    0.000000] LWS read val 38009c1f
    [    0.000000] LWS Main PLL clk (1200000000 Hz), parent (122880000 Hz) postdiv = 2, mult = 624, prediv = 31
    [    0.000000] LWS pllm 0 pllm_lower_mask 3f pllm_upper_shifti 6 plld_mask 3f pllm_upper_mask 7ffc0
    [    0.000000] LWS clock name == papllclk
    [    0.000000] LWS read val 70803c0
    [    0.000000] LWS Generic PLL clk (983040000 Hz), parent (122880000 Hz) postdiv = 2, mult = 15, prediv = 0
    [    0.000000] LWS pllm 0 pllm_lower_mask 3f pllm_upper_shifti 6 plld_mask 3f pllm_upper_mask 7ffc0
    [    0.000000] LWS clock name == ddr3apllclk
    [    0.000000] LWS read val 71803c0
    [    0.000000] LWS Generic PLL clk (400000000 Hz), parent (100000000 Hz) postdiv = 4, mult = 15, prediv = 0
    [    0.000000] LWS pllm 0 pllm_lower_mask 3f pllm_upper_shifti 6 plld_mask 3f pllm_upper_mask 7ffc0
    [    0.000000] LWS clock name == ddr3bpllclk
    [    0.000000] LWS read val 98804c0
    [    0.000000] LWS Generic PLL clk (1000000000 Hz), parent (100000000 Hz) postdiv = 2, mult = 19, prediv = 0
    
    
    
    _____________________________________________________________________________________________________________________________________________________________________________________________________
    
    
    
    

                 4174.test1.rar

    I'll check the link which you have shared.

    Regards,

    Phaneesh A

  • Phaneesh,

    In your first post, you said,  it is DSP.... but the log says, it is ARM.... !! 

    Which core you are using for this DDR3 test?    --> ARM or DSP ? ?

    When you mean kernel ---> Is it SYSBIOS/RTOS kernel  on DSP ? or Linux kernel on ARM ?

    Let me compare your clk configurations of MCSDK and Processor SDK... and get back

    Please run the  ddr3 test code I have shared and let me know the read/write cycles....

    Regards

    Shankari G

  • Hi ,     

        We are using RT linux kernel on ARM side, but the test project which I have shared will run on DSP core , so with the MCSDK linux kernel the DSP read and write cycles to DDR3A is fine but with PDK linux kernel the same DSP example for read and write cycles to DDR3A is 3 times slower compared to the performance in MCSDK linux kernel.

    Thanks,

    Phaneesh

  • Phaneesh,

    There is no linux kernel for DSP core in MCSDK or PDK.

    The linux kernel available in MCSDK and PDK will run only on ARM core.

    --

    In MCSDK or PDK, we have SYSBIOS for DSP core and linux for ARM core.

    Regards

    Shankari G

  • Hi,

           I know there is no linux kernel for DSP , see we are running RTOS on DSP and RT linux on ARM , when MCSDK linux is running on arm and the DSP RTOS example will access a memory address the read and write cycles are seems to be fine, for the same DSP core application memory access is slower when an PDK linux is running on ARM .

    Because the DDR config is done by u-boot or kernel source from linux side , we are thinking that some part that we might have missed in PDK linux when compared to MCSDK so we have checked DDR configurations (phy register dump) and PLL configurations which is same on both the MCSDK and PDK , still we are not sure what we are missing.

    The example project link which you have shared we are seeing these prints.

    In PDK linux kernel, The DSP core output.

    [C66xx_0] Beginning DDR3 Memory Test (Rev 1)
    Turning off L1 Data Cache.
    Turning off L2 Cache.
    Starting DDR3A memory test...
    Test Pulse 1
    Test Pulse 2
    Test Pulse 3
    Test Pulse 4
    Test Pulse 5
    Test Pulse 6
    Test Pulse 7
    Test Pulse 8
    Test Pulse 9
    Test Pulse 10
    Test Pulse 11
    Test Pulse 12
    Test Pulse 13
    Test Pulse 14
    Test Pulse 15
    Test Pulse 16
    Test Pulse 17
    Test Pulse 18
    Test Pulse 19
    Test Pulse 20
    Memory Test Completed.

    In MCSDK linux kernel , the DSP core output

    [C66xx_0] Beginning DDR3 Memory Test (Rev 1)
    Turning off L1 Data Cache.
    Turning off L2 Cache.
    Starting DDR3A memory test...
    DMA1 Read Error @ 80000000, read cycle 0. Expected: 0000000000000000, Got: c021b064ffffffff
    DMA1 Read Error @ 80000008, read cycle 0. Expected: 0101010101010101, Got: c04b698c41000000
    DMA1 Read Error @ 80000010, read cycle 0. Expected: 0202020202020202, Got: de8fe000de8fe000

    some times it works but crashes in middle .

    [C66xx_0] Beginning DDR3 Memory Test (Rev 1)
    Turning off L1 Data Cache.
    Turning off L2 Cache.
    Starting DDR3A memory test...
    Test Pulse 1
    Test Pulse 2
    Test Pulse 3

    it crashed here.

    Thanks

    Phaneesh  A

  • Phaneesh,

    Let me experiment the DDR3_EDMA test code with MCSDK and PDK and get back in a day or two.

    Regards

    Shankari G

  • Phaneesh,

    I have tested the DDR3 test code with both MCSDK and PDK on K2H board. The total number of bytes transferred is 4 MB. ( Four times i.e., 4 MB x4) 

    This testing includes the read, write and verify. ( Read was done in 2 iterations)

    --- > Using MCSDK 3.01 , it takes 2.95 seconds.

    ----> Using processor SDK, it takes 2.93 seconds.

    The difference is 0.02 seconds ---> Almost the same time.

    ==============

    Test set up details:

    ==============

    1. DDR3 is driven only by DSP core.

    2. DSP core frequency is 983 MHz.

    3. DDR3 clock frequency is 666 MHz.

    4. The CSL package of MCSDK 3.01 is used for first test.

    5. The CSL package of processor SDK is used for the second test.

    6. Please note, No Operating System ( SYSBIOS/ RTOS)  is used. This is just a baremetal code with CSL header files.

    ===================

    Test 1 with MCSDK 3.01:-

    ===================

    MCSDK - Clock calculation with respect to the core frequency.

    ------------------------------------------------------------------

    Total no of CPU cycles =    cycles 

    DSP core frequency = 983 MHz

     

    Time = 1 / Freq ( General formula )

              = 1 / 983 MHz ( DSP core frequency )

              = 0.001017 us ( Micro seconds) 

     

    983000000 cycles = 1 sec

    =>2900268968 cycles =  2.95 seconds

    ===================

    Test 2 with processor SDK 6.3 :-

    ===================

    Processor SDK - Clock cycle calculation with respect to the core frequency.

    ------------------------------------------------------------------

    Total no of CPU cycles =    cycles 

    DSP core frequency = 983 MHz

     

    Time = 1 / Freq ( General formula )

              = 1 / 983 MHz ( DSP core frequency )

              = 0.001017 us ( Micro seconds) 

     

    983000000 cycles = 1 sec

    =>2887476077 cycles =  2.93 seconds

    ----

    Attaching my source code here for your reference:

    =========================================

    DDR3_EDMA_TEST_K2H_ProcessorSDK.zip

    DDR3_EDMA_TEST_MCSDK.zip

    For test 1:

    =======

    Once the 'DDR3_EDMA_TEST' project has been imported into Code Composer Studio, you must go into the project's 'Properties' -> 'Resource' -> 'Linked Resources', and create/modify a Path Variable called "PDK_LOC". This PDK_LOC should point to the Keystone2 PDK root/packages directory (e.g C:ti\pdk_keystone2_3_xx_xx_xx\packages)

    For test 2:

    =======

    Once the 'DDR3_EDMA_TEST' project has been imported into Code Composer Studio, you must go into the project's 'Properties' -> 'Resource' -> 'Linked Resources', and create/modify a Path Variable called "PDK_LOC". This PDK_LOC should point to the pdk_k2hk_4_0_16\packages directorytory (e.g C:\ti\pdk_k2hk_4_0_16\packages)

    Regards

    Shankari G

  • DDR_phy_emif_regdump.zip

    Hi ,

            We have tested here with your code and the results as follows.

    MCSDK DSP code along side MCSDK kernel on arm
    2951451400 cycle ticks

     

    PDK DSP code along side PDK kernel on arm
    79436445764 cycle ticks

     

    PDK DSP code along side MCSDK kernel on arm
    10456641560 cycle ticks

    We are not sure why it's taking this much time in our PDK linux compared to yours , Can I get your device tree .dtb file , so can I can verify if any config has gone wrong.

    And I have also shared 2 shell script which will give you DDR phy and emif register dump can you run these scripts in your PDK kernel and please share me the logs.

    Thanks,

    Phaneesh A

  • Phaneesh,

    I already said in my post,

    Please note, No Operating System ( SYSBIOS/ RTOS)  is used. This is just a baremetal code with CSL header files.

    Just to narrow down the problem, check with baremetal code and its configuration.

    If your results matches with mine, you can further compare the DDR3 register dump between the baremetal code and with the [code + OS code].

    Regards

    Shankari G

  • Hi Shankari ,

                         We need to test this with OS running on arm as we know linux will configure all the clocks and DDR and memory region reserved for DSP cores so we are seeing issue with PDK RT linux running on ARM , But if we run baremetal without any OS how will it help? is it like we will get to know that individually DSP read and write cycles to DDR is working fine, But with respect to linux we need some pointers from your side.

    Thanks

    Phaneesh

  • Phaneesh,

    When you are measuring the time, which core is used to transmit the data via DDR3 ? DSP core ? or ARM core or Both?

    ---

    Would you please provide the procedure and steps of your software package ? so that I can experiment the same in my set up ?

    or

    Would you please upload your source code and programs of both MCSDK and PDK, so that I an experiment in my setp up ?

    If this experiments can be done with the examples provided in the PDK RT linux and MCSDK , please point me the name of the example and the package versions.

    ---

    From your results above,

    1. In your zip files, there is no register Dump. It just has the shell script to generate the register values.

    2. Post your register dump. 

    -----

    I have another suggestion for you:-

    First , do a read/write of DDR3 with just the ARM core without DSP core. - For both MCSDK linux and PSDK Linux - And measure the time.

    This way we can narrow down little bit easily...

    Regards

    Shankari G

  • Hi ,

          We have linux running on arm and RTOS on DSP , u-boot and linux kernel is configuring DDR3 emif and phy registers and PLL clock registers, On DSP side we are loading DSP code from either JTAG or through linux file system using mpm service, this is our software setup .

    So with the above setup we are testing DSP load which is writing a data to DDR memory and reading it back this is considered to be one cycle in our example, we are writing and reading data up to 4MB size.

    The DSP code we are using is same but the only change is RT linux , if MCSDK RT linux is running the DSP write and read to DSP is taking 503349 cycles , if PDK rt linux is running then DSP is taking 1696787 cycles to read and write the same 4MB size at same DDR location.

    Note:- Test code which we are using is not available in any TI RTOS examples.

    The test project which we have created is already shared in above posts name of that file is 4174.test1.rar .

    The steps are simple in your K2HK evm at first load an MCSK linux kernel meanwhile build the the project which I have shared and load it to DSP via JTAG or mpmcl then you can see the number of cycles .

    repeat the same process with PDK linux.

    I have attached tar file of the DDR register dump which has been captured in my setup.

    We have also tried test code to write and read from arm the average time taken per 100 write and read cycles is same in both mcsdk and PDK , I have attached same the arm test code tar file you can test the binary on both mcsdk linux and pdk linux.

    DDR3_phy_emif_reg_dump.tar.gz

    DDR_write_read_test.tar.gz

    Thanks,

    Phaneesh A

  • Phaneesh,

    From your screenshot of PLL settings between MCSDK and PSDK

    MPORTANT:

    As far as I know, Usually, it is not the Linux which might cause problem. It will be the difference in register and clock settings.....

    ------------- COMPARE THE DDR3 register settings and clock configurations between the MCSDK Linux and the PSDK Linux---------------

    I am attaching here my gel file - Initialization script - which configures the DDR3 registers, phy registers and PLL clock registers.

    For my DDR3 read/write test, this is the register settings used....

    Please compare it with your configuration done in u-boot and linux-kernel.

    DSP GEL: - 

    xtcievmk2x_DSP.gel

    ARM GEL:-

    ----6378.xtcievmk2x_arm.gel

    Let me also try with RT Linux...

    MCSDK- RT Linux, Processor SDK RT linux ----- In my PC, Package installation with IT request --- It will take 1 week for approval, installation, setup etc..

    Meanwhile, you try to post me a step-by-step procedure/ doc or a video to setup and run your program.

    ---

    Regards

    Shankari G  

  • 3.Phaneesh,

    Few other suggestions for you.

    =========================

    1. DSP is the core which does the read and write to the DDR3. What is the DSP core frequency while using MCSDK-RTOS Vs PSDK-RTOS?

    Who sets the DSP core frequency? Are you using Gel file ?

    ----

    2. You have done the following combinations and produced the results in the form of cycle ticks. Would you please do one more combination..... That is .... MCSDK DSP code along side with PSDK Kernel on arm? ---> This test will narrow down whether the PSDK kernel is the root cause or the PSDK DSP code is the root cause for the delay..

    MCSDK DSP code along side MCSDK kernel on arm
    2951451400 cycle ticks

     

    PDK DSP code along side PDK kernel on arm
    79436445764 cycle ticks

     

    PDK DSP code along side MCSDK kernel on arm
    10456641560 cycle ticks

    ----

    3. Just to narrow down, only the Real time Linux is causing the problem, it is worth checking the same experiment with non-RT Linux.

    That is try with PSDK-non-RT-Linux vs MCSDK-non-RT-Linux.

    ---

    4. How about interchanging the u-boot and Linux among the MCSDK-RT-Linux and the PSDK RT-Linux.

    I mean, use the u-boot of MCSDK RT linux with Linux image of PSDK-RT Linux

    and the u-boot of  PSDK-RT Linux with the Linux image of MCSDK RT linux .

    This test will narrow down whether u-boot causes the delay or the Linux causes the delay...

    Regards

    Shankari G