This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

EDMA Throughput Performance [PCIe]

Hi,

1. In PCIe UserGuide [2.9.3 EDMA Transfer Examples] the paramSetup_pcie.aCntbCnt is called as:

paramSetup_pcie.aCntbCnt = CSL_EDMA3_CNT_MAKE(pcie_max_payload, buff_size/pcie_max_payload);

means that  ACNT= pcie_max_payload , BCNT= buff_size/pcie_max_payload but With DSPC6678 the pcie_max_payload is limited as 128B.

so ACNT is also limited as 128B or I can change it,  (e.g with ACNT=1024, BCNT=1) !

2. I try to evaluate the throughput performance using _CSL_tscRead () but I got a wrong result!

                                                             /** EDMA**/

                                                        _CSL_tscEnable();

                                                               printf("=============EDMA RC==============\n");

                                                               printf("Activate the Cache memory\n");

                                                               cacheinit();

                                                               printf("EDMA start\n");

                                                   pcie_read_actStatus_reg((CSL_Pciess_appRegs*)handle,&ActStatus);

 

                                                   Nbefore_RC=_CSL_tscRead ();

                                                               //EDMA1

                                                   printf("EDMA1 start\n");

 

 

                                                   EDMA_Transfer(0, 0,srcBuf,pcieBase,(PCIE_EXAMPLE_UINT32_SIZE*PCIE_BUFSIZE_APP));

                                                                              /*test buffet OB is empty*/

                                                               pcie_read_actStatus_reg((CSL_Pciess_appRegs*)handle,&ActStatus);

                                                               while(ActStatus.obNotEmpty) pcie_read_actStatus_reg((CSL_Pciess_appRegs*)handle,&ActStatus);

                                                               Nafter_RC=_CSL_tscRead ();

                                                               Ncycle_RC=Nafter_RC-Nbefore_RC;

 

                                                   printf("EDMA End\n");

 

in order to resolve this problem, I tried to evaluate the average (100 transactions) as following:

                                                                         /** EDMA**/

                                                            _CSL_tscEnable();

                                                               printf("=============EDMA RC==============\n");

                                                               printf("Activate the Cache memory\n");

                                                               cacheinit();

                                                               printf("EDMA start\n");

                                                   pcie_read_actStatus_reg((CSL_Pciess_appRegs*)handle,&ActStatus);

                                                   Nbefore_RC=_CSL_tscRead ();

                                                            printf("EDMA start\n");

                                                             int j;

                               for(j=0;j<100;j++){

                                                   EDMA_Transfer(0, 0,srcBuf,pcieBase,(PCIE_EXAMPLE_UINT32_SIZE*PCIE_BUFSIZE_APP));

 

                                                                                      /*test buffet OB is empty*/

                                                               pcie_read_actStatus_reg((CSL_Pciess_appRegs*)handle,&ActStatus);

                                                               while(ActStatus.obNotEmpty) pcie_read_actStatus_reg((CSL_Pciess_appRegs*)handle,&ActStatus);

 

                                               }

 

 

                                                               Nafter_RC=_CSL_tscRead ();

                                                               Ncycle_RC=Nafter_RC-Nbefore_RC;

 

                                                   printf("EDMA End\n");

 

data is transfered OK but I had also a wrong measure (some time up to 12 Gbps !!!)

  • What values are you getting for your timer reads?  How much data do you think is going through?

    Also, move any printf's outside of the actual measured times, this adds thousands of extra cycles as this is a slow communication process between the emulation and CCS and stall the processor while performing this communication (timer keeps ticking though.)  Actually, I'd suggest starting the read of the timer immediately before triggering the EDMA transfer.  You're including the whole setup and kick off of the transfer as well.  You can put another timer read right before your first check of the status, that would be more appropriate.

    Best Regards,

    Chad

  • Delared,

    What data type are you using for your timer variables?

    About how much wall clock time passes from when you click Run/Resume until you get your final output value? (<1 second, >3 seconds?)

    How do you calculate and display your result that gives your 108Gbps value?

    Regards,
    RandyP

  • Sorry I would say up to 12 Gbps ( I corrected it), so Now I'm trying with this :

    //PCIe config...

     System_printf ("Link is up #RC.\n");

        /**********************************************************************/
        /* RC send send data to EP*/
        /**********************************************************************/


        /* Write from RC to EP                                                */
        if ((retVal = Pcie_getMemSpaceRange (handle, &pcieBase, NULL)) != pcie_RET_OK) {
         printf ("getMemSpaceRange failed\n", (int)retVal);
          exit(1);
        }

            /** EDMA**/

                _CSL_tscEnable();
                    printf("=============EDMA RC==============\n");
                    printf("Activate the Cache memory\n");
                    cacheinit();
                    printf("EDMA start\n");
                    pcie_read_actStatus_reg((CSL_Pciess_appRegs*)handle,&ActStatus);


                    //EDMA1
                int j;
                int Ntrans=100;

                 Nbefore_RC=_CSL_tscRead ();

                for(j=0;j<Ntrans;j++){
                Nbefore_RC1=_CSL_tscRead ();
                  EDMA_Transfer(0, 0,srcBuf,pcieBase,(PCIE_EXAMPLE_UINT32_SIZE*PCIE_BUFSIZE_APP)); // EDMA MODE

                  Nafter_RC1=_CSL_tscRead();

                        /*test buffet OB is empty*/
                    pcie_read_actStatus_reg((CSL_Pciess_appRegs*)handle,&ActStatus);
                    while(ActStatus.obNotEmpty) pcie_read_actStatus_reg((CSL_Pciess_appRegs*)handle,&ActStatus);

                    Nafter_RC2=_CSL_tscRead();

                    Ncycle_RC1+=Nafter_RC1-Nbefore_RC1;
                    Ncycle_RC2+=Nafter_RC2-Nbefore_RC1;
                }



                Nafter_RC=_CSL_tscRead ();
                Ncycle_RC=Nafter_RC-Nbefore_RC;

                    printf("EDMA End\n");

        printf(" RC send data to EP\n");
        printf("=============Result==============\n");
        printf ("size of buffer=%d\n",PCIE_BUFSIZE_APP);
        printf("Ncyle_RC=%lld\nNcyle_RC1=%lld\nNcyle_RC2=%lld\n",Ncycle_RC,Ncycle_RC1,Ncycle_RC2);
    }

    Ncycle_RCx =Nafter_RCx-Nbefore_RCx;

    Data_size (B) = PCIE_EXAMPLE_UINT32_SIZE * PCIE_BUFSIZE_APP

    Throughput_MWr(Gbps) =Data_size*8/Ncycle_RCx

    So with PCIE_BUFSIZE_APP= 8192*Ntrans (100)  I got:


    Ncyle_RC=2006620                 =>Throughput(Gbps) =13.6
    Ncyle_RC1=2690435585         =>Throughput(Gbps) =0.0097
    Ncyle_RC2=2149486543           =>Throughput(Gbps) =0.01

    all this seems wrong because basically i should get some thing like this:

    with Overhead => 128/(128+24+8)=80%
    so the Throughput= Gen2*2x*(8b/10b)* 80%= 6.4Gbps