This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

How fast TMS570 EMIF write or read SRAM?

Hi:

   I connect a  SRAM (IS61WV20488BLL) to TMS570 EMIF interface. When the CPU core run at 160MHz, I test the speed of  TMS570 write the SRAM through RTI.

 My result is : write 1K×16bit  SRAM about 1200us. That means the speed of EMIF writing the SRAM is no large than 2MB/s. 

   I dont understand that EMIF can run 100MHz, why it wirite SRAM so slow?  

My questions is:      how many cycle it need to write SRAM?  What fast EMIF can write to a SRAM?(10MB/s, 50MB/s, 100MB/s, 200MB/s?)

  The SRAM  max write cycle is 16us.

  thanks.

  • Mojiang,

    The EMIF always runs at PLL1 clock/6. If you run the CPU at 160MHz, the EMIF clock is the VCLK frequency 80MHz. I am not aware anyone did such a speed test on TMS570 devices. Can you share us your EMIF settings and the code you used to write the EMIF? I can forward to our EMIF expert to see if he has any comments.

    Regards,

    Haixiao 

  • Haixiao,

      Thansk for your reply.  I test the EMIF as the following cas.

      Case1: 

      emifREG->A1CR = emifREG->A1CR | 0x01;//16-bit

      The other parameters (e.g W_SETUP, W_STROBE,W_HOLD, R_SETUP, R_STROBE, R_HOLD,TA)  use the default parameters.

      I use the RTI as  the measure.

    The code :

     RCLKSRC_bit.RTI1SRC = 0;  // RTI clock 1 form OSC (16MHz)
        RCLKSRC_bit.RTI1DIV = 3;  // RTI clock 1 divider /8
        RTIGCTRL = 0;             // disable both counters
        RTIFRC0  = 0;             // Clear freerunning counter    
       RTICPUC0 = 2-1;   //设置RTIFRC的计算频率为1MHz作为高精度的测量使用

    Use the emif to write the sram. the sram is 16bit width.

      for( i = 0; i < 1024; i++)
      {
        *p++ = i;
      }

    p is pointer which point to EMIF_BASE(0x60000000).

    I set the breakpoint before the "for" and after the "for cycle ". I read the RTIFRC0  to caluclater the time needed by the for cycle.

    The difference between the two RTIFRC0  value is 4B9.

    As the RTI setting , the time is : 1209*1us = 1.2ms  the speed is :   1K × 16 bit / 1.2ms = 1.625MB/s

    Case 2:

       I set W_SETUP =1, W_STROBE=1,W_HOLD=1, other parameters are the same as Case 1.

    The difference between the two RTIFRC0  value is 5B.

    As the RTI setting , the time is : 91*1us = 91us     the speed is :   1K × 16 bit / 91us = 22MB/s

     

  • Mojiang,

     

    With the default setting, ideally, it takes 16+64+8=88 EMIF cycles to complete one write (16bit). With the second setting, ideally, it takes 2+2+2=6 EMIF cycles to complete one write. The ‘for loop’ also takes some time, assuming it takes 1 EMIF cycles. (88+1)/(6+1)=13. The ratio between 1.2 ms and 91us is also 13. So this matches.

     

    Let’s trace back a little bit. 1.2ms/1k=1.2us. The EMIF takes about 1.2 us to read every 16 bit data. The clock period is roughly 1.2 us/89=0.0135us, which is around 74MHz. It looks like the VCLK frequency – 80MHz. It conflicts with the TRM – “The EMIF's internal clock is sourced from the CLKDIV6 clock domain of PLL controller 1 and cannot be sourced directly from an external input clock.”

     

    I checked the VHDL source code. The clock for EMIF is the VCLK_P. I believe it is 80MHz in your case. The TRM is not correct. I will correct it in next release.

    Try to set the  W_SETUP =0, W_STROBE=0,W_HOLD=0 to see if it works in your case. If it works, you can achieve a higher speed - 3 vclk cycles to access every 16 bit.

     

    Thanks,

    Haixiao

  • Haixiao,

      Thanks your advice.  I set the three parameter W_SETUP=0, W_STROBE=0, W_HOLD=0, but the time to write 1K×16bit is the same : 5A cycle. That is 90us.

     It works right. But the time does not vary. Why?

  • I am afraid in this case the 'for loop' instruction execution consumes most of the time. Accessing one 16bit data takes 3 VCLK cycles (maybe a little bit more), the time spending on completing a ‘for loop’ depends on how the compiler does. If the CPU has to load the index from the internal SRAM and write back to the internal SRAM, it will take lots of time. You can look at the assemble code to see what happens.

    Regards,

    Haixiao

  • Haixiao.

         I  see the Assemble code , the result accord with your answer. Every time it will compare the index with the constant (1024), and then a Jump expression. The two expression

    assume some time about (2 Core Cycles). 

      I want to know whether TI has plan to improve the TMS570 performance. Example:  Increase the storage capacity(FLASH  RAM)and Core  speed.  Or Integrate  Ethernet modula.

      Thanks  again.

     

      ChenMojiang

     

  • Our next silicon TMS570LS3x will integrate ethernet and USB module, support both asyc and syc EMIF, 3Mbyte Flash, 64Kbyte EEPROM emulation and 256Kbyte SRAM.

    The first silicon will check out in the coming spring.

    Regards,

    Haixiao