This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Asynchronous EMIFA access appears to be slow

I'm using the TMS320C6748B with an asynchronous EMIFA interface (CS3) to an FPGA. PLL0.SYSCLK1 is running at 200MHz and  I have set the EMIFA clock (PLL0.SYSCLK3) to 73MHz and the async configuration register (CE3CFG) to 0x00200105. We have tried a value of 0x00000001 but had some issues correctly reading the values from the FPGA but the speed never increased.

The issue is the it takes about 980ns to execute a read like;  temp=*fpga_address with all cache turned off. With cache (L1P,L1D,L2) enabled the time is about 480ns-500ns (CS3 address space not cached). If we run at the preferred 50Mhz then the time is 540ns.

This statement was timed by pulsing a GPIO pin low before the read and pulsing it high after the read and then using a scope to measure the length of the pulse.  If you execute this read in a loop while waiting for a signal to change then the time can become excessive. For example sometimes this loop would execute 62 times before the signal changed so we are waiting around 30us. We measured the chip select and output enable signals with a scope and got: CS duration of 36ns, OE duration of 10ns with about 10ns setup time and 12 ns hold time. This would seem to imply that the read time should be much smaller.

Is it normal to have this much latency through the processor for an external memory read/write?  I added the CS3 address space to cache and cut it down to under 200ns but caching an external hardware address that can change quickly is usually not a good idea.

Is there anyway to speed this up?

We orginally looked at using a synchronous EMIFA interface but it looked like an SDRAM type interface is the only type available. We want a linear address space to the FPGA and don't want to deal with the RAS and CAS for column and row addressing. We also don't need any of the memory refresh capability.

Do you have any reference designs for synchronous SRAM type interface (clock, CS, OE, WE)  to an FPGA or other device using the EMIFA? Can we setup this synchronous type interface using one of the asynchronous chip selects (CS2-5) or would it require the synchronous chip select, CS0, to be used? Do you believe the synchrounous interface would be faster?

Thanks,

Dave

  • Dave,

    Please search the Single Core DSP Forum for some keywords like EMIFA and slow and reserved and C6748. I am not certain the part number, but I recall an issue with a reserved register that needed a bit to be flipped for better operation. This might be related to your device, but I am about to run to a meeting and I wanted to give you something to look for until I get back to this (which may be a while).

    Regards,
    RandyP

  • Hi Dave,

    Thanks for your post.

    There are useful E2E threads provided below which would be helpful to understand the EMIFA read speed with asynchronous device and addressing EMIFA questions:

    http://e2e.ti.com/support/embedded/starterware/f/790/t/187844.aspx

    http://e2e.ti.com/support/dsp/tms320c6000_high_performance_dsps/f/115/t/258949.aspx

    http://e2e.ti.com/support/dsp/tms320c6000_high_performance_dsps/f/115/t/42869.aspx

    Kindly check the above E2E threads which would give you brief idea on EMIFA interface with asynchronous devices.

    Thanks & regards,

    Sivaraj K

    ---------------------------------------------------------------------------------
    Please click the
    Verify Answer button on this post if it answers your question.
    ---------------------------------------------------------------------------------
  • Thanks for the links but I had already searched the forum and found those entries. They do not address my problem though as my original post showed the CE3CFG register has be set to 0x00000001 which is not the default reset value. The CS, OE, and R/W timings show the read access should be much faster. There is apparently quite bit of overhead in the processor that I can't find anyway to get rid of. I think the data path is L1 cache/memory to L2 cache/memory to external memory to L2 cache/memory to L1 cache/memory. It appears that the L1 cache/memory - L2 cache memory path that is adding an excessive amount of time.

  • Hi Dave,

    Thanks for your update.

    Basically, asynchronous mode of read operation shall be divided in three steps (setup, strobe and hold time) and you need to experiment these parameters which are configurable in terms of CLK OUT. You need to measure the EMIFA clock source and need to check out for appropriate EMIFA CEn configuration register bit fields(asynchronous/synchronous) as per datasheet.

    So, in order to test the best possible speed on the EMIF interface with asynchronous/synchronous setup, you need to adjust the width of setup, strobe and hold time in the code. You need to check how many EMIF read/write cycles consumed in each phase (setup, strobe,hold etc) and you can capture the EMIF output speed through oscilloscope too. You can refer the below E2E threads to experiment the way to test the possible speed on the EMIF.

    http://e2e.ti.com/support/dsp/tms320c6000_high_performance_dsps/f/112/p/117298/434941.aspx#434941

    http://e2e.ti.com/support/dsp/tms320c6000_high_performance_dsps/f/112/p/61585/222044.aspx#222044

    Just for your information, higher data rates can be achieved only by using EMIFA in synchronous mode.

    Hope it helps!!

    Thanks & regards,

    Sivaraj K

    ---------------------------------------------------------------------------------
    Please click the
    Verify Answer button on this post if it answers your question.
    ---------------------------------------------------------------------------------
  • Dave,

    I have been absent for a while, and you and Sivaraj do not appear to have reached a consensus on your problem.

    Since you are going through a loop 62 times and taking a total of 30us, if you had the EMIFA operation running any faster, you would still take 30us before you detect teh register value changing. A faster EMIFA operation would simply result in going through the loop 124 times (if twice as fast) or some other higher number of times.

    If you do the GPIO write-low then write-high, how much time passes between them? Depending on the code doing the GPIO writes, this could be faster or slower than what you need for an accurate time measurement. If you are using only direct register writes for the GPIOs, then you may be putting both values into ConfigBus write buffers and you will see the shortest possible delay between the pulse edges. If you are using read-modify-write instructions, or CSL that does read-modify-write, then there may be significant delays added due to the reads intermingling. To get the most accurate measurement, do the following:

    1. Write the GPIO high
    2. Read the GPIO and discard the value
    3. Write the GPIO low
    4. Read the GPIO and discard the value
    5. Run your test code to be measured
    6. Write the GPIO high
    7. Read the GPIO and discard the value

    The delay from high-to-low will be your benchmark, then subtract that from the delay from low-to-high to determine you execution time for the test code being measured. The GPIO reads make sure the value written has landed in the GPIO peripheral before the DSP moves on to the test code.

    This probably will not solve your overall problem, but it will help us understand what exactly the problem is.

    Do you have a way to observe the EMIFA signals such as using an oscilloscope?

    Regards,
    RandyP

  • Thanks for the information. I will look into the GPIO timing as I am using read-write-modify method. We have looked at the EMIFA signals on an oscilloscope and they look correct according to how it was setup. We have developed a work around so we don't have to wait on the read of the EMIFA to get a value and see if it has changed.

     

    Regards,

    Dave