Asynchronous EMIFA access appears to be slow

Dave Saubers

I'm using the TMS320C6748B with an asynchronous EMIFA interface (CS3) to an FPGA. PLL0.SYSCLK1 is running at 200MHz and I have set the EMIFA clock (PLL0.SYSCLK3) to 73MHz and the async configuration register (CE3CFG) to 0x00200105. We have tried a value of 0x00000001 but had some issues correctly reading the values from the FPGA but the speed never increased.

The issue is the it takes about 980ns to execute a read like; temp=*fpga_address with all cache turned off. With cache (L1P,L1D,L2) enabled the time is about 480ns-500ns (CS3 address space not cached). If we run at the preferred 50Mhz then the time is 540ns.

This statement was timed by pulsing a GPIO pin low before the read and pulsing it high after the read and then using a scope to measure the length of the pulse. If you execute this read in a loop while waiting for a signal to change then the time can become excessive. For example sometimes this loop would execute 62 times before the signal changed so we are waiting around 30us. We measured the chip select and output enable signals with a scope and got: CS duration of 36ns, OE duration of 10ns with about 10ns setup time and 12 ns hold time. This would seem to imply that the read time should be much smaller.

Is it normal to have this much latency through the processor for an external memory read/write? I added the CS3 address space to cache and cut it down to under 200ns but caching an external hardware address that can change quickly is usually not a good idea.

Is there anyway to speed this up?

We orginally looked at using a synchronous EMIFA interface but it looked like an SDRAM type interface is the only type available. We want a linear address space to the FPGA and don't want to deal with the RAS and CAS for column and row addressing. We also don't need any of the memory refresh capability.

Do you have any reference designs for synchronous SRAM type interface (clock, CS, OE, WE) to an FPGA or other device using the EMIFA? Can we setup this synchronous type interface using one of the asynchronous chip selects (CS2-5) or would it require the synchronous chip select, CS0, to be used? Do you believe the synchrounous interface would be faster?

Thanks,

Dave

over 12 years ago

0 RandyP over 12 years ago

TI__Guru* 84110 points

Dave,

Please search the Single Core DSP Forum for some keywords like EMIFA and slow and reserved and C6748. I am not certain the part number, but I recall an issue with a reserved register that needed a bit to be flipped for better operation. This might be related to your device, but I am about to run to a meeting and I wanted to give you something to look for until I get back to this (which may be a while).

Regards,
RandyP

0 Sivaraj Kuppuraj over 12 years ago

TI__Mastermind 35645 points

Hi Dave,

Thanks for your post.

There are useful E2E threads provided below which would be helpful to understand the EMIFA read speed with asynchronous device and addressing EMIFA questions:

http://e2e.ti.com/support/embedded/starterware/f/790/t/187844.aspx

http://e2e.ti.com/support/dsp/tms320c6000_high_performance_dsps/f/115/t/258949.aspx

http://e2e.ti.com/support/dsp/tms320c6000_high_performance_dsps/f/115/t/42869.aspx

Kindly check the above E2E threads which would give you brief idea on EMIFA interface with asynchronous devices.

Thanks & regards,

Sivaraj K

---------------------------------------------------------------------------------
Please click the Verify Answer button on this post if it answers your question.
---------------------------------------------------------------------------------

0 Dave Saubers over 12 years ago in reply to Sivaraj Kuppuraj

Intellectual 595 points

Thanks for the links but I had already searched the forum and found those entries. They do not address my problem though as my original post showed the CE3CFG register has be set to 0x00000001 which is not the default reset value. The CS, OE, and R/W timings show the read access should be much faster. There is apparently quite bit of overhead in the processor that I can't find anyway to get rid of. I think the data path is L1 cache/memory to L2 cache/memory to external memory to L2 cache/memory to L1 cache/memory. It appears that the L1 cache/memory - L2 cache memory path that is adding an excessive amount of time.

0 Sivaraj Kuppuraj over 12 years ago in reply to Dave Saubers

TI__Mastermind 35645 points

Hi Dave,

Thanks for your update.

Basically, asynchronous mode of read operation shall be divided in three steps (setup, strobe and hold time) and you need to experiment these parameters which are configurable in terms of CLK OUT. You need to measure the EMIFA clock source and need to check out for appropriate EMIFA CEn configuration register bit fields(asynchronous/synchronous) as per datasheet.

So, in order to test the best possible speed on the EMIF interface with asynchronous/synchronous setup, you need to adjust the width of setup, strobe and hold time in the code. You need to check how many EMIF read/write cycles consumed in each phase (setup, strobe,hold etc) and you can capture the EMIF output speed through oscilloscope too. You can refer the below E2E threads to experiment the way to test the possible speed on the EMIF.

http://e2e.ti.com/support/dsp/tms320c6000_high_performance_dsps/f/112/p/117298/434941.aspx#434941

http://e2e.ti.com/support/dsp/tms320c6000_high_performance_dsps/f/112/p/61585/222044.aspx#222044

Just for your information, higher data rates can be achieved only by using EMIFA in synchronous mode.

Hope it helps!!

Thanks & regards,

Sivaraj K

0 RandyP over 11 years ago in reply to Sivaraj Kuppuraj

TI__Guru* 84110 points

Dave,

I have been absent for a while, and you and Sivaraj do not appear to have reached a consensus on your problem.

Since you are going through a loop 62 times and taking a total of 30us, if you had the EMIFA operation running any faster, you would still take 30us before you detect teh register value changing. A faster EMIFA operation would simply result in going through the loop 124 times (if twice as fast) or some other higher number of times.

If you do the GPIO write-low then write-high, how much time passes between them? Depending on the code doing the GPIO writes, this could be faster or slower than what you need for an accurate time measurement. If you are using only direct register writes for the GPIOs, then you may be putting both values into ConfigBus write buffers and you will see the shortest possible delay between the pulse edges. If you are using read-modify-write instructions, or CSL that does read-modify-write, then there may be significant delays added due to the reads intermingling. To get the most accurate measurement, do the following:

Write the GPIO high
Read the GPIO and discard the value
Write the GPIO low
Read the GPIO and discard the value
Run your test code to be measured
Write the GPIO high
Read the GPIO and discard the value

The delay from high-to-low will be your benchmark, then subtract that from the delay from low-to-high to determine you execution time for the test code being measured. The GPIO reads make sure the value written has landed in the GPIO peripheral before the DSP moves on to the test code.

This probably will not solve your overall problem, but it will help us understand what exactly the problem is.

Do you have a way to observe the EMIFA signals such as using an oscilloscope?

Regards,
RandyP

0 Dave Saubers over 11 years ago in reply to RandyP

Intellectual 595 points

Thanks for the information. I will look into the GPIO timing as I am using read-write-modify method. We have looked at the EMIFA signals on an oscilloscope and they look correct according to how it was setup. We have developed a work around so we don't have to wait on the read of the EMIFA to get a value and see if it has changed.

Regards,

Dave

Processors

Processors forum

Asynchronous EMIFA access appears to be slow