I'm using the TMS320C6748B with an asynchronous EMIFA interface (CS3) to an FPGA. PLL0.SYSCLK1 is running at 200MHz and I have set the EMIFA clock (PLL0.SYSCLK3) to 73MHz and the async configuration register (CE3CFG) to 0x00200105. We have tried a value of 0x00000001 but had some issues correctly reading the values from the FPGA but the speed never increased.
The issue is the it takes about 980ns to execute a read like; temp=*fpga_address with all cache turned off. With cache (L1P,L1D,L2) enabled the time is about 480ns-500ns (CS3 address space not cached). If we run at the preferred 50Mhz then the time is 540ns.
This statement was timed by pulsing a GPIO pin low before the read and pulsing it high after the read and then using a scope to measure the length of the pulse. If you execute this read in a loop while waiting for a signal to change then the time can become excessive. For example sometimes this loop would execute 62 times before the signal changed so we are waiting around 30us. We measured the chip select and output enable signals with a scope and got: CS duration of 36ns, OE duration of 10ns with about 10ns setup time and 12 ns hold time. This would seem to imply that the read time should be much smaller.
Is it normal to have this much latency through the processor for an external memory read/write? I added the CS3 address space to cache and cut it down to under 200ns but caching an external hardware address that can change quickly is usually not a good idea.
Is there anyway to speed this up?
We orginally looked at using a synchronous EMIFA interface but it looked like an SDRAM type interface is the only type available. We want a linear address space to the FPGA and don't want to deal with the RAS and CAS for column and row addressing. We also don't need any of the memory refresh capability.
Do you have any reference designs for synchronous SRAM type interface (clock, CS, OE, WE) to an FPGA or other device using the EMIFA? Can we setup this synchronous type interface using one of the asynchronous chip selects (CS2-5) or would it require the synchronous chip select, CS0, to be used? Do you believe the synchrounous interface would be faster?
Thanks,
Dave