This thread has been locked.
If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.
I´m using Async EMIF and I have observed some behaviour regarding the Write Enable signal which I don't understand.
The problem is basically observed between consecutive writings. The time between assertions of WE (Write Enable) signal is surprisingly high (around 450 nseg @ EMIF clock of 50 MHz) but is also variable.
Normally I am transfering the data using a for loop of 16 bits (which is also the width of the EMIF interface configuration) elements. The processor is working at 100 MHz so I wouldn't expect such delay as a consecuence of the mentioned loop.
for (i = 0; i < 12U; i++)
{
*(Destination + i) = *(Origin + i); //Destination and origin are memory pointers (Uint16_t*).
}
It is also worth to mention that each 4 transfers the WE transition is pretty short (marked in green) compare to the others (marked in blue).
We have also tried to use consecutive writings without the for loop with no changes….
The only thing observed that seems to have some effect is to order bigger transactions (with a cast for example) although the width of the EMIF bus is still 16 bits.
for (i = 0; i < 3U; i++)
{
*((uint64*) Destination + i) = *((uint64*) origin + i); //Destination and origin are memory pointers (Uint16_t*).
}
In this case it is observed a reduction between the forementioned consecutive WE assertions. This reduction is observed normally grouped in chunks of four transactions (16 bits each) and then comes a larger one (except at the beginning which are only two "fast" transaction before the "slow" one.
I am guessing that this behaviour may have someting to do with an internal FIFO of the EMIF peripheral but the delay mentioned seems quite high to me and limits the possible throughput of the interface. The "slow" time between transactions is the almost the same as the transaction itself.
If anyone could give me some explanation or guidance on how to improve this I would be grateful :)
Thanks in advance.
Hi,
For 16-bit data write, there is only one WE pulse in on CS period since only 1 bus transaction. For 32-bit data write, there are two WE pulses in one CS period because of of two bus transaction.
EMIF asynchronous interface can operate in either normal mode or select strobe mode. In normal mode the nCS will be active for the entire EMIF transaction. In select strobe mode the nCS is only active during the strobe time.
In normal mode, the nCS is active for an entire EMIF bus transaction. For example, if you are writing to a EMIF address, the nCS is active for the entire bus transaction (16-bit write) while the nWE has two or three pulses while the nCS is active. This is the bug in revA silicon. However, in select strobe mode, the nCS is only active during the strobe phase which means that the nCS is actually active in phase with the nWE. While we see extra nWE pulses in normal mode we would also see extra nCS in select strobe mode.
This bug has been fixed in revB and later rev silicon. The extra nWE pulses has been removed during write transactions. However, there was some technical difficulty to shorten the extra duration of the active nCS in addition to removing the extra WE pulses. When switched to select strobe mode, these extra duration of nCS remains as two extra nCS pulses. But the nWE and nOE are not active when these extra nCS pulses are active. So from a memory access stand point this should be fine. All writes and reads to the memory should be qualified with the valid nWE and nOE.
For read, not only were we able to remove the extra nOE pulses but also to shorten the nCS as well to improve EMIF performance/bandwidth. Therefore, during read, there is no extra nOE or nCS pulses seen in either normal or select strobe mode.
This extra CS duration will lower the throughput, but will not cause the data lost.
Thanks for such a quick answer.
I was kind of awar of the extra pulses as they are exposed in somo sylicon errata documents I've checked. But I am not sure if those problems are related with what I was trying to explain.
I think my problem could be related with:
Specifically in the lasts comments where you posted a link with the explanation (but the link seems broken now).
But in that thread it was about reading operations and my question were about writting ops (althouth probably I would observe something similar when reading, I could check.)
The problem is the delays between writes, which in one hand are very long and in the other are not constant. In the images in the original post can be seen "short" delays in green and "long" delays in blue.
I've tried normal mode, strobe mode, optimization o2, o0, wait extension cycles enabled and disabled but non of those things seem to have any effect on those mentioned delays. The only thing that seems to modify the behaviour of those delays is the size of the transaction at the software level (the bus width is always at 16 bits) as I tried to explain in the first post (I could try to elaborate this again if it isn't clear).
Hope this explanation bring some light on what I was trying to explain.
When writing 16-bit data to EMIF Async memory, 1~3 WE pulses are generated. Sometime it generates only one WE pulse, and sometimes two pulses or three pulses. This is why you see the different duration of WE High between transactions.
For 32-bit or 64-bit access, there is no extra WE pulse between two 16-bit write in the same CS period.
I think the linked thread is EMIF - Delay between cycles. That thread describes the effect of burst access in the EMIF.
[Used How to find the new location for broken e2e links to find the broken link]
Thanks a lot for the link. That seems to explain the behaviour I'm observing although I haven't manage to modify it by chaging the external memory region to "device no shareable" as it is suggested in that thread.
What I have observed is that the problem is greatly mitigated by using DMA with 64 bits elements as it is suggested there. Problem is that in the project I am working it may not be possible to use the DMA method.
Just in case it is usefull for anyone the timing capture of the DMA approach is:
Now there are no "so long" pulses in "high" state of WE.