I am wrestling with a system that is supposed to use DMA to get data from an FPGA.
The FPGA raises a line when it has a packet to read, it automatically drops the line as the last byte of data is read out (about 3 us), if there is another packet available it raises the line again (with a miniumum low period of 120 ns). The FPGA has up to 32 packets which it can transfer in quick succession.
To deal with this I have a essentially a ping pong setup (eg http://www.ti.com/lit/an/spra636a/spra636a.pdf fig 28), the base channel is associated with a GPIO IRQ event, this channel is linked to 32 channels in a ring buffer. EDMA is setup to send me intermediate and final transfer completion callbacks.
I can see with a logic analyzer that the DMA controller has no trouble responding to the GPIO event and reading from the FPGA.
In a perfect world I would like to get a callback for every transfer, and in the parameters of the callback function, confirmation of which buffer has just been written.
In reality I find that I get few DMA completion events when there is a burst of DMA transfers, I can understand this, a string of IRQs can cause missed IRQs.
To counter this I made a non-nested, re-entrant, non-self-masking IRQ handler to catch the end of the transfer that simply increments a counting semaphore. By also toggling a GPIO pin I can see that these IRQs seem not to miss any transfers, but of course I cannot guarantee that.
I used the couting semaphore to control polled reading of the buffers but still seemed to miss some packets.
So I have gone back to the pretty unsatisfactory mechanism of writing a known pattern to the buffer before handing it to the DMA controller, and then checking for when the pattern is overwritten to know DMA has done its work and I can read the buffer and resote the know pattern (The buffers are in L1Data so there are no cache coherency issues to mess this up).
I expect to have a wave of written buffers to flow around the circular buffer for my polling routine to follow.
Even this does not work, I find that the buffers seem to complete out of order, that is I can be at position A checking the buffer, but due to counting IRQs and DMA completion callbacks I know there have been completions, in this case I peek at the next buffer and sometime find it has been written even thought the preceeding buffer has still not been written.
So am I missing the point, should this work, can this work, is there a better way?
Or on a simpler level, is thare an easy way to know what has just completed, and where data was just written, if nothing else, is there a way to read a count of completions from the DMA controller?
I just hate looking at a buffer and waiting for it to change underneath me, there are just too many ways for caching and optimization to confuse the issue.
Has anyone else implemented something like this and had success - just to give me some hope.
It seems that the DMA always works and the transfers are always complete. I suspect that the interrupts from DMA at the end of every packet are running into one another and do not meet the real time requirements. From my understanding of your system it would be better to devise a scheme that interrupted the CPU at the end of the up to 32 packet transfer rather than at the end of every packet.
I took your advice and removed the IRQs I can, now I have a task polling the buffers written to by the DMA controller which wakes up quite often (semaphore timeout), and also gets a poke from the DMA completion IRQ (posting the semaphore).
I took out the piece of code that was polling the next DMA transfer address in the param block and went back to just seeing when the buffer has been written.
I will test more overnight but so far it seems to behave much as before, with a little more latency processing isolated packets, and it still all goes wrong after a couple of million packets.
Chris, can you confirm that the DMA always transfers the data correctly into the buffers? Is it just that picking up the buffers is still an issue?
I can confirm it always runs and reads bytes over emif, whether they get to the L1data is not something I can guarantee.
My new plan is to get rid of most of the IRQs and use handshaking with the FPGA so it can spoon feed data to the DSP with no possibility of an IRQ that the DSP is not ready for.
This will add latency I am sure, but with enough complexity the raw speed should be similar to the original design. In this new design there will be a per packet IRQ to the DMA controller, no completion IRQ from the DMA controller and a single IRQ to the DSP from the FPGA (which just increments a semaphore).
A task will pick up the semaphore, do the read(s) and handshake back to the FPGA when it is done.
Does that sound more achievable?
The main problem I can see is if there is a significant delay between the DMA controller reading the data and it writing it to L1data, do you think I need to get the FPGA to put in some delay to allow for this? I was thinking that the IRQ latency and the subsequent delay until the task responds to the semaphore would be far greater than any write delay in the DMA controller - but if I am being optimistic please tell me.
All content and materials on this site are provided "as is". TI and its respective suppliers and providers of content make no representations about the suitability of these materials for any purpose and disclaim all warranties and conditions with regard to these materials, including but not limited to all implied warranties and conditions of merchantability, fitness for a particular purpose, title and non-infringement of any third party intellectual property right. TI and its respective suppliers and providers of content make no representations about the suitability of these materials for any purpose and disclaim all warranties and conditions with respect to these materials. No license, either express or implied, by estoppel or otherwise, is granted by TI. Use of the information on this site may require a license from a third party, or a license from TI.
TI is a global semiconductor design and manufacturing company. Innovate with 100,000+ analog ICs andembedded processors, along with software, tools and the industry’s largest sales/support staff.