I'm going a matrix application, so part of it involves moving submatrices around in memory. I was initially excited about the EDMA 'tranpose'/'data sorting' technique, but when I included it, the transpose operation started returning incorrect results for larger than trivial-sized matrices. I reasoned out where I needed to write-back and invalidate the caches, and in debugging, I added more (safely), but that never fixed the problem (in fact, one time I think it delayed the problem until an even larger size, then it started not returning correct results again)
The sizes of transfers I'm talking about vary, but they are all less than an Acount of 4*256 and a Bcount of around 186 is when it starts to fail (the transfers of A=142*4, B=184 to be precise).
My guess was that I was overloading the event queue (although I wasn't explicitly putting anything in there myself), but I could not figure out how to remedy the problem. And that might not even be the case because it works fine for repeated, relatively small transfer in an inside loop, but it fails on some of the slightly larger chunks of data in the outside loop. The fact that this transfer is to/from MSMC and DDR may be something, but I believe I tried moving the particular matrix to L2 instead, but with no luck.
My method of transferring data is to use the CSL 'channel set' function, spin on QUERY_INTRPEND, and then clear the IPR bit. I think this is the "Shadow Region", but that terminology thoroughly confuses me. This works for the other transfers and when I am doing small sizes. I have seen some other methods to do DMA transfers (like maybe QDMA), so if I need to do one of those, I could try rewriting my code for that. Originally, I was using 1 channel for all of my transfers to/from DDR, but allocating 2 extra channels solely for the "failing" transfers doesn't fix the problem (though they were from the same Event Queue as far as I could tell).
It would be very easy for me to believe that the hardware just "gave up" trying to transfer the data, set an error bit somewhere, and then set the IPR bit -- but I am not sure how to check that hypothetical error bit or even how to relaunch the transfer once an error is detected in software.
I also saw some things about an error/interrupt handler, and that may be a step towards the solution, but I could not find any examples dealing with a handler.
Any help, hints, or alternatives would be much appreciated