I have a working custom codec, but it processes whole frames in external (slower) memory. I already converted it to slices for unrelated reasons, and am now considering upgrading it to use faster L1D memory.
I'm looking at the canny edge detection example, at the MEMCPY() function (defined in terms of CANNY_TI_do1DDma). This function appears to me to be "blocking". That is, you call it and it does not return until the copy is complete. I understand this may be just a simplified example, but in my 30 years experience, on and off, of doing DMA, you always overlap the DMA transfer with other activities. You don't "block" like this.
Does it seem that I'm understanding the situation correctly? Or am I missing something? More specifically, I might get my codec working using this MEMCPY() function, but then later restructure it so that it is NOT blocking. For example, I might handle two slices at a time, stepping one slice forward each iteration. (My algorithm logic would allow this.) In doing so, I begin each by starting DMA on slice N+1, and then process slice N (such processing being prefixed by a wait on an earlier DMA start on slice N). Thus, I am at least overlapping the processing of slice N with the DMA transfer of the next slice, N+1. This scenario is for a simple input-process-only situation. A more complicated input-process-output situation could similarly overlay processing with DMA transfer.
Oh, speaking of which, the doc is a little too voluminous to easily find the answer to this next question. Can I run an input and output DMA simultaneously (DM6467T)? I'm thinking that, if I can, I'll need "two channels". I believe the existing Canny example receives a single channel from UNIVERSAL_create(). I would need to somehow create a second channel, which seems it would violate XDAIS standards? Any advice here as well is greatly appreciated. (My codec source superstructure is derived from the Canny example.)
Thanks very much,
Helmut