This thread has been locked.
If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.
Hello,
I m using Dm6467.
In the code, I am running the algorithm and copy the input frame into output frame with memcpy function. The code is work fine.
When I use EDMA3 to copy input frame into output frame instead of memcpy function then code is working. It is stuck in between. But if I use only the EDAM3 to copy input frame into output frame without running algorithm (only loopback with EDAM3) then its work fine.
I dont know why this happen.
Can anyone help me? What is the problem and how to debug the code.
Regards,
Naresh
Hello,
I m using Dm6467.
In the code, I am running the algorithm and copy the input frame into output frame with memcpy function. The code is work fine.
When I use EDMA3 to copy input frame into output frame instead of memcpy function then code is not working. It is stuck in between. But if I use only the EDAM3 to copy input frame into output frame without running algorithm (only loopback with EDAM3) then its work fine.
I dont know why this happen.
Can anyone help me? What is the problem and how to debug the code.
Regards,
Naresh
The algorithm uses cache to store data. EDMA3 does not go through the cache. There is likely a cache coherency situation, and if you are not currently handling this, then this could be your problem.
The DSP cache and coherency and cache registers are described in the C64x+ Megamodule Reference Guide SPRU871.
When the algorithm needs to read a new set of input data from a frame or line buffer that has been newly loaded by the EDMA3, you need to issue a cache invalidate command before reading from the buffer so the new data will be read.
When the algorithm writes new output data to a frame or line buffer, you need to issue a cache writeback command to flush all new data from the L1D and/or L2 cache into the buffer. This needs to be done before the EDMA3 tries to move the data.
In CCS in a Memory Window, you can determine if data is held in cache by the highlighted color of the data; the highlight colors will match the color by the cache "check boxes" in the Memory Window. Also, you can compare the data in a buffer with what is in cache by clicking the check boxes for the different caches; this bypasses the cache for the display and shows what is actually stored in the addressed memory location when the boxes are unchecked.
If this answers your question, please click Verify Answer on this post; if not, please reply back with more information to help us answer your question.
Thank you RandyP.
But my input frame buffer and output frame buffer resides in external memory and I just transfer memory buffer from external memory to external memory with EDMA3.
Is cache coherency situation also possible in this case?
Regards,
Naresh
Yes, cache coherency is very possible in this case. I do not know enough about your application to say for sure, but your description matches those cases when cache coherency can be an issue.
In the Training section of TI.com, there is a training video set for the C6474. It may be helpful for you to view several of the modules since the C64x+ core is common between the C6474 and the DM6467. But in particular, the Memory and Cache Module will apply to your current questions. You can find the complete video set at http://focus.ti.com/docs/training/catalog/events/event.jhtml?sku=OLT110002 .
Hello,
Thanks your prompt response.
Could you please guide me on this issue.
How to issue a cache writeback command to flush all new data from the L1D and/or L2 cache into the buffer? I m using the linux environment.
regards.
NAresh
Naresh,
You are observing that the EDMA is stuck when yoou have the Algorithm as well as the EDMA copy running. Which EDMA channel are you using for the External memory copy? Is the same channel also being used by the algorithm?
Regards,
kapil
Naresh,
Please refer to the DSP/BIOS API Reference Guide. There may be different document revisions for different BIOS versions, but the BCACHE API functions are described in SPRU403, for example.
If this answers your question, please click Verify Answer on this post; if not, please reply back with more information to help us answer your question.
@Kapil
Yes both the dma channal is same. I configure only one dma channal and useing that only channal.
regards,
Naresh
Naresh,
If you are using the same DMA channel for the algorithm and the memcpy, then both these processes cannot run simultaneously in parallel. You will observe the processor polling on the DMA_wait endlessly. I would advise you to use separate channels and check if you still see the hang.
Regards,
Kapil
Kapil
I think there is misunderstanding to understand problem.
We are using ACPY3 to do 2D-2D transfers from DDR to DDR. The transfers are set up and executed within the VIDANALYTICS_process() call on the DSP side. The transfers fail after a few cycles. This is particularly observed when code doing analytics is included as a part of the process call.
If a simple loopback is done with the DMA transfers as the single operation then it has been observed to work longer. The code doing analytics also works when the DMA transfers are replaced by memcpy calls.
Dma is used to transfer external memory to external memory( input frame buffer to output frame buffer).
In our video analytic application, we required to copy input buff to output buff. Initially we did this transfer by using memcpy function and application works fine. Now we want to remove memcpy fuction and want to use EDMA for copying input frame buffer to output frame buffer for the better performance. Video analytic application is not working with EDMA.
for the sake of testing, we disabled all the algorithm( by remove the analytic function) and sinlge EDMA trasfer of input buff to outbuff remain in application, then this loop back is working fine.
So there is only one copying operation in the code and it is either by EDMA or memcpy.
regards,
Naresh
Naresh,
The BCACHE commands are a necessity for using EDMA3 with external cached memory.
But cache coherency would only explain why the frame of data would contain old stale data. Coherency problems would not explain why the EDMA would lockup or why it would work for a few frames and then freeze.
Perhaps you can explain some more about the symptoms of the failure you are observing, please?
Regards,
RandyP
RAndyP,
I use the BCACHE as per ur gidence but not work. Still stuck with the problem.
See how I use it in my code.
#ifdef USE_EDMA
elem.LineLength = obj->frameWidth;
elem.NumLines = obj->frameHeight;
elem.srcLineOffset = obj->frameWidth;
elem.dstLineOffset = obj->frameWidth;
elem_uv.LineLength = obj->frameWidth;
elem_uv.NumLines = obj->frameHeight/2;
elem_uv.srcLineOffset = obj->frameWidth;
elem_uv.dstLineOffset = obj->frameWidth;
BCACHE_wbInvAll();
BCACHE_wait();
BCACHE_wbAll();
BCACHE_wait();
//BCACHE_invL1pAll();
//BCACHE_wait();
VCA_ALGORITHMS_MT_doCopy2D2D(dmaHandle2D[0],(U8 *)inBufs->descs[0].buf,(U8 *) outBufs->descs[0].buf,&elem);
VCA_ALGORITHMS_MT_doCopy2D2D(dmaHandle2D[0],(U8 *)inBufs->descs[1].buf,(U8 *) outBufs->descs[1].buf,&elem_uv);
#else
memcpy(sOutputframeYUV.u8FrameData, img, u32FrameSize);
memcpy(sOutputframeUV.u8FrameData, imguv, (u32FrameSize)>>1);
#endif
Naresh
The BCACHE_wbAll is not needed after you have already done the wbInvAll. wbAll may be sufficient, but you need to understand how the buffers are being used and how the cache commands will affect your data. Also, you may want to consider the wbBlock command, but there are tradeoffs of execution time vs. intrusion on the rest of your application.
As I said before, the cache issues will only affect data corruption and not program execution. Please examine the details and symptoms of your program's behavior that you are trying to fix.
Randy,
Processor: c6472.
We have a situation where we suspect we could be hitting a L1D to L2 cache coherency issue. We use all of L1D as cache. We have a (temporary) processing buffer (significant size) in LL2. The final output buffer is in DDR2. We use EDMA3 to transfer the processed output from LL2 to DDR2 and then clear the LL2 for next round of processing.
We are seeing an intermittent artifact and What we are not sure is if somehow L1D cache could be overwriting the cleared LL2 buffer (LL2 got written over by the L1D cached contents because it needed a new line to be assigned to L1D cache).
However reading SPRU871 it doesn't look lime that should be happening.
This is what I am referring to in SPRU871.
3.3.6 Cache Coherence Protocol
The C64x+ L1D cache remains coherent with respect to DMA activity in L2 RAM. To support this
paradigm, the L1D cache accepts cache coherence commands arriving from L2.
3.3.6.1 L2 to L1D Cache Coherence Protocol
To support L1D cache coherence with respect to DMA/IDMA traffic in L2 RAM, the L1D controller supports
two cache coherence commands arriving from L2: snoop-read (SNPR) and snoop-write (SNPW). The L2
only sends these snoop commands, when necessary, in response to DMA and IDMA activity in L2 RAM.
Snoop-read is sent to L1D when L2 detects that the L1D cache holds the requested line, and that the line
is dirty. L1D responds by returning the requested data.
Snoop-write is sent to L1D when L2 detects that the L1D holds the requested line. It does not matter if the
line is modified within L1D. The L1D updates its contents accordingly.
Will appreciate any help you can provide in clarifying the above.
Thanks,
Somnath Banik
Somnath,
Please post your question in a new thread in the C64x Multicore DSP Forum. This is the wrong forum and an unrelated old thread.
Since L1D-LL2 cache coherency is maintained, you need to describe in your new thread what you are observing, why you think a certain thing might be what is wrong, and what you are doing with the various buffers.
Regards,
RandyP