DM648 EDMA PaRAM synchronization? (polling of PaRAM.DST)

Thomas9070

Hi,

in my DM648 application (frame grabber PCI card) there is an EDMA transfer: Videoport -> DDR (grabbing an image). Each "Videoport-FIFO-Full" event triggers the copy of the data into DDR Ram. For example if an image with 640x480 bytes is grabbed, 480 EDMA events will be triggered, each copies 640bytes of data.

A TSK runs parallel and polls PaRAM.DST in order to see how far the data was already copied. The valid data is forwarded over the PCI bus into the PCs memory.

Polling PaRAM.DST sometimes brings a strange result - the value jumps back to a smaller value - which can not be true because the memory is filled linear from lower to higher addresses. For example (polling once every ms with about 30MB/s of incoming net data):

0xE36D35A0
0xE36D8BB0
0xE36EFB10
0xE36E6F80
0xE36EDEE0
0xE36F4E40

It seems that the update of the PaRAM.DST register on the EDMA side is not synchronized to reading it on the DSP side - so sometimes you get some of the bits with wrong values.

Is it theoretically possible that reading PaRAM.DST results in wrong values because EDMA is updating the register "at the same time"? Is there any synchronization mechanism?

best regards,

Thomas

over 14 years ago

0 B.C. over 14 years ago

TI__Genius 16786 points

Thomas,

I'm not sure about what could be causing the out-of-order polling. An alternative scheme would be to break the transfer into two or more smaller transfers that are chained together. When the first transfer in the chain completes, it can both interrupt the DSP and automatically trigger the next transfer in the chain.

Regards,

Brad

0 Thomas9070 over 14 years ago in reply to B.C.

Expert 1575 points

Hi Brad,

Is it not allowed to read PaRAM.DST during an ongoing transfer? If it is allowed, there is a bug - dont you agree?

The application is time critical. We need to use the time of grabbing to forward the data to the PC. There are 2 resons for why linked transfers are not possible:

1.) At the moment of grabbing it can be possible that the destination on the PC side is not yet known to the DSP.
2.) Up to 5 cameras on videoports can send up to 80MB/s data (single images being triggerd at the same time sometimes). This 400MB/s is too much for the PCI.

In most cases only 1 or 2 grabs overlap - so the parallel forward results in the image being ready for the PCs app very short time after the last Byte has been received by the Videoport.

Is there a legal way to read the progress of an ongoning (multi-EDMA-event) Videoport->DDR Ram EDMA transfer?

bye,

Thomas

0 kcastille over 14 years ago in reply to Thomas9070

TI__Guru 51792 points

Thomas,

You mention an "example" image of 640x480, where 640 bytes is transferred per EDMA event. If I look at the address snapshot you provide, the addresses are not offset by a multiple of 640 bytes... Is the "example" different than the actual experimental results/testcase?

Thanks in advance

Kyle

0 kcastille over 14 years ago in reply to kcastille

TI__Guru 51792 points

Another question: You mention "video port fifo full" event ... If the video port waits for the fifo to become full before generating an event, you run the risk of FIFO overflow (and loseng data) since the latency from event->dma read will be longer (probably) than the next incoming sample from the video port. You should make sure the event threshold (which is programmable in the video port) matches the DMA transfer size (PaRAM ACNT field).

Regards
Kyle

0 Thomas9070 over 14 years ago in reply to kcastille

Expert 1575 points

kcastille said:

You mention an "example" image of 640x480, where 640 bytes is transferred per EDMA event. If I look at the address snapshot you provide, the addresses are not offset by a multiple of 640 bytes... Is the "example" different than the actual experimental results/testcase?

The offsets in the example above are the result of polling PaRAM.DST once every ms. It reflects the incoming datarate of about 30MB/s - it has no connection to the size of the FIFO Threshold. EDMA copies in the background independently.

0 Thomas9070 over 14 years ago in reply to kcastille

Expert 1575 points

kcastille said:

Another question: You mention "video port fifo full" event ... If the video port waits for the fifo to become full before generating an event, you run the risk of FIFO overflow (and loseng data) since the latency from event->dma read will be longer (probably) than the next incoming sample from the video port. You should make sure the event threshold (which is programmable in the video port) matches the DMA transfer size (PaRAM ACNT field).

With 'Videoport full event' I meant 'FIFO Threshold'. It is set to the size (width) of the camera image. If the size is more than 1296 (which is half of the maximum of our 2592 pixel width camera...), the FiFO Threshold will be half the width. I think that this should be ok, we use 8 bit raw data and the maximum threshold is always far below the size of the Videoport FIFO (2560 Bytes). The FIFO threshold is identical to the PaRAM ACNT. BCNT is 1, CCNT is the number of FIFO Threshold full events. Its an A synchronized Transfer.

I think that it is correct in terms of the basic EDMA transfer setup. The problem seems to be that the PaRAM.DST field seems to be asynchronous in hardware between DSP and EDMA controller resulting in some of the bits being read wrong sometimes.

bye,

Thomas

0 Thomas9070 over 14 years ago in reply to Thomas9070

Expert 1575 points

Hi,

I created this thread in the 'DM64x DaVinci Video Processor Forum'. Somehow it got moved to the 'Linux Forum' which is the wrong place. Can one of the List Admins please move it back to the right place?

thanks,

Thomas

0 Thomas9070 over 14 years ago in reply to Thomas9070

Expert 1575 points

some more observations:

I read PaRAM.DST now 3 times with an asm(" nop ") in between. If my assumption is correct (EDMA and DSP not synchronized in access of PaRAM.DST) then I expect to read either

- 3 identical values (EDMA is not updating PaRAM.DST while reading)
- 2 identical values (EDMA is updating PaRAM.DST, 2x old or new correct, 1x old or new wrong or correct
- 3 different values (1x old correct, 1x wrong, 1x new correct)

and really - it is like that. So a workaround for my specific case can be:

- reading EDMA 3 times (+1 for additional check if all 3 values are different)
- if DST1==DST2 -> DST1 is a correct value
- else if DST2==DST3 -> DST2 is a correct value
- else if all different: DST3 is correct if DST3==DST4

so the problem is solved for me at the moment. But ... somehow it leaves a bad feeling if values like PaRAM.DST cannot be trusted.

bye,

Thomas

0 kcastille over 14 years ago in reply to Thomas9070

TI__Guru 51792 points

Thomas,

Very interesting results...

For background, the EDMA Channel Controller and EDMA Transfer Controller have different responsibilites in terms of address updates. The Channel controller, upon receiving an event, sends a Transfer Request for ACNT bytes (assuming A-sync'ed transfer), then updates the SRC/DST addresses according to (S|D)BIDX (if BCNT > 1) or (S|D)CIDX (if BCNT == 1). In your case, CIDX is used since BCNT is programmed to value of 1. In any case, it would make sense then that the value read from the address register be a multiple of CIDX (or ACNT, since CIDX == ACNT I assume) relative to the start address at any point in time.

As you point out, at some point, an inconsistent value appears. Taking a look at your address sequence, based on your example of 640 bytes, the address is never a multiple (thus my previous confusion). But, if I compare based on multiples of 1296 bytes (as per your imager size), the results are as follows:

Address	Relative to previous address (in multiples of 1296 bytes)	Relative to first address (in multiples of 1296 bytes)
0xE36D35A0
0xE36D8BB0	17	17
0xE36EFB10	72.57	89.57
0xE36E6F80	-27.57	62
0xE36EDEE0	22	84
0xE36F4E40	22	106

So, it appears that the "negative" value isn't the problem. The problem is the excessively large value just prior to that.

In any case, can you double check the types of instructions that are being used to access the EDMA? I.e., is the code using Load word (LDW) instructions? Or maybe byte or halfword inadvertently?

If the CPU is using byte or halfword instructions, it's possible that you may get an event in between individual accesses that build up to the 4-byte address?

You can hopefully see the instruction sequence by looking at the disassembled code in the code composer source code view.

Thanks in advance

Kyle

0 Thomas9070 over 14 years ago in reply to kcastille

Expert 1575 points

Hello Kyle,

kcastille said:

As you point out, at some point, an inconsistent value appears. Taking a look at your address sequence, based on your example of 640 bytes, the address is never a multiple (thus my previous confusion). But, if I compare based on multiples of 1296 bytes (as per your imager size), the results are as follows:

The pixel numbers are just examples. The image frame size can change with every frame in my application. I'm not sure which size was used in the test, it can be a random number (multiple of 8 in x).

kcastille said:

So, it appears that the "negative" value isn't the problem. The problem is the excessively large value just prior to that.

yes - correct. If you look at 0xE36D8BB0 and the most likely wrong value 0xE36EFB10 it seems that bit 17 has already switched from 0 to 1 while bit 15 is still 1 (but should switch from 1 to 0).

kcastille said:

In any case, can you double check the types of instructions that are being used to access the EDMA? I.e., is the code using Load word (LDW) instructions? Or maybe byte or halfword inadvertently?

This is the explaination! I use the EDMA3_LL driver (currently version 1.05.00) function EDMA3_DRV_getPaRAM(). I have not looked at it before but the sourcecode is available.

EDMA3_DRV_getPaRAM() internally uses an auxillary function edma3MemCpy() that copies PaRAM in a loop - one byte at a time... With that its no surprise to get the result I have seen. I'll see how I can get the PaRAM entries in a different way to remove the problem.

btw: its the same in the latest 5.3x BIOS based version of EDMA3_LL driver (1.10) - I think that this edma3MemCpy() could (should) be implemented in a better way...

bye,

Thomas

0 kcastille over 14 years ago in reply to Thomas9070

TI__Guru 51792 points

Thanks for the follow-up. At least good to hear that it's not a silicon issue. We'll follow up with the BIOS team.

Regards

Kyle

0 Thomas9070 over 14 years ago in reply to Thomas9070

Expert 1575 points

Thomas said:

The application is time critical. We need to use the time of grabbing to forward the data to the PC. There are 2 resons for why linked transfers are not possible:

1.) At the moment of grabbing it can be possible that the destination on the PC side is not yet known to the DSP.
2.) Up to 5 cameras on videoports can send up to 80MB/s data (single images being triggerd at the same time sometimes). This 400MB/s is too much for the PCI.

There is one more reason why VP is not directly linked to PCI: its simply not possible. According to http://focus.ti.com/lit/an/spraaz9/spraaz9.pdf (table 3 - System Connection Matrix"), there is no connection between Videoport and PCI.

Processors

Processors forum

DM648 EDMA PaRAM synchronization? (polling of PaRAM.DST)