5510 DMA engine throughput in the presence of HPI

FastFourier

Expert 1915 points

Other Parts Discussed in Thread: TMS320C6455

We're getting fairly far along in our development, and starting to see throughput problems. I've looked through tons of documentation, but wasn't able to resolve some questions. So here I am.

Our scenario :

1x serial ADC on McBSP0 serviced by DMA channel 0 at 500 kHz. 32 bit samples. Samples are DMA'd into SARAM2 (0x18000 base address, length 0x8000)
1x serial ADC on McBSP1 serviced by DMA channel 1 at 500 kHz. 32 bit samples. Samples are DMA'd into SARAM3 (0x20000 base address, length 0x7f80)
TMS320C6455 hanging off of the HPI interface pulling data out of the 5510. (Samples, demod data etc.) This data is placed into DARAM2, all by itself at 0x2000, length 0x2000.

The Problem :

Everything worked really smoothly when only the 5510 was in the system. And when we hooked up the 6455, there weren't any serious problems at first. However, when we started to stream a solid amount of data from the 5510 to the 6455, we run out of time. So I have some specific questions.

What is the maximum theoretical throughput/bandwidth of the DMA engine itself? (i.e. from DARAM to DARAM etc etc)
How many DMA events can be serviced simultaneously? The documentation isn't very clear on this, and I'm beginning to believe the answer is "1".
The DMA documentation mentions memory blocks extensively. It mentions how the CPU will hold off the DMA engine if the CPU is accessing a memory block. I want to clear up exactly what constitutes a block. Is it, for example, an 8k DARAM page block? Or is "The DARAM" a block" i.e. can I judiciously chose and allocate sections of DARAM so that CPU accesses to other blocks of DARAM won't conflict with DMAs into my area of interest? (This is what I thought it meant, and how I have it linked and coded.)
Do DMA transactions from higher priority DMA channels interrupt transactions from lower priority DMA channels? Or do they wait for completion of the lower priority event? Ex : So if my McBSP on DMA Channel 0 is set to PRIO_HI, and my HPI is lower priority, will my sample from my McBSP come in if the HPI is in the middle of a "long" transaction?

I have a theory about what's going on, and maybe someone can enlighten me. We have constant streaming samples over McBSP coming in at 500kHz. Then we buffer those up, do some computations, demods, etc. And then we try to burst out the result over HPI. So the result is several hundred soft decisions in int16_t. So an HPI access comes along to read those out, but the transfer takes longer than the McBSP sample to sample time. So there is some contention. I'm wondering if the HPI gets prempted and has to restart, or gets ARDY'd and waits. Anyone know how this would behave?

Thanks!

over 14 years ago

0 TommyG over 14 years ago

TI__Genius 13045 points

FF,

I can answer some of your questions. Someone else will have to pickup the rest:

1. The section on Latency in the DMA Peripheral Reference Guide (http://www-s.ti.com/sc/techlit/spru587), has a pretty good explanation of throughput. Best case on internal memory transfers appears to be one CPU clock cycle to read and one CPU clock cycle to write. I'd like somone to confirm this.

2. If you look at Fig 1 in DMA PRG, you can see a block diagram of the DMA system. The contraint on simultaneous transfers is that only one transfer at a time can be done per port. There are 5 ports on the DMA controller, so you can have at most 2 simultaneous transfers happening at any one time.

3. The memory blocks refer to the SARAM and DARAM 8K word blocks. You can think of each 8K word block as being a separate memory module of type SARAM or DARAM, with a separate bus interface going to it. So your comment about how you allocate code/data to the individual memory blocks can help your through put is correct.

4. In the DMA controller a channel transaction is completed before passing control to the next DMA channel. Section 5 has a pretty explanation on how the channel transactions work.

I'm not sure about the HPI question. Someone else want to jump in?

0 FastFourier over 14 years ago in reply to TommyG

Expert 1915 points

Thanks for the details. It's good to know I can have 2 transfers happening at once. And also regarding the architecture of DARAM/SARAM.

Just to clarify regarding #4, as I understand what you just said... If I had a long low-priority transaction occurring, and a high priority DMA SYNC event occurred, the high priority event would wait for completion of the low priority transaction to finish before the high priority transaction began? i.e. no preemption?

Thanks for the quick reply Tommy.

0 TommyG over 14 years ago in reply to FastFourier

TI__Genius 13045 points

FF,

By data transaction, I meant one Read Source - Write Destination operation and not an entire buffer transfer. So a high priority transfer should only have to wait for one data transfer to complete before being given access to the DMA controller.

Regards.

Processors

Processors forum

5510 DMA engine throughput in the presence of HPI