We're getting fairly far along in our development, and starting to see throughput problems. I've looked through tons of documentation, but wasn't able to resolve some questions. So here I am.
Our scenario :
- 1x serial ADC on McBSP0 serviced by DMA channel 0 at 500 kHz. 32 bit samples. Samples are DMA'd into SARAM2 (0x18000 base address, length 0x8000)
- 1x serial ADC on McBSP1 serviced by DMA channel 1 at 500 kHz. 32 bit samples. Samples are DMA'd into SARAM3 (0x20000 base address, length 0x7f80)
- TMS320C6455 hanging off of the HPI interface pulling data out of the 5510. (Samples, demod data etc.) This data is placed into DARAM2, all by itself at 0x2000, length 0x2000.
The Problem :
Everything worked really smoothly when only the 5510 was in the system. And when we hooked up the 6455, there weren't any serious problems at first. However, when we started to stream a solid amount of data from the 5510 to the 6455, we run out of time. So I have some specific questions.
- What is the maximum theoretical throughput/bandwidth of the DMA engine itself? (i.e. from DARAM to DARAM etc etc)
- How many DMA events can be serviced simultaneously? The documentation isn't very clear on this, and I'm beginning to believe the answer is "1".
- The DMA documentation mentions memory blocks extensively. It mentions how the CPU will hold off the DMA engine if the CPU is accessing a memory block. I want to clear up exactly what constitutes a block. Is it, for example, an 8k DARAM page block? Or is "The DARAM" a block" i.e. can I judiciously chose and allocate sections of DARAM so that CPU accesses to other blocks of DARAM won't conflict with DMAs into my area of interest? (This is what I thought it meant, and how I have it linked and coded.)
- Do DMA transactions from higher priority DMA channels interrupt transactions from lower priority DMA channels? Or do they wait for completion of the lower priority event? Ex : So if my McBSP on DMA Channel 0 is set to PRIO_HI, and my HPI is lower priority, will my sample from my McBSP come in if the HPI is in the middle of a "long" transaction?
I have a theory about what's going on, and maybe someone can enlighten me. We have constant streaming samples over McBSP coming in at 500kHz. Then we buffer those up, do some computations, demods, etc. And then we try to burst out the result over HPI. So the result is several hundred soft decisions in int16_t. So an HPI access comes along to read those out, but the transfer takes longer than the McBSP sample to sample time. So there is some contention. I'm wondering if the HPI gets prempted and has to restart, or gets ARDY'd and waits. Anyone know how this would behave?
Thanks!