DSP DMA on OMAP-L137

Flamingo

Other Parts Discussed in Thread: OMAP-L137, DA8XX

I'm using MV Linux on the ARM9 and DSP/BIOS on the DSP side of the OMAP-L137.

I need to DMA from the SPI1 to memory on the DSP side. (MV Linux will not be able to keep up with the real-time control requirements.)

The number of 16-bit transfers is variable from one transfer to the next, and the number of transfers will usually be greater than 65535 (up to 250000). The data is not really in frames, but I can lie about that just to get the number of transfers large enough to get all the data. The number could be a prime number (just to make things fun). Because of the prime number thing and the large number of transfers, there may be no values of ACNT and BCNT that, multiplied together, equal the total. I figure I'll need to chain PaRAM sets to get those remaining values transferred.

I think this means that I'll need 2 PaRAM sets: one for the bulk of the data and one for the remainder, if any.

I need to assure that the ARM9 and the DSP do not try to use the same DMA channels and PaRAM sets. I noticed in the literature that I should be using the Shadow region for multi-core processors. For the OMAP-L137, the ARM9 seems to be in control. Does this mean that the ARM9 must setup the Shadow regions? If the DSP does it, how does the ARM9 know to avoid the same PaRAM sets, etc?

over 16 years ago

0 Brad Griffis over 16 years ago

TI__Guru*** 125430 points

Flamingo said:

I'm using MV Linux on the ARM9 and DSP/BIOS on the DSP side of the OMAP-L137.

I need to DMA from the SPI1 to memory on the DSP side. (MV Linux will not be able to keep up with the real-time control requirements.)

The number of 16-bit transfers is variable from one transfer to the next, and the number of transfers will usually be greater than 65535 (up to 250000). The data is not really in frames, but I can lie about that just to get the number of transfers large enough to get all the data. The number could be a prime number (just to make things fun). Because of the prime number thing and the large number of transfers, there may be no values of ACNT and BCNT that, multiplied together, equal the total. I figure I'll need to chain PaRAM sets to get those remaining values transferred.

I think this means that I'll need 2 PaRAM sets: one for the bulk of the data and one for the remainder, if any.

I've not tested this out, but I think you can do it using just one parameter set. Normally one would configure BCNTRLD to be identical to BCNT. As the transfers are occurring (using A-synchronized transfers) the BCNT field gets decremented after each transfer of ACNT bytes (i.e. each sync event). Once BCNT=0 it gets reloaded with the value in BCNTRLD. So let's say you want to transfer 200,002 bytes, i.e. 100,001 elements of 16 bits each. We might use the following parameters:

ACNT=2 (16-bit data)
BCNT = 1
BCNTRLD = 25000
CCNT = 5

So in the above scenario the first data transfer would move 1 element and BCNT would decrement to 0. This would cause CCNT to be decremented and BCNT reloaded with BCNTRLD. Now we will do 4 more "frames" of 25000 elements for a total of 100,001 elements total.

Flamingo said:

I need to assure that the ARM9 and the DSP do not try to use the same DMA channels and PaRAM sets. I noticed in the literature that I should be using the Shadow region for multi-core processors. For the OMAP-L137, the ARM9 seems to be in control. Does this mean that the ARM9 must setup the Shadow regions? If the DSP does it, how does the ARM9 know to avoid the same PaRAM sets, etc?

The configuration file for the Framework Components allows you to specify which channels are owned by the DSP. More details are here:

http://wiki.davincidsp.com/index.php?title=DMA_Framework_Components#Configuring_DMAN3_Using_RTSC_Tooling

Keep in mind this has no effect on the ARM. You need to make sure when using EDMA from the ARM side that you don't use any of the DSP's channels. Put another way, that configuration only effects the DSP so it is the programmer's responsibility to keep things consistent between ARM and DSP.

Brad

0 Flamingo over 16 years ago in reply to Brad Griffis

Expert 2290 points

Thanks. I'll be using this info combined with additional info from my Linux guru (quoted below). For background, let me tell you that MV had told me that the DMA assignments are dynamic and that I would need to use "cat /proc/dma" to identify the assignments at runtime. The following quote suggests that isn't true and may be useful to other people using DMAs on both sides of the OMAP-L137. Together, these constitute the "answer".

Linux guru says: "There is no "/proc/dma" on this system. Exists on my Linux PC, though. Seems that it's an ISA interface legacy thing. (remember "ISA" from very old PCs?)

The TI driver datasheet does look to be current re: DMA usage:

2.13.3 Resource Ownership
The following EDMA Channels and TCCs sets are owned by ARM:
0-7, 10, 11, 14-23, 28, 29
The following QDMA channels are owned by ARM:
4-7
The following PaRaM sets are owned by ARM:
0-7, 10, 11, 14-23, 28, 29, 56-127

I looked at the definitions at the top of this file: arch/arm/mach-da8xx/dma.c in the Linux source tree and DA8XX_EDMA_ARM_OWN defined here: include/asm/arch/edma.h

The bit pattern in DA8XX_EDMA_ARM_OWN matches what the above documentation says that the ARM owns for EDMA channels. Same for QDMA channels that are "owned" via the da8xx_qdma_channels_arm definition in dma.c. Didn't check "param sets".

I can also see DMA init code here:

arch/arm/plat-davinci/dma.c

that is using the values set up in arch/arm/mach-da8xx/dma.c.

So, as near as I can tell, the ARM handles DMA init + owns a bunch of DMA channels and then presumably the DSP can use the rest."

0 Brad Griffis over 16 years ago in reply to Flamingo

TI__Guru*** 125430 points

Thanks for sharing that info with the community. As you can tell I spend a lot more time on the DSP side of things so I appreciate the details from the ARM side!

0 Flamingo over 16 years ago in reply to Brad Griffis

Expert 2290 points

Brad Griffis said:

I've not tested this out, but I think you can do it using just one parameter set. Normally one would configure BCNTRLD to be identical to BCNT. As the transfers are occurring (using A-synchronized transfers) the BCNT field gets decremented after each transfer of ACNT bytes (i.e. each sync event). Once BCNT=0 it gets reloaded with the value in BCNTRLD. So let's say you want to transfer 200,002 bytes, i.e. 100,001 elements of 16 bits each. We might use the following parameters:

ACNT=2 (16-bit data)

BCNT = 1

BCNTRLD = 25000

CCNT = 5

So in the above scenario the first data transfer would move 1 element and BCNT would decrement to 0. This would cause CCNT to be decremented and BCNT reloaded with BCNTRLD. Now we will do 4 more "frames" of 25000 elements for a total of 100,001 elements total.

The time has come for me to test the dma. My plan is to test the dma setup using a memory to memory transfer, even though I need to convert it to a spi to memory transfer eventually. I remembered this post and have started setting up a test buffer. Of course, I'd like to test the non-multiple example that you discussed above.

I'm not sure how to set DSTCIDX. Normally I would set it to the reload value * the data width, but for the first increment, that would cause a huge hole in the destination data. If you know of a way to do it, please let me know. Otherwise, I think I'm back to either using 2 parameter sets (or forcing all requests to a multiple of some value.)

0 Brad Griffis over 16 years ago in reply to Flamingo

TI__Guru*** 125430 points

DSTCIDX is set differently depending on whether you are doing an A-synchronized or AB-synchronized transfer. In this case we're doing A-synchronized. The terms for the various dimensions are arrays, frames, and blocks. In the example above our array size is 2 bytes. Each frame (except the first) will be composed of 25000 arrays. The DSTCIDX for A-synchronized transfers is the distance from the last array in frame "n-1" to the first array in frame "n". So if the data is contiguous it would simply be ACNT, i.e. 2 in this case.

0 Flamingo over 16 years ago in reply to Brad Griffis

Expert 2290 points

I've been experimenting with various sizes of arrays. Once I got to the case where CCNT was greater than 1, my codec will not successfully load (unless I disable the dma test that I'm doing. I've verified that the acnt, bcnt, ccnt, and bcntrld values match those in your example. I'm assuming that one of the indices is wrong. Below you see (in the first three lines) the calculated values. Below that, you see the PaRAM values (except for CCNT, which I've forced to 1 because I can't run when it is anything else.)

[DSP] @0x000021dd:[T:0x00000000] DSP_server - acnt = 2, bcnt = 1, ccnt = 5
[DSP] @0x000021f6:[T:0x00000000] DSP_server - srcbidx = 2, dstbidx = 2, srccidx = 2, dstcidx = 2
[DSP] @0x00002218:[T:0x00000000] DSP_server - bcntrld = 25000
[DSP] @0x00002231:[T:0x00000000] DSP_server - Param 0 = 0x81108004
[DSP] @0x0000224a:[T:0x00000000] DSP_server - Param 1 = 0xc23f6118
[DSP] @0x00002262:[T:0x00000000] DSP_server - Param 2 = 0x10002
[DSP] @0x00002279:[T:0x00000000] DSP_server - Param 3 = 0xc2470238
[DSP] @0x00002292:[T:0x00000000] DSP_server - Param 4 = 0x20002
[DSP] @0x000022a8:[T:0x00000000] DSP_server - Param 5 = 0x61a8ffff
[DSP] @0x000022c1:[T:0x00000000] DSP_server - Param 6 = 0x20002
[DSP] @0x000022d8:[T:0x00000000] DSP_server - Param 7 = 0x1
[DSP] @0x000022ec:[T:0x00000000] DSP_server - SECR 0x0
[DSP] @0x00002300:[T:0x00000000] DSP_server - IPR Before 0x0
[DSP] @0x00002317:[T:0x00000000] DSP_server - EER 0x100
[DSP] @0x0000232d:[T:0x00000000] DSP_server - IPR After 0x100

Is SRCCIDX correct? Something else wrong? I've run 10, 100, and 1000 with ccnt = 1, and they all work. (Output similar to above for the 1000 case follows.)

[DSP] @0x00002101:[T:0x00000000] DSP_server - spi_dma_setup> bcnt = 1000
[DSP] @0x00002128:[T:0x00000000] DSP_server - acnt = 2, bcnt = 1000, ccnt = 1
[DSP] @0x00002143:[T:0x00000000] DSP_server - srcbidx = 2, dstbidx = 2, srccidx = 2, dstcidx = 0
[DSP] @0x00002165:[T:0x00000000] DSP_server - bcntrld = 0
[DSP] @0x0000217c:[T:0x00000000] DSP_server - Param 0 = 0x81108004
[DSP] @0x00002195:[T:0x00000000] DSP_server - Param 1 = 0xc23f6118
[DSP] @0x000021ad:[T:0x00000000] DSP_server - Param 2 = 0x3e80002
[DSP] @0x000021c5:[T:0x00000000] DSP_server - Param 3 = 0xc2470238
[DSP] @0x000021de:[T:0x00000000] DSP_server - Param 4 = 0x20002
[DSP] @0x000021f5:[T:0x00000000] DSP_server - Param 5 = 0xffff
[DSP] @0x0000220b:[T:0x00000000] DSP_server - Param 6 = 0x2
[DSP] @0x0000221f:[T:0x00000000] DSP_server - Param 7 = 0x1
[DSP] @0x00002234:[T:0x00000000] DSP_server - SECR 0x0
[DSP] @0x00002247:[T:0x00000000] DSP_server - IPR Before 0x0
[DSP] @0x00002260:[T:0x00000000] DSP_server - EER 0x100
[DSP] @0x00002277:[T:0x00000000] DSP_server - IPR After 0x100
[DSP] @0x00002290:[T:0x00000000] DSP_server - DstBuf[0] = 44461
[DSP] @0x000022a8:[T:0x00000000] DSP_server - DstBuf[100] = 44461
[DSP] @0x000022c0:[T:0x00000000] DSP_server - DstBuf[200] = 44461
[DSP] @0x000022d8:[T:0x00000000] DSP_server - DstBuf[300] = 44461
[DSP] @0x000022f0:[T:0x00000000] DSP_server - DstBuf[400] = 44461
[DSP] @0x00002309:[T:0x00000000] DSP_server - DstBuf[500] = 44461
[DSP] @0x00002321:[T:0x00000000] DSP_server - DstBuf[600] = 44461
[DSP] @0x00002339:[T:0x00000000] DSP_server - DstBuf[700] = 44461
[DSP] @0x00002351:[T:0x00000000] DSP_server - DstBuf[800] = 44461
[DSP] @0x00002369:[T:0x00000000] DSP_server - DstBuf[900] = 44461

Any suggestions appreciated. The intent as I mentioned in other posts (or earlier in this thread) is to glue this to the SPI, which will be attached to an A2D. I will "start" the A2D and the DMA together and then wait for the transfer complete interrupt. This particular debug effort is just to assure that I set all the PaRAM fields correctly for the number of readings I'm expecting.

0 Brad Griffis over 16 years ago in reply to Flamingo

TI__Guru*** 125430 points

What codec are you talking about that doesn't load? Is it something unrelated in the system and this test is interfering or is this test tied into the codec?

Are you transferring to the SPI peripheral or are you just doing memory-to-memory copies to start?

0 Flamingo over 16 years ago in reply to Brad Griffis

Expert 2290 points

Sorry, I should have said "server", not "codec". The "server" probably loads, but then becomes non-responsive. (The failure is 20 retries associated with some message queue or something. I'm hesitant to run it again to get the details because it takes quite a while to reboot and set up the test code again.)

The server code includes the my custom codec library, server main, and two background tasks. One of the background tasks will be driving the SPI/DMA and then feeding the other background task. The codec is simply an interface to completed buffers of data.

My testedma code is parked in main. It is not in the codec and not in the background tasks (yet). My codec test code runs fine up until I set CCNT to 2 or more.)

Initially, I'm running a straight memory to memory DMA. The goal now is to determine that I've got all the count and index fields correct before I glue on the non-incrementing source (the SPI), events driven by received data, the ISR, etc.

0 Brad Griffis over 16 years ago in reply to Flamingo

TI__Guru*** 125430 points

You have OPT.SYNCDIM=ABSYNC. Your code is not going to work with AB-sync for a couple reasons. The indexing will be wrong is one reason as we discussed earlier. The other issue is that the bcntrld field will never be utilized because BCNT isn't decremented in the Parameter RAM during AB-sync transfers. So your transfer is going to do a whole lot less than you expected...

Brad

0 Flamingo over 16 years ago in reply to Brad Griffis

Expert 2290 points

I have two test cases:

1. memory to memory DMA

2. SPI to memory DMA

Although my end goal is to get case 2 working, I'm trying to accomplish as much debug as possible using case 1. This is because I have no useful SPI data yet and because I'll lose my console connection (that I'm using for debug) when I change the PINMUX to SPI1.

Eventually, I might want case 1 working for other reasons as well.

The 100001 2-byte transfer that you described above is a good example of the types of transfers that I'll be doing, so let's limit our discussion to that case.

Case 1 is the current test case. The event is manually triggered by a write to ESR. If I have only one trigger event and I want to transfer all 200002 bytes with that one event, would I have to do an AB-synchronized transfer? And for an AB-synchronized transfer, CCNT can be used, but BCNT is not updated. Therefore, the transfer must be a full multiple of ACNT*BCNT*CCNT. Do I understand that correctly?

Then in Case 2, I will get an event for each 2-byte transfer, so the A-synchronized transfer will transfer 2 bytes for each event, downcount BCNT each time, reloading it CCNT-1 times from BCNTRLD each time BCNT decrements to 0?

0 Brad Griffis over 16 years ago in reply to Flamingo

TI__Guru*** 125430 points

My "trick" of using different values for bcnt and bcntrld to do transfers of any size is only applicable to A-sync transfers. It cannot work with AB-sync because BCNT never decrements and hence BCNT is never loaded with BCNTRLD. In other words, BCNTRLD is a "dont care" for AB-sync transfers.

So if you want this to work you MUST switch to A-sync. That means you will need BCNT*CCNT sync events to transfer all your data. (When using AB-sync in a "normal" fashion one would need CCNT sync events to transfer all the data.)

0 Flamingo over 16 years ago in reply to Brad Griffis

Expert 2290 points

Am I correct in surmising then that for case 1, if I want to transfer with a single event, I need to have the #bytes divided between ACNT and BCNT. That is, I should have CCNT set to 1 because I intend no additional manual events?

0 Brad Griffis over 16 years ago in reply to Flamingo

TI__Guru*** 125430 points

Yes. There's a better way to do this with one event where you could have CCNT>1. If you set OPT.ITCCHEN=1 and OPT.TCC=<itself> then that will cause the transfer to re-trigger itself after every "intermediate" transfer. In this case one event would cause the entire thing to run to completion.

0 Flamingo over 16 years ago in reply to Brad Griffis

Expert 2290 points

(Sorry this is dragging on so long, but I'm learning lots. Does it help that I've just finished describing you as "a saint" to TI upper management?)

It sounds like I could use the setup below, but with ITCCHEN = 1 and CCNT = 5 and get 100001 16-bit elements transferred, but when I try that, it gives me the message queue retry failure I had before (with CCNT > 1).

PaRAM set =

0x81908000, 0xc241f868, 0x00010002, 0xc2499988,

0x00020002, 0x61a8ffff, 0x00020002, 0x00000005.

0 Brad Griffis over 16 years ago in reply to Flamingo

TI__Guru*** 125430 points

Flamingo said:

(Sorry this is dragging on so long, but I'm learning lots. Does it help that I've just finished describing you as "a saint" to TI upper management?)

Yes, as a matter of fact it does! [:H]

Flamingo said:

It sounds like I could use the setup below, but with ITCCHEN = 1 and CCNT = 5 and get 100001 16-bit elements transferred, but when I try that, it gives me the message queue retry failure I had before (with CCNT > 1).

PaRAM set =

0x81908000, 0xc241f868, 0x00010002, 0xc2499988,

0x00020002, 0x61a8ffff, 0x00020002, 0x00000005.

I'm not sure what that error is. Maybe you're stepping on an EDMA channel being used by message queue? Can you try out this experiment in a stand-alone project so that we can decouple the EDMA stuff from other stuff in the system? You should be able to just connect directly to the DSP with your emulator and load whatever program you want.

0 Flamingo over 16 years ago in reply to Brad Griffis

Expert 2290 points

I can connect, but I can't load any project and run. (Let's not go there: I don't want to whine.)

Nevertheless, I've been assured that Linux does not use DMA 8. Therefore, the only likely culprit is Codec Engine. When I examine the contents of PaRAM set 8 in one of my background tasks, it looks unused. What you said suggested another course of action: avoid using PaRAM set 8 for a while (in the hopes that its use of PaRAM set 8 will stop). I tried that, and the dma completed.

This is good because it lets me know that the setup is good. It also leads me to believe that I'll have to petition Linux for DMA resources (which I had hoped to avoid: my Linux guru says that I'll not be able to use the existing interfaces and we'll have to write our own driver to do that).

Thanks for the idea.

0 Gunjan over 16 years ago in reply to Flamingo

TI__Expert 7415 points

Do you know if codec engine has been configured to instantiate codecs etc in your setup ?! (If you have a server config file, i could look at it and try and determine that).

0 Flamingo over 16 years ago in reply to Gunjan

Expert 2290 points

I'm not sure if you want to see the .cfg or the .tcf, so I'll copy both of them below:

.cfg (basically the same as the original, although I increased stacks):

/*
* Copyright (c) 2009, Texas Instruments Incorporated
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
*
* * Redistributions of source code must retain the above copyright
*    notice, this list of conditions and the following disclaimer.
*
* * Redistributions in binary form must reproduce the above copyright
*    notice, this list of conditions and the following disclaimer in the
*    documentation and/or other materials provided with the distribution.
*
* * Neither the name of Texas Instruments Incorporated nor the names of
*    its contributors may be used to endorse or promote products derived
*    from this software without specific prior written permission.
*
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
* AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO,
* THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
* PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
* CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
* EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
* PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS;
* OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
* WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR
* OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE,
* EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*
*/
/*
* ======== server.cfg ========
*
* For details about the packages and configuration parameters used throughout
* this config script, see the Codec Engine Configuration Guide (link
* provided in the release notes).
*/

/* add checking */
xdc.useModule('ti.sdo.ce.Settings').checked = true;

/*
* Configure CE's OSAL. This codec server only builds for the BIOS-side of
* a heterogeneous system, so use the "DSPLINK_BIOS" configuration.
*/
var osalGlobal = xdc.useModule('ti.sdo.ce.osal.Global');
osalGlobal.runtimeEnv = osalGlobal.DSPLINK_BIOS;

/* configure default memory seg id to BIOS-defined "DDR2" */
osalGlobal.defaultMemSegId = "DDR2";

/* activate BIOS logging module */
var LogServer = xdc.useModule('ti.sdo.ce.bioslog.LogServer');

/* configure power management */
var cfgArgs = Program.build.cfgArgs;
if ((cfgArgs != undefined) && (cfgArgs.usePowerManagement != undefined)) {
    var biosIpc = xdc.useModule('ti.sdo.ce.ipc.bios.Ipc');
    biosIpc.usePowerManagement = cfgArgs.usePowerManagement;
}

/*
* ======== Server Configuration ========
*/
var Server = xdc.useModule('ti.sdo.ce.Server');

/* The server's stackSize. More than we need... but safe. */
Server.threadAttrs.stackSize = 8196;

/* The servers execution priority */
Server.threadAttrs.priority = Server.MINPRI;

/*
* The optional stack pad to add to non-configured stacks. This is well
* beyond most codec needs, but follows the approach of "start big and
* safe, then optimize when things are working."
*/
Server.stackSizePad = 9000;

utils.importFile("codec.cfg");

/*
* Note that we presume this server runs on a system with DSKT2 and DMAN3,
* so we configure those modules here.
*/

/*
* ======== DSKT2 (xDAIS Alg. memory allocation) configuration ========
*
* DSKT2 is the memory manager for all algorithms running in the system,
* granting them persistent and temporary ("scratch") internal and external
* memory. We configure it here to define its memory allocation policy.
*
* DSKT2 settings are critical for algorithm performance.
*
* First we assign various types of algorithm internal memory (DARAM0..2,
* SARAM0..2,IPROG, which are all the same on a C64+ DSP) to "L1DHEAP"
* defined in the .tcf file as an internal memory heap. (For instance, if
* an algorithm asks for 5K of DARAM1 memory, DSKT2 will allocate 5K from
* L1DHEAP, if available, and give it to the algorithm; if the 5K is not
* available in the L1DHEAP, that algorithm's creation will fail.)
*
* The remaining segments we point to the "DDRALGHEAP" external memory segment
* (also defined in the.tcf) except for DSKT2_HEAP which stores DSKT2's
* internal dynamically allocated objects, which must be preserved even if
* no codec instances are running, so we place them in "DDR2" memory segment
* with the rest of system code and static data.
*/
var DSKT2 = xdc.useModule('ti.sdo.fc.dskt2.DSKT2');
DSKT2.DARAM0     = "L1DHEAP";
DSKT2.DARAM1     = "L1DHEAP";
DSKT2.DARAM2     = "L1DHEAP";
DSKT2.SARAM0     = "L1DHEAP";
DSKT2.SARAM1     = "L1DHEAP";
DSKT2.SARAM2     = "L1DHEAP";
DSKT2.ESDATA     = "DDRALGHEAP";
DSKT2.IPROG      = "L1DHEAP";
DSKT2.EPROG      = "DDRALGHEAP";
DSKT2.DSKT2_HEAP = "DDR2";

/*
* Next we define how to fulfill algorithms' requests for fast ("scratch")
* internal memory allocation; "scratch" is an area an algorithm writes to
* while it processes a frame of data and
*
* First we turn off the switch that allows the DSKT2 algorithm memory manager
* to give to an algorithm external memory for scratch if the system has run
* out of internal memory. In that case, if an algorithm fails to get its
* requested scratch memory, it will fail at creation rather than proceed to
* run at poor performance. (If your algorithms fail to create, you may try
* changing this value to "true" just to get it running and optimize other
* scratch settings later.)
*
* Next we set "algorithm scratch sizes", a scheme we use to minimize internal
* memory resources for algorithms' scratch memory allocation. Algorithms that
* belong to the same "scratch group ID" -- field "groupId" in the algorithm's
* Server.algs entry above, reflecting the priority of the task running the
* algorithm -- don't run at the same time and thus can share the same
* scratch area. When creating the first algorithm in a given "scratch group"
* (between 0 and 19), a shared scratch area for that groupId is created with
* a size equal to SARAM_SCRATCH_SIZES[<alg's groupId>] below -- unless the
* algorithm requests more than that number, in which case the size will be
* what the algorithm asks for. So SARAM_SCRATCH_SIZES[<alg's groupId>] size is
* more of a groupId size guideline -- if the algorithm needs more it will get
* it, but getting these size guidelines right is important for optimal use of
* internal memory. The reason for this is that if an algorithm comes along
* that needs more scratch memory than its groupId scratch area's size, it
* will get that memory allocated separately, without sharing.
*
* This DSKT2.SARAM_SCRATCH_SIZES[<groupId>] does not mean it is a scratch size
* that will be automatically allocated for the group <groupId> at system
* startup, but only that is a preferred minimum scratch size to use for the
* first algorithm that gets created in the <groupId> group, if any.
*
* (An example: if algorithms A and B with the same groupId = 0 require 10K and
* 20K of scratch, and if SARAM_SCRATCH_SIZES[0] is 0, if A gets created first
* DSKT2 allocates a shared scratch area for group 0 of size 10K, as A needs.
* If then B gets to be created, the 20K scratch area it gets will not be
* shared with A's -- or anyone else's; the total internal memory use will be
* 30K. By contrast, if B gets created first, a 20K shared scratch will be
* allocated, and when A comes along, it will get its 10K from the existing
* group 0's 20K area. To eliminate such surprises, we set
* SARAM_SCRATCH_SIZES[0] to 20K and always spend exactly 20K on A and B's
* shared needs -- independent of their creation order. Not only do we save 10K
* of precious internal memory, but we avoid the possibility that B can't be
* created because less than 20K was available in the DSKT2 internal heaps.)
*
* In our example below, we set the size of groupId 0 to 32K -- as an example,
* even though our codecs don't use it.
*
* Finally, note that if the codecs correctly implement the
* ti.sdo.ce.ICodec.getDaramScratchSize() and .getSaramScratchSize() methods,
* this scratch size configuration can be autogenerated by
* configuring Server.autoGenScratchSizeArrays = true.
*/
DSKT2.ALLOW_EXTERNAL_SCRATCH = false;
DSKT2.SARAM_SCRATCH_SIZES[0] = 32*1024;

/*
* ======== DMAN3 (DMA manager) configuration ========
*/
var DMAN3 = xdc.useModule('ti.sdo.fc.dman3.DMAN3');

/* First we configure how DMAN3 handles memory allocations:
*
* Essentially the configuration below should work for most codec combinations.
* If it doesn't work for yours -- meaning an algorithm fails to create due
* to insufficient internal memory -- try the alternative (commented out
* line that assigns "DDRALGHEAP" to DMAN3.heapInternal).
*
* What follows is an FYI -- an explanation for what the alternative would do:
*
* When we use an external memory segment (DDRALGHEAP) for DMAN3 internal
* segment, we force algorithms to use external memory for what they think is
* internal memory -- we do this in a memory-constrained environment
* where all internal memory is used by cache and/or algorithm scratch
* memory, pessimistically assuming that if DMAN3 uses any internal memory,
* other components (algorithms) will not get the internal memory they need.
*
* This setting would affect performance very lightly.
*
* By setting DMAN3.heapInternal = <external-heap> DMAN3 *may not* supply
* ACPY3_PROTOCOL IDMA3 channels the protocol required internal memory for
* IDMA3 channel 'env' memory. To deal with this catch-22 situation we
* configure DMAN3 with hook-functions to obtain internal-scratch memory
* from the shared scratch pool for the associated algorithm's
* scratch-group (i.e. it first tries to get the internal scratch memory
* from DSKT2 shared allocation pool, hoping there is enough extra memory
* in the shared pool, if that doesn't work it will try persistent
* allocation from DMAN3.internalHeap).
*/
DMAN3.heapInternal    = "L1DHEAP";       /* L1DHEAP is an internal segment */
// DMAN3.heapInternal = "DDRALGHEAP";    /* DDRALGHEAP is an external segment */
DMAN3.heapExternal    = "DDRALGHEAP";
DMAN3.idma3Internal   = false;
DMAN3.scratchAllocFxn = "DSKT2_allocScratch";
DMAN3.scratchFreeFxn = "DSKT2_freeScratch";

/* Next, we configure all the physical resources that DMAN3 is granted
* exclusively. These settings are optimized for the DSP on DM6446 (DaVinci).
*
* We assume PaRams 0..79 are taken by the Arm drivers, so we reserve
* all the rest, up to 127 (there are 128 PaRam sets on DM6446).
* DMAN3 takes TCC's 32 through 63 (hence the High TCC mask is 0xFFFFFFFF
* and the Low TCC mask is 0). Of the 48 PaRams we reserved, we assign
* all of them to scratch group 0; similarly, of the 32 TCCs we reserved,
* we assign all of them to scratch group 0.
*
* If we had more scratch groups with algorithms that require EDMA, we would
* split those 48 PaRams and 32 TCCs appropriately. For example, if we had
* a video encoder alg. in group 0 and video decoder alg. in group 1, and they
* both needed a number of EDMA channels, we could assing 24 PaRams and 16
* TCCs to Groups [0] and [1] each. (Assuming both algorithms needed no more
* than 24 channels to run properly.)
*/
DMAN3.paRamBaseIndex     = 80; // 1st EDMA3 PaRAM set available for DMAN3
DMAN3.numQdmaChannels    = 8;   // number of device's QDMA channels to use
DMAN3.qdmaChannels       = [0,1,2,3,4,5,6,7]; // choice of QDMA channels to use
DMAN3.numPaRamEntries    = 48; // number of PaRAM sets exclusively used by DMAN
DMAN3.numPaRamGroup[0]   = 48; // number of PaRAM sets for scratch group 0
DMAN3.numTccGroup[0]     = 32; // number of TCCs assigned to scratch group 0
DMAN3.tccAllocationMaskL = 0;   // bit mask indicating which TCCs 0..31 to use
DMAN3.tccAllocationMaskH = 0xffffffff; // assign all TCCs 32..63 for DMAN

/* The remaining DMAN3 configuration settings are as defined in ti.sdo.fc.DMAN3
* defaults. You may need to override them to add more QDMA channels and
* configure per-scratch-group resource sub-allocations.
*/

/*
* ======== RMAN (IRES Resource manager) configuration ========
*/
/* TODO: What all do I set up here from RMANs perspective ?
         Should this be here or in OSAL? */
var RMAN = xdc.useModule('ti.sdo.fc.rman.RMAN');
RMAN.useDSKT2 = true;
RMAN.tableSize = 10;
RMAN.semCreateFxn = "Sem_create";
RMAN.semDeleteFxn = "Sem_delete";
RMAN.semPendFxn = "Sem_pend";
RMAN.semPostFxn = "Sem_post";

/* Mem utils */
var MEMU = xdc.useModule("ti.sdo.fc.memutils.MEMUTILS");

/* The lock/unlock/set/getContext functions will default to DSKT2 */
/*
* @(#) ti.sdo.ce.wizards.genserver; 1, 0, 0,4; 1-19-2009 08:33:05; /db/atree/library/trees/ce/ce-l08x/src/
*/

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

.tcf (about 50% original, but significantly changed for my background tasks, memory map changes, etc.):

0 Brad Griffis over 16 years ago in reply to Flamingo

TI__Guru*** 125430 points

How are you acquiring resources (Parameter RAMs, channels, TCCs), e.g. DMAN3, LLD, direct memory writes? Which specific resources are you using?

0 Flamingo over 16 years ago in reply to Brad Griffis

Expert 2290 points

I'm stealing resources directly. I checked with MontaVista Linux and they claim that they own all DMA resource except a couple: notably channel 8 is not used by them.

I asked TI and they claimed that they weren't using any. (I didn't believe it.)

Anyway, I stole channel 8, TCC 8, PaRAM set 8. That is the only one I'm suing.

(Before I stole it, I looked at the PaRAM set, and it looked like nobody had used it.)

It has been suggested that I get a DMA assigned from Linux and passed to the codec as a parameter. My Linux guru says "Unfortunately you don't just request DMA resources from a user-space application. You'd need to write a "dummy" driver that requests the DMA resources, then returns them to the app via an ioctl() call or something like that."

0 Gunjan over 16 years ago in reply to Flamingo

TI__Expert 7415 points

As per the config file, Codec Engine isnt using Edma channel #8 or PaRam # 8.

It is being assigned Params start from #80 (DMAN3) and it is using QDMA channels (0 - 7). And Tccs (32 - 64).

Unless there are some "weird" codecs in your system that aren't negotiating their resources through ti.sdo.fc modules, Codec Engine should not be touching resource # 8.

Processors

Processors forum

DSP DMA on OMAP-L137