H.264 BP encoder integration on C6678

Oleg Fomenko

Intellectual 895 points

Other Parts Discussed in Thread: SYSBIOS, DEMOVIDEO-MULTICORE

Hi,

I'm trying to use H.264 BP encoder for C66x (version 1.24.00.01) on C6678 evm.

I used this codec on C64+ in the past and now I work on migration to C66x platform

I don't use SYS/BIOS and RTSC in the project

As I understand this library uses ECPY external library for EDMA implementation.

When I tried to link it with my project I saw many undefined symbols from ecpy library. Then I added the following library:

framework_components_3_20_02_29/packages/ti/sdo/fc/ecpy/lib/release/ecpy.ae674

But I still get the following linkage error:

undefined first referenced
symbol in file
--------- ----------------
ECPY_CFG_PARAMS /opt/ti/framework_components_3_20_02_29/packages/ti/sdo/fc/ecpy/lib/release/ecpy.ae674<ecpy_impl.oe674>

First of all I wanted to ask for some documentation about this connection and interface between h264 encoder library and ecpy library and what actions should I do in order to make it work: define and fill structures, initialize edma, etc.

Which ecpy library should I use? In the H.264 encoder sample application (which by the way requires many changes in order to be compiled on linux) I saw ecpy.ae674 usage.

Is it complatible with c66?

Whats the differences in edma initialization that I should do?

Who is responsible for edma initialization in the framework and how can it be done?

As I understand there is another library called RMAN than can manage it. Can I use my own code that manage edma channels instead?

Thanks,

Oleg Fomenko

over 12 years ago

0 Hongmei Gou over 12 years ago in reply to Oleg Fomenko

TI__Expert 4335 points

Hi Oleg,

For C6678 video codecs, RMAN_freeResources() needs to be called with algActivate() and algDeactivate() around it. When doing algActivate(), the EDMA related data structures for the codecs will be copied from DDR to LL2 for the codec to use. On the other hand, algDeactivate() will save the EDMA related data structures in LL2 to DDR so that they can be copied back to LL2 later when algActivate() is called. Without doing algActivate() and algDeactivate() around RMAN_freeResources(), the EDMA related data structures for the codecs can have incorrect values, causing DSP crash as you observed.

As for RMAN_activateAllResources() and RMAN_deactivateAllResources(), internally in the codecs, we are just setting/clearing the flag of resActive. As you can see from MCSDK Video code dsp\siu\osal\bios6\siuFcBios6.c, RMAN_activateAllResources() is called after RMAN_assignResources(). They are executed once after ALG_create() and with algActivate() and algDeactivate() around. Then, encoding/decoding can be called multiple times with algActivate() and algDeactivate() around the process call to maintain the correct EDMA related data structures for the codecs. So for video codecs, there is no need to do RMAN_activateAllResources() and RMAN_deactivateAllResources() for every frame. After the encoding/decoding is completed and the codec instance needs to be deleted, RMAN_freeResources() can be called before ALG_delete(), still with algActivate() and algDeactivate() around. The above calling sequence can be found from dsp\siu\vct\siuVctEncode_xdm0p9.c.

Thanks,

Hongmei

0 Oleg Fomenko over 12 years ago in reply to Hongmei Gou

Intellectual 895 points

Hi Hongmei,

Regarding RMAN_freeResources() crash your explanation solves it.

Regarding the RMAN_deactivateAllResources() failure: as I mentioned earlier, these edma resources are shared between many encoders and decoders, so if I call RMAN_activateAllResources() only once for every algorithm already on initialization of the second algorithm, it will fail since the resources are already in use.

-Anyway, there is no reason for this failure, so I'm afraid something is wrong in my system. Do you have any ideas what can be the reason?

-Do you think I should work without RMAN_activateAllResources() and RMAN_deactivateAllResources() calls at all? (currently this works on my system)

Thanks,

Oleg

0 Hongmei Gou over 12 years ago in reply to Oleg Fomenko

TI__Expert 4335 points

Hi Oleg,

We identified some gaps inside the H264BP encoder about storing the EDMA resource information in codec's instance memory. We will fix that in the next drop of C6678 codecs. For now, please use it without RMAN_activateAllResources() and RMAN_deactivateAllResources() calls.

Thanks,

Hongmei

0 Oleg Fomenko over 12 years ago in reply to Hongmei Gou

Intellectual 895 points

Hi,

Thanks for the answer, it explains everything. By the way, you have the same problem in H.264 Main Profile decoder (I mean deactivation of resources)

I started testing the encoder on different cores and I found a strange problem:

First time after the reset, allocation of edma resources works fine on any core.

If I continue to work on the same core it also works fine (I can assign and releasae edma channels to the encoder and decoder)

But when after that I try to allocate resource on another core, my app crashes and I get the following log from fctraces: (Note that the same core would work fine if I had not used another cores earlier)

[C66xx_1] [t=0x00000008:de8c28f6] ti.sdo.fc.rman: [+E] RMAN_assignResources> Enter (alg=0x80f23a80, resFxns=0x8497d0, scratchGroupId=1)
[t=0x00000008:e90bdaf6] : [+E] DSKT2_allocPersistent> Enter (numRecs=1)
[t=0x00000008:ea9eaa0c] ti.sdo.fc.dskt2: [+E] _DSKT2_init> Enter
[t=0x00000008:ec263a45] ti.sdo.fc.dskt2: [+X] _DSKT2_init> Exit
[t=0x00000008:edb6395c] ti.sdo.fc.dskt2: [+E] _DSKT2_assignInstanceMemory> Enter (scratchId=-1, numRecs=1, extHeap=0x8498f8)
[t=0x00000008:ef72a855] ti.sdo.fc.dskt2: [+E] _DSKT2_allocateInDesignatedSpace> Enter (index=0, ialgSpace=IALG_EXTERNAL, extHeap=0x8498f8)
[t=0x00000008:f13a37e7] ti.sdo.fc.dskt2: [+X] _DSKT2_allocateInDesignatedSpace> Exit (returnVal=1)
[t=0x00000008:f2dba022] ti.sdo.fc.dskt2: [+2] _DSKT2_assignInstanceMemory> memTab[0] allocated in persistent memory in Memory space:IALG_EXTERNAL. Addr=0x849fa0
[t=0x00000008:f4aad104] ti.sdo.fc.dskt2: [+X] _DSKT2_assignInstanceMemory> Exit (returnVal=1)
[t=0x00000008:f64868fc] ti.sdo.fc.dskt2: [+X] DSKT2_allocPersistent> Exit (status=1)
[t=0x00000008:f7e09bb1] ti.sdo.fc.ires.edma3chan: [+E] IRESMAN_EDMA3CHAN_getProtocolName> Enter
[t=0x00000008:f980bdc9] ti.sdo.fc.ires.edma3chan: [+X] IRESMAN_EDMA3CHAN_getProtocolName> Exit (name=ti.sdo.fc.ires.edma3chan)
[t=0x00000008:fb358b63] ti.sdo.fc.ires.edma3chan: [+E] IRESMAN_EDMA3CHAN_getProtocolRevision> Enter
[t=0x00000008:fce02c8d] ti.sdo.fc.ires.edma3chan: [+X] IRESMAN_EDMA3CHAN_getProtocolRevision> Exit (version=(2.0.0))
[t=0x00000008:fe917b52] ti.sdo.fc.rman: [+2] RMAN_assignResources> Call getHandle on the IRESMAN implementation 0x849ac8
[t=0x00000009:004556b8] ti.sdo.fc.ires.edma3chan: [+E] IRESMAN_EDMA3CHAN_getHandles> Enter (protocolArgs=0x801b8c, scratchGroupId=1)
[t=0x00000009:02026692] ti.sdo.fc.edma3: [+E] EDMA3_getResourceManager> Enter (alg=0x80f23a80, scratchGroupId =1)
[t=0x00000009:03aeefca] ti.sdo.fc.edma3: [+2] EDMA3_getResourceManager> Look for matching entry for ALG 0x80f23a80
[t=0x00000009:055a8544] ti.sdo.fc.edma3: [+2] EDMA3_getResourceManager> Match not found, create new entry, get resource handle
[t=0x00000009:0724d11c] ti.sdo.fc.dskt2: [+E] DSKT2_allocPersistent> Enter (numRecs=1)
[t=0x00000009:08e0be77] ti.sdo.fc.dskt2: [+E] _DSKT2_init> Enter
[t=0x00000009:0a7387de] ti.sdo.fc.dskt2: [+X] _DSKT2_init> Exit
[t=0x00000009:0c07458a] ti.sdo.fc.dskt2: [+E] _DSKT2_assignInstanceMemory> Enter (scratchId=-1, numRecs=1, extHeap=0x8498f8)
[t=0x00000009:0dba2ad3] ti.sdo.fc.dskt2: [+E] _DSKT2_allocateInDesignatedSpace> Enter (index=0, ialgSpace=IALG_ESDATA, extHeap=0x8498f8)
[t=0x00000009:0f74fcca] ti.sdo.fc.dskt2: [+X] _DSKT2_allocateInDesignatedSpace> Exit (returnVal=1)
[t=0x00000009:1115c947] ti.sdo.fc.dskt2: [+2] _DSKT2_assignInstanceMemory> memTab[0] allocated in persistent memory in Memory space:IALG_ESDATA. Addr=0x849fe0
[t=0x00000009:12de9da0] ti.sdo.fc.dskt2: [+X] _DSKT2_assignInstanceMemory> Exit (returnVal=1)
[t=0x00000009:1479fcdb] ti.sdo.fc.dskt2: [+X] DSKT2_allocPersistent> Exit (status=1)
[t=0x00000009:1610488b] ti.sdo.fc.edma3: [+2] EDMA3_getResourceManager> Opening new Resource Manager Handle
[t=0x00000009:18996fed] ti.sdo.fc.edma3: [+E] openRMHandle> Enter (scratchId=1, sem=0x90000ad8)
[t=0x00000009:1b318abd] ti.sdo.fc.edma3: [+2] openRMHandle> Populating Scratch Groups with Resources
[t=0x00000009:1cd2a2e1] ti.sdo.fc.edma3: [+E] poulateScratchGroups> Enter (scratchGroupId=1)
[t=0x00000009:1e6df7f1] ti.sdo.fc.edma3: [+2] populateScratchGroup> Allocating 8 tccs 8 edma 0 qdma channels and 42 params
[t=0x00000009:201d676a] ti.sdo.fc.edma3: [+2] populateScratchGroup> Attempting symmetric allocation of EDMA channels and Tccs
[t=0x00000009:21cfae5f] ti.sdo.fc.edma3: [+2] populateScratchGroup> Could not allocate a TCC

[t=0x00000009:24c49c35] ti.sdo.fc.edma3: [+7] populateScratchGroup> Error allocating tcc 1010
[t=0x00000009:265ff86e] ti.sdo.fc.edma3: [+X] populateScratchGroup> Exit (status=0)
[t=0x00000009:27f69269] ti.sdo.fc.edma3: [+7] openRMHandle> Error populating RM scratch groups
[t=0x00000009:299db4c5] ti.sdo.fc.edma3: [+E] closeRMHandle> Enter (handle=0x9000f01c)
[t=0x00000009:2b3796da] ti.sdo.fc.edma3: [+X] closeRMHandle> Exit (status=0x0)
[t=0x00000009:2cd2dbaa] ti.sdo.fc.edma3: [+2] openRMHandle> Deleting System semaphore
[t=0x00000009:2e69263e] ti.sysbios.heaps.HeapMem: ERROR: line 337: assertion failure: A_invalidFree: Invalid free
ti.sysbios.heaps.HeapMem: line 337: assertion failure: A_invalidFree: Invalid free
xdc.runtime.Error.raise: terminating execution

Do you know what could be the problem here?

Thanks,

Oleg

0 Hongmei Gou over 12 years ago in reply to Oleg Fomenko

TI__Expert 4335 points

Hi Oleg,

How many codec instances are you using for each core?

Can you please share your current .cfg file, EDMA region configuration array, and the .c files which have the RMAN calls?

Thanks,

Hongmei

0 Oleg Fomenko over 12 years ago in reply to Hongmei Gou

Intellectual 895 points

Hi Hongmei,

In my test case I used one H.264 BP encoder and one H.264 MP decoder
My .cfg file is attached as well as edma configuration .c file.
The .c file includes all RMAN calls except RMAN_Init() that is called from on initialization stage of my framework (After BIOS_start() call) are also attached.

4062.app.cfg

5861.video_codec_TI_H264_BP_enc.c

3146.video_codec_TI_H264_mp_dec.c

0245.edma_config.c

The two .c files that implement H.264 decoder and encoder wrappers contain several functions that are called from the framework with the following order:

Encoder: query->init->encode_new_frame->do_encoding->do_encoding... ->encode_new_frame->do_encoding->... ->free

Decoder: query->init->decode->decode-> .... ->get_decoded_frame-> decode->decode ... ->get_decoded_frame-> .... ->free

Thanks,

Oleg

0 Hongmei Gou over 12 years ago in reply to Oleg Fomenko

TI__Expert 4335 points

Hi Oleg,

Thanks for the files. From app.cfg and the c files with RMAN calls, it looks like the scratch groups/IDs are incorrectly defined and used. If I understand correctly, you are trying to use one scratch group for each of the eight cores, and the core ID is used as the scratch ID. But this may not be the correct way.

As in the MCSDK Video use case, each C6678 core owns 8 EDMA channels and 42 PaRAM sets, and there is no overlapping between the cores, e.g., core 0 owns EDMA channels 0-7, core 1 owns EDMA channels 8-15, .... This is achieved by defining corresponding EDMA3_PARAMS.regionConfig for individual cores before calling RMAN_init(). With this, the codec instances on a core will be sharing the EDMA resources owned by the core. For example, all the codec instances on core 0 will be sharing EDMA channels 0-7. The sharing among the codec instances is achieved via scratch group with a scratch ID not equal to -1. Assuming no two codec instances will execute codec process call (where the EDMA resources will be used) at the same time, a single scratch group can be used with scratch group ID of 0:

In .cfg:

/* Use scratch group 0 and specify the maximum number of PaRAM sets,
TCCs, EDMA channels, QDMA channels for the group */
META.maxPaRams[0] = 42;
META.maxTccs[0] = 8;
META.maxEdmaChannels[0] = 8;
META.maxQdmaChannels[0] = 0;

RMAN calls in C file:

scratchId = 0;

...

ires_status = RMAN_assignResources((IALG_Handle)codecHandle, (IRES_Fxns*)resFxns, scratchId);

...

ires_status = RMAN_freeResources((IALG_Handle)codecHandle, (IRES_Fxns*)resFxns, scratchId);

For you use case, are the encoder and decoder instances running in parallel? Please provide us the function where you call RMAN_init() if the problem is still not resolved with the above change for scratch groups/ids.

Thanks,

Hongmei

0 Oleg Fomenko over 12 years ago in reply to Hongmei Gou

Intellectual 895 points

Hi,

Unfortunately, this change has not fixed my problem.

In my case, decoder and encoder are not running in parallel if it is run on the same core, but if it is run on different cores, so it can run in parallel.

I attach the code that runs encoder, decoder, cfg file and main.

In main.c the only task that is created is main_surf() function that is implemented in surf_main.c

Thanks,

Oleg

5810.video_codec_TI_H264_BP_enc.c

2068.video_codec_TI_H264_mp_dec.c

1033.app.cfg

0361.main.c

8130.surf_main.c

0 Oleg Fomenko over 12 years ago in reply to Oleg Fomenko

Intellectual 895 points

I forgot to mention that edma_config.c has not changed and it is attached in my previous post

0 Hongmei Gou over 12 years ago in reply to Oleg Fomenko

TI__Expert 4335 points

Hi Oleg,

Thanks for the files. The code looks fine to me.

Does your application use any peripherals which generate EDMA events? If so, the corresponding EDMA channels cannot be used for video codecs. For example, timer 8 is used in MCSDK Video, and therefore we are not using EDMA channels 22 and 23 for video codecs in sv04. Please find the details from siu\osal\bios6\siuFcBios6.c.

As long as the codec instances do not run in parallel on the same core, it is fine to use a single scratch group.

What's the current failure after you changed the scratch group/ID? Is it still the same as before or something else?

Thanks,

Hongmei

0 Oleg Fomenko over 12 years ago in reply to Hongmei Gou

Intellectual 895 points

Hi,

I don't use timers directly, but I use timestamp module of sys/bios and it uses one of the timers, but how timers are related to edma?

Moreover, timers issue does not explain that in my scenario the codecs work fine on a single core (any of 8 cores). The problem exists only when RMAN_assignResources is called from different cores (even not simultaneously, but with a big delay).

In the past this call simply crashed, but today I saw that it changed its behavior and now it does not crash, but the codec is stuck inside some ecpy waiting function which waits for copy completion.

Thanks,

Oleg

0 Gunjan over 12 years ago in reply to Oleg Fomenko

TI__Expert 7415 points

Hi Oleg,

I'm trying to piece together information from all the posts. Please help fill in the gaps for the failing scenario:-

- Creating and assigning resources to each codec, from a different core causes one core to be stuck in ECPY wait.

? What are the core numbers being used to create each codec ?

? What scratch group is being used on each core ?

- You are using EDMA physical instance # 1 for both cores

- Per your cfg, you have 42Params, 8 Edma channels, 8 Tccs assigned to each scratch group from 0 through 7, hw many scratch groups are you actually using ?

EDMA3_PARAMS.regionConfig = &surf_edma_C6678_config[1][core_number_used]

/* EDMA3 INSTANCE# 1 */
{ regionSample1, regionSample2, regionSample3, regionSample4,
regionSample5, regionSample6, regionSample7, regionSample8
},

- Assuming you are using core numbers 0 and 1, you are using the two entries in bold above to get your resources

? I see you are still calling algActivate, algDeactivate around your calls to RMAN_assignResources. Is that under Hongmei's instructions ?

- RMAN_assignResources happens without any errors

- You don't use any "C" code or configuration to change the queue/TC assignments so the default settings on the hardware continue to be in effect

Can you please confirm the above and answer the questions. Also could you share logs with FC trace enabled from both the cores so I can look at them together:-

http://processors.wiki.ti.com/index.php/Trace_in_Framework_Components#Framework_Components_3.20-Current

0 Hongmei Gou over 12 years ago in reply to Oleg Fomenko

TI__Expert 4335 points

Hi Oleg,

As in C6678 data manual, timer interrupts trigger EDMA events as listed below:

Thanks,

Hongmei

0 Oleg Fomenko over 12 years ago in reply to Hongmei Gou

Intellectual 895 points

Hi,
I'll answer all questions below, but before this I want to share my latest findings on this problem, may be it will save your time:

First of all, general description of my final goal:
-I want to run several transcoders on each core; all algorithms are run one after another, there is no task switch in the middle of the processing, therefore all algorithms can share the same edma resources. Thats the reason I want to use scratch groups in RMAN. (moreover there is not enough edma resources to implement this scenario without sharing edma resources between algorithms)
-On different cores the algorithms can run in parallel so each core must assign different edma resources to its algorithms. And this is what I configure in edma_config.c file - each core gets its separate resources, there is no sharing of edma resources between different cores.
-I want to assign 42 PaRams, 8 edma channels and 8 tccs to each core.

Here is the trace of my scenario latest test scenario (one h.264MP decoder on core0 and one h.264BP encoder on core1, all cores use scratch group id 0):

4477.edma_traces.txt

According to traces, both algorithms on core0 and core1 allocate the same EDMA channels from RMAN! It seems like this is what causes both algorithms to stuck at ECPY_directWait function.

First, the algorithm on core0 (h.264MP decoder) tries to allocate edma resources:
- EDMA3_getResourceManager is called which calls openRMHandle that calls poulateScratchGroups
- poulateScratchGroups populates the new scratch group (number 0) which still does not exist and allocates for it edma channels 0-8 and tccs 0-8 as it configured in edma_config.c
- The rest of the flow is as expected and the decoder is ready to run.
After that the algorithm on core1 (h.264BP encoder) tries to allocate edma resources:
- EDMA3_getResourceManager is called which calls openRMHandle that calls poulateScratchGroups
- poulateScratchGroups checks in static EDMA3_Table _table[EDMA3_MAXGROUPS] (which seems to be shared between RMAN instances on all cores and located in DDR memory) data structure and discovers that scratch group number 0 is already populated and return.
- as a result resources from this structure are allocated also to core1 algorithm

As I see it, there are 2 possible solutions:
- to force edma lld module to store all its data structures in internal memory so that RMAN on each core will work independently (if it is the right solution, I need an assistance with doing this)
- to use different scratch groups for each core, which is not correct way according to what Hongmei wrote. Anyway, if I try using separate scratch group, core1 crashes with the following trace:

0118.logs_for_ti_many_scratch_groups.txt

I attach once more all relevant latest files, so it will be focused in one place:
4760.app.cfg

6443.main.c

8741.surf_main.c

5775.edma_config.c

7607.video_codec_TI_H264_BP_enc.c

8154.video_codec_TI_H264_mp_dec.c

Now, below are the answers to the previous post's questions:
- Creating and assigning resources to each codec, from a different core causes one core to be stuck in ECPY wait.
[Oleg] both cores are stuck in ECPY_directWait() call

? What are the core numbers being used to create each codec ?
[Oleg] core0 - H.264MP decoder core1 - H.264BP encoder

? What scratch group is being used on each core ?
[Oleg] currently it is 0 for all the cores; in the past I tried to use different scratch group id for each core and the result was crash.

- You are using EDMA physical instance # 1 for both cores
[Oleg] Yes

- Per your cfg, you have 42Params, 8 Edma channels, 8 Tccs assigned to each scratch group from 0 through 7, hw many scratch groups are you actually using ?

EDMA3_PARAMS.regionConfig = &surf_edma_C6678_config[1][core_number_used]

/* EDMA3 INSTANCE# 1 */
{ regionSample1, regionSample2, regionSample3, regionSample4,
regionSample5, regionSample6, regionSample7, regionSample8
},

- Assuming you are using core numbers 0 and 1, you are using the two entries in bold above to get your resources
[Oleg] Correct

? I see you are still calling algActivate, algDeactivate around your calls to RMAN_assignResources. Is that under Hongmei's instructions ?
[Oleg] This is how it is implemented in TI h.264 encoder sample applications. Moreover, RMAN_assignResources always failed without algActivate and algDeactivate calls, I got no instructions from Hongmei regarding it.

- RMAN_assignResources happens without any errors
[Oleg] Currently yes (when using scratch group ID 0 for all the cores), but when I used different scratch group ID for each core, this call crashed the dsp, this trace is attached earlier in this post)

- You don't use any "C" code or configuration to change the queue/TC assignments so the default settings on the hardware continue to be in effect
[Oleg] Correct

Thanks a lot for your help,

If you need any additional information, I'm dedicated now 100% for this task and it is number 1 priority for me.
Oleg

0 Hongmei Gou over 12 years ago in reply to Oleg Fomenko

TI__Expert 4335 points

Hi Oleg,

Can you please provide the map file for your application? Also, what version of FC are you using?

Thanks,

Hongmei

0 Oleg Fomenko over 12 years ago in reply to Hongmei Gou

Intellectual 895 points

Hi Hongmei,

I'll upload my map file tomorrow.

I use FC version 3.22.03.09

But for now can you share with me who is your suspect? I can help and do tomorrow some tests.

0 Hongmei Gou over 12 years ago in reply to Oleg Fomenko

TI__Expert 4335 points

Hi Oleg,

I am suspecting there is memory map issue as you mentioned that "static EDMA3_Table _table[EDMA3_MAXGROUPS]" is located in DDR. In your .cfg file, there is "Program.sectMap[".fardata"] = "IRAM";". How about .far section?

Below please find the placement of edma3_config.oe66 sections in MCSDK Video. Can you please check how the placement is done in your application?

0081d2a0 000000ac edma3.ae66 : edma3_config.oe66 (.far) [fill = 0]

00828bd8 00000014 edma3.ae66 : edma3_config.oe66 (.neardata)

0c249d80 00005e20 edma3.ae66 : edma3_config.oe66 (.text)

0c32021c 0000151c edma3.ae66 : edma3_config.oe66 (.const:.string)

0c337b64 00000014 edma3.ae66 : edma3_config.oe66 (.switch:edma3OsProtectEntry)

Thanks,

Hongmei

0 Gunjan over 12 years ago in reply to Oleg Fomenko

TI__Expert 7415 points

I don't mean to talk over Hongmei, but the issue is as you suspect. EDMA3_Table _table[EDMA3_MAXGROUPS] is not supposed to be shared between the two cores. In fact, I don't think any memory is to be shared between the two groups. Since we don't use any sort of synchronism mechanism between the two cores, they should be using mutually exclusive resources as well as memory.

Maybe Hongmei intends to confirm by looking at your '.map' file.

I expect you should partition your DDR2 memory for each of the different cores and then each core would have its own copy of the "_table" and then access it independently of the other. Hongmei, do you have examples of how you do that ? Or how do you handle the distribution of memory between various cores ?

Thanks.

0 Oleg Fomenko over 12 years ago in reply to Gunjan

Intellectual 895 points

Hi,

First of all, I attach my .map file:

As you can see you in the map file, you were right, the .far section is located in DDR memory.

Now the question is how do I fix this without moving the global .far section to the internal memory. (I can't do it because it is simply too big in my project)

I tried to move only the relevant sections of this library to the relevant memory with the following code in the link command file:

GROUP
{
edma30{ -ledma3.ae674 (.far) }
edma31{ -ledma3.ae674 (.neardata) }
edma32{ -ledma3.ae674 (.rodata) }
edma33{ -ledma3.ae674 (.bss) }
} >IRAM
GROUP
{
edma1300{ -ledma3.ae674 (.text) }
edma1301{ -ledma3.ae674 (.neardata) }
edma1302{ -ledma3.ae674 (.const:.string) }
edma1134{ -ledma3.ae674 (.switch:edma3OsProtectEntry) }
} >MSMCSRAM

but than I got plenty of errors like the following

"dskt2dact.c", line 157: warning #17003-D: relocation from function
"DSKT2_deactivateAll" to symbol "_DSKT2_lastActiveAlg" overflowed; the
30-bit relocated address 0x2462e421 is too large to encode in the 15-bit
unsigned field (type = 'R_C6000_SBR_U15_W' (13), file =
"/opt/ti/framework_components_3_22_03_09/packages/ti/sdo/fc/dskt2/lib/releas
e/dskt2.ae674<dskt2dact.oe674>", offset = 0x000002b8, section = ".text")

I tried also to move all these sections of all following libraries, but than I still get tons of errors of this type.

app_pe66.oe66
ipc.lib
edma3.ae674
edma3Chan.ae674
rman.ae674
nullres.ae674
dskt2.ae674
osal_support.ae674
ecpy_cacheMode.ae674
rmm.ae674
smgr.ae674
rmmp.ae674
memutils.ae674
fcsettings.ae674
edma3_lld_rm.ae66

Example of error:

"edma3_config.c", line 1655: warning #17003-D: relocation from function
"EDMA3_getResourceManager" to symbol "_systemResourceManagerHandle"
overflowed; the 17-bit relocated address 0x17b00 is too large to encode in
the 15-bit unsigned field (type = 'R_C6000_SBR_U15_B' (11), file =
"/opt/ti/framework_components_3_22_03_09/packages/ti/sdo/fc/edma3/lib/releas
e/edma3.ae674<edma3_config.oe674>", offset = 0x00004448, section = ".text")

What is you recommendation?

Thanks,

Oleg

0 Oleg Fomenko over 12 years ago in reply to Oleg Fomenko

Intellectual 895 points

Forgot to attache the map file in the previous post7127.oleg_test_map.txt

0 Hongmei Gou over 12 years ago in reply to Oleg Fomenko

TI__Expert 4335 points

Hi Oleg,

Thanks for the map file. For multi-core applications, it is incorrect to place .far in DDR which can be shared by multiple cores. Generally the following sections need to be placed in either local L2 or DDR dedicated to individual cores.

.neardata
.rodata
.bss
.fardata
.cio
.far
.stack
.args

From your map file, it looks like .far section is huge:

80000000 80000000 1001de1c 1001de1c rw- .far

All the globals will be taken as .far if they are not under "#pragma DATA_SECTION". Please examine globals defined in your application, and pragam those which can be shared by multiple cores and place them in DDR or MSMC. Also please check if malloc is used. If so, please replace it with static memory allocation in DDR/MSMC when applicable.

There are globals defined in FC also. Size of .far section from FC depends on the number of codec instances on individual cores. Please set "RMAN.maxAlgs" to the number of codec instances in your .cfg file. In your current .cfg, there is "RMAN.maxAlgs = 32;". Is this over-allocated? What is the maximal number of codec instances in your application?

After the above changes, I am expecting .far section will be largely reduced.

Then, we can check if the the current LL2 sections can be reduced, such as main_task_stack, .INTMEM_HEAP, .stack. The actual peak usage can be found from CCS ROV (RTSC Object View). Determine size of these LL2 sections according to the peak usage with some extra room, e.g., allocate_size = peak usage / 0.9.

With the above changes, we can check if LL2 can accommodate the sections listed above. If not, we can use MPAX to map different physical DDR regions to the same logical address so that they can be used to place .far sections. We can further discuss the MPAX mapping if it is needed by your application. Sample implementation and usage of MPAX has been included in MCSDK Video.

Thanks,

Hongmei

0 Oleg Fomenko over 12 years ago in reply to Hongmei Gou

Intellectual 895 points

Hi,

Thanks for the tips, it really helped!

I forced the linker to put .rodata, .far, .bss and .neardata sections in core's internal memory and the issue was solved!

Thanks you very much for your help guys, Now it seems that everything work as expected

Oleg

Processors

Processors forum

H.264 BP encoder integration on C6678