OMAPL-138: Missing EDMA HWI

Arya Ba

Other Parts Discussed in Thread: OMAP-L138, DA8XX, OMAPL138

Hi All,

Note: Using BIOS 5.33.05 / DSPLink 1.65.00.03 on the OMAP-L138 with Linux on the ARM Core.

We have designed and developed a fairly large system on the OMAP-L138 C674x core using the aforementioned BIOS/DSPLink. Our simplified overall system architecture is as follow, ADCàMcASP->EDMA0_TC0->DSP. The EDMA0_TC0 HWI has a period of 10msec. Within the DSP have a series of SWI and TSK for computational purpose (using TI fir/biqads/fft/math functions) and the data is periodically moved to ARM core using DSPLink. Our program/data is kept in the DDR/L3, with caching enabled on DDR. Moreover, L2 cache space is 128kB.

We are having an issue where randomly we miss several EDMA0_TC0 HWI (generally happens back to back). This seems to happen once every day to once every several days. Here are some of things we have attempted to alleviate the problem:

Compiler Tuning:

Our DSP Project, Optimization level are off, disabled SLOOP and set interrupt_threshold=1.
DSPLink Recompiled with following flags: -g -d"_DEBUG" --no_compress -q -pdr -pdv -pden -ml3 -mv6740 --disable:sploop --interrupt_threshold=1
RTS6470 Recompiled with the following flags: --extra_options="--interrupt_threshold=1 --opt_level=0"
IQMath and DSPLib Instead of using a precompiled version, we requested the source from TI and integrated it directly into project. Therefore, we have control over the compiler options.
Note: We are aware that with Optimization level off the compiler does not attempt to SLOOP or disabled interrupts. Those options are included so we can hopefully enabled –O2 once our problem is resolved.

System tuning:

Restructured our HWIs such that The EDMA0_TC0 HWI is the highest priority.
Set interruptMask = "all" for the EDMA0_TC0 HWI
Moved DSPLink from HWI 4 and 5 to 14 and 15.
Do very minimal work in our HWI before posing its corresponding SWI.
Set SYSCFG MSTPRIX Registers with the following values:
1. DA8XX_MSTPRI0_REG = 0x44441144;
2. DA8XX_MSTPRI1_REG = 0x44420033;
3. DA8XX_MSTPRI2_REG = 0x54604404;
4. EDMA0_TC0, EDMA0_TC1, DSP, DSP, EDMA1_TC2, etc.

If anyone has any ideas what in the system might be disabling the HWI we would appreciate the help.

(Is there any way to cross post this into the compiler forum as well ? )

Thanks,

Arya B.

over 9 years ago

0 Arya Ba over 9 years ago

Intellectual 690 points

If anyone could address our issue we would highly appreciate it.

Thanks,

Arya B.

0 Arya Ba over 9 years ago in reply to Arya Ba

Intellectual 690 points

If no one here has any ideas, can a mod move me into another forum ( or should i repost this there as well ) ? Such as the Compiler forum or the DSP Processor forum to see if anyone there can help us.

0 Vincent W. over 9 years ago in reply to Arya Ba

TI__Genius 12865 points

Hi Arya,

This sounds like a general system issue, where your EDMA interrupts are prevented from being serviced in time. I will try to help you from the DSPLINK perspective, given that is where my expertise is.

DSPLINK disables interrupts at various places (outside of initialization when it registers for interrupts): when it notifies the ARM core, e.g. in NOTIFY_notify(); when it calls MPCS_enter in an HWI context; and when it does a MSGQ_locate. Have you tried disabling the part of your code where you use DSPLINK to send data to the ARM and see if the issue goes away? What are the DSPLINK modules that you use - do you use RINGIO, MSGQ, NOTIFY, MPLIST or MPCS? Maybe you want to simplify your code and comment out the DSPLINK function calls to find out if using DSPLINK is at the root of your issue.

Best regards,

Vincent

0 Arya Ba over 9 years ago in reply to Vincent W.

Intellectual 690 points

Hi Vincent,

Thank you very much for the reply; it is definitely a general system issue. We are using PROC to load the DSP image from the ARM along with three separate MSGQs allocated in the same pool. Each MSGQ is associated with a corresponding task which has a certain priority in our system. Please refer to my older post (http://e2e.ti.com/support/embedded/tirtos/f/355/p/221296/780055.aspx#780055) for a bit more detail and a diagram (ignore the stated issue in that post as we fixed it). Moreover, these three tasks are the lowest priority tasks in the DSP with a period of 500msec in each task (TSK_Sleep(500)). Unfortunately, our system is heavily centered around DSPLink for data transport and initialization. However, I may be able to rework the DSP to disable our three DSPLink tasks during runtime operation and can use the ARM to check the memory region for the missing HWI. Lastly, I only call MSGQ_locate() three times on startup for each MSGQ initialization.

Functions i use during runtime are the following:

MSGQ_get() MSGQ_alloc() MSGQ_put()

There are some LOG_printf() call for logging errors.

Please let me know if you have anymore questions.

Thanks,

Arya B.

0 Vincent W. over 9 years ago in reply to Arya Ba

TI__Genius 12865 points

Hi Arya,

While MSGQ_alloc and local MSGQ_put calls that send messages between BIOS TSKs might introduce small periods in which interrupts are disabled, I don't expect those to be problematic. In the case where MSGQ_put is called to send a message to the ARM, my main concern was the code in IPS_notify() from <DSPLINK_INSTALL_DIR>/dsp/src/base/ips/DspBios/ips.c, which seems to wait for the ARM side to clear a flag with interrupts disabled. However, this only happens when it is called with waitEvent=TRUE, and the MSGQ transport code in ZCPYMQT_send() always passes in waitEvent=FALSE.

How frequently does the DSP get messages from the ARM? Did it fully eliminate the problem when you moved the DSPLINK interrupts to a lower priority relative to the EDMA's? If it is easy enough to rework your app to avoid interaction with the ARM you may still want to do it to simplify your test case. However, if you are simply using the MSGQ APIs you have mentioned in the execute phase then I think this is highly unlikely to be an issue where DSPLINK disabled interrupts for too long.

Best regards,

Vincent

0 Arya Ba over 9 years ago in reply to Vincent W.

Intellectual 690 points

Hi Vincent,

Thank you very much for the reply, interesting so if DSPLink is disabling interrupts, could the following happen:

BIOS: Post LO priority dsplink task ( TSK_Sleep(500) is over )

LO priority task: do msgq packing..disable hwi

BIOS: Post MED priority dsplink task ( TSK_Sleep(500) is over )

MED priority task: do msgq packing..disable hwi..enable hwi..

BIOS: Post HI priority dsplink task ( TSK_Sleep(500) is over )

HI priority task: do msgq packing..disable hwi..enable hwi ?..done.

MED priority task: ..Done.

LO priority task: ..enabled hwi..done.

BG Task:

So under this scenario, would our HWI remain disabled starting from when the LO priority dsplink task disabled them until the end of the highest priority task ? How likely do you think that may be ?

Thank you very much,

Arya B.

0 Vincent W. over 9 years ago in reply to Arya Ba

TI__Genius 12865 points

Hi Arya,

I don't see how there would be a context switch between the low priority task to the medium priority task (steps 2 and 3) while interrupts are disabled, unless the low priority task voluntarily relinquishes the processor, which I don't think is happening here. The code in IPS_notify() is not doing any yield.

Best regards,

Vincent

0 Arya Ba over 9 years ago in reply to Vincent W.

Intellectual 690 points

Hi Vincent,

Thank you for the quick reply, in our system the HI MED LO Dsplink tasks are asynchronous from any ISR and SWI. They each wake up according to their own 500msec sleeps. Therefore, according to this section from the DSP/BIOS user guide:

"As a rule, no ready task has a priority level greater than that of the currently running task, since TSK preempts the running task in favor of the higher-priority ready task."

Would that imply that the aforementioned scenario is possible ? Unless, HWI_disable() disables TSK or BIOS preemption ?

Thank you very much,

Arya B.

0 Vincent W. over 9 years ago in reply to Arya Ba

TI__Genius 12865 points

Hi Arya,

HWI_disable() disables interrupts, hence no ISRs would be able to run. This effectively disables TSK and SWI pre-emption as well, as no asynchronous thread (including the timer ISR) would get a chance to run and invoke the scheduler.

This is why you don't want to disable interrupts for a long time, as the system is essentially shut off from external events.

Best regards,

Vincent

0 Arya Ba over 9 years ago in reply to Vincent W.

Intellectual 690 points

Hi Vincent,

Thanks, that makes sense, I am assuming the scheduler is invoked when we call PRD_Tick() from a ISR ? We have the timer ISR disabled ( we run PRD_Tick from the same ISR as the EDMA ). Then, that would mean the higher priority task would never go into a "ready" state unless ISRs are enabled.

Okay so it seems unlikely that anything in the DSPLink subsystem would cause a latency in our system in order to miss the 10msec periodic EMDA HWI. How are the following compiler options ? anything you think we should add or remove ?

DSPLink Recompiled with following flags: -g -d"_DEBUG" --no_compress -q -pdr -pdv -pden -ml3 -mv6740 --disable:sploop --interrupt_threshold=1

Lastly, If you know of any other TI employees with specialties in BIOS, Compilers or overall OMAP-L138 architecture can you please refer them to this thread ? We would appreciate any help from TI as this is a complex system issue.

Thank you very much,

Arya B.

0 Vincent W. over 9 years ago in reply to Arya Ba

TI__Genius 12865 points

Hi Arya,

When interrupts are disabled, the EDMA ISR would not run either, so it will not get a chance to call PRD_tick(), and no higher priority threads would be made "ready".

Your DSPLINK compiler options look fine to me. They are options meant for debug, so no optimizations are done. However, I wouldn't expect turning on optimization in DSPLINK would solve your issue in this case. '--interrupt_threshold=1' is a good one to have in there to keep interrupt latency to a minimum according to this thread: http://e2e.ti.com/support/development_tools/compiler/f/343/t/49707.aspx

One thought is maybe you can instrument your EDMA ISR code to read the free running counter on the DSP and ensure that the time delta stays more or less the same between the current counter read and the previous counter read. If a long time elapses between interrupts (say twice the expected period) it can halt the CPU with a software breakpoint. Then it would give you a chance to take a look at the BIOS execution graph which may give you a hint on what could have disabled interrupts. I am assuming here that the EDMA ISR occurs periodically, since you are using it to call PRD_Tick().

I will ask someone from the BIOS team to peek at this and see if he has other thoughts. If not we can move your thread to the device forum. Maybe it'd help the thread gain more visibility from folks working on other software components.

Best regards,

-Vincent

0 Arya Ba over 9 years ago in reply to Vincent W.

Intellectual 690 points

Hi Vincent,

Thank you for all the help, we will keep the DSPLink compiler flag with their present configuration. In regards to the EDMA ISR, that is exactly what we have implemented to catch the missing ISR. If the period of the free running counter is greater than our allowable period 10 msec, we flag and report it to the ARM. I have observed latencies of 12 msec or so ( which equates to a free running counter period of 22msec), seems to signify a missed ISR and that we got to second ISR 2 msec late. Unfortunately, we have disable all bios logging (such as RTDX) mechanisms to reduce overhead. However, We have been wanting to log the IRP register along with the period. Since IRP is a core register and not memory mapped there is limited access to it from C.

Please let me know if they need any further information.

Thank you very much,

Arya B.

0 Vincent W. over 9 years ago in reply to Arya Ba

TI__Genius 12865 points

Hi Arya,

Not sure if you have already tried this, but as the second post in this thread indicates, you can access the IRP from C using the cregister keyword: http://e2e.ti.com/support/dsp/tms320c6000_high_performance_dsps/f/115/p/41855/146934.aspx.

Maybe this would help you in your logging.

Best regards,

Vincent

0 Arya Ba over 9 years ago in reply to Vincent W.

Intellectual 690 points

Hi Vincent,

Thank you for the response, that is what we are now using for logging. The system is running with some basic logging, we are waiting for it to miss the ISR and hopefully the IRP will give us some hints toward where in the code ISRs are being disabled. Meanwhile, please let us know if there is anything else you or your colleagues think we should look into.

Thank you very much,

Arya B

0 Arya Ba over 9 years ago in reply to Arya Ba

Intellectual 690 points

Hi Vincent,

Here is a update, as I mentioned previously in our system we use EDMA0_CC0 to ping pong buffer data coming in from the McASP. Moreover, we have several SWIs, two of which run from the EDMA0_CC0 HWI. One SWI runs on the ping buffer the other runs on the pong. Both SWIs are configure to have the same priority, since in a healthy and stable system they would never run at the same time.

Now, we placed the PRD detection/logging into the EDMA0_CC0 HWI and the two corresponding SWIs. Interestingly, we have not missed any of the EDMA0_CC0 HWI. However, we missed both of the SWIs once. Looking at the IRP for the first SWI, it was point to _BCACHE_inv. The second SWI IRP, was pointing to a location the first SWI.

The aforementioned data raises several questions:

Does bios function _BCACHE_inv disabled HWI or SWI or delay the real time process significantly ?
Given that both SWI have the same priority in BIOS, How is that the second SWI IRP was pointing to a region within the first SWI ?

Please let me know if any clarification or detail is needed.

Thank you very much,

Arya B.

0 Steven Connell over 9 years ago in reply to Arya Ba

TI__Mastermind 45025 points

Hi Arya,

Arya Ba said:

Does bios function _BCACHE_inv disabled HWI or SWI or delay the real time process significantly ?

Yes, the BCACHE_inv API first waits for other cache operations to complete, and then it disables hardware interrupts to enter a critical section.

Arya Ba said:
Given that both SWI have the same priority in BIOS, How is that the second SWI IRP was pointing to a region within the first SWI ?

It sounds like there could be some cache corruption happening, but that's just my initial guess.

Are you able to run your application with cache disabled entirely? Just for experimental purposes, to see if that eliminates the problem?

Steve

0 Arya Ba over 9 years ago in reply to Steven Connell

Intellectual 690 points

Hi Steve,

Thank you very much for your response, interesting so it seems BCACHE_inv is a very dangerous function to call in a HWI or high priority SWI as it may lead to starvation and delay in the real time process. Is there anywhere that these details are documented ? We referred to the DSP/BIOS API, however, there was no mention of calling context constraints.

At the moment we removed all calls to BCACHE_inv in our system and have disabled caching on those regions. Unfortunately, due to the size and complexity of our system we cannot meet our hard deadlines if we disable caching completely. Cache corruption sounds disturbing, what are some common causes ? How can we debug for it ?

Thank you very much,

Arya B.

0 Steven Connell over 9 years ago in reply to Arya Ba

TI__Mastermind 45025 points

Arya Ba said:
Thank you very much for your response, interesting so it seems BCACHE_inv is a very dangerous function to call in a HWI or high priority SWI as it may lead to starvation and delay in the real time process

I'm not sure if calling BCACHE APIs from HWI context is dangerous (it is OK to call them from HWI context). But, I suppose this disabling of interrupts should be taken into consideration when using the APIs. But, in general, one needs to be weary of doing too much in a HWI. As little as possible should be done in HWI context, and any API that may run a long time should be pushed out to SWI or TSK context (if possible).

Arya Ba said:
Is there anywhere that these details are documented ? We referred to the DSP/BIOS API, however, there was no mention of calling context constraints

You may have missed it. There should be a chapter in the API Guide (spru403s.pdf) called "Function Callability and Error Tables". This table shows you the acceptable context in which each API can be called.

Arya Ba said:
At the moment we removed all calls to BCACHE_inv in our system and have disabled caching on those regions. Unfortunately, due to the size and complexity of our system we cannot meet our hard deadlines if we disable caching completely. Cache corruption sounds disturbing, what are some common causes ? How can we debug for it ?

I saw a post that you made a while back. I'm wondering, how much processing are you doing in your HWI threads? Is it possible to move out any of that code/processing into a corresponding SWI thread?

For cache, in general, you need to make sure all of your alignments are correct. When you call invalidate, you must make sure the size you are invalidating is a multiple of the alignment size. Otherwise, you could invalidate past a cache boundary (invalidating data that is indeed valid). Or write back data that's incorrect.

I found a guide here that may be useful: TMS320C64x+ DSP Cache User's Guide

0 Arya Ba over 9 years ago in reply to Steven Connell

Intellectual 690 points

Hi Steve,

Thank you very much for the reply, in our system we have a total of 5 HWI, two eCAPs, two EDMAs, and one PRU. Moreover, our total computation in each HWI is kept to a minimum, we update/read registers then we post the corresponding SWI/TSK associated with the HWI. The most we do in a HWI is for the EDMA where we loop over IPR and post SWI/TSK accordingly. We were making the BCACHE_inv in the EDMA1_TC2 HWI, however, given the latency involved we have removed it and are no longer caching that region.

Now, having removed the BCACHE_inv from our DSP source, we ran a new build across several systems over the weekend to catch any HWI/SWI latencies. Note, this contains the PRD detection/logging in the EDMA0_CC0 HWI and the two corresponding SWIs. Once more, we did not miss any EDMA0_CC0 HWI but we missed the corresponding SWIs. I have included our test data, two systems exhibited a large latency once over the 3 days period. While the other system exhibited a similar problem twice in the same time period. Furthermore, I have included the IRP function name indicating the stall point.

Runtime PRD for SWI: 2500000

PRD Values in term of 150 MHz free running counter.

System A:

SWI_IRP: BCACHE_wait (bcache_wait.o674 (.bios))
SWI_PRD: 3821119

System B:

SWI_IRP: MPCS_leave
SWI_PRD: 3825743

System C:

SWI_IRP: KNL_glue
SWI_PRD: 3171189
SWI_IRP: MPCS_leave
SWI_PRD: 3805477

As you can see, removing the BCACHE_inv from our source was helpful. However, now it seems some components of DSP/BIOS and DSPLink are causing latencies up to 8.8 milliseconds.

Why would any of the aforementioned functions cause such huge latencies ? 8.8 milliseconds is a very long time in DSP time.
In order to gain some insight, which unit in BIOS or DSPLink is calling those functions ? and why ?

Lastly, in regards to the cache corruption, since we do not manually invalidate/wirteback any regions, is there anything else we can look into ? The only unit in our system that invalidate/wirteback regions is within DSPLink Lib and we have ensured that our MSGQ packets are aligned to the 128byte boundary (using the PRAGMA).

Thank you very much,

Arya B.

0 Vincent W. over 9 years ago in reply to Arya Ba

TI__Genius 12865 points

Hi Arya,

MPCS_enter/MPCS_leave are called at various locations in DSPLINK. They ensure mutual exclusion and translate to different functions depending on the calling context. If called in a HWI, it corresponds to HWI_disable/restore; otherwise it corresponds to SWI_disable/enable -- unless you are in a TSK and DSPLINK was configured to use TSK mode (--
DspTskMode=1) for MSGQ. See dsp/src/mpcs/mpcs.c for implementation details. So to minimize latency, you want to be calling these functions in a TSK, with DspTskMode=1 configured.

Here are the relevant places in my opinion where MPCS_enter/leave are called:

1. In MSGQ transport (dsp/src/msg/DspBios/zcpy_mqt.c): Both ZCPYMQT_send and ZCPYMQT_msgCtrl use MPCS. The first is called in the context of MSGQ_put, and the second is called in the callback when receiving a message. The callback is either a TSK if you have configured DSPLINK to use TSK mode for MSGQ, or a SWI otherwise. Questions: Where do you call MSGQ_put (in a HWI, SWI or TSK)? In which mode did you configure MSGQ? I think you should try to defer the MSGQ_put calls to TSKs and use TSK mode if possible.

2.When allocating or freeing buffers from a POOL, SMAPOOL_alloc/free (from dsp/src/pools/DspBios/sma_pool.c) are called. These functions do use MPCS. Question: Are you calling MSGQ_alloc/free in HWIs or SWIs? Can they be deferred to a TSK?

Many of these critical regions protected by MPCS involve cache invalidation and writeback. In most cases the amount of memory being invallidated/written back are control data structures and should be small, but I see some calls on the msg buffers, and depending on their size they may be an issue in light of what you and Steve found. To minimize latency, you should try to perform these in TSKs whenever possible.

Hopefully by following some of these suggestions you can reduce your latency.

Best regards,

Vincent

0 Arya Ba over 9 years ago in reply to Vincent W.

Intellectual 690 points

Hi Vincent,

Thank you very much for the reply, to answer some of your questions, we make all DSPLink related calls from their corresponding tasks. Moreover, they are the lowest priority tasks in the system since they have a soft deadline. Therefore, all MSGQ_put or MSGQ_alloc calls are invoked from a low priority task in our system.

I have included our DSPLink configuration; it seems we had DSPLink configured in DSP_SWI_MODE. We made the change to DSP_TSK_MODE. I am wondering how is the TSK or SWI priority determined? I know for the DSPLink HWI it is define in the platform cfg file. If you see anything alarming the follow configuration please let me know.

Present Configuration:

-dTSK_MODE -DDDSP_DEBUG -dMAX_DSPS=1 -dMAX_PROCESSORS=2 -dID_GPP=1 -dOMAPL138 -dPROC_COMPONENT -dPOOL_COMPONENT -dNOTIFY_COMPONENT -dMPCS_COMPONENT -dRINGIO_COMPONENT -dMPLIST_COMPONENT -dMSGQ_COMPONENT -dMSGQ_ZCPY_LINK -dCHNL_COMPONENT -dCHNL_ZCPY_LINK -dZCPY_LINK -dPROCID=0 -dDA8XXGEM -dDA8XXGEM_INTERFACE=SHMEM_INTERFACE -dPHYINTERFACE=SHMEM_INTERFACE -dDSP_SWI_MODE

perl dsplinkcfg.pl --platform=OMAPL138 --nodsp=1 --dspcfg_0=OMAPL138GEMSHMEM --dspos_0=DSPBIOS5XX --gppos=ARM --comps=ponslrmc

New Configuration:

perl dsplinkcfg.pl --platform=OMAPL138 --nodsp=1 --dspcfg_0=OMAPL138GEMSHMEM --dspos_0=DSPBIOS5XX --gppos=ARM --comps=ponslrmc --DspTskMode=1

There two related compile options, there is TSK_MODE and DSP_TSK_MODE. We were using TSK_MODE for one tag and DSP_SWI_MODE for the other.

Also, any ideas about what KNL_glue() ? It seems to be part of the BIOS ?

On a separate note, it seems the DSPLink related latencies are caused by caching. We can move the MSGQ POOL into a region in the DDR with caching disabled. Can we setup DSPLink to bypass caching the payload if the POOL memory is a separate DDR region?

Thank you very much,

Arya B.

0 Vincent W. over 9 years ago in reply to Arya Ba

TI__Genius 12865 points

Hi Arya,

Not sure if I understand fully what you meant by 'priority'. You can only use DSP_TSK_MODE if all your DSPLINK API calls are performed in TSK context: See http://processors.wiki.ti.com/index.php/DSPLink_FAQs#How_should_I_choose_between_TSK_mode_or_SWI_mode_in_the_--DspTskMode_option_when_I_am_trying_to_configure_DSPLink.3F

If you are referring to the priority of the TSK in the ZCPYMQT message transport, you can set it as the first argument in the platform CFG file:

STATIC LINKCFG_Mqt LINKCFG_mqtObjects [] =
{
{
"ZCPYMQT", /* NAME : Name of the Message Queue Transport */
  (Uint32) SHAREDENTRYID1, /* MEMENTRY : Memory entry ID (-1 if not needed) */
  (Uint32) -1, /* MAXMSGSIZE : Maximum message size supported (-1 if no limit) */
  1, /* IPSID : ID of the IPS used */
  0, /* IPSEVENTNO : IPS Event number associated with MQT */
  0x0, /* ARGUMENT1 : First MQT-specific argument */
  0x0 /* ARGUMENT2 : Second MQT-specific argument */
}
} ;

By default, the priority is 15.

The SWI priority is not configurable, and I believe it is fixed to the default value of 1. If you really need to change it, you can set it in the SWI_create() call in ZCPYMQT_open() from the file dsp/src/msg/DspBios/zcpy_mqt.c, and rebuild DSPLINK.

As for caching, you can disable it by setting the MAR bits accordingly to leave your MSGQ POOL area uncached. However, short of modifying the DSPLINK code itself, there is no configuration parameter to remove the BCACHE calls, so you may still incur the inherent delays.

Best regards,

Vincent

0 Arya Ba over 9 years ago in reply to Vincent W.

Intellectual 690 points

Hi Vincent,

Thank you very much for the reply, since we were only missing SWI and not HWI, seems like not setting the DSP_TSK_MODE was definitely a major cause of latency. We have several systems running with DSP_TSK_MODE enabled; hopefully the results will reflect reduced latencies.

As for the priority, you answered my question perfectly. Since DSPLink has the lowest priority in our system, the default priority value of 15 is too high. Here are our before and after LINKCFG structs, we changed ARGUMENT1 to 0XA which is our ideal priority for the DSPLink TSK.

STATIC LINKCFG_Mqt LINKCFG_mqtObjects [] =

{

"ZCPYMQT", /* NAME : Name of the Message Queue Transport */

(Uint32) SHAREDENTRYID1, /* MEMENTRY : Memory entry ID (-1 if not needed) */

(Uint32) -1, /* MAXMSGSIZE : Maximum message size supported (-1 if no limit) */

1, /* IPSID : ID of the IPS used */

0, /* IPSEVENTNO : IPS Event number associated with MQT */

0x0, /* ARGUMENT1 : First MQT-specific argument */

0x0 /* ARGUMENT2 : Second MQT-specific argument */

}

} ;

STATIC LINKCFG_Mqt LINKCFG_mqtObjects [] =

{

"ZCPYMQT", /* NAME : Name of the Message Queue Transport */

(Uint32) SHAREDENTRYID1, /* MEMENTRY : Memory entry ID (-1 if not needed) */

(Uint32) -1, /* MAXMSGSIZE : Maximum message size supported (-1 if no limit) */

1, /* IPSID : ID of the IPS used */

0, /* IPSEVENTNO : IPS Event number associated with MQT */

0xA, /* ARGUMENT1 : First MQT-specific argument */

0x0 /* ARGUMENT2 : Second MQT-specific argument */

}

} ;

Lastly, regarding caching, you are correct, even if we disable caching on those regions the BCACHE calls will still cause latencies ( we experienced this first hand earlier when we were calling BCACHE_Inv in our HWI ). Therefore, if we wanted to disable caching and remove all the BCACHE calls, which calls in DSPLink are affecting our preset system? And is it safe to remove them?

Thank you both very much for the support; we are definitely making great progress in resolving our mysterious system issues. We really appreciate your help.

Thank you very much,

Arya B.

0 Vincent W. over 9 years ago in reply to Arya Ba

TI__Genius 12865 points

Hi Arya,

If you can make sure all shared memory areas used by DSPLINK (i.e. DSPLINKMEM, POOLMEM) are uncached, then it should be ok to remove the BCACHE calls.

All the BCACHE calls are invoked via the HAL layer wrappers. I have not tried this, but if you take a look at dsp/src/base/hal/DspBios/DA8XXGEM/_hal_cache.h, you should able to just change the HAL macros to not call the BCACHE API.

In any case, I suggest you try DSP_TSK_MODE first along with the new priority level. Once you are in that mode, MPCS would no longer be disabling SWIs, and the BCACHE calls in critical regions would not be executed with SWI disabled. This may be sufficient to get to the latency you need.

Best regards,

Vincent

0 Arya Ba over 9 years ago in reply to Vincent W.

Intellectual 690 points

Hi Vincent,

Thank you very much the response, we went ahead and moved DSPLINKMEM and POOLMEM into uncached regions. Moreover, following your recommendation we modified the HAL macros to not call the BCACHE API. The system booted, was stable and we did not observe any notable performance degradation.

At the moment we have several sets of units in testing, one set is running all of our discussed changes up to the BCACHE API removal, the other contains the BCACHE API removal as well. We will be monitoring the systems and will report back shortly.

Thank you very much,

Arya B.

0 Arya Ba over 9 years ago in reply to Arya Ba

Intellectual 690 points

Hi Vincent,

Here is an update on our testing, our units have been running for 3 days. Both sets have not missed any HWI or SWI during runtime ( which is great news ! ). However, every system in the set without the BCACHE API calls misses several hundred HWI/SWI on startup. The problem goes away if we place the BCACHE API calls back into DSPLink while keeping DSPLINKMEM and POOLMEM in uncached regions. Any ideas why removing the BCACHE API calls decreases system stability on startup ? What are the repercussions of calling the BCACHE API on memory that is outside of our caching regions ?

Thank you very much,

Arya B.

0 Vincent W. over 9 years ago in reply to Arya Ba

TI__Genius 12865 points

Hi Arya,

Did the set of machines that has DSP_TSK_MODE *and* the BCACHE calls achieve the runtime performance you need? If so, I'd highly recommend you leave the BCACHE calls in, since that was the configuration that was originally system-tested by TI. Calling BCACHE on uncached memory is harmless apart from the few extra cycles it takes to run the functions.

If you really need to remove the BCACHE calls themselves, the startup problems lead me to think that either you are not disabling the cache properly (double-check the MAR registers and alignment of your memory areas), or there are cached areas outside of DSPLINKMEM or POOLMEM that I wasn't aware of. So you may want to be more conservative and only comment out the HAL_cache* function calls in MPCS_enter() and MPCS_leave() in dsp/src/mpcs/mpcs.c, given those were the functions you saw as having a latency issue.

Best regards,

Vincent

0 Arya Ba over 9 years ago in reply to Vincent W.

Intellectual 690 points

Hi Vincent,

Thank you very much for the quick response, so far the machines running DSP_TSK_MODE *and* the BCACHE calls have been meeting our interrupt latency requirements. However, having experienced the latency introduced by the BCACHE calls in the past, it would be ideal to remove them from DSPLink to reduce potential latencies.

I have included our DSPLink memory map along with the MAR bits, we have verified using the debugger that our MSGQ packets are indeed allocated in our defined region.

DSPLink Regions from DSP Map file:
DSPLINKMEM0 c7100000 00005000 00000000 00005000 RWIX
DSPLINKMEM1 c7105000 0002b000 00000000 0002b000 RWIX
POOLMEM c7130000 00600000 00000000 00600000 RWIX

MAR bits from BIOS tcf:
bios.GBL.C64PLUSMAR192to223 = 0x00000040; /* DDR 0xC6000000-0xC6FFFFFF = 16M is cacheable */

Doing a bit of experimenting in removing the BCACHE calls, it seems if we remove any of them from MPCS_enter() and MPCS_leave() in dsp/src/mpcs/mpcs.c results in a higher chance of missing interrupts on startup. However, removing BCAHE calls from ZCPYMQT_put() in dsp/src/msg/DspBios/zcpy_mqt.c seemed fine. This is all quite puzzling, as according to the aforementioned memory configuration all DSPLink memory is already in non-cached region.

Are you aware of any other DSPLink memory region that we need to move ?

Thank you very much,
Arya B.

0 Vincent W. over 9 years ago in reply to Arya Ba

TI__Genius 12865 points

Hi Arya,

Based on what you are seeing, I am fairly certain that your SWI misses were due to MPCS being misconfigured to SWI mode. It basically caused all critical regions between MPCS_enter and MPCS_exit (which may contain multiple BCACHE calls plus other 'work'. See ZCPYMQT_msgCtrl() for an example) to be entirely run with SWI disabled. That is a huge latency compared to the small code sections inside BCACHE functions where interrupts are disabled briefly to read or write to a few EMIF registers. If your EDMA transfers are of any significant size, it is extremely unlikely the interrupts would occur fast enough for these small latencies to be of concern.

That said, if you really want to look into removing the BCACHE calls (a reminder that this is not something TI had system-tested), one suggestion I have -- if you want to find out whether the BCACHE functions are called on regions outside of DSPLINKMEM or POOLMEM -- is to do a check in each of the HAL_cache* functions in dsp/src/base/hal/DspBios/DA8XXGEM/_hal_cache.h to see if the memory being invalidated/written back land outside of those regions. If so, spin in an infinite loop. That would give you a chance to connect to the DSP with CCS and find the call stack that is the culprit. It will also tell you which other memory region(s) needs to have cache turned off. Make sure you are building against the DSPLINK debug libraries so that you have access to the symbols.

Best regards,

Vincent

0 Arya Ba over 9 years ago in reply to Vincent W.

Intellectual 690 points

Hi Vincent,

Thank you very much for the reply, having moved all of DSPLink related memory into non-cacheable regions, it would be optimal to remove these potential sources of latency as they do not provide any performance benefit.

Following your suggestion we were able to find the region which was residing in cacheable space. It turns out we had forgotten about "DSPLINK_shmBaseAddress" region used on startup during PROC. We remapped that into non-cacheable space as well. However, the problem of missing hundreds of HWI/SWI on startup persisted. Therefore, we decided to recompile our DSP project without debug symbols, similarly with DSPLink we removed debug symbols and compiled without optimization. That seemed to have alleviated the problem. Upon further testing, it seems like the problem arises if they are not configured homogeneously, for example, having the DSP project with Debug symbols and DSPLink without.

Any ideas why the debug symbols would have such an effect on latency on startup ?

Thank you very much,

Arya B.

0 Vincent W. over 9 years ago in reply to Arya Ba

TI__Genius 12865 points

Hi Arya,

By "debug symbols", I assume you are referring to the -g compiler option. So you are saying by throwing -g to either your application or to the DSPLINK library, you see a performance decrease.

According to table 3.7 in the Compiler manual: http://www.ti.com/lit/ug/spru187v/spru187v.pdf, there should not be any impact on optimization level. This may be a good question to run by the compiler forum. Only other thought I have is it may have somehow changed the memory placement of the various sections. You can try to compare the map files before and after applying the -g option to see if there are any differences. If the data is placed differently it could have an impact on the areas that are still being cached.

Best regards,

Vincent

Processors

Processors forum

OMAPL-138: Missing EDMA HWI