This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Memory coherency, cache(?) causing random failures in MessageQ open on ARM on OMAPL138

Other Parts Discussed in Thread: OMAPL138

Hi,

We are using OMAPL138 and ipc_1_25_03_15, bios_6_35_04_50, xdctools_3_25_03_72, syslink_2_21_02_10.

Our applications use Heap_BufMP and MessageQ and follow the example_02 for MessageQ that comes with the Syslink examples. There is a lock step handshake between the two cores after creating Rx MessageQs (RESRDY event), opening MessageQs on remote core (READY event) and then a COMPLETE event. we have prints to log MessageQ name, open success and failures.

The code was executing fine without any problems until now when the problems have started surfacing.

The problem observed is that with prints disabled the ARM fails to open one of the Rx MessageQs created by the DSP. The MessageQ that fails is random. The error happens randomly almost 2 of 3 times.

If we enable few prints that dump the result of MessageQ opening, the MessageQ failure occurrences reduce to about 1 in 10 starts.

If we enable all the prints the MessageQ opening never fails, the ARM is able to open all the messageQ created by the DSP.

In all these failures the ARM is beyond the first RESRDY handshake which implies that DSP was able to create all the Rx MessageQs and sent the RESRDY event to ARM. But still the ARM failed to open the MessageQ.

We have tried combinations of having SR0 on the L3_CBA_RAM and also on the DDR2_Shared with the cacheEnable configuration and the corresponding MAR set for cache enabled and disabled.

We suspect that what DSP writes to SR0 is not what the ARM sees. This failure happens even when the SR0 is set in L3_CBA_RAM with SR0.cacheEnable set to false and the MAR register is cleared.

What else could be missing? What configuration on the ARM side should we check or configure differently?

Thanks in advance,

Taran Tripathi

  • To add to the previous post.

    Out target.bld has:

            ["DDR_SHARED",
                {
                    name: "DDR_SHARED",
                    base: 0xC7000000,
                    len: 0x00200000, /* 2MB extra shared space */
                    space: "code/data",
                    access: "RWX",
                }
            ],

            ["DDR_DSP",
                {
                    name: "DDR_DSP",
                    base: 0xC7800000,
                    len: 0x00800000, /* 8MB DSP code space */
                    space: "code/data",
                    access: "RWX",
                }
            ]

    codeMemory: "DDR_DSP",
    dataMemory: "DDR_DSP",
    stackMemory:"DDR_DSP",
    l1DMode: "32k",
    l1PMode: "32k",
    l2Mode: "32k"

    the target.cfg defined SR0:

    SharedRegion.setEntryMeta(0,

    new SharedRegion.Entry({

    name: "SR0",
    base: 0xC7000000,
    len: 0x00200000, /* 2MB shared space */
    ownerProcId: MultiProc.getIdMeta("HOST"),
    cacheEnable: true,
    isValid: true

    })

    );

    The MAR was not set because the MARs are default enabled.

    We changed our target.bld to make DDR_SHARED as "data" and access as "RW" only. We also explicitly enabled the MAR for the region in target.cfg

    This has resulted in successful MessageQ open 10 out of 10 reboots when earlier the success rate for MessageQ open was 1 in 3 reboots.

    These changes do affect the behavior, but not sure why. We are still waiting to see if and when this fails.

    Any pointers or help would be appreciated.

    Thanks in advance,

    Taran Tripathi

  • We’re closing out this old thread. If you feel there is more discussion on the topic, please feel free to "Reply".