Bcache_inv crash when MPC turned on

Khor

Hi all,

Bcache_inv has been used in some of our code to refresh cache data. However, we do see exception happened when MPC is turned on. After we trace the crashing point, It happened that the crash is happen at Bcache_wait inside Bcache_inv. But this crash is not happened all the time.

Any idea what will cause this to be happened?

BCACHE_inv(g_buffer, 4, TRUE); // we just pass in a global buffer.

could it because we does not set properly for MPPA register? we just enable the MPC by change the TCF setting.

FYI we are using L138.

Thanks

Aun

over 14 years ago

0 judahvang over 14 years ago

TI__Mastermind 32475 points

Aun,

Some questions:

1. If you don't enable MPC, does the program still crash? If it doesn't, it could mean you aren't correctly setting up the MPPA registers.

2. So are you saying the exceptions happens within the BCACHE_inv() call? If so, do you know if there's an MMU being used (presumbly programmed on the ARM) to remap the address space of the DSP? If this is the case, I think there's a config parameter that may need to be changed.

3. If neither of the case above apply, it could just be that your program is crashing due to the results of the BCACHE_inv call.

Judah

0 Robert Tivy over 14 years ago

TI__Mastermind 18260 points

If Judah's suggestions don't fix your issue, you can try a bottom-up approach. Do you have the exception information?

For any exception, BIOS prints information to the "Execution Graph Details" message LOG, which should be available for viewing by opening the RTA LOG menu item. For MPC exceptions, there is further information available in the MPC Fault registers, which are memory mapped:
    L1D MPFAR - 0x0184ac00
    L1P MPFAR - 0x0184a400
    L2 MPFAR - 0x0184a000
MPFAR is the Fault Address Register, and if any of the above ones are non-zero then it's likely that that address was the one faulting. Each MPC also has an MPFSR Fault Status Register (located 4 bytes after the MPFAR) which contains the lacking MPPA bits for the corresponding MPFAR address. There are BIOS header files for MPC and EXC that essentially "document" the information related to exception decoding (structures and bit masks, etc.)

NRP is another valuable data point, indicating the point in the code execution stream at which the exception occurred. MPC faults can be delayed a few cycles, in which case NRP would be a few instructions *after* the faulting instruction.

Regards,

- Rob

0 Khor over 14 years ago

Intellectual 375 points

Hi Judah and Rob,

Thanks for your promt reply.

Software is not crashing if MPC is not turned on. We dint do any setting for MPPA register, only enable the tcf setting. When exception happened, the crashing point is at bcache_wait inside the bcache_inv().

We has some finding today. The bcache_inv has been used in few place in our code to invalidates memory from cache. However, we found that crash is not happened all the time when bcache_inv being executed. We had set the wait to true all the time to make sure the invalidation operation completed before return. Thus we suspect it could be second or multiple entry to bcache_inv() to same g_buffer while previous operation is still in waiting mode and cause the exception happened.

we tried this

intval = hwi_disable();

bcache_inv(g_buffer, 4, True);

hwi_restore(intval);

With this change, the software will no crash here. As i mentioned, there is few more place using this bcache_inv, now the software crash at different place.

Wonder why this MPC will trigger this as exception?

Aun

0 judahvang over 14 years ago in reply to Khor

TI__Mastermind 32475 points

Aun,

When you say "now the software crash at different place", do you mean it crashes at a different BCACHE_inv() call?

From your description above, it doesn't really give me any additional clues as to why MPC triggers an exception. You might want to follow Rob's suggestion for setting up the MPPA registers.

Do you know if you are using the MMU?

Judah

0 Khor over 14 years ago

Intellectual 375 points

Judah,

Yes, it now crashes at different bcache_inv() call.

Now we want to know the reason why this exception in triggered, I will take a look on the MPC register that suggested by Rob when exception.

We are using Linux in Arm side, i believe the linux is come with some MMU.

Aun

0 Karl Wechsler over 14 years ago in reply to Khor

TI__Mastermind 20805 points

Aun --

One thing that concerns me is your use of '4' as the size of the invalidate. The cache invalidate will only work on full cache lines. If you say "4", the size of the invalidate will be rounded up to 128 (or 64 if the data is in L2 since L1 line size is 64). The base address will also be rounded down to the alignment of the cache line. So, you will be invalidating more data than just the 4 words you specify in that call. Are you taking care to make sure that no other data is on the same cache line as the 4-bytes that you are trying to invalidate? If you only need 4 bytes, then you still need to reserve a buffer of length 128 and align that buffer on a 128-byte alignment. And use 4 of those bytes for the variable you are working with. The other 124 bytes will be unused, but you can isolate the variable of interest and not erroneously invalidated adjacent data.

You can do it with something like this (not syntax exact but hopefully close):

typedef struct CacheVar {
int var;
char pad[124];
] CacheVar;

#pragma DATA_ALIGN(data, 128)
CacheVar dataBuf;

-Karl-

0 Khor over 14 years ago in reply to Karl Wechsler

Intellectual 375 points

Karl,

Yup, you are right. We actually allocated 128 bytes for the g_buffer. It should not be a problem

#pragma DATA_SECTION(g_buffer, ".dspBuffer")
char g_buffer[128];

Aun

0 Karl Wechsler over 14 years ago in reply to Khor

TI__Mastermind 20805 points

Can you double-check your .cmd file to make sure you are also aligning this section as necessary? And check .map file for the base address of g_buffer and make sure 128-byte aligned?

The next step I'd take would be to put a breakpoint on hwi1 (the NMI vector where exceptions go). And when you get that breakpoint, you should look at NRP which should show you which instruction (might be a few before the NRP) triggered the exception. If you take a screen shot of that disassembly window (make sure show instructions before and after also make sure you include the full core register view), then we might see something. If you scroll back in the disassembly view, you should be able to find which function is triggering the exception.

-Karl-

0 Karl Wechsler over 14 years ago in reply to Karl Wechsler

TI__Mastermind 20805 points

Aun --

I saw in a separate message that you are using 5.41.02.14.

This is a known issue with the 5.41.03.17 and earlier versions.

Here's the bug id:

SDOCM00071000 BCACHE_wait() contains a hole which can cause an exception if MPC is enabled

This problem was fixed in 5.41.04.18.

You can get the different 5.41.xy point releases on this web page:

http://software-dl.ti.com/dsps/dsps_public_sw/sdo_sb/targetcontent/bios/dspbios/index.html

We summarize the list of changes for each point release in the release notes. This issue is captures in the release notes for 5.41.04.18.

I'm sorry we didn't recognize this earlier, but updating BIOS should fix your problem.

Regards,
-Karl-

0 Khor over 14 years ago in reply to Karl Wechsler

Intellectual 375 points

Hi Karl,

For this bug, In what condition bcache_wait will cause MPC to trigger exception? what does it mean by contain a hole?

Is there workaround for this problem if we try not to up-grade our library version?

Thanks for your great support

Aun

0 judahvang over 14 years ago in reply to Khor

TI__Mastermind 32475 points

Aun,

There is no workaround that we know of at this point. The problem happens within the BCACHE_wait() call. This function is also called internally by other BCACHE functions like BCACHE_inv(). If you don't want to upgrade your library, one option you can try if you have the source code is to include it as part of your project and patch up the function. The problem is in the bcache_wait.c file. If you do this, your copy of the BCACHE_wait() will override the BIOS library copy.

This part of the code is where the problem is:

    if (_BCACHE_emifAddr != NULL) {
        mask = HWI_disableI();
        *_BCACHE_emifAddr = 0;
        *_BCACHE_emifAddr;
       _BCACHE_emifAddr = NULL;
       HWI_restoreI(mask);
    }

Should be replaced by:

    mask = HWI_disableI();
    if (_BCACHE_emifAddr != NULL) {
         *_BCACHE_emifAddr = 0;
          *_BCACHE_emifAddr;
          _BCACHE_emifAddr = NULL;
    }
    HWI_restoreI(mask);

Judah

0 Khor over 14 years ago in reply to judahvang

Intellectual 375 points

Judah,

Thanks. May I know which fie _BCACHE_emifAddr being defined?

Aun

0 judahvang over 14 years ago in reply to Khor

TI__Mastermind 32475 points

Aun,

If you hadn't already find out, its in bcache_wait.c at the top of the file.

Judah

0 Khor over 14 years ago in reply to judahvang

Intellectual 375 points

Judah,

Yes, I already found the declaration. Thanks

Aun

Processors

Processors forum

Bcache_inv crash when MPC turned on