AM2634: Multicore Program Fails CacheP_getEnabled() On First Execution

Kier

Part Number: AM2634
Other Parts Discussed in Thread: SYSCONFIG

Hello,

I have a four core program. On first launch in DevBoot mode Cores 1,2 and 3 always fail an assert check in MpuP_enable:

The assert expression highlighted below is always false on first execution:

In other words, type is not zero but 15 instead. I tried to find out what type==15 actually means but I'm having difficulty finding the relevant information.

However, if I do a group Reset -> Restart -> Resume then this check passes (i.e. type must be zero) and all four cores run just fine. So it appears a CPU Reset fixes the issue but I need to understand how.

Questions:

1) What are the "current enabled bits" returned from CacheP_getEnabled()?

2) Why are they already set when DevBoot mode is launched?

3) How might I reset them programmatically?

Thank you.

over 1 year ago

0 Kier over 1 year ago

Mastermind 8855 points

If I add a call to CacheP_disable(CacheP_TYPE_ALL); before Mpu_init(); in ti_dpl_config.c the problem goes away. This further demonstrates that cache "bits" are somehow enabled before MpuP_init(); has been called:

However, this is a very inconvenient fix since since ti_dpl_config.c is generated by SysCfg.

I guess that answers Q3 but Q2 remains. How do I find out what is setting the cache enabled bits before MpuP_init() is called? I can't see any cache configuration registers so am feeling around in the dark. Also, there appears to be no MPU settings in the GEL files.

0 Kier over 1 year ago

Mastermind 8855 points

Can TI answer Q2 please?

0 Aakash Kedia over 1 year ago in reply to Kier

TI__Mastermind 26145 points

Hi Kier,

Kier said:
The assert expression highlighted below is always false on first execution:

The check mentioned is to ensure that the MPU settings are done first and cache is enabled next. We have observed issues if the sequence is reversed. This means, if your cache is enabled, before the MPU is enabled, the software will be stuck in the assert.

Kier said:
2) Why are they already set when DevBoot mode is launched?

Let me check this. This may be due to your debugger settings that the cache of the device is enabled.

Can you also try this method of initialization ? Does this give you the same result ?

https://software-dl.ti.com/mcu-plus-sdk/esd/AM263X/latest/exports/docs/api_guide_am263x/ADDITIONAL_DETAILS_PAGE.html#autotoc_md38

Best Regards,
Aakash

0 Kier over 1 year ago in reply to Aakash Kedia

Mastermind 8855 points

Hi Aakash,

Thank you.

I tried the SBL init method:

But Core1 would not connect at all and I could not see any debug symbols on cores 0 and 3 Only Core 2 ran properly.

In any case, it seems like this method also executes GEL files.

After examining the contents of CacheP_armv7r_asm.S, I think what would help is to understand how to examine the System Control Register in CCS:

Cortex-A7 MPCore Technical Reference Manual r0p3 (arm.com)

I guess this isn't memory mapped so how can I view its status please?

0 Kier over 1 year ago in reply to Kier

Mastermind 8855 points

H Aakash,

I found the System Control Register now:

Also, now I think there's an issue with our custom boot assembly code. It could be that mpu_init is being called twice. Let's put this on hold for the moment while I investigate.

0 Kier over 1 year ago in reply to Kier

Mastermind 8855 points

OK, so the boot code is not calling MpuP_init() twice, it seems to be a quirk of the debugger. It stops once at the BP then pressing Resume it stops again at the same place but I confirmed by other means that MpuP_init() is only called once.

I guess the next step is confirm the contents of the CP15_SYSTEM_CONTROL register when this occurs.

By the way Aakash, why is it that there's no bitfield breakdown of the CP15 registers in the Register view like there is for other registers?

0 Aakash Kedia over 1 year ago in reply to Kier

TI__Mastermind 26145 points

Hi Kier,

For CP15_SYSTEM_CONTROL registers, I can take that feedback to our concerned team and plan to get that added.

Do keep us posted on any luck with your debugging.

Best Regards,
Aakash

0 Kier over 1 year ago in reply to Aakash Kedia

Mastermind 8855 points

Hi Aakash,

I seem to have reached a dead end with this problem. The weird thing is that when I add a breakpoint to debug it seems to actually fix the problem! Let me explain.

As mentioned in the original post, the assert trap in MpuP_enable() fails because type == 15. This is because, for reasons unknown, instruction and data caches are already enabled as indicated by the I and C flags in the CP15_SYSTEM_CONTROL register:

However, if I repeat the test and just put a BP at line 140, just before the assert trap, then type is always 0. CP15_SYSTEM_CONTROL reg has its reset value (no cache enabled) which means the MpuP_enable() continues successfully if allowed to run on:

It's like a quantum observation problem. When I try to examine the issue, the behaviour changes. I'm at a loss as to how to debug the issue. Do you have any suggestions please?

0 Aakash Kedia over 1 year ago in reply to Kier

TI__Mastermind 26145 points

Hi Kier,

We will try this experimentation to use dev boot mode for IPC notify application on TI-EVM (control card) to confirm if the issue is reproducible or not.

Best Regards,
Aakash

0 Gunjan Kumari over 1 year ago in reply to Aakash Kedia

TI__Intellectual 2640 points

Hi Kier,

I have tried IPC notify application on am263x (control card) and for me CacheP_getEnabled() returns 0. And this issue is not reproducible.

Regards,
Gunjan

0 Adam Lancaster over 1 year ago

Prodigy 41 points

I'm also experiencing this issue.

The application behaves as normal when using SDK version 08.05.00.24, but the exact symptoms Kier describes are present when building with version 09.01.00.41

I am able to consistently get past this point (specifically, line 140 of MpuP_armv7r.c), but only by removing an MPU entry in SysConfig which marks a small region of OCRAM as non-cached (for the .bss:ENET_CPPI_DESC section). Of course, this is no good, as the Enet library then throws an assert because that must be non-cached.

We have no custom boot code, i.e. are using boot_armv7r_asm.S from the SDK.

Resets, power cycling, and whether or not the debugger is attached before this assert is hit makes no difference.

"Kier said:

It could be that mpu_init is being called twice"

Modifying the MpuP_enable function to use a static variable to track how many times the if(MpuP_isEnable()==0U) branch is entered show that it is being entered twice.

0 Gunjan Kumari over 1 year ago in reply to Gunjan Kumari

TI__Intellectual 2640 points

Hello,

I am able to successfully run code when I am compiling and debugging ipc_notify example for first time. But if I do CPU reset on all cores and then perform debug once again, my program gets 'type=15'. I have raised JIRA for fixing this issue.

Regards,
Gunjan

0 Kier over 1 year ago in reply to Adam Lancaster

Mastermind 8855 points

Adam Lancaster said:
I'm also experiencing this issue.

Thanks Adam. Good to know the problem isn't just local to me.

Adam Lancaster said:
Modifying the MpuP_enable function to use a static variable

I thought of doing that but auto init is called after MPU init so I assumed that any variable variable will be wiped. I will try that again.

So just to be sure, you can show by this method that MpuP_enable() is called two times?

0 Gunjan Kumari over 1 year ago in reply to Kier

TI__Intellectual 2640 points

Hi,

Did you get this issue even when you power cycle the board and run ipc eg for very first time after power cycle?

Thanks,
Gunjan

0 Kier over 1 year ago in reply to Gunjan Kumari

Mastermind 8855 points

Hi Gunjan,

I tried the example:ipc_notify_echo_am263x-cc_r5fss0-0_freertos_ti-arm-clang from 9.1.0.41 SDK with CCS12.5 however, it does not seem to work at all.

After power on then Resume, I get no output in the Console window. After a few seconds, I stop the cores and see the following:

After Reset, Resume and Restart, I still get no output. After a few seconds, I stop the cores and see the following:

0 Gunjan Kumari over 1 year ago in reply to Kier

TI__Intellectual 2640 points

Hi Kier,

From your first snapshot it looks like program executed successfully, can you try (after power cycle) launching serial terminal for output in CCS (using your respective UART com<> port)
Steps: View -> Terminal -> Open a terminal ->

For second snapshot it seems they are not able to sync with Cortex_R5_0.

Thanks,
Gunjan

0 Kier over 1 year ago in reply to Gunjan Kumari

Mastermind 8855 points

Thank you. Yes, UART Log works, I assumed incorrectly that CCS Log would be used.

The example runs first time and repeated resets don't produce the issue for me. I guess we just have to hope that the resolution to your Jira will also fix my issue. Can you post a link to the Jira please?

0 Gunjan Kumari over 1 year ago in reply to Kier

TI__Intellectual 2640 points

Hello,

Jira link for issue: https://jira.itg.ti.com/browse/MCUSDK-13200

Regards,
Gunjan

0 Kier over 1 year ago in reply to Gunjan Kumari

Mastermind 8855 points

Thanks but that one doesn't work for me. I was expecting a JIRA link similar to this one:

[EXT_EP-11682] CLB Tile Design settings reset when changing tile Name - Software Issue Report (SIR)

0 Gunjan Kumari over 1 year ago in reply to Kier

TI__Intellectual 2640 points

Hello Kier,
You can track it with the help of FAE. Our jira link is accessible internally.

Also, Can you do these steps on each core when you get issue in MpuP_enable(for each core):

Step1: Pause program
Step2: CPU reset
Step3: Restart

For me it removes the issue.

Thanks,
Gunjan

0 Kier over 1 year ago in reply to Gunjan Kumari

Mastermind 8855 points

Hello Gunjan,

Gunjan Kumari said:
You can track it with the help of FAE. Our jira link is accessible internally.

Why is your JIRA link internal and others are external? Just wondering why you seem to have two different systems.

Gunjan Kumari said:
Can you do these steps on each core when you get issue in MpuP_enable(for each core)

You can see in my original question text that I describe this action. The complaint is that this occurs on first (and therefore most important) execution.

The only progress made on this topic is that you have created a JIRA ticket I cannot see. Please let me know the cause and solution of this problem.

0 Gunjan Kumari over 1 year ago in reply to Kier

TI__Intellectual 2640 points

Hello Kier,

Accessibility of JIRA links differs from project to project. You can reach out to your FAE for tracking it.

Thanks,
Gunjan

0 Kier over 1 year ago in reply to Gunjan Kumari

Mastermind 8855 points

I have made some progress.

There has been a lot of code change in our application since I raised this issue and now the the original trigger (assert trap in MpuP_enable()) does not occur for reasons unknown.

However, I did speculate previously that a possible cause could be that __mpu_init() / MpuP_init() is being called twice. I now have evidence that was/is probably the case all along.

I instrumented the code (purple box) such that a separate counter is updated by each core just before MpuP_init() is called. I am using a pointer to unused memory instead of a variable because this circumvents the cinit initialisation routines which would otherwise reset the counter. In other words, the pointer contents are not reset if the start-up code is called again for any reason.

The results show that both cores in SS0 is, somehow, calling __mpu_init() a second time.

- I do not think the program is starting again from the software vector. If I put a BP at 0x0, in all the cores, I only hit it once.

- I than added a SW breakpoint at the call to MpuP_init() with a Skip Count of 1 to understand the reason for the second execution of MpuP_init(). The code ran to main() successfully as expected but on Resuming, the SW breakpoint seems to cause a PABT:

- I then started a 'PC Trace' and I find that the and of the PC Trace does not record anything to do with the PABT handler:

However, notice that the PC Trace has stopped at line 444 in Port.c (WFI). This could be a coincidence but it reminded me of this issue:

AM2634: Multicore FreeRTOS Empty Project Doesn't Work Correctly - Arm-based microcontrollers forum - Arm-based microcontrollers - TI E2E support forums

The advice there is to remove the WFI instruction from line 444 of Port.c and rebuild. I did so here and now this particular problem of double __mpu_init() has gone although the code does appear to fall over at other points but let's ignore that for the moment.

On the one hand I'm relieved to have made some progress but on the other, the normal debug tools have let me down and I've l learned very little.

So I hope to find answers to the following questions on Wednesday in our arranged debug session.

1) Why does the PC Trace stop at the WFI instruction?

2) Why does a SW BP with Skip Count > 0 cause a PABT?

3) Why would the WFI instruction apparently cause the double call to __mpu_init()?

0 Kier over 1 year ago in reply to Kier

Mastermind 8855 points

Hi Gunjan,

As per your request, stepping out of the the second __mpu_init() call simply returns the PC to the boot code.

There are no function calls before __mpu_init() so we can infer that the program is starting again from the entry point. However, as demonstrated, the reason cannot be deduced because:

- If I set a BP at 0x0, I just get a PABT.

- The PC Trace ends with WFI instruction in vApplicationIdleHook().

I had one more thought to check if a reset is occurring so I checked the RST_STATUS_CAUSE of both sub systems:

Bit0: POR Reset

Bit1: Warm Reset

Bit 7: Reset for CORE0 and MSS_CORE00_VIM caused because of reset request by debugger in CORE00

Bit 9: Reset for CR5SS0 by the RESET FSM using MSS_CTRL::R5SS0_CONTROL_RESET_FSM_TRIGGER.

Bit 10: MSS_RCM.MSS_CR5SS_POR_RST_CTRL0 Reset Source: mod_g_rst_n.

The registers for SS0 and SS1 are the same except for Bit 7 and moreover, the register values are identical to the values seen before the problem occurs.

The conclusion is that I cannot rule out a reset but if it is occurring, it cannot be distinguished from the normal condition.

0 Gunjan Kumari over 1 year ago in reply to Kier

TI__Intellectual 2640 points

Hello Kier,

It might have happened that your program accessed some wrong/unexpected address and it caused a reset.
For debugging this, you can corrupt location 0x0.
When you start your program, just before first call to mpu_init, corrupt value at 0x0 (instead of jumping to Common_StartUp_Steps address). So that when you come back at 0x00 an exception will occur and you can trace-back reason for this corruption.

Best regards,
Gunjan

0 Kier over 1 year ago in reply to Gunjan Kumari

Mastermind 8855 points

Hello Gunjan,

I'm not sure why this is different to inserting a breakpoint at 0x0. I shall try again with the latest code.

0 Gunjan Kumari over 1 year ago in reply to Kier

TI__Intellectual 2640 points

Hello Kier,

Are you able to debug issue of reset in your code?

Thanks,
Gunjan