This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

AM5728: Jailhouse Hypervisor error

Part Number: AM5728
Other Parts Discussed in Thread: TPIC2810

Hi there,

I'm working with Linux processor-sdk 05.02.00.10 and RTOS processor-sdk 4.03.00.05.

I'm using Jailhouse hypervisor on a AM5728 based custom board, on which I'm running Linux on A15 core0 and TI-RTOS on A15 core1.

Linux kernel is 4.14.79 with RT-PREEMPT patch, supplied by TI.
After starting up TI-RTOS on A15 core1 I get the following dumps from the kernel:

[ 268.237085] ------------[ cut here ]------------
[ 268.468851] WARNING: CPU: 0 PID: 28 at /home/stx-ti/Projects/tisdk/build/arago-tmp-external-linaro-toolchain/work-shared/am57xx-evm/kernel-source/drivers/bus/omap_l3_noc.c:147 l3_interrupt_handler+0x254/0x370
[ 268.487441] 44000000.ocp:L3 Custom Error: MASTER MPU TARGET GPMC (Read): Data Access in User mode during Functional access
[ 268.498529] Modules linked in: jailhouse(O) vnetdevice(O) can_raw can ecatmc r8152 mc_gp_timer ec_master xhci_plat_hcd xhci_hcd sha512_generic sha512_arm usbcore sha256_generic sha1_generic sha1_arm dwc3 udc_core usb_common md5 cbc ti_prueth pru_rproc pruss pruss_intc omap_aes_driver c_can_platform c_can can_dev pruss_soc_bus omap_sham omap_wdt phy_omap_usb2 ahci_platform libahci_platform libahci libata scsi_mod ti_vip ti_vpe ti_sc videobuf2_dma_contig ti_csc ti_vpdma v4l2_mem2mem videobuf2_memops videobuf2_v4l2 videobuf2_core dwc3_omap rtc_omap extcon_palmas rtc_palmas omap_des gpio_tpic2810 ov2659 v4l2_fwnode des_generic crypto_engine omap_crypto omap_remoteproc virtio_rpmsg_bus rpmsg_core remoteproc sch_fq_codel cryptodev(O)
[ 268.563351] CPU: 0 PID: 28 Comm: irq/23-l3-app-i Tainted: G O 4.14.79-rt47-g6b79697728 #2
[ 268.563353] Hardware name: Generic DRA74X (Flattened Device Tree)
[ 268.563354] Backtrace:
[ 268.563366] [<c010b808>] (dump_backtrace) from [<c010baec>] (show_stack+0x18/0x1c)
[ 268.563370] r7:00000009 r6:60000013 r5:00000000 r4:c0c55160
[ 268.563377] [<c010bad4>] (show_stack) from [<c07728bc>] (dump_stack+0x90/0xa4)
[ 268.563386] [<c077282c>] (dump_stack) from [<c012aea0>] (__warn+0xec/0x104)
[ 268.563390] r7:00000009 r6:c099beb8 r5:00000000 r4:d4a31e40
[ 268.563396] [<c012adb4>] (__warn) from [<c012aef8>] (warn_slowpath_fmt+0x40/0x48)
[ 268.563400] r9:0000000b r8:d4a1c0d0 r7:c099bd24 r6:00000002 r5:c099bde4 r4:c099be88
[ 268.563406] [<c012aebc>] (warn_slowpath_fmt) from [<c041e108>] (l3_interrupt_handler+0x254/0x370)
[ 268.563408] r3:d4a14f00 r2:c099be88
[ 268.563410] r4:80080003
[ 268.563416] [<c041deb4>] (l3_interrupt_handler) from [<c0181dd8>] (irq_forced_thread_fn+0x28/0x7c)
[ 268.563422] r10:c0181db0 r9:d4a1c440 r8:d49e1100 r7:00000001 r6:00000000 r5:d49e1100
[ 268.563427] r4:d4a1c440
[ 268.563431] [<c0181db0>] (irq_forced_thread_fn) from [<c0182130>] (irq_thread+0x130/0x208)
[ 268.563434] r7:00000001 r6:00000000 r5:ffffe000 r4:d4a1c464
[ 268.563440] [<c0182000>] (irq_thread) from [<c0149368>] (kthread+0x164/0x16c)
[ 268.563443] r10:d4871b20 r9:c0182000 r8:d4a1c440 r7:d4a30000 r6:00000000 r5:d4a1c480
[ 268.563444] r4:d49e3080
[ 268.563450] [<c0149204>] (kthread) from [<c0107a90>] (ret_from_fork+0x14/0x24)
[ 268.563453] r10:00000000 r9:00000000 r8:00000000 r7:00000000 r6:00000000 r5:c0149204
[ 268.563454] r4:d4a1c480
[ 268.563456] ---[ end trace 0000000000000002 ]---
root@am57xx-evm:~#

These dump come up with no apparent regularity. Sometimes a single dump comes up after 1 minute. Sometimes after an hour the dumps show up every 130 seconds, and then stop.

The RTOS inmate is in constant and periodic interaction with the GPMC from the beginning.

The GPMC is disabled in the dts:

root@am57xx-evm:~# dtc -I fs -O dts /sys/firmware/devicetree/base

...

...

gpmc@50000000 {
compatible = "ti,am3352-gpmc";
ti,hwmods = "gpmc";
gpio-controller;
gpmc,num-waitpins = <0x2>;
status = "disabled";
#interrupt-cells = <0x2>;
#address-cells = <0x2>;
interrupts = <0x0 0xf 0x4>;
gpmc,num-cs = <0x8>;
#size-cells = <0x1>;
dma-names = "rxtx";
reg = <0x50000000 0x37c>;
#gpio-cells = <0x2>;
dmas = <0xf8 0x4 0x0>;
interrupt-controller;
};

If the GPMC is disabled in the dts, why am I getting these dumps and how can I stop them?

Thanks a lot,

Nir.

  • Hi Nir,

    The kernel dump basically tells ARM still access the GPMC module which is not accessible. I will check the Linux kernel to see if there is anything other change needed other than the DTS to disable GPMC. I will keep you posted.

  • Hi,

    Just wanted to add that I made sure that when running the RTOS application native, and when running it as a jailhouse inmate, the pinctrl is identical in both cases.

    Not sure if this is of any help.

    Waiting for your post.

    Thanks,

    Nir.

  • Nir,

    Please try to apply the following kernel patch to see if the kernel dump still happens.

    diff --git a/arch/arm/mach-omap2/omap_hwmod_7xx_data.c b/arch/arm/mach-omap2/omap_hwmod_7xx_data.c
    index 24fcc874b2c9..ea3237f1397d 100644
    --- a/arch/arm/mach-omap2/omap_hwmod_7xx_data.c
    +++ b/arch/arm/mach-omap2/omap_hwmod_7xx_data.c
    @@ -4567,7 +4567,6 @@ static struct omap_hwmod_ocp_if *dra7xx_hwmod_ocp_ifs[] __initdata = {
            &dra7xx_l4_per1__gpio6,
            &dra7xx_l4_per1__gpio7,
            &dra7xx_l4_per1__gpio8,
    -       &dra7xx_l3_main_1__gpmc,
            &dra7xx_l3_main_1__gpu,
            &dra7xx_l4_per1__hdq1w,
            &dra7xx_l4_per1__i2c1,
    
  • Hi,

    The kernel dump still comes up.

    Thanks,

    Nir.

  • Hi Nir,

    please undo the previous hwmod patch but try to add "ti,no-idle;" in the gpmc DT node which you disabled. If it doesn't solve the issue, what is the gpmc interrupt number used in RTOS inmate application? Did you skip it in the DTS file?

  • Hi,

    I reverted the previous patch and added "ti,no-idle" to the GPMC node.

    The dump still occurs.

    My RTOS app doesn't use GPMC interrupt. We use polling method, but still I skipped the GPMC interrupt number 20, and passed it to the 

    RTOS inmate cell.

    The dump still occurs.

    Thanks,

    Nir.

  • Hi Nir,

    In your use case, there are 3 possibilities which could cause this l3-noc dump.

    1) a bug in a driver (in any cell) causes NULL pointer dereferrence access;

    2) the gpmc is not clocked;

    3) the gpmc far-end device reports access failure which will be translated to gpmc access error then leads to the l3-noc dump.

    Given that your RTOS application periodically accesses gpmc and the l3-noc dump only happens randomly, the issue seems to be case #3. I recommend you to examine the the device attached to gpmc and debug it in your RTOS application.

  • Hi Bin,

    According to AM572x Technical Reference Manual, page 3558, GPMC Error Handling, the registers GPMC_ERR_TYPE and GPMC_ERR_ADDRESS hold some precious data in case of an error.

    After the system is powered up these two registers hold the value zero, and after the dump occurs for the first time I read these values:

    root@am57xx-evm:~#
    root@am57xx-evm:~# devmem2 0x50000044
    /dev/mem opened.
    Memory mapped at address 0xb6f80000.
    Read at address 0x50000044 (0xb6f80044): 0x40413430
    root@am57xx-evm:~#
    root@am57xx-evm:~#
    root@am57xx-evm:~# devmem2 0x50000048
    /dev/mem opened.
    Memory mapped at address 0xb6f55000.
    Read at address 0x50000048 (0xb6f55048): 0x00000211
    root@am57xx-evm:~#

    Meaning that a read was attempted of the unsupported address 0x50413430 in the GPMC registers address space.

    I'm going along with case 3 and some problems come up, maybe you can help me sort them out...

    I'm using CCS to debug the A15 core1 on which the RTOS jailhouse inmate is running, and I set a HW watchpoint on 0x50413430.

    When I attempt to intentionally read this address from the RTOS side, the watchpoint is caught, and when I allow the core to continue running I get the familiar dump from the kernel. The mentioned GPMC registers hold exactly the same values as if  the real error happened.

    When the error comes up spontaneously though, the watchpoint doesn't get caught, which brings me to the conclusion that the source is NOT the RTOS inmate running on A15 core1.

    Now I'm using CCS to debug the A15 core0 on which Linux is running as the root cell, and I set a HW watchpoint on 0x50413430.

    When I attempt to intentionally read this address with devmem2, I get the familiar dump from the kernel, but the watchpoint doesn't get caught.

    When the error comes up spontaneously, the watchpoint doesn't get caught either.

    In order to find out if the source of the dump is indeed the Linux, I first have to make sure that watchpoints work.

    Only after that I'll be able to determine if the origin of the real error is indeed the Linux side.

    Can you please advise?

    Thanks a lot,

    Nir.

  • Just wanted to add that when debugging A15 core0, and setting HW watchpoint in the RAM address space (0xB0000000), and reading it with devmem2

    the read was successful and the watchpoint didn't caught either.

    Thanks,

    Nir.

  • Hi,

    Thanks for your answer on 

    In this case, I really have no idea how to translate physical address to a virtual address. I have no idea which process is causing the dump..

    But anyway, I took a different approach and decided to use the hypervisor root cell config to completely block Linux from accessing the GPMC registers address space.

    After doing so, I can't read or write the registers GPMC_ERR_TYPE and GPMC_ERR_ADDRESS because the hypervisor immediately parks cpu 0, obviously.

    However, the dump still happens.

    No other core or DSP is activated besides A15 core0 and 1.. 

    Any idea what else I can test?

    Thanks,

    Nir.

  • Hi Nir,

    Nir Geller said:

    But anyway, I took a different approach and decided to use the hypervisor root cell config to completely block Linux from accessing the GPMC registers address space.

    After doing so, I can't read or write the registers GPMC_ERR_TYPE and GPMC_ERR_ADDRESS because the hypervisor immediately parks cpu 0, obviously.

    However, the dump still happens.

    Does the dump happen when you run RTOS on core1 or not? If not, the access should be happened in Linux on core0; otherwise, it is in RTOS on core1. does it make sense?

  • Hi,

    The dump happens when I run RTOS on core1.

    The thing is, that I ran the RTOS inmate while debugging core 1 with CCS, and placing a HW watchpoint on the address 0x50413430.

    When the dump came out the watchpoint didn't get caught, which leads to the conclusion that core1 doesn't perform the read attempt.

    What else might be causing this? 

  • Hi Nir,

    I am a Linux guy and not familiar with RTOS. But I believe when you attach CCS/JTAG and set the HW watchpoint, it only monitors the activities between A15 core1 and L3, it won't catch any access from other cores, for example DSP or DMA...

  • That's my understanding as well.

    Therefore, when I'm running RTOS inmate on A15 core1, and debugging core1 with CCS, and setting HW watchpoint on some address, and the watchpoint doesn't get caught, it leads me to conclude that RTOS is not responsible for the dump from the linux kernel.

    DSP is not activated.

    Can you tell me more about DMA? can I trap accesses done by DMA to GPMC registers address space?

    Thanks.

  • Hi Nir,

    Here is the response I got from internal about DMA access to GPMC address space - based on the ARM documentation, the Hardware breakpoint is a comparator between a CPU bus access and a memory address. So it is almost certain that a DMA access will not trigger the breakpoint (watchpoint).