This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Linux/DRA718: Crash during Linux system boot up

Part Number: DRA718

Tool/software: Linux

Hi TI team,

During Linux system run on DRV718, two crash happpen, please help check it:

1, Since we're not using IPU1, then how to disable this function

------------[ cut here ]------------
[ 0.241593] WARNING: CPU: 0 PID: 1 at /mnt/yocto/yocto_repo/build/arago-tmp-external-linaro-toolchain/work-shared/delphi-jlr-isp-proto/kernel-source/drivers/bus/omap_l3_noc.c:147 l3_interrupt_handler+0x264/0x37c()
[ 0.241601] 44000000.ocp:L3 Custom Error: MASTER MPU TARGET IPU1 (Idle): Data Access in User mode during Functional access
[ 0.241606] Modules linked in:
[ 0.241617] CPU: 0 PID: 1 Comm: swapper Not tainted 4.4.45-g89944627d5 #1
[ 0.241622] Hardware name: Generic DRA72X (Flattened Device Tree)
[ 0.241646] [<c00162ec>] (unwind_backtrace) from [<c0013e48>] (show_stack+0x10/0x14)
[ 0.241661] [<c0013e48>] (show_stack) from [<c00331fc>] (warn_slowpath_common+0x80/0xac)
[ 0.241673] [<c00331fc>] (warn_slowpath_common) from [<c0033268>] (warn_slowpath_fmt+0x40/0x64)
[ 0.241684] [<c0033268>] (warn_slowpath_fmt) from [<c028ee78>] (l3_interrupt_handler+0x264/0x37c)
[ 0.241698] [<c028ee78>] (l3_interrupt_handler) from [<c0068e14>] (handle_irq_event_percpu+0x9c/0x158)
[ 0.241710] [<c0068e14>] (handle_irq_event_percpu) from [<c0068f08>] (handle_irq_event+0x38/0x5c)
[ 0.241722] [<c0068f08>] (handle_irq_event) from [<c006bd04>] (handle_fasteoi_irq+0xe0/0x1a8)
[ 0.241733] [<c006bd04>] (handle_fasteoi_irq) from [<c006856c>] (generic_handle_irq+0x24/0x34)
[ 0.241744] [<c006856c>] (generic_handle_irq) from [<c00687d0>] (__handle_domain_irq+0x70/0xdc)
[ 0.241754] [<c00687d0>] (__handle_domain_irq) from [<c0009564>] (gic_handle_irq+0x38/0x64)
[ 0.241762] [<c0009564>] (gic_handle_irq) from [<c00148c0>] (__irq_svc+0x40/0x74)

2, Please help check it 

[ 4.654477] ------------[ cut here ]------------
[ 4.659128] WARNING: CPU: 0 PID: 212 at /mnt/yocto/yocto_repo/build/arago-tmp-external-linaro-toolchain/work-shared/delphi-jlr-isp-proto/kernel-source/lib/dma-debug.c:1169 check_for_stack+0xbc/0xf8()
[ 5.133014] omap-dma-engine 4a056000.dma-controller: DMA-API: device driver maps memory from stack [addr=d9565de0]
[ 5.242602] Modules linked in:
[ 5.245685] CPU: 0 PID: 212 Comm: jffs2_gcd_mtd24 Tainted: G W 4.4.45-g89944627d5 #1
[ 5.372601] Hardware name: Generic DRA72X (Flattened Device Tree)
[ 5.378741] [<c00162ec>] (unwind_backtrace) from [<c0013e48>] (show_stack+0x10/0x14)
[ 5.399409] [<c0013e48>] (show_stack) from [<c00331fc>] (warn_slowpath_common+0x80/0xac)
[ 5.409587] [<c00331fc>] (warn_slowpath_common) from [<c0033268>] (warn_slowpath_fmt+0x40/0x64)
[ 5.418528] [<c0033268>] (warn_slowpath_fmt) from [<c0287330>] (check_for_stack+0xbc/0xf8)
[ 5.426959] [<c0287330>] (check_for_stack) from [<c028876c>] (debug_dma_map_sg+0x10c/0x18c)
[ 5.435464] [<c028876c>] (debug_dma_map_sg) from [<c034215c>] (spi_map_buf+0x304/0x348)
[ 5.443618] [<c034215c>] (spi_map_buf) from [<c0343a40>] (spi_flash_read+0x160/0x218)
[ 5.451486] [<c0343a40>] (spi_flash_read) from [<c032b8e0>] (m25p80_read+0xa4/0x1d4)
[ 5.459445] [<c032b8e0>] (m25p80_read) from [<c032bdbc>] (spi_nor_read+0x74/0xc8)
[ 5.468864] [<c032bdbc>] (spi_nor_read) from [<c03262a8>] (part_read+0x4c/0x8

Thanks.

  • Hi,

    in file "arch/arm/boot/dts/dra71-evm.dts" (or the correct .dts you use for your board) can you change this:

    &ipu1 {
    status = "okay";
    memory-region = <&ipu1_cma_pool>;
    };

    to:

    &ipu1 {
    status = "disabled";
    memory-region = <&ipu1_cma_pool>;
    };

    Regards,
    Yordan
  • Hi,

    Sorry, I put a wrong message, we load IPU1 in MLO and we are not using IPU2, then how to fix the first crash.

    BRs
    Jeremy
  • Hi Jeremy,

    do you use Early Boot Late attach? Have you checked this document:
    processors.wiki.ti.com/.../Early_Boot_and_Late_Attach_in_Linux

    Regards,
    Yordan
  • Hi Yordan,

    We're already done the load IPU1 in Early Boot.
    What I'm asking is we have two crashes in kernel, please help check the crash.
    1,
    ------------[ cut here ]------------
    [ 0.241593] WARNING: CPU: 0 PID: 1 at /mnt/yocto/yocto_repo/build/arago-tmp-external-linaro-toolchain/work-shared/delphi-jlr-isp-proto/kernel-source/drivers/bus/omap_l3_noc.c:147 l3_interrupt_handler+0x264/0x37c()
    [ 0.241601] 44000000.ocp:L3 Custom Error: MASTER MPU TARGET IPU1 (Idle): Data Access in User mode during Functional access
    [ 0.241606] Modules linked in:
    [ 0.241617] CPU: 0 PID: 1 Comm: swapper Not tainted 4.4.45-g89944627d5 #1
    [ 0.241622] Hardware name: Generic DRA72X (Flattened Device Tree)
    [ 0.241646] [<c00162ec>] (unwind_backtrace) from [<c0013e48>] (show_stack+0x10/0x14)
    [ 0.241661] [<c0013e48>] (show_stack) from [<c00331fc>] (warn_slowpath_common+0x80/0xac)
    [ 0.241673] [<c00331fc>] (warn_slowpath_common) from [<c0033268>] (warn_slowpath_fmt+0x40/0x64)



    2,
    [ 4.654477] ------------[ cut here ]------------
    [ 4.659128] WARNING: CPU: 0 PID: 212 at /mnt/yocto/yocto_repo/build/arago-tmp-external-linaro-toolchain/work-shared/delphi-jlr-isp-proto/kernel-source/lib/dma-debug.c:1169 check_for_stack+0xbc/0xf8()
    [ 5.133014] omap-dma-engine 4a056000.dma-controller: DMA-API: device driver maps memory from stack [addr=d9565de0]
    [ 5.242602] Modules linked in:
    [ 5.245685] CPU: 0 PID: 212 Comm: jffs2_gcd_mtd24 Tainted: G W 4.4.45-g89944627d5 #1
    [ 5.372601] Hardware name: Generic DRA72X (Flattened Device Tree)

    Thanks

    BRs
    Jeremy
  • Hi Jeremy,

    did you added "ti,late-attach", "ti,no-idle-on-init" and "ti,no-reset-on-init" properties in device tree as described here:
    processors.wiki.ti.com/.../Early_Boot_and_Late_Attach_in_Linux

    for late attaching IPU1 you shout add them to the following nodes: ipu1, timer11, timer7, timer8, mmu_ipu1.

    Regards,
    Yordan
  • 2570.deviceTree.zipHi Yordan,

    I updated the device tree file, and the two crashes still exist, attached is our device tree, please help check it, thanks.

    BRs

    Jeremy

  • Any update??
  • Hi Jeremy,

    What firmware are you running on IPU-1? Is it compiled using Vision-SDK or are you running the IPC examples defined as part of the Linux-SDK?
    Can you apply the patch in the link and let us know if the issue is still observed?

    git.ti.com/.../a4abe03555eda8373de671b272ea04ed90d15185

    Regards
    Shravan
  • Hi Sharavan,

    On IPU-1, We run AUTOSAR from Vector, and IPU-1 is started up by MLO.
    After applied your patch, this issue still exist, please help check it.

    Thanks
    Jeremy
  • Hi Jeremy,

    If you're running AUTOSAR you can bootstrap the core from MLO/U-boot but you can't late-attach. You need to disable the core from Linux perspective.
    Please refer section 3.6 of the Vision-SDK Linux userguide for more information.

    Regards
    Shravan
  • Hi Shravan,


    Could you please provide this guide to me, seems I don't have it.
    Thanks.


    BRs
    Jeremy
  • Hi Jeremy,

    You can get the docs from the below link.
    software-dl.ti.com/.../PROCESSOR_SDK_VISION_03_06_00_00_Docs_Only.zip

    Please refer to section 3.6 of the file VisionSDK_Linux_UserGuide.pdf present in the Linux folder.

    Regards
    Shravan
  • Hi Shravan,

    On my project, IPU1 is run AUTOSAR and boot up in MLO, and IPU2 is not used .

    What I need to set in Linux device tree is:

    DISABLE_COMPLETE(ipu2);

    is that right?

    And I cannot find PROC_IPU1_0_INCLUDE define in my Linux code.

    Please advise, thanks.

    BRs

    Jeremy

  • Hi Jeremy,

    If IPU-2 is not used, please add the below entry to your kernel DTB:

    &ipu2
    {
    status = "disabled";
    };

    If IPU-1 is being loaded from MLO, you need to add below entries to your DTB:
    DISABLE_COMPLETE(ipu1);
    DISABLE_COMPLETE(mmu_ipu1);
    DISABLE_COMPLETE(<timer_ipu1>); // replace timer_ipu1 by the timer being used by IPU1. If there are multiple timers add disable_complete for each timer.

    If after all these changes, it still fails, please attach the Linux DTB being used (not DTS file, but final DTB file).

    Regards
    Shravan
  • Hi Shravan,

    Can you provide the definition of DISABLE_COMPLETE(), seem it is not supported in my code.

    BTW, attached is my device tree. 

    BRs

    Jeremy

    DTB.zip

  • Hi Jeremy,

    Attached the definitions you can add to the beginning of your DTS file.

    #define DISABLE_PRCM(label) &label { ti,no-idle; ti,no-reset-on-init; }
    #define DISABLE_COMPLETE(label) &label { status = "disabled"; ti,no-idle; ti,no-reset-on-init; }
    #define LATE_ATTACH(label) &label { ti,late-attach; ti,no-idle; ti,no-reset-on-init; }

    Can you please try with these changes and let us know if it works. Else please send the modified DTB file.

    Regards
    Shravan
  • Hi Shravan,

    The crashes still exist, please help check the device tree we are using.

    And IPU1 use timer10

    Thanks

    Jeremy

    dms_inuse_dtb.zip

  • Hi Shravan,

    Could you please help check the device tree?
    Thanks.

    BRs
    Jeremy
  • Hi Jeremy,

    For timer 10 I notice you've disabled the timer, instead can you please add the late_attach attribute for the timer? I.e for the timer, please add 

    &timer10

    {

    ti,no-idle;

    ti,no-reset-on-init;

    ti,late-attach;

    };

    If the error still exists, you would need to connect a debugger to IPU-1 to see where its hung.

    Regards
    Shravan

  • Hi Shravan,

    The errors still exist
    Now our situation is: IPU1 is started up by MLO and works fine, A15 is work fine with these two crashes, what we want is to eliminate the two crashes.


    For the fist one, if we don't run IPU1, then the first crash is gone.

    For the second one, please help also check it:
    [ 4.654477] ------------[ cut here ]------------
    [ 4.659128] WARNING: CPU: 0 PID: 212 at /mnt/yocto/yocto_repo/build/arago-tmp-external-linaro-toolchain/work-shared/delphi-jlr-isp-proto/kernel-source/lib/dma-debug.c:1169 check_for_stack+0xbc/0xf8()
    [ 5.133014] omap-dma-engine 4a056000.dma-controller: DMA-API: device driver maps memory from stack [addr=d9565de0]
    [ 5.242602] Modules linked in:
    [ 5.245685] CPU: 0 PID: 212 Comm: jffs2_gcd_mtd24 Tainted: G W 4.4.45-g89944627d5 #1
    [ 5.372601] Hardware name: Generic DRA72X (Flattened Device Tree)
    [ 5.378741] [<c00162ec>] (unwind_backtrace) from [<c0013e48>] (show_stack+0x10/0x14)
    [ 5.399409] [<c0013e48>] (show_stack) from [<c00331fc>] (warn_slowpath_common+0x80/0xac)
    [ 5.409587] [<c00331fc>] (warn_slowpath_common) from [<c0033268>] (warn_slowpath_fmt+0x40/0x64)
    [ 5.418528] [<c0033268>] (warn_slowpath_fmt) from [<c0287330>] (check_for_stack+0xbc/0xf8)
    [ 5.426959] [<c0287330>] (check_for_stack) from [<c028876c>] (debug_dma_map_sg+0x10c/0x18c)
    [ 5.435464] [<c028876c>] (debug_dma_map_sg) from [<c034215c>] (spi_map_buf+0x304/0x348)
    [ 5.443618] [<c034215c>] (spi_map_buf) from [<c0343a40>] (spi_flash_read+0x160/0x218)
    [ 5.451486] [<c0343a40>] (spi_flash_read) from [<c032b8e0>] (m25p80_read+0xa4/0x1d4)
    [ 5.459445] [<c032b8e0>] (m25p80_read) from [<c032bdbc>] (spi_nor_read+0x74/0xc8)
    [ 5.468864] [<c032bdbc>] (spi_nor_read) from [<c03262a8>] (part_read+0x4c/0x8

    BRs
    Jeremy
  • Hi Jeremy,

    What is the first crash observed? From your above post this isn't clear.

    Is the second crash observed when you don't load IPU-1 from SPL? So that means there's nothing running on IPU-1 correct? Is there anything else running on any other core other than A15?

    The second error observed is when you're trying to map and SG list. Can you add some debug logs in the function: 

    spi_map_buf present in drivers/spi/spi.c on what parameters are being passed to the function dma_map_sg in the same function?

    Regards

    Shravan

  • Hi Shravan,

    For the first crash, with ipu1 run, it exists, without IPU1 run, the crash is gone, I think maybe some resource conflict when IPU1 run.


    For the second one, always exists, I will add debug log.

    BRs
    Jeremy
  • Hi Jeremy,

    Are you using peripherals such as SPI from IPU-1? If yes, you will need to add the necessary flags to the kernel device-tree?
    Do you have access to a debugger? It will be helpful to figure out where the IPU -1 firmware crash occurs during kernel execution.

    Regards
    Shravan
  • Hi Shravan,

    A15 and IPU1 are working fine, these two errors have no impact on the functioning of A15 and M4, we want to remove the errors and warnings to reduce the system start up time .

    BRs

    Jeremy 

  • Hi Jeremy,

    Understood. The warnings usually mean the IPU is accessing some module that isn't clocked since its being reset by the kernel.
    To prevent these errors, you need to add LATE_ATTACH / DISABLE_PRCM etc. flags to the modules being accessed by the IPU.
    To figure out which module is being reset by the kernel, you will need to connect a debugger and step through the kernel execution to figure out where the failure occures.

    Regards
    Shravan