This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TDA2P-ACD: Linux Kernel panic during shutdown, regarding shared memory allocation

Part Number: TDA2P-ACD


Hi, everybody,

I am using PROCESSOR_SDK_VISION_03_05_00_00 package. In order to run a use case in Linux environment and to provide the communication between the links comprising the use case, the shared memory gets allocated utilizing the OSA_memAllocSR routine. The memory is allocated within the Shared Region 1 (SR1). If the use case uses up to 189MB of the memory allocated in this manner, the use case runs and stops correctly and the shutdown command "shutdown -h -P now" performs correctly as well. However, if the use case allocates more than 189MB within the SR1 and performs some writing operations (utilizing, e.g. memset routine), it is able to run and stop correctly, but the aforementioned shutdown command brings about a "Kernel panic" message.

Within the PROCESSOR_SDK_VISION_03_05_00_00/vision_sdk/apps/build/tda2px/mem_segment_definition_linux.xs file, the SR1 size is set as follows by default:

SR1_FRAME_BUFFER_SIZE       = 300*MB;

Obviously, 300MB is larger than 189MB, but using more than 189MB brings about an error.

The "Kernel panic" message is as following:

[ 65.070824] Unable to handle kernel NULL pointer dereference at virtual address 00000010
[ 65.078947] pgd = edf88000
[ 65.081671] [00000010] *pgd=00000000
[ 65.085262] Internal error: Oops: 5 [#1] PREEMPT SMP ARM
[ 65.090592] Modules linked in: memcache(O) xfs libcrc32c sd_mod cmba xhci_plat_hcd xhci_hcd rpmsg_proto usbcore dwc3 udc_core usb_common virtio_rpmsg_bus rpmsg_core xfrm_user xfrm4_tunnel ipcomp xfrm_ipcomp esp4 ah4 af_key xfrm_algo bluetooth ecdh_generic snd_soc_simple_card pps_gpio snd_soc_simple_card_utils extcon_usb_gpio ntb_hw_switchtec smartpqi scsi_transport_sas ntb ahci_platform switchtec libahci_platform phy_omap_usb2 libahci omap_aes_driver libata scsi_mod omap_sham omap_des des_generic ov490 crypto_engine snd_soc_tlv320aic3x v4l2_fwnode omap_crypto dwc3_omap ov1063x omap_remoteproc remoteproc sch_fq_codel
[ 65.145494] CPU: 0 PID: 1 Comm: systemd-shutdow Tainted: G W O 4.14.103 #1
[ 65.153351] Hardware name: Generic DRA74X (Flattened Device Tree)
[ 65.159465] task: eec68000 task.stack: eec66000
[ 65.164014] PC is at ti_pipe3_exit+0x5c/0x134
[ 65.168388] LR is at of_device_is_compatible+0x4c/0x54
[ 65.173542] pc : [<c041ef4c>] lr : [<c069a9a4>] psr: 600f0013
[ 65.179830] sp : eec67da8 ip : eec67d88 fp : eec67dcc
[ 65.185074] r10: c0af3d34 r9 : c0d58020 r8 : eee0cc44
[ 65.190315] r7 : c0d9476c r6 : ee4951a8 r5 : ee495000 r4 : ee490690
[ 65.196865] r3 : 00000000 r2 : 00000000 r1 : 600f0013 r0 : 00000000
[ 65.203416] Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none
[ 65.210576] Control: 10c5387d Table: adf8806a DAC: 00000051
[ 65.216341] Process systemd-shutdow (pid: 1, stack limit = 0xeec66210)
[ 65.222893] Stack: (0xeec67da8 to 0xeec68000)
[ 65.227266] 7da0: eec67dcc eec67db8 c041d774 c0562388 c041eef0 ee495000
[ 65.235474] 7dc0: eec67dec eec67dd0 c041e4a4 c041eefc 00000000 00000002 ee4ae410 c0d9476c
[ 65.243682] 7de0: eec67e0c eec67df0 c045bdcc c041e418 eee0cc10 ee4ae410 eee0cc10 c0d9476c
[ 65.251892] 7e00: eec67e24 eec67e10 c045c290 c045bd98 eee0cc1c eee0ca10 eec67e34 eec67e28
[ 65.260100] 7e20: c0558b3c c045c260 eec67e6c eec67e38 c0555478 c0558b24 00000000 c0af3d24
[ 65.268310] 7e40: c0d15dfc 00000000 4321fedc c0d15dfc 8220d600 fee1dead eec66000 00000058
[ 65.276519] 7e60: eec67e7c eec67e70 c014c8ac c0555304 eec67fa4 eec67e80 c014ca78 c014c87c
[ 65.284727] 7e80: 00000027 00000000 eeca9550 00000002 eec66000 00000092 eec67ed4 eec67ea8
[ 65.292938] 7ea0: c02382d4 c0279d0c 00000000 edc94480 00000027 eec67ee8 eec67f68 00000005
[ 65.301145] 7ec0: eec66000 00000092 eec67f5c eec67ed8 c023843c c02263d8 eec67ee4 eec67ee8
[ 65.309354] 7ee0: bea1cd74 00000000 00000001 00000000 00000027 eec67f00 00000005 c0114378
[ 65.317561] 7f00: bea1d248 00000004 bea1df7d 00000010 bea1d25c 00000005 bea1d30c 0000000d
[ 65.325771] 7f20: 004fc1b4 00000001 eec67f54 eec67f38 c0255004 c0833e48 eec67f5c eec67f48
[ 65.333979] 7f40: edc94480 edc94480 00000000 bea1d28c eec67f94 eec67f60 c02384c0 c02383bc
[ 65.342189] 7f60: 00000000 bea1dbe0 00000000 00000000 eec66000 00000000 bea1d30c 0050e05c
[ 65.350398] 7f80: 00000092 ffffffff 0050f0a8 bea1dbe0 00000058 c0107f44 00000000 eec67fa8
[ 65.358606] 7fa0: c0107d40 c014c944 ffffffff 0050f0a8 fee1dead 28121969 4321fedc 8220d600
[ 65.366814] 7fc0: ffffffff 0050f0a8 bea1dbe0 00000058 0050f0f8 0050f0c8 4321fedc 00000000
[ 65.375024] 7fe0: 00000058 bea1db4c b6f06e3d b6e8e996 600f0030 fee1dead 00000000 00000000
[ 65.383230] Backtrace:
[ 65.385688] [<c041eef0>] (ti_pipe3_exit) from [<c041e4a4>] (phy_exit+0x98/0xbc)
[ 65.393023] r5:ee495000 r4:c041eef0
[ 65.396613] [<c041e40c>] (phy_exit) from [<c045bdcc>] (dra7xx_pcie_disable_phy+0x40/0x4c)
[ 65.404823] r7:c0d9476c r6:ee4ae410 r5:00000002 r4:00000000
[ 65.410503] [<c045bd8c>] (dra7xx_pcie_disable_phy) from [<c045c290>] (dra7xx_pcie_shutdown+0x3c/0x40)
[ 65.419758] r7:c0d9476c r6:eee0cc10 r5:ee4ae410 r4:eee0cc10
[ 65.425442] [<c045c254>] (dra7xx_pcie_shutdown) from [<c0558b3c>] (platform_drv_shutdown+0x24/0x28)
[ 65.434520] r5:eee0ca10 r4:eee0cc1c
[ 65.438109] [<c0558b18>] (platform_drv_shutdown) from [<c0555478>] (device_shutdown+0x180/0x220)
[ 65.446932] [<c05552f8>] (device_shutdown) from [<c014c8ac>] (kernel_power_off+0x3c/0x78)
[ 65.455140] r10:00000058 r9:eec66000 r8:fee1dead r7:8220d600 r6:c0d15dfc r5:4321fedc
[ 65.462997] r4:00000000
[ 65.465538] [<c014c870>] (kernel_power_off) from [<c014ca78>] (SyS_reboot+0x140/0x1f4)
[ 65.473487] [<c014c938>] (SyS_reboot) from [<c0107d40>] (ret_fast_syscall+0x0/0x54)
[ 65.481173] r8:c0107f44 r7:00000058 r6:bea1dbe0 r5:0050f0a8 r4:ffffffff
[ 65.487901] Code: eb09ee85 e3500000 1a000011 e5942000 (e5923010)
[ 65.494105] ---[ end trace 3fbb92008af1a542 ]---
[ 65.499362] systemd-shutdow: 8 output lines suppressed due to ratelimiting
[ 65.506319] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
[ 65.506319]
[ 65.515495] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
[ 65.515495]

It is noticeable that the issue corresponds to the ti_pipe3_exit function.

Is there any way to circumvent this issue if I need more than 189MB within the SR1 to run the use case (using the shared memory or some other memory regions that can be used for communication between links)?

Thanks in advance,
Marko Gostović

  • Hi,

    Can you please confirm if accessing a specific portion of memory causes problem with the shutdown command?

    I suspect that the memory is not really reserved from the Linux kernel, hence it seems to be using it.

    Therefore, it might be causing corruption in the kernel data structs.

    Please make sure that the .xs file and the reserved memory DT node in kernel are describing the same chunk of memory.

    Regards,

    Nikhil D

  • Hi Marco,

    Is this issue resolved? Can you let me know the fix?

    Thanks

    RamPrasad

  • Hi, Nikhil,

    Considering the aforementioned .xs file, I have calculated the following:

    SR1_FRAME_BUFFER_ADDR = 0x84203000

    Having SR1_FRAME_BUFFER_SIZE = 300MB = 0x12C00000,

    the ending address of the SR1_FRAME_BUFFER is equal to 0x96E03000

    Now, let us consider the .dts file at the following path:

    PROCESSOR_SDK_VISION_03_05_00_00/ti_components/os_tools/linux/kernel/omap/arch/arm/boot/dts/dra76-evm-infoadas.dts

    Within this file, there are the following settings regarding the SR1:

    vsdk_sr1_mem: vsdk_sr1_mem@84000000 {
    compatible = "shared-dma-pool";
    reg = <0x0 0x84000000 0x0 0x15000000>;
    status = "okay";
    };

    Best regards,
    Marko Gostović

  • Hi, Nikhil,

    Considering the aforementioned .xs file, I have calculated the following:
    SR1_FRAME_BUFFER_ADDR = 0x84203000

    Having SR1_FRAME_BUFFER_SIZE = 300MB = 0x12C00000,
    the ending address of the SR1_FRAME_BUFFER is equal to 0x96E03000

    Now, let us consider the .dts file at the following path:
    PROCESSOR_SDK_VISION_03_05_00_00/ti_components/os_tools/linux/kernel/omap/arch/arm/boot/dts/dra76-evm-infoadas.dts

    Within this file, there are the following settings regarding the SR1:

    vsdk_sr1_mem: vsdk_sr1_mem@84000000 {
    compatible = "shared-dma-pool";
    reg = <0x0 0x84000000 0x0 0x15000000>;
    status = "okay";
    };

    Best regards,
    Marko Gostović

  • Thanks Marko,

    But I didn't get what is modified to work. Did you change the vsdk_sr1_mem length?

    Thanks

    RamPrasad

  • Hi,

    From the DTS, it looks like you are reserving enough memory from Linux.

    Could you confirm which range of addresses cause the failures when overwritten using memset?

    Regards,

    Nikhil D

  • Hi, RamPrasad,

    The issue has not yet been resolved.


    Within the last post I gave some excerpts from the .xs and .dts files that are relevant for the memory allocation on SR1.

    The "Kernel panic" message gets printed when there are some writing operations performed beyond the block of 189MB, starting from the physical address 0x84203000.

    So, as a first step, I allocate 256MB of the memory within the SR1, utilizing OSA_memAllocSR routine.
    When I print the pointer that has been returned by this routine, I get the address: 0xA1D24000. This is a virtual address of the first byte of the allocated space.
    The physical address of this same byte is 0x84203000, which is exactly the same address as is the value of the SR1_FRAME_BUFFER_ADDR inside the .xs file.
    The last byte within the allocated space has the following virtual address: 0xB1D23FFF, and the corresponding physical address: 0x94202FFF.

    If I perform some write operation beyond 0xA1D24000 + 189MB (virtual address) and I stop the application and run the shutdown command, I get the "Kernel panic" message.

    To run the use case I need more than 189MB within the SR1 or some other region that can be used to allocate some memory space that can be accessed by links that communicate with each other in "in-place" manner.

    The question is whether and how this "Kernel panic" issue can be overcome if I need more than 189MB of memory for my use case.

    If there are yet some unclear statements within this and the previous posts, I will not hesitate to explain this issue once more.

    Best regards,
    Marko Gostović

  • HI Marco,

    Can I know what is the usecase for running the shutdown command?

    Regards,

    Nikhil D

  • Hi, Nikhil,

    The use case is intended to capture video contents from 4 cameras and to store the contents on four SSDs. This use case itself does not perform shutdown command. The use case gets started by running the "apps.out" executable and gets stopped by entering "0" key. The use case performs and stops correctly. Also, the "apps.out" binary gets stopped. 

    After "apps.out" gets stopped, the shutdown command can be invoked from command line, as I have mentioned within my first post in this thread. Namely, I run the following command from the command line: "shutdown -h -P now".

    At the end of the log, after the aforementioned shutdown command gets run, I get the "Kernel panic" log as described within my first post.

    The dataflow diagram corresponding to the use case I run is given as an attachment.

    The links marked with red color run on Linux OS, on A15 core.

    I am interested whether there is any way to allocate some additional memory regions that can be used for "in-place" processing operations, so as those regions can be shared between two or more links, since I face the "Kernel panic" issue when I attempt to use more than 189MB on SR1.

    Best regards,
    Marko Gostović

  • Hello,

    I understood the steps to reproduce the issue.

    I was asking which production use case requires a clean shutdown of the system?

    Regards,

    NIkhil D

  • Hi, Nikhil,

    Actually, the issue with the shutdown command itself is not paramount for us. However, it can be an indication that there are some problems with memory allocation that can result as some consequences at further steps of development. So, we would like to overcome this issue as soon as possible to assure that we will not experience such a behavior during a usual running of our use cases. Hence, we would be pleased if there are any ideas of how to provide more memory regions for the use cases.

    Best regards,
    Marko Gostović

  • Hi Marco,

    Understood the intent.

    I am not sure if the issue will cause issue with General Linux behavior since the memory is actually carved out.

    I am also not sure if there is any specific address used for reboot cmd implementation.

    Regards,

    Nikhil D