This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Linux/AM5728: DSP CMA pool issues

Part Number: AM5728
Other Parts Discussed in Thread: BQ40Z60, DRA752, ASH

Tool/software: Linux

With Processor SDK 3.x kernels the following DSP CMA pool allocations worked fine but are no longer working with Processor SDK 4.x kernels:

reserved-memory {
#address-cells = <2>;
#size-cells = <2>;
ranges;

/* 0xe000 0000 - 0xe7ff ffff */
dsp1_cma_pool: dsp1_cma@e0000000 {
compatible = "shared-dma-pool";
reg = <0x0 0xe0000000 0x0 0x8000000>;
reusable;
status = "okay";
};

/* 0xe800 0000 - 0xefff ffff */
dsp2_cma_pool: dsp2_cma@e8000000 {
compatible = "shared-dma-pool";
reg = <0x0 0xe8000000 0x0 0x8000000>;
reusable;
status = "okay";
};

ipu1_cma_pool: ipu1_cma@df000000 {
compatible = "shared-dma-pool";
reg = <0x0 0xdf000000 0x0 0x800000>;
reusable;
status = "okay";
};

ipu2_cma_pool: ipu2_cma@df800000 {
compatible = "shared-dma-pool";
reg = <0x0 0xdf800000 0x0 0x800000>;
reusable;
status = "okay";
};
};

The DSPs seem to load fine but when I go to disable them (via /sys/bus/platform/drivers/omap-rproc/unbind) I get the following:
-bash-4.3# echo 40800000.dsp > /sys/bus/platform/drivers/omap-rproc/unbind
[ 537.629639] Unable to handle kernel paging request at virtual address e9a6307c
[ 537.636925] pgd = dcccd200
[ 537.639643] [e9a6307c] *pgd=80000080007003, *pmd=9cc08003, *pte=00000000
[ 537.646528] Internal error: Oops: a07 [#1] PREEMPT SMP ARM
[ 537.652038] Modules linked in: virtio_rpmsg_bus rpmsg_core omap_remoteproc remoteproc virtio virtio_ring usb_f_ecm g_ether usb_f_rndis u_ether libcomposite extcon_palmas gcm ccm arc4 dwc3 udc_core usb_common uio_pdrv_genirq uio phy_omap_usb2 dwc3_omap bridge stp llc xt_tcpudp ipv6 iptable_filter ip_tables x_tables hw_info bq40z60_battery mpu9250(C) extcon_usb_gpio extcon_core rtc_isl1208 omap_wdt
[ 537.690367] CPU: 0 PID: 459 Comm: bash Tainted: G WC 4.9.69 #3
[ 537.697273] Hardware name: Generic DRA74X (Flattened Device Tree)
[ 537.703392] task: dccd5280 task.stack: dd5dc000
[ 537.707969] PC is at rproc_free_vring+0x10c/0x15c [remoteproc]
[ 537.713828] LR is at 0x0
[ 537.716373] pc : [<bf195424>] lr : [<00000000>] psr: 60020013
[ 537.716373] sp : dd5ddce0 ip : dd59f240 fp : dd5ddd1c
[ 537.727902] r10: c0214e44 r9 : e0000000 r8 : dd4d9800
[ 537.733146] r7 : de2b4c10 r6 : dd6d71f0 r5 : e9a63064 r4 : 00003000
[ 537.739704] r3 : e9a63064 r2 : ffffffff r1 : 00000000 r0 : 00000064
[ 537.746261] Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user
[ 537.753424] Control: 30c5387d Table: 9cccd200 DAC: fffffffd
[ 537.759193] Process bash (pid: 459, stack limit = 0xdd5dc218)
[ 537.764963] Stack: (0xdd5ddce0 to 0xdd5de000)
[ 537.769338] dce0: 00000000 dd5dc000 de001e40 e0000000 00014ad2 dd6d7000 dd4d9a04 de2b4c10
[ 537.777550] dd00: e1410000 00100000 c0214e44 dd4d9800 dd5ddd34 dd5ddd20 bf195770 bf195324
[ 537.785764] dd20: dd4d99ec dd4d9a04 dd5ddd94 dd5ddd38 bf195c68 bf195758 00000000 c065bb8c
[ 537.793978] dd40: de2b4ce8 bf198494 bf1980fc c1207bd4 dd4d99f0 dd4d9800 dd4d9820 00000000
[ 537.802191] dd60: dd5ddd94 dd5ddd70 bf1a3b8c dd4d9a0c dd4d9800 dd4d9a0c dd4d99b4 dd4d99bc
[ 537.810405] dd80: dd4d9820 00000000 dd5dddbc dd5ddd98 bf1963d4 bf1957d0 de2b4c00 dd4d9800
[ 537.818619] dda0: 00000000 bf1a5878 00000034 00000000 dd5ddddc dd5dddc0 bf196560 bf196344
[ 537.826833] ddc0: de2b4c00 dd4d9800 00000000 bf1a5878 dd5dddf4 dd5ddde0 bf1a39bc bf196498
[ 537.835048] dde0: de2b4c10 de2b4c10 dd5dde0c dd5dddf8 c06533f0 bf1a39ac de2b4c10 de2b4c44
[ 537.843260] de00: dd5dde34 dd5dde10 c0651afc c06533d0 c12336c8 de2b4c10 0000000d bf1a5878
[ 537.851476] de20: 00000000 00000000 dd5dde44 dd5dde38 c0651ba8 c06519b4 dd5dde64 dd5dde48
[ 537.859689] de40: c064fda0 c0651b9c c064fd20 0000000d dd5ddf78 dd563340 dd5dde7c dd5dde68
[ 537.867902] de60: c064f18c c064fd2c c064f164 0000000d dd5dde94 dd5dde80 c03b141c c064f170
[ 537.876117] de80: dcccb380 0000000d dd5dded4 dd5dde98 c03b0bc8 c03b13e0 00000000 00000000
[ 537.884329] dea0: c034d6b0 dcccb38c dda66410 dcce9480 c03b0aec dd5ddf78 0000000d 00000000
[ 537.892545] dec0: 0000000d 00000000 dd5ddf44 dd5dded8 c033c4d4 c03b0af8 00000000 0000000a
[ 537.900760] dee0: dcdf0200 dd490dc0 00000000 dccc9300 0000000d 00000000 dcce9480 00000001
[ 537.908971] df00: dd5ddf44 dd5ddf10 c033d0d0 c04d155c 00000001 dcdf0200 dccc9300 00000001
[ 537.917185] df20: 0000000d dcce9480 00115408 dd5ddf78 00000000 0000000d dd5ddf74 dd5ddf48
[ 537.925399] df40: c033d33c c033c4ac dd5ddf74 dd5ddf58 dcce9480 dcce9480 00000000 00000000
[ 537.933613] df60: 00115408 0000000d dd5ddfa4 dd5ddf78 c033e13c c033d29c 00000000 00000000
[ 537.941829] df80: 0000000d 00115408 b6f1ed58 00000004 c02084e4 dd5dc000 00000000 dd5ddfa8
[ 537.950044] dfa0: c0208320 c033e104 0000000d 00115408 00000001 00115408 0000000d 00000000
[ 537.958258] dfc0: 0000000d 00115408 b6f1ed58 00000004 00000000 00000000 b6f8b000 00000000
[ 537.966471] dfe0: 00000000 bea4eaa4 b6e826c9 b6ebe006 00020030 00000001 6d760044 6174735f
[ 537.974681] Backtrace:
[ 537.977179] [<bf195318>] (rproc_free_vring [remoteproc]) from [<bf195770>] (rproc_vdev_release+0x24/0x78 [remoteproc])
[ 537.987924] r10:dd4d9800 r9:c0214e44 r8:00100000 r7:e1410000 r6:de2b4c10 r5:dd4d9a04
[ 537.995786] r4:dd6d7000
[ 537.998359] [<bf19574c>] (rproc_vdev_release [remoteproc]) from [<bf195c68>] (rproc_resource_cleanup+0x4a4/0x554 [remoteproc])
[ 538.009796] r5:dd4d9a04 r4:dd4d99ec
[ 538.013416] [<bf1957c4>] (rproc_resource_cleanup [remoteproc]) from [<bf1963d4>] (rproc_shutdown+0x9c/0x154 [remoteproc])
[ 538.024423] r10:00000000 r9:dd4d9820 r8:dd4d99bc r7:dd4d99b4 r6:dd4d9a0c r5:dd4d9800
[ 538.032285] r4:dd4d9a0c
[ 538.034859] [<bf196338>] (rproc_shutdown [remoteproc]) from [<bf196560>] (rproc_del+0xd4/0xe0 [remoteproc])
[ 538.044644] r9:00000000 r8:00000034 r7:bf1a5878 r6:00000000 r5:dd4d9800 r4:de2b4c00
[ 538.052446] [<bf19648c>] (rproc_del [remoteproc]) from [<bf1a39bc>] (omap_rproc_remove+0x1c/0x34 [omap_remoteproc])
[ 538.062925] r7:bf1a5878 r6:00000000 r5:dd4d9800 r4:de2b4c00
[ 538.068621] [<bf1a39a0>] (omap_rproc_remove [omap_remoteproc]) from [<c06533f0>] (platform_drv_remove+0x2c/0x44)
[ 538.078840] r5:de2b4c10 r4:de2b4c10
[ 538.082436] [<c06533c4>] (platform_drv_remove) from [<c0651afc>] (device_release_driver_internal+0x154/0x1e8)
[ 538.092391] r5:de2b4c44 r4:de2b4c10
[ 538.095986] [<c06519a8>] (device_release_driver_internal) from [<c0651ba8>] (device_release_driver+0x18/0x1c)
[ 538.105944] r9:00000000 r8:00000000 r7:bf1a5878 r6:0000000d r5:de2b4c10 r4:c12336c8
[ 538.113725] [<c0651b90>] (device_release_driver) from [<c064fda0>] (unbind_store+0x80/0x100)
[ 538.122205] [<c064fd20>] (unbind_store) from [<c064f18c>] (drv_attr_store+0x28/0x34)
[ 538.129983] r7:dd563340 r6:dd5ddf78 r5:0000000d r4:c064fd20
[ 538.135675] [<c064f164>] (drv_attr_store) from [<c03b141c>] (sysfs_kf_write+0x48/0x4c)
[ 538.143624] r5:0000000d r4:c064f164
[ 538.147219] [<c03b13d4>] (sysfs_kf_write) from [<c03b0bc8>] (kernfs_fop_write+0xdc/0x1dc)
[ 538.155431] r5:0000000d r4:dcccb380
[ 538.159026] [<c03b0aec>] (kernfs_fop_write) from [<c033c4d4>] (__vfs_write+0x34/0x120)
[ 538.166977] r10:00000000 r9:0000000d r8:00000000 r7:0000000d r6:dd5ddf78 r5:c03b0aec
[ 538.174836] r4:dcce9480
[ 538.177381] [<c033c4a0>] (__vfs_write) from [<c033d33c>] (vfs_write+0xac/0x170)
[ 538.184724] r9:0000000d r8:00000000 r7:dd5ddf78 r6:00115408 r5:dcce9480 r4:0000000d
[ 538.192504] [<c033d290>] (vfs_write) from [<c033e13c>] (SyS_write+0x44/0x98)
[ 538.199585] r9:0000000d r8:00115408 r7:00000000 r6:00000000 r5:dcce9480 r4:dcce9480
[ 538.207368] [<c033e0f8>] (SyS_write) from [<c0208320>] (ret_fast_syscall+0x0/0x40)
[ 538.214973] r9:dd5dc000 r8:c02084e4 r7:00000004 r6:b6f1ed58 r5:00115408 r4:0000000d
[ 538.222752] Code: e3e02000 e5900230 e0833000 e0835105 (e5c51018)
[ 538.229805] ---[ end trace 6ee89626842b6058 ]---

Any help as to what changed with the newer kernel and this memory address range would be appreciated.

  • Hi, Gerard,

    Let me take a look and get back to you.

    Rex

  • Thanks, Rex.

    Looking at diffs, it looks like the code that causes this error was added to remoteproc_core.c within the rproc_resource_cleanup() function. Specifically, the kernel error seems to occur when it's cleaning up the remote vdev entries. I noticed in the earlier kernel versions did not have the block of code in question within rproc_resource_cleanup().

    For clarity, the old version I'm referring to is Processor SDK 03.0.3.00.04 / Linux 4.4.41 and the new version is Processor SDK 04.03.00.05 / Linux 4.9.69.

    Gerard

  • Gerard,

    Yes. I briefly diff'ed remoteproc_core.c against 3.3.0.4 , and also poked around the dtsi file to see if there is any conflict on those CMA areas. I am checking with developer to get more thorough review of the code.

    Rex
  • Hi, Gerard,

    Could you provide the full kernel log?

    Rex
  • Starting kernel ...

    [ 0.000000] Booting Linux on physical CPU 0x0
    [ 0.000000] Linux version 4.9.69 () (gcc version 6.2.1 20161016 (Linaro GCC 6.2-2016.11) ) #3 SMP PREEMPT Tue Jul 17 17:35:21 EDT 2018
    [ 0.000000] CPU: ARMv7 Processor [412fc0f2] revision 2 (ARMv7), cr=30c5387d
    [ 0.000000] CPU: div instructions acustomable: patching division code
    [ 0.000000] CPU: PIPT / VIPT nonaliasing data cache, PIPT instruction cache
    [ 0.000000] OF: fdt:Machine model: TI AM572x Custom
    [ 0.000000] Reserved memory: created CMA memory pool at 0x00000000df000000, size 8 MiB
    [ 0.000000] OF: reserved mem: initialized node ipu1_cma@df000000, compatible id shared-dma-pool
    [ 0.000000] Reserved memory: created CMA memory pool at 0x00000000df800000, size 8 MiB
    [ 0.000000] OF: reserved mem: initialized node ipu2_cma@df800000, compatible id shared-dma-pool
    [ 0.000000] Reserved memory: created CMA memory pool at 0x00000000e0000000, size 128 MiB
    [ 0.000000] OF: reserved mem: initialized node dsp1_cma@e0000000, compatible id shared-dma-pool
    [ 0.000000] Reserved memory: created CMA memory pool at 0x00000000e8000000, size 128 MiB
    [ 0.000000] OF: reserved mem: initialized node dsp2_cma@e8000000, compatible id shared-dma-pool
    [ 0.000000] cma: Reserved 24 MiB at 0x00000000fd800000
    [ 0.000000] Memory policy: Data cache writealloc
    [ 0.000000] OMAP4: Map 0x000000027fd00000 to fe600000 for dram barrier
    [ 0.000000] DRA752 ES2.0
    [ 0.000000] percpu: Embedded 14 pages/cpu @daf83000 s26252 r8192 d22900 u57344
    [ 0.000000] Built 1 zonelists in Zone order, mobility grouping on. Total pages: 1034386
    [ 0.000000] Kernel command line: console=ttyO2,115200n8 vmalloc=512M root=/dev/mmcblk2p2 rw bootfs=/current_rootfs rootfstype=ext4 rootwait
    [ 0.000000] PID hash table entries: 2048 (order: 1, 8192 bytes)
    [ 0.000000] Dentry cache hash table entries: 65536 (order: 6, 262144 bytes)
    [ 0.000000] Inode-cache hash table entries: 32768 (order: 5, 131072 bytes)
    [ 0.000000] Memory: 3784496K/4142080K acustomable (8192K kernel code, 376K rwdata, 2488K rodata, 4096K init, 274K bss, 54480K reserved, 303104K cma-reserved, 3322880K highmem)
    [ 0.000000] Virtual kernel memory layout:
    [ 0.000000] vector : 0xffff0000 - 0xffff1000 ( 4 kB)
    [ 0.000000] fixmap : 0xffc00000 - 0xfff00000 (3072 kB)
    [ 0.000000] vmalloc : 0xe0000000 - 0xff800000 ( 504 MB)
    [ 0.000000] lowmem : 0xc0000000 - 0xdf800000 ( 504 MB)
    [ 0.000000] pkmap : 0xbfe00000 - 0xc0000000 ( 2 MB)
    [ 0.000000] modules : 0xbf000000 - 0xbfe00000 ( 14 MB)
    [ 0.000000] .text : 0xc0008000 - 0xc0a00000 (10208 kB)
    [ 0.000000] .init : 0xc0e00000 - 0xc1200000 (4096 kB)
    [ 0.000000] .data : 0xc1200000 - 0xc125e290 ( 377 kB)
    [ 0.000000] .bss : 0xc1260000 - 0xc12a4814 ( 275 kB)
    [ 0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=2, Nodes=1
    [ 0.000000] Preemptible hierarchical RCU implementation.
    [ 0.000000] Build-time adjustment of leaf fanout to 32.
    [ 0.000000] NR_IRQS:16 nr_irqs:16 16
    [ 0.000000] OMAP clockevent source: timer1 at 32786 Hz
    [ 0.000000] arm_arch_timer: Architected cp15 timer(s) running at 6.14MHz (virt).
    [ 0.000000] clocksource: arch_sys_counter: mask: 0xffffffffffffff max_cycles: 0x16af5adb9, max_idle_ns: 440795202250 ns
    [ 0.000005] sched_clock: 56 bits at 6MHz, resolution 162ns, wraps every 4398046511023ns
    [ 0.000016] Switching to timer-based delay loop, resolution 162ns
    [ 0.000337] clocksource: 32k_counter: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 58327039986419 ns
    [ 0.000345] OMAP clocksource: 32k_counter at 32768 Hz
    [ 0.000753] Console: colour dummy device 80x30
    [ 0.000770] WARNING: Your 'console=ttyO2' has been replaced by 'ttyS2'
    [ 0.000777] This ensures that you still see kernel messages. Please
    [ 0.000782] update your kernel commandline.
    [ 0.000797] Calibrating delay loop (skipped), value calculated using timer frequency.. 12.29 BogoMIPS (lpj=61475)
    [ 0.000810] pid_max: default: 32768 minimum: 301
    [ 0.000891] Security Framework initialized
    [ 0.000929] Mount-cache hash table entries: 1024 (order: 0, 4096 bytes)
    [ 0.000938] Mountpoint-cache hash table entries: 1024 (order: 0, 4096 bytes)
    [ 0.001530] CPU: Testing write buffer coherency: ok
    [ 0.001739] /cpus/cpu@0 missing clock-frequency property
    [ 0.001755] /cpus/cpu@1 missing clock-frequency property
    [ 0.001765] CPU0: thread -1, cpu 0, socket 0, mpidr 80000000
    [ 0.001782] Setting up static identity map for 0x80200000 - 0x80200060
    [ 0.170321] CPU1: thread -1, cpu 1, socket 0, mpidr 80000001
    [ 0.170410] Brought up 2 CPUs
    [ 0.170424] SMP: Total of 2 processors activated (24.59 BogoMIPS).
    [ 0.170431] CPU: All CPU(s) started in SVC mode.
    [ 0.170900] devtmpfs: initialized
    [ 0.198913] VFP support v0.3: implementor 41 architecture 4 part 30 variant f rev 0
    [ 0.199160] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 19112604462750000 ns
    [ 0.199175] futex hash table entries: 512 (order: 3, 32768 bytes)
    [ 0.204531] pinctrl core: initialized pinctrl subsystem
    [ 0.205319] NET: Registered protocol family 16
    [ 0.206203] DMA: preallocated 256 KiB pool for atomic coherent allocations
    [ 0.207282] omap_hwmod: l3_main_2 using broken dt data from ocp
    [ 0.440407] cpuidle: using governor ladder
    [ 0.470435] cpuidle: using governor menu
    [ 0.479768] OMAP GPIO hardware version 0.1
    [ 0.484825] GPIO line 103 (blue_led) hogged as output/low
    [ 0.484845] GPIO line 104 (red_led) hogged as output/high
    [ 0.484861] GPIO line 105 (green_led) hogged as output/high
    [ 0.484876] GPIO line 106 (orange_led) hogged as output/high
    [ 0.489080] GPIO line 164 (FPGA1_RF_RX_OVERLOAD) hogged as input
    [ 0.489098] GPIO line 167 (FPGA2_RF_RX_OVERLOAD) hogged as input
    [ 0.489115] GPIO line 178 (RF1_FPGA_RESET) hogged as output/low
    [ 0.489131] GPIO line 179 (RF2_FPGA_RESET) hogged as output/low
    [ 0.493425] GPIO line 226 (FPGA1_STATUS_N) hogged as input
    [ 0.493443] GPIO line 238 (FPGA2_STATUS_N) hogged as input
    [ 0.493460] GPIO line 229 (FPGA1_PERSTL1_N) hogged as output/low
    [ 0.493475] GPIO line 241 (FPGA2_PERSTL1_N) hogged as output/low
    [ 0.493491] GPIO line 235 (FPGA1_RST1_N) hogged as output/low
    [ 0.493505] GPIO line 247 (FPGA1_RST2_N) hogged as output/low
    [ 0.493520] GPIO line 232 (FPGA1_DSP_IRQ) hogged as input
    [ 0.493533] GPIO line 244 (FPGA2_DSP_IRQ) hogged as input
    [ 0.495692] irq: no irq domain found for /ocp/l4@4a000000/scm@2000/pinmux@1400 !
    [ 0.513323] omap-gpmc 50000000.gpmc: GPMC revision 6.0
    [ 0.513334] gpmc_cs_set_reserved: setting cs 0 reserved flag to 1
    [ 0.513345] gpmc_mem_init: disabling cs 0 mapped at 0x0-0x1000000
    [ 0.513352] gpmc_cs_set_reserved: setting cs 1 reserved flag to 0
    [ 0.513359] gpmc_cs_set_reserved: setting cs 2 reserved flag to 0
    [ 0.513366] gpmc_cs_set_reserved: setting cs 3 reserved flag to 0
    [ 0.513372] gpmc_cs_set_reserved: setting cs 4 reserved flag to 0
    [ 0.513378] gpmc_cs_set_reserved: setting cs 5 reserved flag to 0
    [ 0.513383] gpmc_cs_set_reserved: setting cs 6 reserved flag to 0
    [ 0.513389] gpmc_cs_set_reserved: setting cs 7 reserved flag to 0
    [ 0.513727] gpmc_cs_reserved: cs 2 reserved = 0
    [ 0.513738] gpmc_cs_set_reserved: setting cs 2 reserved flag to 1
    [ 0.513798] gpmc_probe_generic_child: pre-gpmc_cs_disable_mem()
    [ 0.513805] gpmc_probe_generic_child: post-gpmc_cs_disable_mem()
    [ 0.513813] gpmc_cs_delete_mem: res: start=0x01000000, end=0x01ffffff
    [ 0.513820] gpmc_cs_delete_mem: release_resource() returned 0
    [ 0.513827] gpmc_probe_generic_child: gpmc_cs_remap() returned 0
    [ 0.513837] gpmc_probe_generic_child: of_property_read_u32() read of bank-width: width 2, function returned 0
    [ 0.513844] gpmc_probe_generic_child: gpmc_cs_program_settings() returned 0
    [ 0.513851] gpmc_ps_to_ticks: tick_ps=3759
    [ 0.513858] gpmc_calc_divider: sync_clk=11386, div=4
    [ 0.513864] gpmc_cs_set_timings: gpmc_calc_divider() returned 4
    [ 0.513881] gpmc_probe_generic_child: gpmc_cs_set_timings() returned 0
    [ 0.513887] gpmc_probe_generic_child: pre-gpmc_cs_enable_mem()
    [ 0.513893] gpmc_probe_generic_child: post-gpmc_cs_enable_mem()
    [ 0.514088] gpmc_cs_reserved: cs 4 reserved = 0
    [ 0.514097] gpmc_cs_set_reserved: setting cs 4 reserved flag to 1
    [ 0.514153] gpmc_probe_generic_child: pre-gpmc_cs_disable_mem()
    [ 0.514159] gpmc_probe_generic_child: post-gpmc_cs_disable_mem()
    [ 0.514166] gpmc_cs_delete_mem: res: start=0x01000000, end=0x01ffffff
    [ 0.514173] gpmc_cs_delete_mem: release_resource() returned 0
    [ 0.514180] gpmc_probe_generic_child: gpmc_cs_remap() returned 0
    [ 0.514189] gpmc_probe_generic_child: of_property_read_u32() read of bank-width: width 2, function returned 0
    [ 0.514195] gpmc_probe_generic_child: gpmc_cs_program_settings() returned 0
    [ 0.514202] gpmc_ps_to_ticks: tick_ps=3759
    [ 0.514208] gpmc_calc_divider: sync_clk=11386, div=4
    [ 0.514214] gpmc_cs_set_timings: gpmc_calc_divider() returned 4
    [ 0.514230] gpmc_probe_generic_child: gpmc_cs_set_timings() returned 0
    [ 0.514236] gpmc_probe_generic_child: pre-gpmc_cs_enable_mem()
    [ 0.514241] gpmc_probe_generic_child: post-gpmc_cs_enable_mem()
    [ 0.523841] No ATAGs?
    [ 0.523865] hw-breakpoint: found 5 (+1 reserved) breakpoint and 4 watchpoint registers.
    [ 0.523874] hw-breakpoint: maximum watchpoint size is 8 bytes.
    [ 0.524244] omap4_sram_init:Unable to allocate sram needed to handle errata I688
    [ 0.524254] omap4_sram_init:Unable to get sram pool needed to handle errata I688
    [ 0.524808] OMAP DMA hardware revision 0.0
    [ 0.571783] omap-dma-engine 4a056000.dma-controller: OMAP DMA engine driver (LinkedList1/2/3 supported)
    [ 0.572774] edma 43300000.edma: memcpy is disabled
    [ 0-iommu 41501000.mmu: 41501000.mmu registered
    [ 0.584172] omap-iommu 41502000.mmu: 41502000.mmu registered
    [ 0.585809] vgaarb: loaded
    [ 0.586122] SCSI subsystem initialized
    [ 0.586491] omap_i2c 48070000.i2c: could not find pctldev for node /e: LinuxPPS API ver. 1 registered
    [ 0.588036] pps_core: Software ver. 5.3.6 - Copyright 2005-2007 Rodolfo Giometti <giometti@linux.it>
    [ 0.588056] PTP clock support registered
    [ 0.588559] omap-mailbox 4883c000.mailbox: omap mailbox rev 0x400
    [ash table entries: 1024 (order 0, 4096 bytes)
    [ 0.600798] NET: Registered protocol family 2
    [ 0.601290] TCP established hash table entries: 4096 (order: 2, 16384 bytes)
    [ 0.601328] TCP bind hash table entries: 4096 (order: 3, 32768 bytes)
    [ 0.601394] TCP: Hash tables configured (established 4096 bind 4096)
    [ 0.601444] UDP hash table entries: 256 (order: 1, 8192 bytes)
    [ 0.601464] UDP-Lite hash table entries: 256 (order: 1, 8192 bytes)
    [ 0.601645] NET: Registered protocol family 1ailable
    [ 0.693007] audit: initializing netlink subsys (disabled)
    [ 0.693047] audit: type=2000 audit(0.679:1): initialized
    [ 0.693703] workingset: timestamp_bits=14 max_order=20 bucket_order=6
    [ 0.701978] NFS: Registering the id_resolver kescheduler deadline registered
    [ 0.703930] io scheduler cfq registered (default)
    [ 0.706947] pinctrl-single 4a003400.pinmux: 291 pins at pa fc003400 size 1164
    [ 0.714016] Serial: 8250/16550 driver, 10 ports, IRQ sharing enabled
    [ 0.717012] 4rm] Initialized
    [ 1.953765] brd: module loaded
    [ 1.961900] loop: module loaded
    [ 1.965995] mtdoops: mtd device (mtddev=name/number) must be supplied
    [ 1.974365] libphy: Fixed MDIO Bus: probed
    [ 2.040542] davinci_mdio 48485000.mdio: davinci mdio revision 1.6
    [ 2.046667] davinci_mdio 48485000.mdio: detected phy mask fffffffc
    [ 2.056956] libphy: 48485000.mdio: probed
    [ 2.061006] davinci_mdio 48485000.mdio: phy[0]: device 48485000.mdio:00, driver Micrel KSZ8081 or KSZ8091
    [ 2.070632] davinci_mdio 48485000.mdio: phy[1]: device 48485000.mdio:01, driver Micrel KSZ9031 Gigabit PHY
    [ 2.081048] cpsw 48484000.ethernet: Detected MACID = fc:0f:4b:c2:fc:92
    [ 2.087664] cpsw 48484000.ethernet: device node lookup for pps timer failed
    [ 2.094713] cpsw 48484000.ethernet: cpts: overflow check period 500 (jiffies)
    [ 2.102869] cpsw 48484000.ethernet: cpsw: Detected MACID = fc:0f:4b:c2:fc:93
    [ 2.111313] mousedev: PS/2 mouse device common for all mice
    [ 2.117092] i2c /dev entries driver
    [ 2.121526] [gpio_poweroff]node name: poweroff-request
    [ 2.126689] [gpio_poweroff]pp name: button
    [ 2.130853] [gpio_poweroff]init gpio value: 94
    [ 2.135315] [gpio_poweroff]hold_time_ms: 2000
    [ 2.152629] cpu cpu0: dev_pm_opp_set_regulators: no regulator (vdd) found: -19
    [ 2.161216] omap_hsmmc 4809c000.mmc: Got CD GPIO
    [ 2.166083] omap_hsmmc 4809c000.mmc: no pinctrl state for hs mode
    [ 2.173836] omap_hsmmc 480b4000.mmc: no pinctrl state for ddr_1_8v mode
    [ 2.180483] omap_hsmmc 480b4000.mmc: no pinctrl state for hs mode
    [ 2.240852] ledtrig-cpu: registered to indicate activity on CPUs
    [ 2.246991] Alarm: Major: 501; Minor: 0
    [ 2.254764] omap_hsmmc 480ad000.mmc: card claims to support voltages below defined range
    [ 2.264326] oprofile: using timer interrupt.
    [ 2.268829] Initializing XFRM netlink socket
    [ 2.273180] NET: Registered protocol family 17
    [ 2.277688] NET: Registered protocol family 15
    [ 2.282246] Key type dns_resolver registered
    [ 2.286655] omap_voltage_late_init: Voltage driver support not added
    [ 2.293188] Power Management for TI OMAP4+ devices.
    [ 2 2.302691] Registering SWP/SWPB emulation handler
    [ 2.316470] mmc0: new high speed SDIO card at address 0001
    [ 2.327601] dmm 4e000000.dmm: workaround for errata i878 in use
    [ 2.335202] dmm 4e000000.dmm: initialized all PAT entrieskipping irq request
    [ 2.370762] palmas 0-0058: Muxing GPIO 7d, PWM 0, LED 0
    [ 2.380145] vtt_fixed: supplied by smps3
    [ 2.385569] random: fast init done
    [ 2.411509] vdd_3v3: supplied by regen1
    [ 2.415586] aic_dvdd_fixed: supplied by vdd_3v3
    [ 2.423485] omap_i2c 48070000.i2c: bus 0 rev0.12 at 400 kHz
    [ 2.429488] omap_hsmmc 4809c000.mmc: Got CD GPIO
    [ 2.434403] omap_hsmmc 4809c000.mmc: no pinctrl state for hs mode
    [ 2.502259] omap_hsmmc 480b4000.mmc: no pinctrl state for ddr_1_8v mode
    [ 2.508908] omap_hsmmc 480b4000.mmc: no pinctrl state for hs mode
    [ 2.570741] hctosys: unable to open rtc device (rtc0)
    [ 2.576777] aic_dvdd_fixed: disabling
    [ 2.580458] vmmcwl_fixed: disabling
    [ 2.584208] ldousb: disabling
    [ 2.587488] ALSA device list:
    [ 2.591491] No soundcards found.
    [ 2.599804] Freeing unused kernel memory: 4096K
    Mounting /proc and /sys
    [ 2.645864] mmc2: new MMC card at address 0001
    [ 2.661399] mmcblk2: mmc2:0001 Q1J55L 7.13 GiB
    [ 2.666227] mmcblk2boot0: mmc2:0001 Q1J55L partition 1 2.00 MiB
    [ 2.682489] mmcblk2boot1: mmc2:0001 Q1J55L partition 2 2.00 MiB
    [ 2.691289] mmcblk2: p1 p2 p3 p4 < p5 p6 p7 >
    /dev/mmcblk2p2: clean, 4666/98304 files, 273844/393216 blocks
    [ 2.849898] EXT4-fs (mmcblk2p2): mounted filesystem with ordered data mode. Opts: (null)
    starting pid 111, tty '': '/etc/init.d/startSystem.sh'
  • Hi, Gerard,

    Is your codebase from TI or upstream kernel? Upstream kernel for virtio-based stacks in general was broken between 4.8 and 4.10-rc4 kernels. Error recovery and cleanup features are only fixed up in the TI kernels. We do not recall any major CMA changes between 4.4 and 4.9 kernels.

    These CMA addresses in general fall within the HIGHMEM space. If this were true, one would expect to see the same behavior across all processors. It would be helpful to see if the crash is happening within the dma_free_coherent call in rproc_free_vring, and if the addresses being freed are the same used during rproc_alloc_vring.

    It will also helpful if this issue can be reproduced on TI EVM with minimal functions on DSP to identify if this was an issue with these CMA addresses or some other memory corruption during porting.

    Rex
  • Hi, Gerard,

    We are able to reproduce the issue on TI environment even on 4.14 kernel using modified samples using the HIGHMEM pools.

    The remoteproc core went through lot of changes between 4.8 and 4.9 kernel on upstream that essentially broke the virtio-rpmsg stack including error recovery. The HIGHMEM usage has exposed a bug in the resource cleanup fixes carried within our fixed-up kernel.

    None of our TI images uses HIGHMEM for remoteproc pools, so this was not covered during our testing.

    I'll file a bug, and if it is ok with you, do you mind providing your email address so that we can do a Reported-by: on the fix.

    Thanks!

    Rex

     

  • Rex Chang said:
    Hi, Gerard,

    Is your codebase from TI or upstream kernel? Upstream kernel for virtio-based stacks in general was broken between 4.8 and 4.10-rc4 kernels. Error recovery and cleanup features are only fixed up in the TI kernels. We do not recall any major CMA changes between 4.4 and 4.9 kernels. 

    I see your more recent response - just noting that we are using the TI kernel as it comes in the Processor SDK releases.

  • Rex Chang said:

    We are able to reproduce the issue on TI environment even on 4.14 kernel using modified samples using the HIGHMEM pools.

    The remoteproc core went through lot of changes between 4.8 and 4.9 kernel on upstream that essentially broke the virtio-rpmsg stack including error recovery. The HIGHMEM usage has exposed a bug in the resource cleanup fixes carried within our fixed-up kernel.

    None of our TI images uses HIGHMEM for remoteproc pools, so this was not covered during our testing.

    I'll file a bug, and if it is ok with you, do you mind providing your email address so that we can do a Reported-by: on the fix.

    Glad to hear you were able to reproduce the issue. You can use gdaubar@harris.com for the bug reported-by field.

    Is there a preferred work-around while waiting for the bug fix? Ideally we need 128MB for each DSP.

    Thanks

  • Hi Gerard,

    Add the following code snippet as a temporary workaround, until I do a formal fix in the TI Linux kernel tree. You wouldn't need to change anything w.r.t your carveouts/DSP images.

    regards
    Suman

    ---
    diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c
    index 4e8978dc68f1..7715fdcb81e6 100644
    --- a/drivers/remoteproc/remoteproc_core.c
    +++ b/drivers/remoteproc/remoteproc_core.c
    @@ -1074,6 +1074,8 @@ static void rproc_resource_cleanup(struct rproc *rproc)
    kfree(entry);
    }

    + rproc->table_ptr = rproc->cached_table;
    +
    /* clean up carveout allocations */
    list_for_each_entry_safe(entry, tmp, &rproc->carveouts, node) {
    dma_free_coherent(dev->parent, entry->len, entry->va,
  • Hi, Gerard,

    Are you still getting the same crash after applying the workaround?

    Rex
  • The crash I was seeing before does seem to go away, but now that I am getting more load/unload cycles I see errors like this regularly when I load the remoteproc kernel modules (and therefore load the DSPs for the first time after boot-up):

    1) L3 custom error

    2) MMU faults for memory addresses that are covered in the resource table mapping and worked in the old Processor SDK kernels



    Console output after the remoteproc kernel module load:

    -bash-4.3# [10031.472178] omap-rproc 40800000.dsp: assigned reserved memory node dsp1_cma@e0000000
    [10031.472318] remoteproc remoteproc2: 40800000.dsp is available
    [10031.472774] omap-rproc 41000000.dsp: assigned reserved memory node dsp2_cma@e8000000
    [10031.472923] remoteproc remoteproc3: 41000000.dsp is available
    [10032.067049] remoteproc remoteproc2: powering up 40800000.dsp
    [10032.072773] remoteproc remoteproc2: Booting fw image dra7-dsp1-fw.xe66, size 7426480
    [10032.087904] omap_hwmod: mmu0_dsp1: _wait_target_disable failed
    [10032.093804] omap-iommu 40d01000.mmu: 40d01000.mmu: version 3.0
    [10032.099747] omap-iommu 40d02000.mmu: 40d02000.mmu: version 3.0
    [10032.147050] remoteproc remoteproc2: phdr: type 1 da 0x800000 memsz 0xce0 filesz 0x0
    [10032.154787] remoteproc remoteproc2: phdr: type 1 da 0x80000000 memsz 0x27e04c8 filesz 0x4c4
    [10032.199716] remoteproc remoteproc2: phdr: type 1 da 0x827e04c8 memsz 0x18 filesz 0x18
    [10032.207630] remoteproc remoteproc2: phdr: type 1 da 0x827e04e0 memsz 0x4bf40 filesz 0x4bf40
    [10032.216658] remoteproc remoteproc2: phdr: type 1 da 0x8282c420 memsz 0x10000 filesz 0x0
    [10032.224802] remoteproc remoteproc2: phdr: type 1 da 0x8283c420 memsz 0xd7a2 filesz 0xd7a2
    [10032.233448] remoteproc remoteproc2: phdr: type 1 da 0x82849bc8 memsz 0x3542 filesz 0x0
    [10032.241450] remoteproc remoteproc2: phdr: type 1 da 0x8284d10c memsz 0x948 filesz 0x948
    [10032.250082] remoteproc remoteproc2: phdr: type 1 da 0x8284da58 memsz 0x1120 filesz 0x0
    [10032.258076] remoteproc remoteproc2: phdr: type 1 da 0x8284eb78 memsz 0x8 filesz 0x8
    [10032.265793] remoteproc remoteproc2: phdr: type 1 da 0x8284ec00 memsz 0x200 filesz 0x200
    [10032.273863] remoteproc remoteproc2: phdr: type 1 da 0x8284ee00 memsz 0xec filesz 0x0
    [10032.281673] remoteproc remoteproc2: phdr: type 1 da 0x8284eeec memsz 0x8 filesz 0x0
    [10032.289959] remoteproc remoteproc2: phdr: type 1 da 0x8284eef8 memsz 0x2aa0 filesz 0x2aa0
    [10032.298223] remoteproc remoteproc2: phdr: type 1 da 0x8c920000 memsz 0x8004 filesz 0x0
    [10032.307122] remoteproc remoteproc2: registered virtio0 (type 7)
    [10032.313125] remoteproc remoteproc2: remote processor 40800000.dsp is now up
    [10032.371023] ------------[ cut here ]------------
    [10032.375679] WARNING: CPU: 0 PID: 0 at drivers/bus/omap_l3_noc.c:147 l3_interrupt_handler+0x25c/0x36c
    [10032.384853] 44000000.ocp:L3 Custom Error: MASTER DSP1_MDMA TARGET L4_PER1_P3 (Read): Data Access in User mode during Functional access
    [10032.396989] Modules linked in: rpmsg_core omap_remoteproc remoteproc virtio virtio_ring usb_f_ecm g_ether usb_f_rndis u_ether libcomposite extcon_palmas gcm ccm arc4 dwc3 udc_core usb_common uio_pdrv_genirq uio phy_omap_usb2 dwc3_omap bridge stp llc xt_tcpudp ipv6 iptable_filter ip_tables x_tables hw_info bq40z60_battery mpu9250(C) extcon_usb_gpio extcon_core rtc_isl1208 omap_wdt
    [10032.433808] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G C 4.9.69 #3
    [10032.440972] Hardware name: Generic DRA74X (Flattened Device Tree)
    [10032.447090] Backtrace:
    [10032.449564] [<c020ba88>] (dump_backtrace) from [<c020bd44>] (show_stack+0x18/0x1c)
    [10032.457169] r7:00000009 r6:600e0193 r5:00000000 r4:c1223058
    [10032.462860] [<c020bd2c>] (show_stack) from [<c050c460>] (dump_stack+0x8c/0xa0)
    [10032.470118] [<c050c3d4>] (dump_stack) from [<c022f0b4>] (__warn+0xec/0x104)
    [10032.477112] r7:00000009 r6:c0bcc208 r5:00000000 r4:c1201d10
    [10032.482795] [<c022efc8>] (__warn) from [<c022f10c>] (warn_slowpath_fmt+0x40/0x48)
    [10032.490313] r9:00000006 r8:de20d950 r7:c0bcc4fc r6:00000002 r5:c0bcc134 r4:c0bcc1d8
    [10032.498092] [<c022f0d0>] (warn_slowpath_fmt) from [<c053f378>] (l3_interrupt_handler+0x25c/0x36c)
    [10032.507001] r3:de20d7c0 r2:c0bcc1d8
    [10032.510590] r4:80080003
    [10032.513139] [<c053f11c>] (l3_interrupt_handler) from [<c0279ecc>] (__handle_irq_event_percpu+0xb4/0x138)
    [10032.522662] r10:c1247250 r9:de210100 r8:00000017 r7:c1201e34 r6:00000000 r5:de210100
    [10032.530525] r4:de20dcc0
    [10032.533070] [<c0279e18>] (__handle_irq_event_percpu) from [<c0279f74>] (handle_irq_event_percpu+0x24/0x60)
    [10032.542768] r10:c1203198 r9:c1200000 r8:de008000 r7:00000000 r6:c1208894 r5:de210100
    [10032.550631] r4:de210100
    [10032.553177] [<c0279f50>] (handle_irq_event_percpu) from [<c0279ff0>] (handle_irq_event+0x40/0x64)
    [10032.562087] r5:de210160 r4:de210100
    [10032.565681] [<c0279fb0>] (handle_irq_event) from [<c027d6e0>] (handle_fasteoi_irq+0xc0/0x190)
    [10032.574244] r7:00000000 r6:c1208894 r5:de210160 r4:de210100
    [10032.579933] [<c027d620>] (handle_fasteoi_irq) from [<c0279108>] (generic_handle_irq+0x2c/0x3c)
    [10032.588581] r7:00000000 r6:00000000 r5:00000017 r4:c104fe28
    [10032.594269] [<c02790dc>] (generic_handle_irq) from [<c027967c>] (__handle_domain_irq+0x64/0xbc)
    [10032.603007] [<c0279618>] (__handle_domain_irq) from [<c02014a0>] (gic_handle_irq+0x40/0x7c)
    [10032.611396] r9:c1200000 r8:fa213000 r7:fa212000 r6:c1201ef0 r5:fa21200c r4:c1203500
    [10032.619178] [<c0201460>] (gic_handle_irq) from [<c08b76f8>] (__irq_svc+0x58/0x8c)
    [10032.626692] Exception stack(0xc1201ef0 to 0xc1201f38)
    [10032.631766] 1ee0: 00000001 00000000 fe600000 00000000
    [10032.639980] 1f00: c1200000 c120313c 00000001 c1203190 00000000 00000000 c1203198 c1201f4c
    [10032.648193] 1f20: c1201f2c c1201f40 c02212f0 c0208e70 600e0013 ffffffff
    [10032.654837] r9:c1200000 r8:00000000 r7:c1201f24 r6:ffffffff r5:600e0013 r4:c0208e70
    [10032.662620] [<c0208e48>] (arch_cpu_idle) from [<c08b6bbc>] (default_idle_call+0x28/0x34)
    [10032.670749] [<c08b6b94>] (default_idle_call) from [<c026f374>] (cpu_startup_entry+0x1b4/0x230)
    [10032.679402] [<c026f1c0>] (cpu_startup_entry) from [<c08b1e28>] (rest_init+0x8c/0x90)
    [10032.687174] r7:ffffffff
    [10032.689724] [<c08b1d9c>] (rest_init) from [<c0e00d88>] (start_kernel+0x3dc/0x3e8)
    [10032.697237] r5:00000000 r4:c126004c
    [10032.700828] [<c0e009ac>] (start_kernel) from [<80008090>] (0x80008090)
    [10032.707383] ---[ end trace 0fb35075e607bfc5 ]---
    [10032.715685] virtio_rpmsg_bus virtio0: rpmsg host is online
    [10032.739318] remoteproc remoteproc3: powering up 41000000.dsp
    [10032.745047] remoteproc remoteproc3: Booting fw image dra7-dsp2-fw.xe66, size 7426480
    [10032.760073] omap_hwmod: mmu0_dsp2: _wait_target_disable failed
    [10032.765969] omap-iommu 41501000.mmu: 41501000.mmu: version 3.0
    [10032.771910] omap-iommu 41502000.mmu: 41502000.mmu: version 3.0
    [10032.818115] remoteproc remoteproc3: phdr: type 1 da 0x800000 memsz 0xce0 filesz 0x0
    [10032.825847] remoteproc remoteproc3: phdr: type 1 da 0x80000000 memsz 0x27e04c8 filesz 0x4c4
    [10032.870317] remoteproc remoteproc3: phdr: type 1 da 0x827e04c8 memsz 0x18 filesz 0x18
    [10032.878334] remoteproc remoteproc3: phdr: type 1 da 0x827e04e0 memsz 0x4bf40 filesz 0x4bf40
    [10032.887233] remoteproc remoteproc3: phdr: type 1 da 0x8282c420 memsz 0x10000 filesz 0x0
    [10032.895368] remoteproc remoteproc3: phdr: type 1 da 0x8283c420 memsz 0xd7a2 filesz 0xd7a2
    [10032.903690] remoteproc remoteproc3: phdr: type 1 da 0x82849bc8 memsz 0x3542 filesz 0x0
    [10032.911674] remoteproc remoteproc3: phdr: type 1 da 0x8284d10c memsz 0x948 filesz 0x948
    [10032.919728] remoteproc remoteproc3: phdr: type 1 da 0x8284da58 memsz 0x1120 filesz 0x0
    [10032.927708] remoteproc remoteproc3: phdr: type 1 da 0x8284eb78 memsz 0x8 filesz 0x8
    [10032.935426] remoteproc remoteproc3: phdr: type 1 da 0x8284ec00 memsz 0x200 filesz 0x200
    [10032.943491] remoteproc remoteproc3: phdr: type 1 da 0x8284ee00 memsz 0xec filesz 0x0
    [10032.951291] remoteproc remoteproc3: phdr: type 1 da 0x8284eeec memsz 0x8 filesz 0x0
    [10032.958987] remoteproc remoteproc3: phdr: type 1 da 0x8284eef8 memsz 0x2aa0 filesz 0x2aa0
    [10032.967239] remoteproc remoteproc3: phdr: type 1 da 0x8c920000 memsz 0x8004 filesz 0x0
    [10032.975911] omap-iommu 41501000.mmu: iommu fault: da 0x8ca80000 flags 0x0
    [10032.982734] remoteproc remoteproc3: crash detected in 41000000.dsp: type mmufault
    [10032.990256] omap-iommu 41501000.mmu: 41501000.mmu: errs:0x00000002 da:0x8ca80000 pgd:0xdcc0a328 *pgd:0x9cc04801 pte:0xdcc04a00 *pte:0xf1000001
    [10033.004375] virtio_rpmsg_bus virtio1: rpmsg host is online
    [10033.009924] remoteproc remoteproc3: registered virtio1 (type 7)
    [10033.015910] remoteproc remoteproc3: remote processor 41000000.dsp is now up
    [10033.024508] remoteproc remoteproc3: handling crash #1 in 41000000.dsp
    [10033.030999] remoteproc remoteproc3: recovering 41000000.dsp
    [10033.049642] remoteproc remoteproc3: cleaning up carveout allocations
    [10033.063024] remoteproc remoteproc3: cleaning up vdev allocations
    [10033.069091] remoteproc remoteproc3: finished cleaning up vdev allocations
    [10033.083077] omap_hwmod: mmu1_dsp2: _wait_target_disable failed
    [10033.096006] omap_hwmod: mmu0_dsp2: _wait_target_disable failed
    [10033.101919] remoteproc remoteproc3: stopped remote processor 41000000.dsp
    [10033.108754] remoteproc remoteproc3: powering up 41000000.dsp
    [10033.124929] remoteproc remoteproc3: Booting fw image dra7-dsp2-fw.xe66, size 7426480
    [10033.139876] omap_hwmod: mmu0_dsp2: _wait_target_disable failed
    [10033.145772] omap-iommu 41501000.mmu: 41501000.mmu: version 3.0
    [10033.151702] omap-iommu 41502000.mmu: 41502000.mmu: version 3.0
    [10033.197856] remoteproc remoteproc3: phdr: type 1 da 0x800000 memsz 0xce0 filesz 0x0
    [10033.205589] remoteproc remoteproc3: phdr: type 1 da 0x80000000 memsz 0x27e04c8 filesz 0x4c4
    [10033.250036] remoteproc remoteproc3: phdr: type 1 da 0x827e04c8 memsz 0x18 filesz 0x18
    [10033.257924] remoteproc remoteproc3: phdr: type 1 da 0x827e04e0 memsz 0x4bf40 filesz 0x4bf40
    [10033.266691] remoteproc remoteproc3: phdr: type 1 da 0x8282c420 memsz 0x10000 filesz 0x0
    [10033.274818] remoteproc remoteproc3: phdr: type 1 da 0x8283c420 memsz 0xd7a2 filesz 0xd7a2
    [10033.283128] remoteproc remoteproc3: phdr: type 1 da 0x82849bc8 memsz 0x3542 filesz 0x0
    [10033.291114] remoteproc remoteproc3: phdr: type 1 da 0x8284d10c memsz 0x948 filesz 0x948
    [10033.299167] remoteproc remoteproc3: phdr: type 1 da 0x8284da58 memsz 0x1120 filesz 0x0
    [10033.307143] remoteproc remoteproc3: phdr: type 1 da 0x8284eb78 memsz 0x8 filesz 0x8
    [10033.314853] remoteproc remoteproc3: phdr: type 1 da 0x8284ec00 memsz 0x200 filesz 0x200
    [10033.322923] remoteproc remoteproc3: phdr: type 1 da 0x8284ee00 memsz 0xec filesz 0x0
    [10033.330729] remoteproc remoteproc3: phdr: type 1 da 0x8284eeec memsz 0x8 filesz 0x0
    [10033.338430] remoteproc remoteproc3: phdr: type 1 da 0x8284eef8 memsz 0x2aa0 filesz 0x2aa0
    [10033.346681] remoteproc remoteproc3: phdr: type 1 da 0x8c920000 memsz 0x8004 filesz 0x0
    [10033.355307] omap-iommu 41501000.mmu: iommu fault: da 0x8ca80000 flags 0x0
    [10033.362129] remoteproc remoteproc3: crash detected in 41000000.dsp: type mmufault
    [10033.369650] omap-iommu 41501000.mmu: 41501000.mmu: errs:0x00000002 da:0x8ca80000 pgd:0xdd7d2328 *pgd:0x9d770c01 pte:0xdd770e00 *pte:0xf1000001
    [10033.383499] virtio_rpmsg_bus virtio1: rpmsg host is online
    [10033.389043] remoteproc remoteproc3: registered virtio1 (type 7)
    [10033.395034] remoteproc remoteproc3: remote processor 41000000.dsp is now up
  • Hi Gerard,

    1. The L3_NoC error on DSP1 boot seems to suggest you are accessing a peripheral from DSP-side that is not clocked. If this is a timer, I suggest you make sure you have ported your board-specific dts changes to 4.9 kernel as well.
    2. You can enable the traces in remoteproc core, and make sure the MMU mappings are indeed present. If you were mapping carveouts set aside in DTS files, make sure those changes are ported to the newer kernel DTS file as well.

    regards
    Suman
  • Suman Anna said:

    1. The L3_NoC error on DSP1 boot seems to suggest you are accessing a peripheral from DSP-side that is not clocked. If this is a timer, I suggest you make sure you have ported your board-specific dts changes to 4.9 kernel as well.

    How can I tell conclusively what peripheral is being accessed and causing this problem? We did port our board-specific dts changes to 4.9.

    Suman Anna said:

    2. You can enable the traces in remoteproc core, and make sure the MMU mappings are indeed present. If you were mapping carveouts set aside in DTS files, make sure those changes are ported to the newer kernel DTS file as well.

    That was an oops on my end - it was due to the CMEM module not being loaded for my test.

    Thanks

  • More information:

    Problem 1 - L3 Error:

    If we change the DSP1 and DSP2 CMA pools from 0xE000 0000 (128M) and 0xE800 0000 (128M) to 0x9500 0000 (64M) and 0x9900 0000 (64M), respectively, the L3 error upon loading DSP1 goes away. Why does this matter?

    Problem 2 - MMU Fault:

    I spoke too quickly regarding the lack of having a loaded CMEM module as being the cause of the MMU error; the MMU fault still happens randomly. It's for an address that is in both the CMEM dtsi file in Linux and the DSP's resource table.

    Kernel Log:

    [ 3197.822547] remoteproc remoteproc3: mapped devmem pa 0xf1000000, da 0x8ca80000, len 0x800000

    ...

    [ 3197.943371] omap-iommu 41501000.mmu: iommu fault: da 0x8ca80000 flags 0x0
    [ 3197.943377] remoteproc remoteproc3: crash detected in 41000000.dsp: type mmufault
    [ 3197.943387] omap-iommu 41501000.mmu: 41501000.mmu: errs:0x00000002 da:0x8ca80000 pgd:0xde7d2328 *pgd:0x9eba2c01 pte:0xdeba2e00 *pte:0xf1000001

    CMEM Info:

    -bash-4.3# cat /proc/cmem

    Block 0: Pool 0: 1 bufs size 0x1000000 (0x1000000 requested)

    Pool 0 busy bufs:

    Pool 0 free bufs:
    id 0: phys addr 0xf0000000

    Block 1: Pool 0: 1 bufs size 0x1000000 (0x1000000 requested)

    Pool 0 busy bufs:

    Pool 0 free bufs:
    id 0: phys addr 0xf1000000

    Device Tree Entry:

    / {
    reserved-memory {
    #address-cells = <2>;
    #size-cells = <2>;
    ranges;

    /* 0xf000_0000 - 0xf0ffffff */
    cmem_block_mem_0: cmem_block_mem@f0000000 {
    reg = <0x0 0xf0000000 0x0 0x01000000>;
    no-map;
    status = "okay";
    };

    /* 0xf100_0000 - 0xf1ffffff */
    cmem_block_mem_1: cmem_block_mem@f1000000 {
    reg = <0x0 0xf1000000 0x0 0x01000000>;
    no-map;
    status = "okay";
    };

    };

    cmem {
    compatible = "ti,cmem";
    #address-cells = <1>;
    #size-cells = <0>;

    #pool-size-cells = <2>;

    status = "okay";

    /* Pool 0 - DSP1's FPGA Buffer */
    cmem_block_0: cmem_block@0 {
    reg = <0>;
    memory-region = <&cmem_block_mem_0>;
    cmem-buf-pools = <1 0x0 0x01000000>;
    };

    /* Pool 1 - DSP2's FPGA Buffer */
    cmem_block_1: cmem_block@1 {
    reg = <1>;
    memory-region = <&cmem_block_mem_1>;
    cmem-buf-pools = <1 0x0 0x01000000>;
    };

    };
    };

     

  • Hi Gerard,

    You are facing 2 different issues, I suggest you debug one processor at a time disabling all others (Mark status="disabled" in the respective board dts file). 

    1. l3_noc traces only can print limited information based on the bus transactions. One cannot tell which peripheral exactly, but the initiator port and the destination port. If you see a Standard Error instead of a Custom error, you can atleast see an offset of the address you are accessing. Are you seeing the l3_noc error also occasionally or every boot? The addresses should not have any bearings on peripheral faults.

    2.The error traces on MMU faults you have seems to be actually reflecting the proper addresses in the TLBs. So, very strange that you are seeing the MMU faults. 

     da:0x8ca80000 => device address

    pgd:0xde7d2328 => this is kernel address of the L1 page table entry (0xde7d0000 + 0x2328) corresponding to 0x8ca entry (0x8ca * 4 = 0x2328) 

    *pgd:0x9eba2c01 => value of the entry, does suggest the L2 page directory is at address 0x9eba2c00.

    pte:0xdeba2e00 => kernel address of the L2 page table entry (0x9eba2c00 + 0x200 for 0x80000 offset) associated with the da address

    *pte:0xf1000001 => value of the L2 page table entry pointing to the address 0xf1000000.

    Normally, any MMU faults, it is expected that either of *pgd or *pte is 0. How frequent is the MMU fault seen? Can you try reverting the following patches, and see if you see a difference:

    a313fc4938b8 iommu/omap: Use DMA-API for performing cache flushes
    a3dbaf260b75 Revert "HACK: iommu/omap: flush page table entries from L2 cache"
    a4dd5962ba57 Revert "TEMP: iommu/omap: fix range for cache flush operations"

    regards

    Suman

  • Suman Anna said:

    1. l3_noc traces only can print limited information based on the bus transactions. One cannot tell which peripheral exactly, but the initiator port and the destination port. If you see a Standard Error instead of a Custom error, you can atleast see an offset of the address you are accessing. Are you seeing the l3_noc error also occasionally or every boot? The addresses should not have any bearings on peripheral faults.

    Almost every boot results in the l3_noc error for DSP1. The times I have not seen the l3_noc error I have seen the same MMU fault error I see with DSP2.

    Suman Anna said:

    Can you try reverting the following patches, and see if you see a difference:

    a313fc4938b8 iommu/omap: Use DMA-API for performing cache flushes
    a3dbaf260b75 Revert "HACK: iommu/omap: flush page table entries from L2 cache"
    a4dd5962ba57 Revert "TEMP: iommu/omap: fix range for cache flush operations"

    Schedule and budget for my program are currently very challenged - I will try to undo some of listed commits to help determine what changes have caused these problems in the new kernels. In parallel could you try to recreate the problem on the TI5728 EVM? I suspect if you simply change the CMA pool addresses and include some CMEM reserved blocks that you will see the same exact problem.

    Thanks

  • And let me add that I sometimes see a successful start for both DSP1 and DSP2 - but that is the exception not the rule.
  • It is hard for us to reproduce the issues you are facing using our own images. Please work with Rex so that he can try to recreate the issues, it would be easier if you can share your firmware images and all it takes is to boot the cores. I take it that the DTS changes required are the same as above.
  • Suman Anna said:
    Please work with Rex so that he can try to recreate the issues, it would be easier if you can share your firmware images and all it takes is to boot the cores. I take it that the DTS changes required are the same as above.

    Yes, the DTS changes are included in the earlier post. All you need to do is rebuild the demo DSP1/DSP2 images that are shipped with the Processor SDK with the memory map included in the DTS I posted. To get closer to our use case you could carve out a large chunk (we have a 45M carveout for external memory, for example) and also reference a CMEM block. The DSP image itself is not important - just the memory mapping.

    Please let me know if you need any additional info.

    Thanks

  • Hi, Gerard,

    I'll change the CMA and CMEM to match your configuration. I'll get back to you after giving it a try.

    Rex
  • Hi Gerard,

    The commit a313fc4938b8 ("iommu/omap: Use DMA-API for performing cache flushes") has a bug in terms of flushing some L2 entries (any entry beyond the first L2 PTE). The above reverts should help you to proceed with your MMU faults for now as that brings the PTE flushes code back to parity to 4.4 codebase. I will fix it formally without reverting on our trees for the next release.

    regards
    Suman
  • Suman Anna said:


    The commit a313fc4938b8 ("iommu/omap: Use DMA-API for performing cache flushes") has a bug in terms of flushing some L2 entries (any entry beyond the first L2 PTE). The above reverts should help you to proceed with your MMU faults for now as that brings the PTE flushes code back to parity to 4.4 codebase. I will fix it formally without reverting on our trees for the next release.

    I'll be doing further testing to confirm, but it looks like the L3 errors and MMU faults go away with the revert of that commit. I see a Processor SDK 5.x release just happened - is the formal fix for that contained within?

    Thanks

  • Hi Gerard,

    The just released Processor SDK 5.x is based on 4.14 kernel, and doesn't have the fix yet. I have pushed the fix to the IOMMU fault bug to my iommu feature branches, please cherry-pick the following commit against your 4.9 kernel
    git.ti.com/.../

    AFAIK, there are no further releases on 4.9 based kernel, so you would have to cherry-pick fixes onto your baseline if sticking with 4.9 kernel. All the newer Processor SDK 5.x releases for the rest of the year will be based on 4.14 kernel.

    regards
    Suman
  • Hi Gerard,

    Have you moved to 4.14 kernel, or still looking for fixes on 4.9 kernel? I have finalized my patches

    regards
    Suman
  • Hello,

    We are still using the 4.9 kernel so fixes for that would be great.

    Thanks,
    Gerard

  • Hi Gerard,

    OK, I have pushed all the fixes to relevant branches. You would have to cherry-pick the patches or merge my branch directly into your 4.9-based kernel since we do not have any further formal Processor SDK releases based on 4.9 kernel.

    git pull git://git.ti.com/rpmsg/rpmsg.git rpmsg-ti-linux-4.9.y

    The specific individual kernel patches you are looking for would be:
    28101ffb24bd remoteproc: fix cleanup on firmware version processing failures
    0594b4454b1b remoteproc: fix kernel crashes for rprocs using HighMem CMA pools
    eda0921e8737 iommu/omap: Fix cache flushes on L2 page table entries

    The top-most commit is an unrelated cleanup fix to the issues you reported on this thread (mostly moot if none of your firmwares do not have the .version section), it is only applicable on 4.9 kernels.

    regards
    Suman