Context = DSP/BIOS Link 1.61.03 for Linux on OMAP3530, using XDCtools 3.10.05.61, running a DMAI 1.20 application that loads a server from omap3530_dvsdk_combos_3_12
Crashed Linux kernel as the app loaded the server:
Starting application...
Unable to handle kernel paging request at virtual address 010a0fff
...
Internal error: Oops: 805 [#1]
Modules linked in: lpm_omap3530 dsplinkk cmemk
CPU: 0 Not tainted (2.6.28-rc8-omap1 #11)
PC is at memcpy+0xfc/0x330
...
Backtrace:
[<bf01a9dc>] (MEM_Copy+0x0/0x18 [dsplinkk]) from [<bf00b6f8>] (OMAP3530_write+0xbc/0x104 [dsplinkk])
[<bf00b63c>] (OMAP3530_write+0x0/0x104 [dsplinkk]) from [<bf00b274>] (DSP_write+0x48/0x54 [dsplinkk])
r7:00000003 r6:00000200 r5:00000000 r4:00000003
[<bf00b22c>] (DSP_write+0x0/0x54 [dsplinkk]) from [<bf00d068>] (LDRV_PROC_write+0x74/0xbc [dsplinkk])
r4:00000000
[<bf00cff4>] (LDRV_PROC_write+0x0/0xbc [dsplinkk]) from [<bf018a78>] (COFF_load+0x55c/0x668 [dsplinkk])
[<bf01851c>] (COFF_load+0x0/0x668 [dsplinkk]) from [<bf01df98>] (PMGR_PROC_load+0x124/0x224 [dsplinkk])
[<bf01de74>] (PMGR_PROC_load+0x0/0x224 [dsplinkk]) from [<bf01f400>] (DRV_Ioctl+0x4b4/0x9a8 [dsplinkk])
[<bf01ef4c>] (DRV_Ioctl+0x0/0x9a8 [dsplinkk]) from [<c00bc488>] (vfs_ioctl+0x64/0x74)
r7:00000036 r6:00006c05 r5:00006c05 r4:c4cbb540
[<c00bc424>] (vfs_ioctl+0x0/0x74) from [<c00bc8ec>] (do_vfs_ioctl+0x428/0x48c)
r5:00000007 r4:c4cbb540
[<c00bc4c4>] (do_vfs_ioctl+0x0/0x48c) from [<c00bc990>] (sys_ioctl+0x40/0x64)
[<c00bc950>] (sys_ioctl+0x0/0x64) from [<c0030e40>] (ret_fast_syscall+0x0/0x2c)
r6:00057218 r5:000571bc r4:000003b1
Code: e4d1e001 c4c03001 a4c04001 e052200c (e4c0e001)
---[ end trace a7c2981c1e9d8072 ]---
Digging into the root cause, the address 0x010a0fff is not valid (neither for the ARM nor the DSP; neither virtual nor physical). So where did it come from? COFF_load() (running on the ARM) parses the section headers of the DSP executable, and copies them to an appropriate location in memory. To do the copy, LDRV_PROC_write() calls DSP_write(), which calls OMAP3530_write() with the intended DSP address for each section. OMAP3530_write() then calls OMAP3530_addrConvert() to figure out the ARM-relative address to copy the section.
The problem I encountered was that one of the section headers was not marked as "shared" (i.e., accessible to both the ARM and the DSP), so earlier driver code had not bothered to map an ARM window to the section header's destination memory region. In this particular case, the offending memory region was "DDRALGHEAP". OMAP3530_addrConvert() transforms DSP addresses to ARM addresses by starting with the DSP destination address, subtracting the base address of the DSP window (to get an offset into the window), and then adding the base address of the ARM window. (And then bytes can be copied by the ARM to the resulting ARM address.) However, since the ARM window was unmapped, the memory entry field for the ARM window base address had been left at -1, so the math ended up generating an invalid ARM address. (Specifically, a DSP address of 0x878a1000 in a window based at 0x86800000 was correctly converted to an offset of +0x010A1000, but then the invalid ARM window base address of -1 was added, resulting in the invalid/crash address of 0x010A0FFF.)
Roughly speaking, there are two options to fix this kernel crash at the driver level: 1. notice the issue and fail gracefully, or 2. notice the issue and fix it. The issue can be detected inside the scanning loop of OMAP3530_addrConvert(). Specifically, if the function has been asked to map from DSP to ARM (i.e., DspToGpp), then after "addr" has been confirmed to be IS_RANGE_VALID() for a particular memEntry, then also check that gppVirtAddr is not -1. If it is, then proceeding will soon result in a crash. If you want to fail gracefully (option 1), then return with convAddr left at ADDRMAP_INVALID (by ignoring the match, perhaps by breaking out of the loop). If instead you want to fix the issue (option 2), call MEM_Map() to give you a MEM_UNCACHED window to the physAddr.
With either fix, the DSP/BIOS Link driver will no longer crash the kernel because of DSP code or data sections not being marked as "shared".
Here is some code that can be inserted into OMAP3530_addrConvert() (in dsplink/gpp/src/arch/OMAP3530/omap3530.c) to implement option 1:
else if (type == DspToGpp) {
byteAddr = MADU_TO_BYTE (addr, dspObj->maduSize) ;
if (IS_RANGE_VALID (byteAddr,
memEntry->dspVirtAddr,
( memEntry->dspVirtAddr
+ memEntry->size))) {
if (memEntry->gppVirtAddr != (Uint32) -1) {
found = TRUE ;
convAddr = byteAddr
- memEntry->dspVirtAddr
+ memEntry->gppVirtAddr ;
}
else {
TRC_1PRINT (TRC_LEVEL4, "OMAP3530_addrConvert found entry %s unmapped\n", memEntry->name) ;
break ; /* optional */
}
}
}
Or the same code can be enhanced further to implement option 2:
else if (type == DspToGpp) {
byteAddr = MADU_TO_BYTE (addr, dspObj->maduSize) ;
if (IS_RANGE_VALID (byteAddr,
memEntry->dspVirtAddr,
( memEntry->dspVirtAddr
+ memEntry->size))) {
if (memEntry->gppVirtAddr != (Uint32) -1) {
found = TRUE ;
}
else {
MemMapInfo mapInfo ;
DSP_STATUS status ;
mapInfo.src = memEntry->physAddr ;
mapInfo.size = memEntry->size ;
mapInfo.memAttrs = MEM_UNCACHED ;
status = MEM_Map (&mapInfo) ;
if (DSP_SUCCEEDED (status)) {
TRC_1PRINT (TRC_LEVEL4, "OMAP3530_addrConvert mapped %s\n", memEntry->name) ;
memEntry->gppVirtAddr = mapInfo.dst ;
found = TRUE ;
}
else {
TRC_1PRINT (TRC_LEVEL7, "OMAP3530_addrConvert failed to map addr [0x%x]\n", addr) ;
/*TODO: SET_FAILURE_REASON ; */
/* !found */
}
}
if (found) {
convAddr = byteAddr
- memEntry->dspVirtAddr
+ memEntry->gppVirtAddr ;
}
}
}
Either option keeps the kernel driver from crashing. Option 1 results in rejecting the DSP server image (a needed code section (DDRALGHEAP for me) was not copied). Option 2 allows the DSP server image to load and run successfully.
I did experiment briefly with letting OMAP3530_addrConvert() map only at certain times (i.e., during COFF_load(), then unmap when COFF_load() was done, effectively resulting in DSP code/data that cannot be accidentally overwritten by the ARM), but found that there are other times that the ARM wants to modify the DSP code/data (for example, LDRV_DRV_handshake(DRV_HandshakeSetup) updates the value of DRV_SHMBASESYMBOL (.data:DSPLINK_shmBaseAddress)). I'm not sure how to know when the ARM will stop accessing DSP code and data (and mapping/unmapping slowly drains kernel resources), so I chose to just leave the window mapped to the ARM permanently.
Looking into the DSP server executable: This can be fixed by making sure that all the memory table entries for code and data (such as DDRALGHEAP) are marked as shared. Unfortunately, when using Codec Engine's Engine.createFromServer() (in packages/ti/sdo/ce/Engine.xs), the generated armDspLinkConfigMemTable_<serverName> (in packages/ti/sdo/ce/ipc/dsplink/Ipc.xdt) only sets the shared flag if the "type" of the memory entry is "main", "link", "reset", "poolmem", or "code". And in the case of the older code from omap3530_dvsdk_combos_3_12, the "type" in the .x64P.info.js file was "other" for DDRALGHEAP -- so its shared flag was set to 0, so the dsplink driver did not map an ARM memory window for it. (In contrast, the "type" for DDR2 was "main", so its shared flag was set to 1, so dsplink did map an ARM memory window for it.)
Glancing ahead to DSP/BIOS Link 1.62, the change to OMAP3530_addrConvert() still seems to be applicable (although I haven't looked to see if the "shared" flag might be caught and corrected at some higher level).
Do let me know if some other fix is possible or preferred. I was happy that this change not only keeps the dsplink driver from crashing the kernel, it can even compensate for slightly imperfect DSP server executables and get them loaded and running. -Dirk