Hello -
I'm hoping someone can help me with this (hoping others have seen it/have a solution).
Running on a Spectrum Digital DVEVM, using the Montavista 2.6.10 kernel as part of DVSDK 1.30/MV Pro 4.0.1, I'm seeing crashes in the kernel scheduler (NULL ptr deref in dequeue_task). It seems to happen when lots of execs are done (via bash) and/or piped input/output between these tasks.
I've tried on multiple boards - same problem.
For example, this very simple script will cause the crash. Sometimes it happens after running for (say) a half hour, sometimes after a few hours. But, it'll eventually crash:
#! /bin/sh
let counter=0
while true
do
let counter=$counter+1
## echo "=== Loop count: $counter ==="
/bin/echo $(/bin/echo 'hello') | /bin/cat > /dev/null
done
The 'echo' commands may look odd (why would you do that?), but it's really a more complicated script that I've reduced down to this simple test to replicate the problem.
If I change the echo line to this instead:
/bin/echo 'hello' | /bin/cat > /dev/null
I don't seem to ever get a crash (although, since it's "random", maybe if I left it running longer it would fail).
When running this test, nothing else of significance is running. For example, there isn't any video encode/decode demo (or any thing else like that) running. It's just booting-up from a simple DVSDK 1.30 root filesystem without starting any 'apps' (just normal startup stuff). I log-in via the serial port, then run this script.
I've seen it fail when the rootfs is on NFS, and when it's on a local NAND YAFFS partition.
It happens on a LOT of our custom boards. I've just today gone back to the original Spectrum Digital DM6446 DVEVM to confirm that the bug is there too. It is.
Has anyone seen this? If you have some spare time, would others mind trying to reproduce? We're sort of 'stuck' with the 2.6.10 kernel (for compatibility reasons), but maybe if others are running newer kernels, they could run the script and prove whether/not it's been 'fixed' since then. As I said, it can take a long time (sometimes hours) to see the failure. But, if I run it over night, I always find that it has crashed by the time I get into work. (And, often, it fails sooner - I had several crashes during the day today).
I'd really appreciate any input others can offer! This really *seems* like a kernel bug to me - but I don't know where to start looking for a 'fix'.
Thank you!
- Paul