Kernel oops during H.264 video decode on DM365

Eric Riley

Normal 0 false false false EN-US X-NONE X-NONE MicrosoftInternetExplorer4

I’m seeing an intermittent kernel failure during video decode on the DM365. There is more than one type of failure, but at least 90% of them look like this:

Normal 0 false false false EN-US X-NONE X-NONE MicrosoftInternetExplorer4

Unable to handle kernel paging request at virtual address afd24814

pgd = c2370000

[afd24814] *pgd=00000000

Internal error: Oops: 5 [#1]

Modules linked in: dm365mmap edmak irqk cmemk linx_eth_cm linx regrw

CPU: 0

PC is at lnhcb_deliver+0xc7c/0xd80 [linx]

LR is at all_conns_connected+0x80/0x88 [linx]

pc : [<bf0250c4>] lr : [<bf021798>] Not tainted

sp : c2285e38 ip : c2285e20 fp : c2285e94

r10: 00000001 r9 : c389b2c0 r8 : 00010005

r7 : ef06c814 r6 : c0ba4560 r5 : 3bc1b205 r4 : 00000001

r3 : c0cb8000 r2 : 3bc1b205 r1 : 00000001 r0 : 00000001

Flags: nzCv IRQs on FIQs on Mode SVC_32 Segment user

Control: 5317F

Table: 82370000 DAC: 00000015

Process tsApp (pid: 786, stack limit = 0xc2284258)

Stack: (0xc2285e38 to 0xc2286000)

5e20: c0ada800 c0ada800

5e40: c2285e6c c2285e50 bf04721c 3bc1b205 00000002 c389b2c0 00000000 00000001

5e60: c2285e94 c2285e70 c01b3bc8 0000000c 3bc1b205 c389b2c0 00000001 c0ada800

5e80: 00000000 00000000 c2285edc c2285e98 bf046e1c bf024458 0000000c c389b2c0

5ea0: 00000000 00000002 00000000 f0007fff 00000001 c0ada800 c389b2c0 0000097b

5ec0: 00000000 c0ada8c4 00000001 00000000 c2285f04 c2285ee0 bf04750c bf046a6c

5ee0: c0ada8d0 00000000 00000004 00000000 00000009 c02d51c0 c2285f24 c2285f08

5f00: c0052d3c bf047324 00000001 c02d5210 00000102 c2284000 c2285f34 c2285f28

5f20: c0052dd4 c0052cd4 c2285f64 c2285f38 c00530d4 c0052dac c2285f74 00400140

5f40: c2284000 c2285fb0 00000001 00000000 c2284000 001cfa04 c2285f7c c2285f68

5f60: c00531a8 c0053088 00000035 c026ad40 c2285f8c c2285f80 c005352c c0053184

5f80: c2285fac c2285f90 c0038bc4 c00534f4 00000001 ffffffff fbc48000 001cebac

5fa0: 00000000 c2285fb0 c0037c2c c0038b90 42deb000 436fe3c8 00000168 00000001

5fc0: 436feb68 001d08f4 001cebac 001dc738 42de5000 00000000 001cfa04 436fe0e4

5fe0: 00000018 436fe0c0 0006e1bc 00087388 20000010 ffffffff 00000000 00000000

Backtrace:

[<bf024448>] (lnhcb_deliver+0x0/0xd80 [linx]) from [<bf046e1c>] (rx_tasklet_recv+0x3c0/0x3d8 [linx_eth_cm])

[<bf046a5c>] (rx_tasklet_recv+0x0/0x3d8 [linx_eth_cm]) from [<bf04750c>] (rx_tasklet+0x1f8/0x244 [linx_eth_cm])

[<bf047314>] (rx_tasklet+0x0/0x244 [linx_eth_cm]) from [<c0052d3c>] (__tasklet_action+0x78/0x94)

[<c0052cc4>] (__tasklet_action+0x0/0x94) from [<c0052dd4>] (tasklet_action+0x38/0x40)

r7 = C2284000 r6 = 00000102 r5 = C02D5210 r4 = 00000001

[<c0052d9c>] (tasklet_action+0x0/0x40) from [<c00530d4>] (___do_softirq+0x5c/0xfc)

[<c0053078>] (___do_softirq+0x0/0xfc) from [<c00531a8>] (__do_softirq+0x34/0x50)

[<c0053174>] (__do_softirq+0x0/0x50) from [<c005352c>] (irq_exit+0x48/0x64)

r5 = C026AD40 r4 = 00000035

[<c00534e4>] (irq_exit+0x0/0x64) from [<c0038bc4>] (asm_do_IRQ+0x44/0x50)

[<c0038b80>] (asm_do_IRQ+0x0/0x50) from [<c0037c2c>] (__irq_usr+0x4c/0xa0)

r6 = 001CEBAC r5 = FBC48000 r4 = FFFFFFFF

Code: e5893000 e51b2048 e5963010 e1a07102 (e7935007)

<1>Unable to handle kernel NULL pointer dereference at virtual address 00000000

pgd = c2370000

[00000000] *pgd=82286031, *pte=00000000, *ppte=00000000

I’m running a test that decodes the same 5 segments of video repeatedly. The 5 videos are encoded with the h.264 encoder at a resolution of 720 x 480, muxed with audio and subtitle data into an MPEG-2 transport stream, and range in length from 20 seconds to 90 seconds. The 5 videos often decode successfully as many as 10 times or more before the failure occurs while decoding one of the videos, but sometimes the failure occurs the first time that one of the videos is decoded. However, any one of the 5 videos may be the one that fails, and the frame number varies as well. I have also re-recorded the 5 videos and thus have completely new video data, but the failure still occurs.

You can see from the oops console output that the failure appears to be in ‘linx’, which is 3^rd party IPC software used in our system. However, the failure occurs only when we are actually decoding video via the VIDDEC2_process() call. If I perform all processing of the video up to the VIDDEC2_process() call and stop at that point, the failure doesn’t occur even with overnight testing (note that in this test case we are still decoding audio and subtitle and handling much linx message traffic). If I perform the VIDDEC2_process() call but then bypass all remaining processing to DMA the decoded data to the display buffers and display the video, the problem still occurs. The problem has never been observed while encoding, only decoding. Thus it seems to be isolated to the VIDDEC2_process() call.

It is important to note that our application is configured to perform encode or decode, but never both at the same time. Therefore after each decode, our application software is killed and restarted with a configuration as an encoder. When the subsequent video decode is started approximately 10 seconds later, the application is again killed and restarted, but it is restarted with a configuration as a decoder. So all dvsdk resources are being cleaned up and re-allocated with each individual decode session. Also, after each decode session the cmemk, irqk, edmak, and dm365mmap kernel modules are removed and re-installed. This occurs after the application is killed as a decoder and before it is restarted as an encoder.

The video data is encoded using the same processor and dvsdk that is attempting to decode it. Also, we calculate a CRC on each video frame and add it to the transport stream header for that frame, and then check the CRC just before making the VIDDEC2_process() call. The CRC always checks out OK, so I think we can rule out corruption of the video data between encoding and decoding.

The dvsdk I’m using is ‘udworks-v2.1-02_10_01_18’ and contains the following component versions:

codec_engine_2_24

dmai_1_21_00_10

dvtb_4_10_03

xdais_6_24

cg_xml_2_12_00

dm365_2_10_01_18_release_notes.html

edma3_lld_1_06_00_01

xdctools_3_15_01_59

dm365_codecs_01_00_06

linuxutils_2_24_03

dvsdk_demos_2_10_00_17

framework_components_2_25_00_04

…with the platinum codecs installed:

H.264 High Profile DM365 Encoder 02.00.00.08

H.264 High Profile DM365 Decoder 02.00.00.05

However, the problem was also observed with our software build that has these codec versions:

H.264 High Profile DM365 Encoder 01.20.00.05

H.264 High Profile DM365 Decoder 01.10.00.04

Our decoder configuration is:

IH264VDEC_Params tParams;

IH264VDEC_DynamicParams tDynamicParams;

tParams.viddecParams.maxHeight = 720;

tParams.viddecParams.maxWidth = 1280;

tParams.viddecParams.size = sizeof (IH264VDEC_Params);

tParams.viddecParams.maxFrameRate = 30000;

tParams.viddecParams.maxBitRate = 0;

tParams.viddecParams.dataEndianness = XDM_BYTE;

tParams.viddecParams.forceChromaFormat = XDM_YUV_420SP;

tParams.hdvicpHandle = NULL;

tParams.displayDelay = 16;

tParams.levelLimit = 0;

tParams.disableHDVICPeveryFrame = 0;

tParams.inputDataMode = 1;

tParams.sliceFormat = 1;

tParams.frame_closedloop_flag = 0;

// Set video decoder dynamic params

tDynamicParams.viddecDynamicParams.size = sizeof (IH264VDEC_DynamicParams);

tDynamicParams.viddecDynamicParams.decodeHeader = XDM_DECODE_AU;

tDynamicParams.viddecDynamicParams.displayWidth = 0;

tDynamicParams.viddecDynamicParams.frameSkipMode = IVIDEO_NO_SKIP;

tDynamicParams.viddecDynamicParams.frameOrder = IVIDDEC2_DISPLAY_ORDER;

tDynamicParams.viddecDynamicParams.newFrameFlag = XDAS_FALSE;

tDynamicParams.viddecDynamicParams.mbDataFlag = XDAS_FALSE ;

tDynamicParams.getDataFxn = NULL;

tDynamicParams.dataSyncHandle = NULL;

tDynamicParams.resetHDVICPeveryFrame = 1;

And a typical set of decoder input arguments is:

VIDDEC2_InArgs tInArgs;

XDM1_BufDesc tInBufDesc;

tInBufDesc.descs[0].bufSize = 6417;

tInBufDesc.descs[0].buf = 0x47b10000;

tInBufDesc.descs[0].accessMask = 0;

tInBufDesc.numBufs = 1;

tInArgs.numBytes = 6417;

tInArgs.inputID = 3;

tInArgs.size = sizeof (VIDDEC2_InArgs);

My questions are:

1) Is there a known problem with corruption of the kernel while decoding video on the DM365?

2) Are there any suggested methods of further isolating this problem to find the root cause?

over 15 years ago

0 Vincent W. over 15 years ago

TI__Genius 12865 points

Hi Eric,

I have never seen this type of crash before while working with the platinum H264 decoder 2.00.00.10 in DVSDK 4.0 on the DM365 EVM (Rev E). So a few suggestions:

- Try the provided clips in the DVSDK to see if you can reproduce this issue to rule out the encoder.

- Try a different EVM if you have any to make sure it is not because you got a bad copy.

- Remove linx and see if the issue goes away.

- Try to install DVSDK 4.0 (if possible) and see if you still come across this issue with newer components.

Best regards,

Vincent

0 Sanjeev Premi over 15 years ago

TI__Expert 4590 points

Eric Riley said:

PC is at lnhcb_deliver+0xc7c/0xd80 [linx]
LR is at all_conns_connected+0x80/0x88 [linx]
pc : [<bf0250c4>] lr : [<bf021798>] Not tainted
sp : c2285e38 ip : c2285e20 fp : c2285e94
r10: 00000001 r9 : c389b2c0 r8 : 00010005
r7 : ef06c814 r6 : c0ba4560 r5 : 3bc1b205 r4 : 00000001
r3 : c0cb8000 r2 : 3bc1b205 r1 : 00000001 r0 : 00000001

Can you work backwards from the PC value above? There must be simpler reason for the NULL pointer exception.

The trace seem to point at interrupt context:

Eric Riley said:

[<c0053174>] (__do_softirq+0x0/0x50) from [<c005352c>] (irq_exit+0x48/0x64)
r5 = C026AD40 r4 = 00000035
[<c00534e4>] (irq_exit+0x0/0x64) from [<c0038bc4>] (asm_do_IRQ+0x44/0x50)
[<c0038b80>] (asm_do_IRQ+0x0/0x50) from [<c0037c2c>] (__irq_usr+0x4c/0xa0)

Is the "linx" code hardened for asynchronous events (interrupts)? Esp. critical sections and mutual exclusion for shared data structures?

0 mayur nikumbh over 10 years ago in reply to Sanjeev Premi

Intellectual 540 points

Hi all,

I'm currently trying to use TI encoder's i:e TIvidenc1 using the linux filesystem from DVSDK 4.02.00.06 and a upgraded kernel from the PSP 03.21.00.04 (that is Linux 2.6.37).

But Kernel oops during H.264 video encode on DM368 with following debug

unable to handle kernel NULL pointer dereference at virtual address 00000000
pgd = c2568000
[00000000] *pgd=82564031, *pte=00000000, *ppte=00000000
Internal error: Oops: 817 [#1] PREEMPT
last sysfs file: /sys/kernel/uevent_seqnum
Modules linked in: dm365mmap edmak irqk cmemk ipv6
CPU: 0 Not tainted (2.6.37 #2)
PC is at __down_interruptible+0x34/0xf0
LR is at down_interruptible+0x44/0x78
pc : [<c0350c3c>] lr : [<c0062298>] psr: 60000093
sp : c2567e70 ip : c2567ea0 fp : c2567e9c
r10: bf05a5d8 r9 : bf059b54 r8 : 00000190
r7 : 0000000a r6 : c3473200 r5 : c2566000 r4 : bf059d00
r3 : 00000000 r2 : c2567e70 r1 : 00000000 r0 : bf059d00
Flags: nZCv IRQs off FIQs on Mode SVC_32 ISA ARM Segment user
Control: 0005317f Table: 82568000 DAC: 00000015
Process ticapturesrc0:s (pid: 1455, stack limit = 0xc2566270)
Stack: (0xc2567e70 to 0xc2568000)
7e60: bf059d04 c003cc9c c2567e9c c2567e88
7e80: c003cc9c bf059d00 a0000013 412d96f0 c2567eb4 c2567ea0 c0062298 c0350c18
7ea0: 0000fc00 bf059ce4 c2567efc c2567eb8 bf059304 c0062264 c2567efc 4168b000
7ec0: 00000065 c35ca300 00000004 00000000 c01bf310 c2537e70 c35ca300 00000004
7ee0: c35ca300 412d96f0 c2566000 00000000 c2567f0c c2567f00 c00b4914 bf05921c
7f00: c2567f7c c2567f10 c00b50a4 c00b48fc c2cf3200 00000028 c2566000 c0464d0c
7f20: c2567f4c c2567f30 00000028 00000000 c2567f4c c2567f40 c35ca300 00000004
7f40: c2567f84 00000036 c002f144 c2566000 c2567f7c 00000004 412d96f0 0000fc00
7f60: c35ca300 c002f144 c2566000 00000000 c2567fa4 c2567f80 c00b5140 c00b4b5c
7f80: 408bd214 00000001 00000004 00000000 00000000 00000036 00000000 c2567fa8
7fa0: c002efc0 c00b5110 00000004 00000000 00000004 0000fc00 412d96f0 00000000
7fc0: 00000004 00000000 00000000 00000036 40092000 00000000 408bb460 412d9814
7fe0: 408bda08 412d96f0 40773340 405dcaec 60000010 00000004 f77acf99 eddadeff
Backtrace:
[<c0350c08>] (__down_interruptible+0x0/0xf0) from [<c0062298>] (down_interruptible+0x44/0x78)
r6:412d96f0 r5:a0000013 r4:bf059d00
[<c0062254>] (down_interruptible+0x0/0x78) from [<bf059304>] (ioctl+0xf8/0x3e4 [irqk])
r5:bf059ce4 r4:0000fc00
[<bf05920c>] (ioctl+0x0/0x3e4 [irqk]) from [<c00b4914>] (vfs_ioctl+0x28/0x44)
[<c00b48ec>] (vfs_ioctl+0x0/0x44) from [<c00b50a4>] (do_vfs_ioctl+0x558/0x5b4)
[<c00b4b4c>] (do_vfs_ioctl+0x0/0x5b4) from [<c00b5140>] (sys_ioctl+0x40/0x64)
[<c00b5100>] (sys_ioctl+0x0/0x64) from [<c002efc0>] (ret_fast_syscall+0x0/0x2c)
r7:00000036 r6:00000000 r5:00000000 r4:00000004
Code: e2803004 e24b202c e50b302c e3a03000 (e5812000)
0m begin init_vi---[ end trace 0e3b2b42c2741b9c ]---
deo

0:00:01.3note: ticapturesrc0:s[1455] exited with preempt_count 1
28492375 BUG: scheduling while atomic: ticapturesrc0:s/1455/0x40000002
1454 0xdModules linked in:ccb0 LOG dm365mmap edmak TIVidenc1 irqk gsttividenc1.c: cmemk1194:gst_tividen ipv6c1_codec_start:
opening codBacktrace: ec engine "codec
Server"

0:00:[<c0032584>] (dump_backtrace+0x0/0x114) from [<c034e8ec>] (dump_stack+0x18/0x1c)
01.329625917 7:c2560f7833m 1454 r6:00000000 0xdccb0 LO r5:c3473200G r4:00000000 TIVid
enc1 gsttividenc[<c034e8d4>] (dump_stack+0x0/0x1c) from [<c003ca20>] (__schedule_bug+0x54/0x60)
1.c:1302:gst_tiv[<c003c9cc>] (__schedule_bug+0x0/0x60) from [<c034ec50>] (schedule+0x78/0x3d4)
idenc1_codec_sta r5:c3473200rt: configu r4:c3473200ring video encod
e width=480, hei[<c034ebd8>] (schedule+0x0/0x3d4) from [<c003cd58>] (__cond_resched+0x18/0x24)
ght=272, bitrate[<c003cd40>] (__cond_resched+0x0/0x24) from [<c034f150>] (_cond_resched+0x34/0x44)
=2000000

0:00[<c034f11c>] (_cond_resched+0x0/0x44) from [<c00929ec>] (__get_user_pages+0x230/0x240)
:01.330431250 <c00927bc>] (__get_user_pages+0x0/0x240) from [<c0092b04>] (get_user_pages+0x58/0x60)
333m 1454 [<c0092aac>] (get_user_pages+0x0/0x60) from [<c008cee0>] (get_user_pages_fast+0x68/0x80)
0xdccb0 L r5:00000001OG 4:c2566000m TIVi
denc1 gsttividen[<c008ce78>] (get_user_pages_fast+0x0/0x80) from [<c006b5cc>] (get_futex_key+0x98/0x158)
c1.c:1304:gst_ti[<c006b534>] (get_futex_key+0x0/0x158) from [<c006bd60>] (futex_wake+0x4c/0x138)
videnc1_codec_st r7:00000001art: openin r6:00000001g video encoder r5:412da4d8"h264enc"

0:0 r4:c2567a5c0:01.498208292
1454 [<c006bd14>] (futex_wake+0x0/0x138) from [<c006d6f0>] (do_futex+0xe4/0xd40)
0xdccb0 r8:00000000LOG 7:000000010m TIV r6:00000000idenc1 gsttivide r5:412da4d8nc1.c:1350:gst_t r4:00000001ividenc1_codec_s
tart: creat[<c006d60c>] (do_futex+0x0/0xd40) from [<c006e49c>] (sys_futex+0x150/0x164)
ing output buffe[<c006e34c>] (sys_futex+0x0/0x164) from [<c0040e10>] (mm_release+0xb0/0xc0)
r table

0:00:[<c0040d60>] (mm_release+0x0/0xc0) from [<c004508c>] (exit_mm+0x20/0x164)
01.499453000 7:c2567e2833m 1454 r6:c3473200 0xdccb0 LO r5:c2fcdc00G r4:0000000b TIVid
enc1 gsttividenc[<c004506c>] (exit_mm+0x0/0x164) from [<c0046bf8>] (do_exit+0x1c8/0x6a4)
1.c:1042:gst_tiv r7:c2567e28idenc1_init_vide r6:c3473200o: end init r5:c3473200_video

0:00:0 r4:0000000b1.500448542
1454 [<c0046a30>] (do_exit+0x0/0x6a4) from [<c0032ae8>] (die+0x1d4/0x204)
0xdccb0 LOG[<c0032914>] (die+0x0/0x204) from [<c00343b8>] (__do_kernel_fault+0x6c/0x8c)
[<c003434c>] (__do_kernel_fault+0x0/0x8c) from [<c00345a8>] (do_page_fault+0x1d0/0x1e8)
TIVide r9:c2567e28nc1 gsttividenc1 r8:00000817.c:1451:gst_tivi r7:c2fcdc00denc1_encode:6:000000000m invoking the r5:c3473200video encoder

r4:c045e064

[<c00343d8>] (do_page_fault+0x0/0x1e8) from [<c002e324>] (do_DataAbort+0x3c/0x9c)
[<c002e2e8>] (do_DataAbort+0x0/0x9c) from [<c002eb6c>] (__dabt_svc+0x4c/0x60)
Exception stack(0xc2567e28 to 0xc2567e70)
7e20: bf059d00 00000000 c2567e70 00000000 bf059d00 c2566000
7e40: c3473200 0000000a 00000190 bf059b54 bf05a5d8 c2567e9c c2567ea0 c2567e70
7e60: c0062298 c0350c3c 60000093 ffffffff
r8:00000190 r7:0000000a r6:c3473200 r5:c2567e5c r4:ffffffff
[<c0350c08>] (__down_interruptible+0x0/0xf0) from [<c0062298>] (down_interruptible+0x44/0x78)
r6:412d96f0 r5:a0000013 r4:bf059d00
[<c0062254>] (down_interruptible+0x0/0x78) from [<bf059304>] (ioctl+0xf8/0x3e4 [irqk])
r5:bf059ce4 r4:0000fc00
[<bf05920c>] (ioctl+0x0/0x3e4 [irqk]) from [<c00b4914>] (vfs_ioctl+0x28/0x44)
[<c00b48ec>] (vfs_ioctl+0x0/0x44) from [<c00b50a4>] (do_vfs_ioctl+0x558/0x5b4)
[<c00b4b4c>] (do_vfs_ioctl+0x0/0x5b4) from [<c00b5140>] (sys_ioctl+0x40/0x64)
[<c00b5100>] (sys_ioctl+0x0/0x64) from [<c002efc0>] (ret_fast_syscall+0x0/0x2c)
r7:00000036 r6:00000000 r5:00000000 r4:00000004

Can any one help me out to fix this issue ?
How i can get TI encoders working with filesystem from DVSDK 4.02.00.06 and a upgraded kernel from the PSP 03.21.00.04 (that is Linux 2.6.37) on DM368 ?

Thanks in advance
Mayur.

0 Ivan Frederiks over 9 years ago in reply to mayur nikumbh

Intellectual 370 points

Hello Mayur!

Most probably you got this error because you used patch 0001-Update-semaphore-to-avoid-MUTEX_LOCKED-error.patch for linuxutils_2_26_03_06.

Check linuxutils_2_26_03_06/packages/ti/sdo/linuxutils/irq/src/module/irqk.c, line 864

It must look like

sema_init(&channelp->completion_sem, 1);

replace it with

sema_init(&channelp->resource_sem, 1);

and try again.

Good luck!

Processors

Processors forum

Kernel oops during H.264 video decode on DM365