Hello,
We run into production with our custom AM335x board. Most of the produced cards works well, but we have some instability on someone.
Everything run well and after a random amount of time... (less that one hour). The Linux freez and the board doesn't work any more. Once the CPU freezed, the processor comes hot.
Some times we catch this kind of error :
[ 2580.675000] Unable to handle kernel paging request at virtual address 3f73143b [ 2580.683189] Unable to handle kernel paging request at virtual address e1a04128 [ 2580.690787] pgd = c0004000 [ 2580.693630] [e1a04128] *pgd=00000000 [ 2580.697401] Internal error: Oops: 5 [#1] ARM [ 2580.701898] Modules linked in: cdc_acm xt_tcpudp iptable_filter ip_tables x_tables usb_f_rndis u_ether libcomposite configfs fbtft musb_dsps musb_hdrc pwm_test(O) ti_am335x_adc kfifo_buf musb_am335x industrialio snd_soc_evm snd_soc_davinci_mcasp snd_soc_tlv320aic3x [ 2580.726703] CPU: 0 PID: 0 Comm: Tainted: G O 3.14.49-ge9cd4cc819 #2 [ 2580.734311] task: e08c2002 ti: dc8dc000 task.ti: 53324000 [ 2580.740009] PC is at do_bad_area+0x34/0x8c [ 2580.744322] LR is at 0x20080193 [ 2580.747632] pc : [<c00188fc>] lr : [<20080193>] psr: 20080193 [ 2580.747632] sp : dc8de050 ip : dc8de000 fp : dc8de074 [ 2580.759715] r10: 00000020 r9 : e1a04000 r8 : dc8de148 [ 2580.765210] r7 : e1a04128 r6 : e1a04128 r5 : 00000005 r4 : dc8de148 [ 2580.772076] r3 : dc8de050 r2 : dc8de148 r1 : 00000005 r0 : e1a04000 [ 2580.778944] Flags: nzCv IRQs off FIQs on Mode SVC_32 ISA ARM Segment user [ 2580.786538] Control: 10c5387d Table: 9de94019 DAC: 00000015 [ 2580.792587] Process (pid: 0, stack limit = 0x53324238) [ 2580.798086] Stack: (0xdc8de050 to 0x53326000) [ 2580.802665] Backtrace: [ 2580.805260] [<c00188c8>] (do_bad_area) from [<c0611370>] (do_translation_fault+0x74/0xa8) [ 2580.813868] r7:e1a04128 r6:c0004000 r5:c0007868 r4:00000005 [ 2580.819856] [<c06112fc>] (do_translation_fault) from [<c00083bc>] (do_DataAbort+0x40/0xa0) [ 2580.828553] r6:c06112fc r5:00000005 r4:c088ebac r3:20080193 [ 2580.834537] [<c000837c>] (do_DataAbort) from [<c060f718>] (__dabt_svc+0x38/0x60) [ 2580.842321] Exception stack(0xdc8de148 to 0xdc8de190) [ 2580.847640] e140: e1a04000 00000005 dc8de288 dc8de190 dc8de288 00000005 [ 2580.856256] e160: e1a04128 e1a04128 dc8de288 e1a04000 00000020 dc8de1b4 dc8de000 dc8de190 [ 2580.864866] e180: 20080193 c00188fc 20080193 ffffffff [ 2580.870170] r8:dc8de288 r7:dc8de17c r6:ffffffff r5:20080193 r4:c00188fc [ 2580.877255] [<c00188c8>] (do_bad_area) from [<c0611370>] (do_translation_fault+0x74/0xa8) [ 2580.885851] r7:e1a04128 r6:c0004000 r5:c0007868 r4:00000005
All boards are using the same configuration and the same image. This is very difficult to see what is the problem. All cards have been electrically checked.
I do a intensive RAM test on the defect boards, but every tests works... Very strange.
Has somebody an idea or a good starting point ?
BR
Steve