This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Linux Boot issue when cold only custom DM365 design

We are seeing several of our custom boards failing at cold temperature.  The failure is at very repeatable places in the boot sequence.  Here is the console output during the event:

DM36x initialization passed!
TI UBL Version: 1.56
Watchguard Video (c) 2009. All rights reserved.
BootMode = SPI
Starting u-boot SPI FLASH Copy...
Valid Kernel not found in SPI FLASH...
   DONE
Jumping to entry point at 0x81080000.


U-Boot 1.3.4-svn31366 build 18 (May 20 2010 - 15:26:09)

DRAM:  128 MB
Using default environment

In:    serial
Out:   serial
Err:   serial
Ethernet PHY: GENERIC @ 0x02
Hit any key to stop autoboot:  1  0
TFTP from server 192.168.1.254; our IP address is 192.168.1.5
Filename '/tftpboot/uImage'.
Load address: 0x80700000
Loading: *#################################################################
     #################################################################
     #################################################################
     ################################################################
done
Bytes transferred = 1324192 (1434a0 hex)
## Booting kernel from Legacy Image at 80700000 ...
   Image Name:   Linux-2.6.18_pro500-davinci_evm-
   Image Type:   ARM Linux Kernel Image (uncompressed)
   Data Size:    1324128 Bytes =  1.3 MB
   Load Address: 80008000
   Entry Point:  80008000
   Verifying Checksum ... OK
   Loading Kernel Image ... OK
OK

Starting kernel ...

Uncompressing Linux............................................................................................ done, booting the kernel.
Linux version 2.6.18_pro500-davinci_evm-arm_v5t_le (bldguy@athena.watchguardvideo.local) (gcc version 4.2.0 20070126 (prerelease) (MontaVista 4.2.0-3.0.0.0702771 2007-03-10)) #1 PREEMPT Mon Oct 17 02:11:37 CDT 2011
CPU: ARM926EJ-S [41069265] revision 5 (ARMv5TEJ), cr=00053177
Machine: DaVinci DM365 EVM
Memory policy: ECC disabled, Data cache writeback
DaVinci DM0365 variant 0x8
PLL0: fixedrate: 27000000, commonrate: 121500000, vpssrate: 243000000
PLL0: vencrate_sd: 27000000, ddrrate: 243000000 mmcsdrate: 121500000
PLL1: armrate: 297000000, voicerate: 99000000, vencrate_hd: 74250000
CPU0: D VIVT write-back cache
CPU0: I cache: 16384 bytes, associativity 4, 32 byte lines, 128 sets
CPU0: D cache: 8192 bytes, associativity 4, 32 byte lines, 64 sets
Built 1 zonelists.  Total pages: 15360
Kernel command line: console=ttyS0,115200n8 mem=60M ip=192.168.1.5:192.168.1.254:192.168.1.254:255.255.255.0:hd:eth0:off root=/dev/nfs ro noinitrd nfsroot=192.168.1.254:/mnt/nfs,nolock,rsize=1024,wsize=1024 video=davincifb:vid0=OFF:vid1=OFF:osd0=720x576x16,4050K dm365_imp.oper_mode=1 davinci_capture.device_type=6
PID hash table entries: 256 (order: 8, 1024 bytes)
Clock event device timer0_0 configured with caps set: 07
Console: colour dummy device 80x30
Dentry cache hash table entries: 8192 (order: 3, 32768 bytes)
Inode-cache hash table entries: 4096 (order: 2, 16384 bytes)
Memory: 60MB = 60MB total
Memory: 57856KB available (2237K code, 515K data, 188K init)
Security Framework v1.0.0 initialized
Capability LSM initialized
Mount-cache hash table entries: 512
CPU: Testing write buffer coherency: ok
NET: Registered protocol family 16
DaVinci: 104 gpio irqs
MUX: initialized I2C_SCL
DM365 IPIPE initialized in Single Shot mode
Generic PHY: Registered new driver
ch0 default output "COMPOSITE", mode "NTSC"
VPBE Encoder Initialized
LogicPD encoder initialized
Avnetlcd encoder initialized
dm365_afew_hw_init
NET: Registered protocol family 2
IP route cache hash table entries: 512 (order: -1, 2048 bytes)
TCP established hash table entries: 2048 (order: 1, 8192 bytes)
TCP bind hash table entries: 1024 (order: 0, 4096 bytes)
TCP: Hash tables configured (established 2048 bind 1024)
TCP reno registered
Initializing Cryptographic API
io scheduler noop registered
io scheduler anticipatory registered (default)
davincifb davincifb.0: dm_osd0_fb: Initial window configuration is invalid.
davincifb davincifb.0: dm_osd0_fb: 720x576x16@0,0 with framebuffer size 4050KB
davincifb davincifb.0: dm_vid0_fb: 0x0x16@0,0 with framebuffer size 1020KB
davincifb davincifb.0: dm_osd1_fb: 720x480x4@0,0 with framebuffer size 675KB
davincifb davincifb.0: dm_vid1_fb: 0x0x16@0,0 with framebuffer size 1020KB
DAVINCI-WDT: DaVinci Watchdog Timer: heartbeat 60 sec
facedetect major#: 253, minor# 0
facedetect driver registered
imp serializer initialized
davinci_previewer initialized
davinci_resizer initialized
Serial: 8250/16550 driver $Revision: 1.90 $ 2 ports, IRQ sharing disabled
serial8250.0: ttyS0 at MMIO map 0x1c20000 mem 0xfbc20000 (irq = 40) is a 16550A
RAMDISK driver initialized: 1 RAM disks of 32768K size 1024 blocksize
Davinci EMAC MII Bus: probed
MAC address is 02:00:00:00:00:05
TI DaVinci EMAC Linux version updated 4.0
netconsole: not configured, aborting
Linux video capture interface: v2.00
vpfe_init
vpfe_probe
no vid2 buffer allocated
no vid3 buffer allocated
Trying to register davinci display video device.
layer=c3ac6400,layer->video_dev=c3ac6560
Trying to register davinci display video device.
layer=c3ac6200,layer->video_dev=c3ac6360
davinci_init:DaVinci V4L2 Display Driver V1.0 loaded
af major#: 250, minor# 0
AF Driver initialized
aew major#: 249, minor# 0
AEW Driver initialized
vpfe ccdc capture vpfe ccdc capture.1: vpif_register_decoder: decoder = ATLAS-SD
vpfe ccdc capture vpfe ccdc capture.1: vpif_register_decoder: decoder = ATLAS-HD
dm_spi.0: davinci SPI Controller driver at 0xc4004000 (irq = 42) use_dma=0
Advanced Linux Sound Architecture Driver Version 1.0.12rc1 (Thu Jun 22 13:55:50 2006 UTC).
ASoC version 0.13.1
CS4251 Audio Codec 0.1
asoc: CS4251 <-> davinci-i2s mapping ok
ALSA device list:
  #0: DaVinci DM365 ATLAS (cs4251)
TCP bic registered
NET: Registered protocol family 1
NET: Registered protocol family 17
drivers/rtc/hctosys.c: unable to open rtc device (rtc0)
Time: timer0_1 clocksource has been installed.
Clock event device timer0_0 configured with caps set: 08
Switched to high resolution mode on CPU 0
1:01 not found
IP-Config: Complete:
      device=eth0, addr=192.168.1.5, mask=255.255.255.0, gw=192.168.1.254,
     host=hd, domain=, nis-domain=(none),
     bootserver=192.168.1.254, rootserver=192.168.1.254, rootpath=
Looking up port of RPC 100003/2 on 192.168.1.254
Looking up port of RPC 100005/1 on 192.168.1.254
VFS: Mounted root (nfs filesystem) readonly.
Freeing init memory: 188K
INIT: version 2.86 booting
0
Starting the hotplug events dispatcher: udevd.
Synthesizing the initial hotplug events...done.
Waiting for /dev to be fully populated...done.
0
Activating swap...done.
Remounting root filesystem...done.
Starting mounting local filesystems: mount none on /var/run type tmpfs (rw)
none on /tmp type tmpfs (rw)
Setting up networking ...ln: /etc/network/run: Read-only file system
Starting hotplug subsystem:
   pci     
   pci      [success]
   usb     
   usb      [success]
   isapnp  
   isapnp   [success]
   ide     
   ide      [success]
   input   
   input    [success]
   scsi    
   scsi     [success]
done.
Starting portmap daemon: portmap.
Watchguard Video - loading modulesCMEMK module: built on Oct 17 2011 at 02:21:11
  Reference Linux version 2.6.18
  File /home/bldguy/atlas/builds/dione/build1/sandBox/ltib/rpm/BUILD/wgv-1.4.0/vpApps/dvsdk/linuxutils_2_24_03/packages/ti/sdo/linuxutils/cmem/src/module/cmemk.c
allocated heap buffer 0xc5000000 of size 0x3a6000
CMEM Range Overlaps Kernel Physical - allowing overlap
CMEM phys_start (0x1000) overlaps kernel (0x80000000 -> 0x83c00000)
cmemk initialized
IRQK module: built on Oct 17 2011 at 02:21:13
  Reference Linux version 2.6.18
  File /home/bldguy/atlas/builds/dione/build1/sandBox/ltib/rpm/BUILD/wgv-1.4.0/vpApps/dvsdk/linuxutils_2_24_03/packages/ti/sdo/linuxutils/irq/src/module/irqk.c
irqk initialized
EDMAK module: built on Oct 17 2011 at 02:21:13
  Reference Linux version 2.6.18
  File /home/bldguy/atlas/builds/dione/build1/sandBox/ltib/rpm/BUILD/wgv-1.4.0/vpApps/dvsdk/linuxutils_2_24_03/packages/ti/sdo/linuxutils/edma/src/module/edmak.c
Initializing the module 0
Registered char device regrw with major number 244.
LINX: version 2.4.2
LINX: Compile-time configuration:
LINX: Max number of LINX sockets 512
LINX: Max number of attach references 1024
LINX: Max number of remote links 32
LINX: Max number of communicating sockets over a link 1024
LINX: Max number of timeout references 1024
NET: Registered protocol family 29
Watchguard Video - modules loaded
Watchguard Video: Configuring Linxmkethcon: created connection 'ethcm/conToMp'.
Now use 'mklink -c ethcm/conToMp ...' to create a link.
Watchguard Video: Linx configured
LINX: Hunt path "toMp/" available (RLNH version:2)
Found Video Device Type 6, using /tmp/hd_encoder_config.xml
Your configuration file uses an 13 hd kern.debug kernel[]: </vpbe_encoder_setmode>
INIT: Entering runlevel: 3
1970:00:00:13 hd kern.debug kernel[]: </vpbe_encoder_setoutput>
1970:00:00:13 hd kern.debug kernel[]: Start of vpbe_encoder_setmode..
1970:00:00:13 hd kern.debug kernel[]: </vpbe_encoder_setmode>
1970:00:00:13 hd kern.debug kernel[]: <vpbe_encoder_getoutput>
1970:00:00:13 hd kern.debug kernel[]: </vpbe_encoder_getoutput>
1970:00:00:13 hd kern.debug kernel[]: <vpbe_encoder_getmode>
1970:00:00:13 hd kern.debug kernel[]: <vpbe_encoder_getmode/>
1970:00:00:13 hd kern.notice kernel[]: VPBE Encoder Initialized
1970:00:00:13 hd kern.notice kernel[]: LogicPD encoder initialized
1970:00:00:13 hd kern.notice kernel[]: Avnetlcd encoder initialized
1970:00:00:13 hd kern.notice kernel[]: dm365_afew_hw_init
1970:00:00:13 hd kern.info kernel[]: NET: Registered protocol family 2
1970:00:00:13 hd kern.warning kernel[]: IP route cache hash table entries: 512 (order: -1, 2048 bytes)
1970:00:00:13 hd kern.warning kernel[]: TCP established hash table entries: 2048 (order: 1, 8192 bytes)
1970:00:00:13 hd kern.warning kernel[]: TCP bind hash table entries: 1024 (order: 0, 4096 bytes)
1970:00:00:13 hd kern.info kernel[]: TCP: Hash tables configured (established 2048 bind 1024)
1970:00:00:13 hd kern.info kernel[]: TCP reno registered
1970:00:00:13 hd kern.info kernel[]: Initializing Cryptographic API
1970:00:00:13 hd kern.info kernel[]: io scheduler noop registered
1970:00:00:13 hd kern.info kernel[]: io scheduler anticipatory registered (default)
1970:00:00:13 hd kern.debug kernel[]: <vpbe_encoder_getmode>
1970:00:00:13 hd kern.debug kernel[]: <vpbe_encoder_getmode/>
1970:00:00:13 hd kern.debug kernel[]: <vpbe_encoder_getmode>
1970:00:00:13 hd kern.debug kernel[]: <vpbe_encoder_getmode/>
1970:00:00:13 hd kern.warning kernel[]: davincifb davincifb.0: dm_osd0_fb: Initial window configuration is invalid.
1970:00:00:13 hd daemon.info init[]: Entering runlevel: 3
1970:00:00:13 hd kern.info kernel[]: davincifb davincifb.0: dm_osd0_fb: 720x576x16@0,0 with framebuffer size 4050KB
1970:00:00:13 hd kern.info kernel[]: davincifb davincifb.0: dm_vid0_fb: 0x0x16@0,0 with framebuffer size 1020KB
1970:00:00:13 hd kern.debug kernel[]: <vpbe_encoder_getmode>
1970:00:00:13 hd kern.debug kernel[]: <vpbe_encoder_getmode/>
1970:00:00:13 hd kern.debug kernel[]: <vpbe_encoder_getmode>
1970:00:00:13 hd kern.debug kernel[]: <vpbe_encoder_getmode/>
1970:00:00:13 hd kern.info kernel[]: davincifb davincifb.0: dm_osd1_fb: 720x480x4@0,0 with framebuffer size 675KB
1970:00:00:13 hd kern.info kernel[]: davincifb davincifb.0: dm_vid1_fb: 0x0x16@0,0 with framebuffer size 1020KB
1970:00:00:13 hd kern.info kernel[]: DAVINCI-WDT: DaVinci Watchdog Timer: heartbeat 60 sec
1970:00:00:13 hd kern.info kernel[]: facedetect major#: 253, minor# 0
1970:00:00:13 hd kern.info kernel[]: facedetect driver registered
1970:00:00:13 hd kern.notice kernel[]: imp serializer initialized
1970:00:00:13 hd kern.notice kernel[]: davinci_previewer initialized
1970:00:00:13 hd kern.notice kernel[]: davinci_resizer initialized
1970:00:00:13 hd kern.info kernel[]: Serial: 8250/16550 driver $Revision: 1.90 $ 2 ports, IRQ sharing disabled
1970:00:00:13 hd kern.info kernel[]: serial8250.0: ttyS0 at MMIO map 0x1c20000 mem 0xfbc20000 (irq = 40) is a 16550A
1970:00:00:13 hd kern.warning kernel[]: RAMDISK driver initialized: 1 RAM disks of 32768K size 1024 blocksize
1970:00:00:13 hd kern.info kernel[]: Davinci EMAC MII Bus: probed
1970:00:00:13 hd kern.info kernel[]: MAC address is 02:00:00:00:00:05
1970:00:00:13 hd kern.warning kernel[]: TI DaVinci EMAC Linux version updated 4.0
1970:00:00:14 hd kern.warning kernel[]: netconsole: not configured, aborting
1970:00:00:14 hd kern.info kernel[]: Linux video capture interface: v2.00
1970:00:00:14 hd kern.notice kernel[]: vpfe_init
1970:00:00:14 hd kern.notice kernel[]: vpfe_probe
1970:00:00:14 hd kern.debug kernel[]: <davinci_display_init>
1970:00:00:14 hd kern.err kernel[]: no vid2 buffer allocated
1970:00:00:14 hd kern.err kernel[]: no vid3 buffer allocated
1970:00:00:14 hd kern.notice kernel[]: Trying to register davinci display video device.
1970:00:00:14 hd kern.notice kernel[]: layer=c3ac6400,layer->video_dev=c3ac6560
1970:00:00:14 hd kern.notice kernel[]: Trying to register davinci display video device.
1970:00:00:14 hd kern.notice kernel[]: layer=c3ac6200,layer->video_dev=c3ac6360
1970:00:00:14 hd kern.notice kernel[]: davinci_init:DaVinci V4L2 Display Driver V1.0 loaded
1970:00:00:14 hd kern.debug kernel[]: </davinci_init>
1970:00:00:14 hd kern.info kernel[]: af major#: 250, minor# 0
1970:00:00:14 hd kern.err kernel[]: AF Driver initialized
1970:00:00:14 hd kern.info kernel[]: aew major#: 249, minor# 0
1970:00:00:14 hd kern.notice kernel[]: AEW Driver initialized
1970:00:00:14 hd kern.notice kernel[]: vpfe ccdc capture vpfe ccdc capture.1: vpif_register_decoder: decoder = ATLAS-SD
1970:00:00:14 hd kern.debug kernel[]: wgvatlashd: wgvatlashd_init() 527: number of channels = 1
<SUP EncodeCtrl> EncodeCtrl.cc:580 Waiting for Init cmd...
1970:00:00:14 hd kern.notice kernel[]: vpfe ccdc capture vpfe ccdc capture.1: vpif_register_decoder: decoder = ATLAS-HD
1970:00:00:14 hd kern.debug kernel[]: wgvatlashd: wgvatlashd_enumstd() Enter ...
1970:00:00:14 hd kern.info kernel[]: dm_spi.0: davinci SPI Controller driver at 0xc4004000 (irq = 42) use_dma=0
1970:00:00:14 hd kern.info kernel[]: Advanced Linux Sound Architecture Driver Version 1.0.12rc1 (Thu Jun 22 13:55:50 2006 UTC).
1970:00:00:14 hd kern.info kernel[]: ASoC version 0.13.1
1970:00:00:14 hd kern.info kernel[]: CS4251 Audio Codec 0.1
1970:00:00:14 hd kern.info kernel[]: asoc: CS4251 <-> davinci-i2s mapping ok
1970:00:00:14 hd kern.info kernel[]: ALSA device list:
1970:00:00:14 hd kern.info kernel[]: #0: DaVinci DM365 ATLAS (cs4251)
1970:00:00:14 hd kern.info kernel[]: TCP bic registered
1970:00:00:14 hd kern.info kernel[]: NET: Registered protocol family 1
1970:00:00:14 hd kern.info kernel[]: NET: Registered protocol family 17
1970:00:00:14 hd kern.warning kernel[]: drivers/rtc/hctosys.c: unable to open rtc device (rtc0)
1970:00:00:14 hd kern.info kernel[]: Time: timer0_1 clocksource has been installed.
1970:00:00:14 hd kern.info kernel[]: Clock event device timer0_0 configured with caps set: 08
1970:00:00:14 hd kern.info kernel[]: Switched to high resolution mode on CPU 0
1970:00:00:14 hd kern.err kernel[]: 1:01 not found
1970:00:00:14 hd kern.warning kernel[]: IP-Config: Complete:
1970:00:00:14 hd kern.warning kernel[]: device=eth0, addr=192.168.1.5, mask=255.255.255.0, gw=192.168.1.254,
1970:00:00:14 hd kern.warning kernel[]: host=hd, domain=, nis-domain=(none),
1970:00:00:14 hd kern.warning kernel[]: bootserver=192.168.1.254, rootserver=192.168.1.254, rootpath=
1970:00:00:14 hd kern.notice kernel[]: Looking up port of RPC 100003/2 on 192.168.1.254
1970:00:00:14 hd kern.notice kernel[]: Looking up port of RPC 100005/1 on 192.168.1.254
1970:00:00:14 hd kern.warning kernel[]: VFS: Mounted root (nfs filesystem) readonly.
1970:00:00:14 hd kern.info kernel[]: Freeing init memory: 188K
1970:00:00:14 hd kern.info kernel[]: CMEMK module: built on Oct 17 2011 at 02:21:11
1970:00:00:14 hd kern.info kernel[]: Reference Linux version 2.6.18
1970:00:00:14 hd kern.info kernel[]: File /home/bldguy/atlas/builds/dione/build1/sandBox/ltib/rpm/BUILD/wgv-1.4.0/vpApps/dvsdk/linuxutils_2_24_03/packages/ti/sdo/linuxutils/cmem/src/module/cmemk.c
1970:00:00:14 hd kern.info kernel[]: allocated heap buffer 0xc5000000 of size 0x3a6000
1970:00:00:14 hd kern.warning kernel[]: CMEM Range Overlaps Kernel Physical - allowing overlap
1970:00:00:14 hd kern.warning kernel[]: CMEM phys_start (0x1000) overlaps kernel (0x80000000 -> 0x83c00000)
1970:00:00:14 hd kern.info kernel[]: cmemk initialized
1970:00:00:14 hd kern.info kernel[]: IRQK module: built on Oct 17 2011 at 02:21:13
1970:00:00:14 hd kern.info kernel[]: Reference Linux version 2.6.18
1970:00:00:14 hd kern.info kernel[]: File /home/bldguy/atlas/builds/dione/build1/sandBox/ltib/rpm/BUILD/wgv-1.4.0/vpApps/dvsdk/linuxutils_2_24_03/packages/ti/sdo/linuxutils/irq/src/module/irqk.c
1970:00:00:14 hd kern.info kernel[]: irqk initialized
1970:00:00:14 hd kern.info kernel[]: EDMAK module: built on Oct 17 2011 at 02:21:13
1970:00:00:14 hd kern.info kernel[]: Reference Linux version 2.6.18
1970:00:00:14 hd kern.info kernel[]: File /home/bldguy/atlas/builds/dione/build1/sandBox/ltib/rpm/BUILD/wgv-1.4.0/vpApps/dvsdk/linuxutils_2_24_03/packages/ti/sdo/linuxutils/edma/src/module/edmak.c
1970:00:00:14 hd kern.warning kernel[]: Initializing the module 0
1970:00:00:14 hd kern.warning kernel[]: Registered char device regrw with major number 244.
1970:00:00:14 hd kern.info kernel[]: LINX: version 2.4.2
1970:00:00:14 hd kern.info kernel[]: LINX: Compile-time configuration:
1970:00:00:14 hd kern.info kernel[]: LINX: Max number of LINX sockets 512
1970:00:00:14 hd kern.info kernel[]: LINX: Max number of attach references 1024
1970:00:00:14 hd kern.info kernel[]: LINX: Max number of remote links 32
1970:00:00:14 hd kern.info kernel[]: LINX: Max number of communicating sockets over a link 1024
1970:00:00:14 hd kern.info kernel[]: LINX: Max number of timeout references 1024
1970:00:00:14 hd kern.info kernel[]: NET: Registered protocol family 29
19Starting internet superserver: inetd

**** This is where it hangs

You will notice that we have an older version of Linux.  We have some third party application SW packages that are incompatible with newer versions, plus we have our hardware into production, so upgrading our Linux version is on non-trivial change to our system.  I have noticed several replies to posts that recommend this as the first action.

On one occasion, our SW guy was able to capture this event while hooked to the debugger.  After the board warmed up, he was able to get a Linux core dump:

********

IP-Config: Complete:

      device=eth0, addr=192.168.1.5, mask=255.255.255.0, gw=192.168.1.254,

     host=hd, domain=, nis-domain=(none),

     bootserver=192.168.1.254, rootserver=192.168.1.254, rootpath= Looking up port of RPC 100003/2 on 192.168.1.254 Looking up port of RPC 100005/1 on 192.168.1.254

VFS: Mounted root (nfs filesystem) readonly.

Freeing init memory: 188K

Bad mode in data abort handler detected: mode ABT_32 Internal error: Oops - bad mode: 0 [#1] Modules linked in:

CPU: 0

[TRAP to debugger (not really shown on the screen)] PC is at __clear_user+0x34/0x64 LR is at padzero+0x44/0x5c

pc : [<c010b18c>]    lr : [<c00c8b3c>]    Not tainted

sp : c0393dc0  ip : 00000000  fp : c0393e1c

r10: 00016aec  r9 : c3848300  r8 : c38603c0

r7 : c0392000  r6 : 000168dc  r5 : 00016aec  r4 : 000168dc

r3 : 00000000  r2 : 00000000  r1 : 0000071c  r0 : 000168dc

Flags: nzCv  IRQs off  FIQs on  Mode ABT_32  Segment user

Control: 5317F

Table: 80020000  DAC: 00000015

Process init (pid: 1, stack limit = 0xc0392258)

Stack: (0xc0393dc0 to 0xc0394000)

3dc0: 000168dc 0000071c 00000000 00000000 000168dc 00016aec 000168dc

c0392000

3de0: c38603c0 c3848300 00016aec c0393e1c 00000000 c0393dc0 c00c8b3c c010b18c

3e00: 20000097 ffffffff 00000724 c00c8b3c c0393ec4 c0393e20 c00c969c

c00c8b08

3e20: 00001812 00000000 c0273914 c0393f28 c3867200 c3ab1ea0 00000003

00000000

3e40: 00008000 00000000 00000001 c3878a60 00000002 00000000 c037ed60

00008000

3e60: 0000e444 00016444 000168dc 00000001 c384e2d8 c386727c 00000000

00000000

3e80: 00000007 00000003 0001fff1 c0029ff1 0000000b 00000fd4 c02ea520

c0392000

3ea0: c027475c 00000000 c3867200 c00c8ccc 00000000 fffffffe c0393efc

c0393ec8

3ec0: c00a3b30 c00c8cdc c0392000 c0393f28 00000000 c3867200 c02d9794

00000000

3ee0: c026e490 c026e524 00000000 c0393f28 c0393f24 c0393f00 c00a5778

c00a3a24

3f00: c022b020 c026e524 c026e490 c0393f28 c0393fa0 c0032604 c0393f8c

c0393f28

3f20: c003b690 c00a564c 00000000 00000000 00000000 00000000 00000000

00000000

3f40: 00000000 00000000 00000000 00000000 00000000 00000000 00000000

00000000

3f60: 00000000 00000000 00000000 00000000 c02ce928 c003212c c0392000

00000000

3f80: c0393f9c c0393f90 c0037020 c003b660 c0393ff4 c0393fa0 c0037224

c0037010

3fa0: 00000000 c0393fb0 c0037e84 c0049504 00000000 00000000 c003702c

c0050684

3fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000

00000000

3fe0: 00000000 00000000 00000000 c0393ff8 c0050684 c003703c 45545359 73253d4d

Backtrace:

[<c00c8af8>] (padzero+0x0/0x5c) from [<c00c969c>] (load_elf_binary

+0x9d0/0x15c4)

[<c00c8ccc>] (load_elf_binary+0x0/0x15c4) from [<c00a3b30>]

(search_binary_handler+0x11c/0x378)

[<c00a3a14>] (search_binary_handler+0x0/0x378) from [<c00a5778>]

(do_execve+0x13c/0x22c)

[<c00a563c>] (do_execve+0x0/0x22c) from [<c003b690>] (execve+0x40/0x88) [<c003b650>] (execve+0x0/0x88) from [<c0037020>] (__init_end+0x20/0x2c)

 r7 = 00000000  r6 = C0392000  r5 = C003212C  r4 = C02CE928 [<c0037000>] (__init_end+0x0/0x2c) from [<c0037224>] (init+0x1f8/0x27c) [<c003702c>] (init+0x0/0x27c) from [<c0050684>] (do_exit+0x0/0x9cc)

Code: b4e02001 e26cc004 e041100c e2511008 (54a02004)  <0>Kernel panic - not syncing: Attempted to kill init!

*********

For people who are more Linux literate, does these messages point to anything in particular?

  • We have gotten beyond this stage of our issue, it was a configuration issue on the memory controller.  Once this was resolved, however, our application no longer encodes video at 720p.  We have a second device on our board that with all the same changes is able to encode two SD streams.  Our application also does audio encoding and stream generation on the ARM processor core.

    The change that fixed the boot issue and created the encoding issue was to set SDTIM2 from 0x04221C72 to 0x4221C722 or (0x4421C722).  How does the timing in this register affect the encoding engine?

  • Correction, at least some of the time both of the DM365 fail to encode video, one configured for a single 720 p and the second configured for two SD video streams.

  • Hi Michael,

    Make sure that all DDR registers in UBL device.c are set as per your DDR module specification.
    Then you can also try to lower DDR frequency with UBL PLLs.


    Regards.

  • DDR registers have been verified.  What is the purpose of lowering the DDR frequency?  The ARM processor is fully booting to Linux.  It would be helpful if you would provide some rational / background for your suggestions.  Given the slow response time of this Forum, a more comprehensive answer would be appreciated.

  • Hi Michael,

    I also had a board then coninuously froze after some time.
    But only when DDR is fully loaded - when encoding, not just running Linux.
    After i lower frequenc by 5%, then it works well.

    Since you also have frozen board issue, you may check if lowering DDR frequency can help.
    If you just divide it by half for the first test, you can see if DDR is the reason.
    To do that increase by 1 DDR  PLLx->PLLDIVx  (PLL1->PLLDIV7 or  PLL2->PLLDIV3) in UBL device.c.

    You will not get expected framerate anymore with lower frequency, but board should not freeze anymore, if DDR is the reason.

    Regards.

  • I am not sure you understand the symptoms:

    1.  With the SDTIM2 register not configured correctly, our board was booted and ran correctly MOST of the time, only failing at extremely low temperatures (> -20 C typically).  With this configuration, we were encoding a 720 P video stream, performed AAC audio encoding, and generate a transport stream.  When it fails, it hangs as indicated in my first post.

    2.  With the SDTIM2 register configured correctly, out board booted at all temperatures, but the encoder is not generating our transport stream.  The board is not "frozen", the ARM processor responds to our console and other applications run.

    It is possible that other functions inside the DM365 are "frozen" but instead of just shotgunning solutions, I am asking for registers to check to determine if functions are running.  Lowering our encode stream frame rate is not an option.

    I need solutions to #2 above, we feel we understand what the issue is with #1.

  • I noticed from your first post that your frequencies are ARM=297, DDR=243.
    When you changed SDTIM2, DDR performance may dropped. You still have room to increase DDR frequency to 270MHz.
    So why don't you try with  ARM297/DDR270 frequencies.
    You have example of how to configure PLLs for that speed in UBL from latest flash-utils in DaVinci-PSP-SDK,
    according to  "#define ARM297_DDR270_OSC24 "in device.c file.
    Keep in mind that you have to change other DDR timing and configuration registers in UBL if you change DDR frequency.

    For #2. if you have source of capture-encode-streamming application, you can put printf of every encoded buffer length (just after capture buffer is encoded),
    so you will at least know if video is encoded.
    If it isn't you can check if it is properly captured by recording a few captured raw  yuv files to NFS filesystem.
    Or you can just run the "top" command from console to check application processor utilisation.

    Regards.

  • Hi Marko,

    I'm working with Mike on this issue here at Watchguard, and I've been able to determine that it appears to be the resizer that is causing the video lockup problem.   All of our video frames go through the resizer to get converted from their input 422 pixel format to the ‘IPIPE_YUV420SP’ format.

    The cause of the lockup is that our application makes the following call but never returns: 

            ioctl (nResizerID_, RSZ_RESIZE, &tResizerParams)

    In the kernel, one of the first things that is done is to acquire a mutex lock.  With kernel logging I’ve confirmed that the mutex is successfully locked, and it is sometime after this point that the kernel apparently waits forever for some operation to complete.

    It should be noted that we are able to successfully process some number of frames before the lockup occurs.  I've seen as few as 9 frames, and as many as over 3000 frames before the lockup occurs.  A reboot clears the problem.

    I’ve included our resizer configuration parameters below.  I’ll continue trying to determine the cause of the failure, but if you have any suggestions on what might be going on or we should look at next, please let us know. 

    Regards,

    Eric

     

    The resizer is opened with:

    (*pResizerId_) = open (“/dev/davinci_resizer”, O_RDWR);

     

    It is then configured in single shot mode with:

    nOperationMode = IMP_MODE_SINGLE_SHOT;

    ioctl ((*pResizerId_), RSZ_S_OPER_MODE, &nOperationMode);

     

    For each video frame, the resizer is configured with:

    ioctl (nResizerId_, RSZ_S_CONFIG, &tResizeChannelConfig);

     

    input.image_width = 1280

    input.image_height = 720

    input.vst = 0

    input.hst = 0

    input.ppln = 1288

    input.lpfr = 730

    input.pix_fmt = 5

     

    output1.width = 1280

    output1.height = 720

    output1.enable = 1

    output1.pix_fmt = 9 (IPIPE_YUV420SP)

    output1.h_flip = 0

    output1.v_flip = 0

    output1.vst_y = 0

    output1.vst_c = 0

    output2.enable = 0

     

     …and then resized with:

    ioctl (nResizerID_, RSZ_RESIZE, &tResizerParams);        NOTE:  This is the call that never returns.

     

    in_buff.offset = 0x8620c000

    out_buff1.offset = 0x84f14000

    in_buff.size = 1843200

    out_buff1.size = 1382400 

     

  • Hi Eric,

    Maybe you could try examples from:
    dvsdk_4.02/dvsdk_dm365_4_02_00_06/psp/linux-driver-examples-psp03.01.01.38/imp-prev-rsz/dm365/
    There you can find  resizer configurations. You can try if any of those examples work on your board.

    Regards.

  • Hi Eric

    Where you able to fix the resizer freeze issue. if so what changes did you do.


    thanks

    Ashok