Linux/AM3351: System time and intrrupt error

user4773977

Part Number: AM3351

Tool/software: Linux

1.We used the AM3351 chip, and the SDK version is ' Ti - processor-sdk-linux-am335x-evm-03.00.00.04'(Kernel version :linux 4.412).The RTC clock source of our board is internal.We used our board for long run testing. After 30 hours or longer, we found that the system clock will go wrong ,console could not connect to board，external socket connection disconnected，and the reboot command is not working.But we could use telent to connect our board.

2.We used date command to to view system time，found that the system time will go ahead 180s form a cetain time value，and then bounce back to this certain time value.It awlays cycled back and forth as this.I have written a module and app to view xtime, found Xtime has been unable to update,and has been stuck in a certain value.The part of log about that the time cycle back and forth within the 180s ,is as follow:

root@opera:~# date
Tue Feb 7 18:34:51 UTC 2017
root@opera:~# date
Tue Feb 7 18:34:55 UTC 2017
root@opera:~# date
Tue Feb 7 18:34:57 UTC 2017
root@opera:~# date
Tue Feb 7 18:35:00 UTC 2017
root@opera:~# date
Tue Feb 7 18:35:02 UTC 2017
root@opera:~# date
Tue Feb 7 18:35:04 UTC 2017
root@opera:~# date
Tue Feb 7 18:35:59 UTC 2017
root@opera:~# date
Tue Feb 7 18:36:18 UTC 2017
root@opera:~# date
Tue Feb 7 18:36:34 UTC 2017
root@opera:~# date
Tue Feb 7 18:36:55 UTC 2017
root@opera:~# date
Tue Feb 7 18:37:00 UTC 2017
root@opera:~# date
Tue Feb 7 18:37:03 UTC 2017
root@opera:~# date
Tue Feb 7 18:37:20 UTC 2017
root@opera:~# date
Tue Feb 7 18:34:23 UTC 2017

3. I used the 'cat /proc/interrupt' command to view the status of the system interrupts many times ,and found that the number of interruptions of 'gp_timer' and' 44e09000.serial' had been unchanged.Normally these two values are constantly increasing.System interrupts are as follows:

root@opera:~# cat /proc/interrupts

CPU0

16: 5315316 INTC 68 Level gp_timer

19: 1 INTC 78 Level wkup_m3_txev

20: 1176 INTC 12 Level 49000000.edma_ccint

22: 0 INTC 14 Level 49000000.edma_ccerrint

26: 0 INTC 96 Level 44e07000.gpio

33: 0 44e07000.gpio 6 Edge 48060000.mmc cd

59: 0 INTC 98 Level gpio1_9

92: 0 INTC 32 Level gpio2_25

125: 0 INTC 62 Level 481ae000.gpio

158: 1281 INTC 72 Level 44e09000.serial

159: 4 INTC 70 Level 44e0b000.i2c

160: 0 INTC 30 Level 4819c000.i2c

161: 13 INTC 64 Level mmc0

162: 11 INTC 28 Level mmc1

164: 0 INTC 77 Level wkup_m3

170: 0 INTC 75 Level rtc0

171: 0 INTC 76 Level rtc0

174: 170642 INTC 41 Level 4a100000.ethernet

175: 4649 INTC 42 Level 4a100000.ethernet

178: 573 INTC 4 Level 48080000.elm

179: 0 INTC 100 Level gpmc

180: 0 INTC 109 Level 53100000.sham

184: 0 INTC 111 Level 48310000.rng

186: 12093 INTC 18 Level musb-hdrc.0.auto

187: 8 INTC 19 Level musb-hdrc.1.auto

188: 0 INTC 17 Level 47400000.dma-controller

Err: 0

root@opera:~# cat /proc/interrupts

CPU0

16: 5315316 INTC 68 Level gp_timer

19: 1 INTC 78 Level wkup_m3_txev

20: 1176 INTC 12 Level 49000000.edma_ccint

22: 0 INTC 14 Level 49000000.edma_ccerrint

26: 0 INTC 96 Level 44e07000.gpio

33: 0 44e07000.gpio 6 Edge 48060000.mmc cd

59: 0 INTC 98 Level gpio1_9

92: 0 INTC 32 Level gpio2_25

125: 0 INTC 62 Level 481ae000.gpio

158: 1281 INTC 72 Level 44e09000.serial

159: 4 INTC 70 Level 44e0b000.i2c

160: 0 INTC 30 Level 4819c000.i2c

161: 13 INTC 64 Level mmc0

162: 11 INTC 28 Level mmc1

164: 0 INTC 77 Level wkup_m3

170: 0 INTC 75 Level rtc0

171: 0 INTC 76 Level rtc0

174: 170692 INTC 41 Level 4a100000.ethernet

175: 4653 INTC 42 Level 4a100000.ethernet

178: 573 INTC 4 Level 48080000.elm

179: 0 INTC 100 Level gpmc

180: 0 INTC 109 Level 53100000.sham

184: 0 INTC 111 Level 48310000.rng

186: 12093 INTC 18 Level musb-hdrc.0.auto

187: 8 INTC 19 Level musb-hdrc.1.auto

188: 0 INTC 17 Level 47400000.dma-controller

Err: 0

So I doubt that the kernel interrupt system is a problem, which led to the system clock and console can not be normal to enter the interrupt。Finally, the problem described in Item 1 is generated.

4.This is the DTB of the relevant documents, please help check whether the DTB configuration caused the system clock interrupt error.

5001.am335x-dts.tar.gz

Please help solve the problem.Thanks very much!

over 7 years ago

0 Cvetolin Shulev-XID over 7 years ago

TI__Guru 65405 points

Hi user4773977,

The described issue is not known and needs deep investigation. But at first I would like to ask do you have NTP configuration? And could you post NTP settings?

BR
Tsvetolin Shulev

0 user4773977 over 7 years ago in reply to Cvetolin Shulev-XID

Prodigy 30 points

Hi Tsvetolin Shulev，
we do not have ntp configuration.we use our own ntp client that we wrote ourselves.our ntpclient connects with NTP server over 'socket' , and get time form it.And then use the gettimeofday founction to write the time we get from NTP server in system time.Our ntpclient will only run once after the system have started.So I don't think this issue has anything to do with ntpclient.
In addition，when the issue occured，we could not connet the board over UART,but could use telent to connect our board.When we login the system of board ,use the 'cat /proc/interrupt' command to view the status of the system interrupts ,and found that the interrupts of gp_timer stoped.That caused the systime wrong. I have written a module and app to view xtime, found Xtime has been unable to update.I do not know what caused that the interrupts of gp_timer stoped.

Things are urgent!
Please help me!
Thanks!

0 Lu Brand over 7 years ago in reply to Cvetolin Shulev-XID

Prodigy 10 points

It looks like the INT system is wrong now!

0 Wayne Kuo over 7 years ago in reply to Lu Brand

Intellectual 870 points

Hi Brand,

Is this issue solved?
What do you mean "INT system is wrong"? Any details?

BR,
Wayne

0 user4773977 over 7 years ago in reply to Wayne Kuo

Prodigy 30 points

Hi Wayne,
This issue is not solved.
When we login the system of board ,use the 'cat /proc/interrupt' command to view the status of the system interrupts ,and found that the interrupts of gp_timer stoped.That caused the system time wrong.

0 Hamish Guthrie over 6 years ago in reply to user4773977

Prodigy 70 points

Hi All,

I am observing something similar on out custom AM335x board. This does not happen often and is really hard to reproduce, basically we have to wait until certain parts of the system fail to respond.

The main symptom we see when the device gets into this state is that it becomes unresponsive in certain time windows. During the 'unresponsive window', I cannot issue any commands at all. I have an ssh session open to the device, and during the 'responsive window' I am able to issue certain commands, for example the date command. I can issue the date command repeatedly until the system 'locks up', after the system has locked up I can still type in commands, but I get no response over my ssh session, then when the system transitions into the 'responsive window', it immediately responds to the last command I issued, in this case the date command, and I see the time jumping back 62 seconds from the last date command.

In addition, I see that the gp_timer interrupt count in /proc/interrupts NEVER increases, however, on a system running normally this increases dramatically.

It also appears as though the system timer has frozen, if I issue a sleep 1 command, it never exits. Also, the serial console appears to be dead in this state. I am also not able to log into the device with a separate ssh session.

I am not sure if I have missed any thread describing this issue, this is the only thread that describes roughly what I observe.

Thanks in advance for any further insight

Processors

Processors forum

Linux/AM3351: System time and intrrupt error