This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Linux/TMS320DM8168: Stuck at Starting Kernel after Multiple Reset Cycles

Part Number: TMS320DM8168

Tool/software: Linux

Hello,

The Problem:

When applying multiple hardware resets, the board comes up after the last reset, printing up to "Starting kernel ...", and nothing more.

Further resets, do not bring the system out from this condition, the system is "stuck" after the "Starting Kernel" message.

The resets are hardware pulses to the Netra's Reset input pin, applied externally from another processor board that can be programmed to do it sequencially.

Power cycle bring the system back to an operating condition.

The Platform:

It is a TI’s DM8168, Davinci "Netra" SOC, design.

The software is based on the DVRRDK-04.01.00.02, adapted to the board’s design and application requirements.

Linux kernel version 2.6.37, arago project: http://arago-project.org/git/projects/?p=linux-dvr-rdk-dm81xx.git;a=commit;h=607df36e37bae28aeba426f65782eb219dd3651e

Findings:

  • Stuck point:

The CPU wasn’t really stuck right after "Starting kernel ...", it was stuck much later, there were simply no printouts thus there was no way to know where it was.

Investigating, it was found that the CPU got all the way to the endless loop in "cpu_idle()", but the loop executed 101 times, before stopping.

  • Printouts:

Printouts are not output immediately upon startup. They are registered and output thru the serial port much later, when getting to "cpu_idle()".

The Test:

Reset pulses are applied to the board sequentially, at a programmed rate, one reset pulse per ~22 seconds. This way the reset is applied at approximately at the same point in Linux’s startup, about just before calling the "cpu_idle()" function.

With 20 reset cycles after power-up, the system will surely be "stuck". As the number of reset cycles decreases, the system will fail less and start OK more.

When testing there are printouts up to "Starting kernel ..." for each reset (including the last one), when there is the failure.

When there is no failure, there are normal printouts after the last reset.

The Patch:

Inserting a delay into the cpu_idle() endless loop (the first step in the main loop is a counting sub-loop), improved the chance of non-failures (none at 200 reset cycles).

The delay was set to a count of 2500000 for the first 512 loops, and then disabled for the rest of Linux’s operation duration.

Setting the delay to 0,1, or 100000, brought the failures back (at 20 reset cycles).

Files (from linux-dvr-rdk-dm81xx.git):

  • Main.c:

This file contain the function "rest_init()", the function calling "cpu_idle" at startup completion.

The testing reset pulses stops the startup at this function just before calling cpu_idle().

location: /init/main.c:

void rest_init(void) {

        ….

        preempt_enable_no_resched();

        //This is the last point arrived at, when applying a test reset.

        schedule();

        //This is where we do not arrive at, when applying a test reset.

        preempt_disable();

        /* Call into cpu_idle with preempt disabled */

        cpu_idle();

}

  • Process.c:

This file contains the cpu_idle() function for the ARM architecture.

This function was changed to have a delay inserted to its endless loop.

location: /arch/arm/kernel/process.c

void cpu_idle(void) {

        int pcnt=0;

        local_fiq_enable();

        /* endless idle loop with no priority at all */

        while (1) {

//Added delay’s start.

          if(pcnt++ < 512) {

          volatile int n = 2500000;

while(n-- > 0) {

}

 }

//Added delay’s end.

tick_nohz_stop_sched_tick(1);

           …

        }

}