This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

RTOS TM4C129XL Program jumping out with DFSR, HFSR errors after running for about 5 minutes

Other Parts Discussed in Thread: SYSBIOS

I am using UART5 TM4C129XL board to read and write. I am using RTOS UART functions to do that. And it works fine until it jumps out after a few minutes. 

This also happens when I write hard-coded data on UART, e.g. value 3.

This happens faster if I also System_printf the data that I send out.

Even though it's hard-coded data the jump-out happens after random amount of time, e.g., sometimes after 30 printouts and sometimes after 130 printouts.

The error however is always the same:

FSR = 0x0000

HFSR = 0x40000000

DFSR = 0x0000000b

Terminating program...

When I looked up "definitive guide to arm cortex m3 and cortex m4 processors" I found this:

HFSR = 0x40000000 means "Indicates hard fault is taken because of bus fault, memory management fault, or usage fault."

DFSR = 0x0000000b means

- "Indicates the debug event is caused by a vector catch, a programmable featurethat allows the processor to halt automatically when entering certain type of system exception including reset"

- "Indicates the debug event is caused by a breakpoint"

- "Indicates the processor is halted is by debugger request (including single step)."

Please feel free to share any thoughts that you may have to resolve this.

Thanks

Kaveh

  • Hello Kavesh,

    Did you check by increasing the heap size?

    Regards
    Amit
  • Thank you for your prompt response Amit!

    I am using the default stack and heap size in RTOS uartecho example in Properties->Build->Linker->Basic Options (in CCS 6.0.1), which is 512 and 0 . I changed both stack and heap size once to 1024, the second time to 2048 and the third time to 0x8000, and it looks like the average time to crash increases, but it still happens. And it can still happen very fast in some trials, say after 30 printouts.

    Also, in file uartecho.cfg, there is another stack and heap that I have not touched yet. Here are the parameters:

    /* System stack size (used by ISRs and Swis) */
    Program.stack = 0x300;

    /* No memory allocation occurs, so no heap is needed */
    BIOS.heapSize = 0;

    /* No runtime stack checking is performed */
    Task.checkStackFlag = false;
    Hwi.checkStackFlag = false;

    /* Reduce the number of task priorities */
    Task.numPriorities = 4;

    /* ================ Task configuration ================ */
    var task0Params = new Task.Params();
    task0Params.instance.name = "echo";
    task0Params.stackSize = 0x300;
    Program.global.echo = Task.create("&echoFxn", task0Params);

    /* ================ Logging configuration ================ */
    var LoggingSetup = xdc.useModule('ti.uia.sysbios.LoggingSetup');
    LoggingSetup.loadLoggerSize = 256;
    LoggingSetup.mainLoggerSize = 512;
    LoggingSetup.sysbiosLoggerSize = 1024;



    Please let me know if you think anything should be modified.

    Thanks again!

  • Kaveh Shafiee said:
    Please let me know if you think anything should be modified.

    Try setting Task.checkStackFlag and Hwi.checkStackFlag to true. That will enable checking for stack overflow before a context switch. If the problem is caused by a task overflowing it's stack, the check should highlight the problem sooner.

    Note that enabling stack checking will add some interrupt latency because the checks are made within the Task scheduler while interrupts are disabled.

  • Thanks Chester, your guess is right. When I set Task.checkStackFlag and Hwi.checkStackFlag to true, I get the stack overflow message. I am not sure if I should keep increasing the stack size or how I can avoid this. I also tried initializing the variables in which I store UART read/write data to zero.
  • Kaveh Shafiee said:
    I am not sure if I should keep increasing the stack size or how I can avoid this.

    It depends on what is causing the stack overflow to be detected. The documentation for checkStackFlag says that the check is performed by setting an initial value for "the top of stack" and then checking if the initial value has been changed.

    There are two ways the stack overflow failure could be triggered:

    a) The task uses more stack space than has been allocated, in which case increasing the allocated stack size would solve the problem (as long an no infinite recursion occurs).

    b) A task overwrites the stack by using an invalid pointer, in which case increasing the allocated stack size might make the program run for longer but won't fix the underlying problem.

    In the CCS debugger try setting a hardware watchpoint on a write to the "top of the stack", and when the breakpoint is hit it will point to the offending code which causes the problem.

  • Thank you for your suggestion Chester. In my case the stack overflow failure is triggered b/c of invalid pointer. So, I set a hardware watchpoint on a write to the "top of the stack", and the offending code is this piece of code (it stops on the second line):

    *((unsigned char *)object->readBuf) = (uint8_t)readIn;
    object->readBuf = (unsigned char *)object->readBuf + 1;



    in <tirtos_install_dir>/packages/ti/drivers/uart/UARTTiva.c file.

    The error is "cannot load from non-primitive location".

    I also found this post:

    e2e.ti.com/.../3325

    But, have no idea how it can apply to my problem.

    Any thoughts?

  • Kaveh Shafiee said:
    So, I set a hardware watchpoint on a write to the "top of the stack", and the offending code is this piece of code (it stops on the second line):

    *((unsigned char *)object->readBuf) = (uint8_t)readIn;
    object->readBuf = (unsigned char *)object->readBuf + 1;

    That is the readData() function which reads and processes data from the UART.

    To determine the root cause of the problem more information is required.

    Which memory area is object->readBuf allocated from? 

    If object->readBuf allocated on the stack, the stack size for the task may be insufficient.

  • Kaveh Shafiee said:
    The error is "cannot load from non-primitive location".

    I also found this post:

    e2e.ti.com/.../3325

    But, have no idea how it can apply to my problem.

    That error means the debugger is unable to display a variable, rather than being a error which causes the program to fail at run time.

    E.g. if the debugger doesn't know the type of a pointer variable.

  • Looks like the stackpeak is the same as stacksize for task "ti.sysbios.knl.Task.IdleTask". The address of this taks, i.e., 0x200018e8 is exactly where the task overflow occurs.

  • Kaveh Shafiee said:
    Looks like the stackpeak is the same as stacksize for task "ti.sysbios.knl.Task.IdleTask". The address of this taks, i.e., 0x200018e8 is exactly where the task overflow occur

    It looks like the "echo" task is overwriting the stack of the ti.sysbios.knl.Task.IdleTask.

    The idle task is in the TI-RTOS function readData() writing to object->readBuf when it overwrites the stack for ti.sysbios.knl.Task.IdleTask.

    I think the problem is the the buffer[0..size] array passed to the UART_read() or UART_readPolling() functions from the echo task overlaps the stack for the ti.sysbios.knl.Task.IdleTask. How is the buffer array allocated?

  • int SendOpenSocketFrame()
    {
      // Sends a socket open frame and waits for a socket response frame, blocks
      // (and discards) until a socket response frame is received.
    
      int rv;
      unsigned char buf[128];
      
      // create and send a start frame
      buf[0] = SOCKET_OPEN_FRAME;
      buf[1] = 8;
      buf[2] = g_socketControl.direction;
      buf[3] = g_socketControl.type;
      buf[4] = g_socketControl.port / 256; // msb
      buf[5] = g_socketControl.port % 256; // lsb
      buf[6] = g_socketControl.ipAddress / 16777216; // msb
      buf[7] = g_socketControl.ipAddress / 65536;
      buf[8] = g_socketControl.ipAddress / 256;
      buf[9] = g_socketControl.ipAddress % 256;      // lsb
    
      rv = SerialWrite(buf);
      if (rv < 0)
        return rv; // an error occurred
    
      // get response frame
      do {
        rv = SerialGetFrame(buf);
        if (rv < 0)
          return rv; // an error occurred
    
        // if frame is a response frame...
    	if (buf[0] == SOCKET_RESPONSE_FRAME) {
    	  g_socketControl.socketHandle = buf[2]; // extract the socket handle
    	  break;
    	}
      } while (1);
    
      // extract the status and error fields
      g_socketControl.status = (buf[3] << 8) + buf[4];
      g_socketControl.error  = (buf[5] << 8) + buf[6];
      if (g_socketControl.error == 0)
        rv = 0;  // success
      else
        rv = -1; // error
    
      return rv;
    }
    
    
    int SerialWrite(int n, unsigned char* bufp)
    {
    	int rv;
    	rv = UART_write(uart1, bufp, n);
    	return n;
    }
    
    
    int SerialGetFrame(unsigned char* frame)
    {
      int rv1 = 1, rv = -1;
      int slide = 0;
      int large = 0;
      int frameSize = 0;
      int CRCindex = 0;
      int remaining = 0;
    
      frame[0] = frame[1] = frame[2] = INVALID_FRAME_ID;
    
      while (1) {
    	if (rv1 == 1) {
          slide++;
          frame[0] = frame[1];
          frame[1] = frame[2];
    	}
    
    	//Sleep(0);  // we want to be "nice" here
    
    	rv1 = SerialRead(1, &frame[2]);
    	if (rv1 == 0) continue;
        if (rv1 != 1)
          break; // break with error
    
        if (isValidFrameID(frame[0])) {
          if (isLargeFrame(frame[0])) {
    	    large = 1;
            frameSize = (frame[1] << 8) + frame[2];
            CRCindex = frameSize + 3;
    		remaining = frameSize + 2;
          }
          else {
            frameSize = frame[1];
            CRCindex = frameSize + 2;
    		remaining = frameSize + 1;
          }
    
          if (!isValidFrameSize(frame[0], frameSize)) {
    	rv1 = SerialBufGet(frame, 3, remaining);
            if (rv1 != remaining)
              break; // break with error
          }
    	  else {
    		  System_printf("Invalid Frame size detected\n");
    		  System_flush();
    	  }
    
          if (ModbusCRC16(frame) == ((frame[CRCindex] << 8) + frame[CRCindex + 1])) {
            rv = frameSize + large + 4;
            break; // break no error
          }
          else {
            rv = -2;
            break; // break with error
          }
        }
    	if (slide == 3) {
    	  slide = 0;
    
    	}
      }
    
      return rv;
    }
    
    
    int SerialBufGet(unsigned char* bufp, int bufIndex, int count)
    {
      int i, rv;
    
      i = bufIndex;
      do {
        rv = SerialRead(1, &bufp[i]);
        if (rv < 0) {
          break; // an error occurred
        }
        if (rv == 1) {
          // we received a character
          i++; // advance the index
          if ((i-bufIndex) == count) {
            rv = count;
            break;
          }
        }
    	//Sleep(0); // try to be nice
      } while (1);
    
      return rv;
    }
    
    
    int SerialRead(int n, unsigned char* bufp)
    {
    	int rv;
    	rv = UART_read(uart1, bufp, n);
    	return rv;
    }

  • The SendOpenSocketFrame() function declares a 128 byte buf[] array on the stack. buf is passed to SerialGetFrame as the pointer to where to store a "frame" read from the serial port.

    The SerialGetFrame function can obtain a frameSize from 1 byte (up to 255 bytes) or 2 bytes (up to 65535 bytes). frameSize bytes are read into the buf[] array. As buf is only 128 bytes, it looks like some frame sizes could cause a write off the end of the buf[] array to occur, thus overwriting other variables.

    What is the maximum frame size allowed by the isValidFrameSize function?