This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Strange reset behaviour with SW-buffered UART

Other Parts Discussed in Thread: STRIKE

Hi all. I'm writing an experimental and custom driver layer for the Tiva-C platform (TM4C1294) in Code Composer using C++. I've been having great success writing GPIO, Clock and NVIC drivers but have run into a strange behavior in the UART that I can't seem to work out. I've done all the obvious checks and have run out of theories. I'm merely looking for ideas and possible failure paths to investigate.

Please note that this this is not with CMSIS and that's the goal of this project.

Here is the breakdown of what I'm experiencing:

  • I have a UART driver that uses the peripheral in character mode (no FIFO) and a giant circular/ring buffer for transmission and reception.
  • My setup follows a similar initialization scheme to what the CMSIS code performs. I just do it in registers directly.
  • Things are working beautifully here thus far. I can send and receive data without any problem except in the following circumstance.
  • When the transmit buffer is full, I want the code to block until space opens up in the buffer.
  • To do this, I have a while loop which breaks only when the number of buffered bytes is less than the size of the queue. Nothing magical here.
  • Whenever this condition occurs and it must wait, the code will hang and when I pause the debugger, the instruction pointer is at 0x00000000.
  • As long as this condition isn't met, the code will happily run without problems, but once it starts looping, horrible things happen.

Additionally:

  • The buffer is implemented fully in software, using the classic insertion/extraction offsets approach. I track the number of bytes buffered using a separate variable rather than checking insert==extract conditions. More data but cleaner.
  • I try to write as stable drivers as possible and have checks for overrun/underrun conditions, as well as performing modulo-division on every offset prior to use. There are no access violations going on here.
  • I've confirmed the UART driver setup against the register setup: there is no discrepancy between what the driver wishes to configure and what the registers are setup as.
  • All configuration changes are performed when the UART is disabled and remain unchanged whenever the UART is enabled. (UARTCTL enable)
  • The potential infinite loop isn't the problem. I've done a flat while( true) {} instruction and it'll happily loop there without any problems.

There are only two potential problems I can foresee, both are esoteric and (I think) highly unlikely.

  • The UART driver uses OOP and simple inheritance, and checks two properties of the instance. By necessity therefore, the ISR must access these as well. Could there be something in how CCS handles OOP that cannot be performed properly in the ISR. But if that's the case, this would have showed up regardless of the full condition being met. The problematic code is here:

// If the transmit buffer is full, wait for space to open up
while( this->bTxBufferWaiting >= this->bTxBufferSize)
{}

  • Because the property "this->bTxBufferWaiting" is modified inside the ISR, could their be some sort of access violation between the loop and the ISR? I don't see how this would be a problem, it's not like these operations are multi-instruction and this sort of thing must be mediated by the processor during context switch anyway. Not to mention the fact that this would be the sort of thing the fault ISRs would pick up.

I can release portions of code if requested, I'd prefer not to merely because this is part of a larger driver layer experiment and thus would be a ton of code for you all to sift through. I'm content to explore any angle presented here. I'm just flat out of ideas to investigate.

Thanks all.

  • Craig Stickel said:
    Whenever this condition occurs and it must wait, the code will hang and when I pause the debugger, the instruction pointer is at 0x00000000.

    Hello Craig

    Do you mean the Program Counter of the ARM CPU is 0x0?

    Regards

    Amit

  • Yes, sorry. I used the nomenclature from another architecture. The program counter is always at 0x0 after pausing the debugger after this problem appears.
  • Hello Craig,

    It seems to me that if the CPU PC is 0x0, then it can only happen because some form of stack corruption pushed out a wrong PC value.

    Regards
    Amit
  • Oh, that's an interesting idea, Amit. Thank you!

    If I could ask a quick question then: I did not notice the use of naked function calls or anything other than the standard cdecl calling convention in the example code - did I miss something and ISRs do require naked calls?

    Another thing that comes to mind, I'm not placing the ISRs in an extern "C" block. I'll try the following and get back to you:
    - Place the ISRs in an extern "C" block. Requires a bit of time to re-jig the the circular buffer implementation to use a C struct rather than a C++ class. This'll take me a few man-hours to branch the existing codebase, but it's simple work.
    - Investigate explicitly forcing cdecl and naked function calls (assuming there's no problem there).
    - (Anything else that might pop up during this line of testing)

    In the meantime, if you or anyone else has other ideas, I'm open to more suggestions.
  • Craig Stickel said:
    The UART driver uses OOP and simple inheritance, and checks two properties of the instance. By necessity therefore, the ISR must access these as well. Could there be something in how CCS handles OOP that cannot be performed properly in the ISR.

    There is an issue, but it's not compiler specific. You cannot use a standard member function as an interrupt routine. This is because of the hidden this parameter which the interrupt cannot provide.

    Craig Stickel said:
    // If the transmit buffer is full, wait for space to open up while( this->bTxBufferWaiting >= this->bTxBufferSize) {}

    It's not clear what you are dong but the above suggests you are using a standard member function.

    Robert

  • Hi Robert,

    Haha, sorry, I'm trying to strike the balance between giving enough information and providing a 80-page design specification and telling you my life story. :-)

    I don't believe that's an issue... I'll try to explain why.

    First off, you're completely correct, methods have a different calling convention than regular ol' functions. The tiny code snippet I showed there is in a member function/method that is being used to queue data in the buffers for transmit. The ISR pulls from that queue.

    In order to provide the ISR access to the queue, I save the class instance through the use of a static pointer local to the Uart driver. On the surface, this would typically break good coding conventions, but can provide a sensible architectural abstraction if you angle it right: an instance of the UART driver represents a pre-configured hardware profile which is then "mounted" to the hardware and the static pointer merely points to the mounted instance: NULL means nothing is mounted and the hardware is idle.

    As for the implementation, if this was the problem, the code wouldn't compile much less run properly in the case where the buffer isn't completely full. As far as I've been able to confirm, this approach has no issue as that loop isn't spinning (either that or there is a problem but it hasn't nuked the system yet).

  • It looks like you're on the right track. I rewrote my driver to have the interrupt handlers under an extern "C" block and the device behaved properly!

    Not being satisfied with just fixing the issue, (problems that go away on their own come back on their own) I tried to narrow down what exactly was the root cause. Turns out the extern "C" isn't crux of the issue.

    - I started messing around with calling conventions (only to discover the lack thereof in CCS). So it was the rewrite, not the calling convention that fixed the issue. I pulled the ISR I had just written outside of the extern "C" block and it behaved perfectly normal as well.

    - This leads me to believe that it has something to do with accessing members of the instance... there were two possible candidates:

    1. In order for me to determine which instance is being used, I access a static property of the class which lists which driver instances are presently mounted to the hardware. 
    2. After getting a pointer to the instance, I access properties (the buffer)

    - I tested both cases and it turns out the the first one is the problem. In fact, merely reading a static property from the ISR seems to source this problem, no assignment necessary. I'm not certain if this is corrupting the stack and causing the problem. The problematic line is this:

    BufferedUart *pcUart = BufferedUart::pcMountedInstances[Uart::UART_0];

    Again, I want to stress, even if this is never used or written to, merely accessing this inside the ISR creates the problem.

    I cannot be certain, but this suggests there is a discongruity between the operating environment of the processor while servicing an ISR and what the compiler assumes that environment is like. At the very least, the problem is known and can be avoided.

    Thank you all for your help!

  • Nope, I was wrong. I continued development only to find it executing hardware faults now instead of resets.

    In keeping with Amit's theory about stack problems, I started watching for corruption on my structures and sure enough they're present. In doing some debug-break-on-change sniffing, this value changes randomly thoughout the code: in some internal assembly-code function used to perform division, another in the first few instructions of a function, etc. These kinda suggest to me that stack operations are overwriting these variables.

    Digging deeper, the linker is placing these variables in an area that should be used for the stack - eg, the problem is in the linker script or I have a stack overflow. Tracing it further.
  • Hello Craig

    Are you mixing dynamic and static allocation of memory buffers?

    Regards
    Amit
  • Any chance the original allocation of the object being referenced is on the stack rather than the heap?

    Robert
  • Haha, yes, actually you beat me to the post. I fixed the issue last night. Turns out that there was a large object being allocated on the stack that jumped in size for the transmit and receive buffers being used for this driver. As a result, the stack overflowed. I moved allocation to the heap and undid all my previous changes and investigations and got it working perfectly last night.

    All my troubleshooting and investigations did was move around variables in the memory map in and out of the range of the stack overflow.

    Thank you to both of you!