Part Number: TMDXRM57LHDK
Hi,
I don't know how to approach this problem. Hopefully, you can give me some tips.
I am working with lwip 1.4.1 and the Texas Instrument HDK RM57 without an OS.
How my program works:
- sensor data comes in through a serial interface in asynchronous mode
- once all data is received, it is sent to a client on a PC via TCP (the server is the HDK)
Problem:
- after some random time (normally around an hour), the program crashes with a "data fetch" error
Things I noticed:
- The L4, L4_ABT, L4_USR registers point towards a problem with the serial interface (bad address). I know that whatever is pointed to by the L4 register doesn't necessarily mean that the problem lies at that instruction. The L4 register is set at the time the debugger or board notices the problem, but the time at which the problem occured could be several instructions before. However, I also noticed that when the problem occurs, the buffers (in the double buffer) I use in the receiving interrupt routine of the serial interface point to an address outside the allowed memory region. This buffers are part of my application, not system level buffers.
Tests I did:
- I left the program running without the TCP server for a whole a day. Sensor data was being received in asynchronous mode and processed by the main loop. Program did NOT crashed.
- I left the program running with the TCP server for a whole a day with a single static buffer which was initialized once and never changed. Serial interface to the sensor was not active. Program did NOT crashed.
- I left the program running with the TCP server for 4 hours with a double static buffer which was being updated every 60ms with dummy values. Serial interface to the sensor was not active. Program did NOT crashed. (I did this to test if my copy function somehow was fault).
- I tried running the TCP Server and the sensor serial interface at the same time but without copying the serial buffer to the tcp buffer. The TCP server was sending, in one test, the single static buffer that is never changed, and in the other test, the double static buffer (being updated every 60ms with dummy values). In both tests, the program crashed.
The problem only occurs when both the TCP and the asynchronous serial communication are active at the same time.
(Maybe related) The TCP client on the PC is actually a GUI that displays my sensor data. When connected, the "image" of the sensor data "jumps" once every ~2 seconds. The data is wrong. I thought this could be a copy error from the serial buffer to the tcp buffer. But, before the board sends the data over TCP, it processes it and checks if the data is wrong or corrupted. If it is, it sends several error signals (LEDs and serial debug data). When the image in the GUI jumps, I also get the error signals from the board (which means, the data is actually wrong). When the server is not connected to the GUI, I do NOT get those error signals from the board.
This looks as if the TCP server somehow affects the interrupt routine of the serial interface (or the same memory area?) (and somehow corrupts its buffers?).
How can I check who is trashing memory? How can I check the EMAC driver and the serial interface handler? How can I check If the "port" is reentring non-reentrable functions? The "port" is what LwIP calls to a sort of interface between a device and the LwIP API. My port I got from a texas instrument tutorial.
I don't have much experience in embedded programming. I don't know how to investigate this deeper. What else can I check and how? Maybe my port is not right? Has someone a port to the HDK RM57 that I can compare?
If you understood my description of the problem, what do you think the cause could be? If you didn't understand, please tell so that I can explain better.
I really appreciate any help.
Best regards,
Julio