This thread has been locked.
If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.
Part Number: TMDXRM57LHDK
Hi,
I don't know how to approach this problem. Hopefully, you can give me some tips.
I am working with lwip 1.4.1 and the Texas Instrument HDK RM57 without an OS.
How my program works:
Problem:
Things I noticed:
Tests I did:
The problem only occurs when both the TCP and the asynchronous serial communication are active at the same time.
(Maybe related) The TCP client on the PC is actually a GUI that displays my sensor data. When connected, the "image" of the sensor data "jumps" once every ~2 seconds. The data is wrong. I thought this could be a copy error from the serial buffer to the tcp buffer. But, before the board sends the data over TCP, it processes it and checks if the data is wrong or corrupted. If it is, it sends several error signals (LEDs and serial debug data). When the image in the GUI jumps, I also get the error signals from the board (which means, the data is actually wrong). When the server is not connected to the GUI, I do NOT get those error signals from the board.
This looks as if the TCP server somehow affects the interrupt routine of the serial interface (or the same memory area?) (and somehow corrupts its buffers?).
How can I check who is trashing memory? How can I check the EMAC driver and the serial interface handler? How can I check If the "port" is reentring non-reentrable functions? The "port" is what LwIP calls to a sort of interface between a device and the LwIP API. My port I got from a texas instrument tutorial.
I don't have much experience in embedded programming. I don't know how to investigate this deeper. What else can I check and how? Maybe my port is not right? Has someone a port to the HDK RM57 that I can compare?
If you understood my description of the problem, what do you think the cause could be? If you didn't understand, please tell so that I can explain better.
I really appreciate any help.
Best regards,
Julio
Hi Chuck,
Thanks for taking the time to help me.
I just would like to point up something new that happend yesterday.
As an intro:
I mentioned that I process the sensor data after it has been completly received. As a once-initialization step of the processing I set some global variables of type float, float[] and float[][] (and others) to some values. This variables are used every time the data needs to be processed.
Problem:
I was trying to get the error to check some registers, but the data fetch error didn't come up yesterday, the 'new' problem might be related, though.
The server stops sending data (tcp_write returns -11, which is NO_CONNECTION). The serial interface keeps working as it should. But that global variables I set in the initialization have corrupted data. Somehow their values were changed.
How can this happend?
Maybe this can help you understand the issue if it is related.
Thanks and regards,
Julio
Hi Chuck,
Thank you for the help.
Is it possible to increase the size of the STACK section in the memory map? Right now it is about 5k, and one sensor data buffer is about 2.2k. If the serial interrupt is active, and the tcp interface is sending at the same time, the stack must hold the context of both interfaces plus the normal context from the main loop, right? It could be that the stack is not large enough and everything gets corrupted?
Thanks and regards,
Julio
Hi Chuck,
I think I found the cause of the data fetch error. I was writting more data than I should into the serial buffer. I wasn't testing/checking the amount of data to be written. I ran the program for about 2 hours and the data fetch error didn't occur. I am going to run it the whole night and see if the program still runs tomorrow.
What I think happens is the following:
If B happends when I'm reading the "size" of the next message, this value might be larger than the actual buffer size. Since I was doing A, I was generating the data fetch error. Now I just check if this value is inside the buffer limits and it seems to work.
I now have another problem: (should I open a new thread for this?)
The serial buffer still gets "corrupted" when the tcp server is active, because B can occur while reading any part of the serial message.
I tried increasing the priority of the serial interrupt, by changing its interrupt handler from IRQ to FIQ in Halcogen.
Unfortunately, that didn't work. The program crashes with another data fetch error seconds after startup.
What am I missing?
Best regards,
Julio
Hi Chuck,
thanks again for the help. I really appreciate it.
Chuck Davenport said:If you are in the middle of your TCP interrupt when data is recieved and the SCI Rx interupt is suppressed/delayed as a result, the data will still transfer from the SCIRXSHF register to the SCIRD register (two stage buffer) but if you are unable to get to it before another message is received into SCIRXSHF the data in SCIRD will be over written and lost leading to an overrun error (you can enable this error if it isn't already. You can at least check the error flag in the status register). However, I think you are referencing the multi-buffer mode of the SCI where there are 8 buffers in which to move the data from the SCIRXSHF register when received. The trick here is to be able to set the number of bytes received/expected to set the length of the frame/buffer depth.
I was able to see the overrun error when I enabled it. So at least I now know for sure what is happening. You mentioned I am using the multi-buffer mode. I am not sure about that. In the "Technical Reference Manual" I only see the option to set SCI/LIN in multi-buffer mode. Can I also do that for SCI3? I am already setting the expected amount of data on the SCI receive function, and I get the sciNotification after that amount of data has been received. However, I think the sci3HighLevelInterrupt is stil being called after each single frame is received.
I did the following test:
I now get the overrun error A LOT less frequently. However, I expected to not see it anymore. Which leads me to believe that I am not using the multi-buffer mode. The overrun error occurs again a lot more frequently if I add another tcp connection.
Chuck Davenport said:
The real solution here is to spend as little time as possible in the interrupts and only use them to dump the data into received data structs. Either for the TCP data or the SCI data. i.e., get into the ISR, copy the data to a RAM buffer for later processing, re-enable interrupts, and get out. All the processing could then happen in your application layer. This still won't prevent interrupt latency but, hopefully it will allow you to capture the data prior to another message coming in. In fact, you may be able to get away with doing this only by enableing interrrupts int he TCP interrupt so that it can allow nested intterrupts from the SCI which can capture the data and copy to RAM. Once the TCP transfer is done, you can move the SCI data to where it belongs and adjust the buffer. (ideally you would have a RAM based ring buffer (FIFO) being used for SCI receive data where the interrupt only places data into the ring buffer and the SCI data is processed at a non interrupt level.
Julio Cesar Aguilar Zerpa said:thanks again for the help. I really appreciate it.
Chuck DavenportIf you are in the middle of your TCP interrupt when data is recieved and the SCI Rx interupt is suppressed/delayed as a result, the data will still transfer from the SCIRXSHF register to the SCIRD register (two stage buffer) but if you are unable to get to it before another message is received into SCIRXSHF the data in SCIRD will be over written and lost leading to an overrun error (you can enable this error if it isn't already. You can at least check the error flag in the status register). However, I think you are referencing the multi-buffer mode of the SCI where there are 8 buffers in which to move the data from the SCIRXSHF register when received. The trick here is to be able to set the number of bytes received/expected to set the length of the frame/buffer depth.
I was able to see the overrun error when I enabled it. So at least I now know for sure what is happening. You mentioned I am using the multi-buffer mode. I am not sure about that. In the "Technical Reference Manual" I only see the option to set SCI/LIN in multi-buffer mode. Can I also do that for SCI3? I am already setting the expected amount of data on the SCI receive function, and I get the sciNotification after that amount of data has been received. However, I think the sci3HighLevelInterrupt is stil being called after each single frame is received.I did the following test:
- The sensor needs 67 ms to send data, it then pauses for 10ms, and starts to send the next serial telegram over the next 67ms.
- I process the complete data and send it (raw) over TCP. This takes around 6-10ms and that's when new serial data is arriving and probably why the error occurs because I need to do some synchronization checks in the first 4 bytes of the telegram and check the size of the telegram in the next 2 bytes. Which means the first 6 bytes I read one at a time (sciReceive(UART3, 1, buffer)), which increases the chances of the TCP blocking the receive stream.
- The test was to send the data over TCP after I've done all those checks and I've done 'sciReceive(UART3, size, buffer)'
I now get the overrun error A LOT less frequently. However, I expected to not see it anymore. Which leads me to believe that I am not using the multi-buffer mode. The overrun error occurs again a lot more frequently if I add another tcp connection.
Chuck Davenport --> SCI3 is a standard SCI and will not have the buffered mode. It is only available on the SCI/LIN instantiations. You might consider using the DMA if the device has one. This way the DMA can transfer received data into a buffer location that can be operated on independently of receive. You can then operation on the buffer as needed/when time permits without fear of interruption from the TCP operations.
Chuck Davenport
The real solution here is to spend as little time as possible in the interrupts and only use them to dump the data into received data structs. Either for the TCP data or the SCI data. i.e., get into the ISR, copy the data to a RAM buffer for later processing, re-enable interrupts, and get out. All the processing could then happen in your application layer. This still won't prevent interrupt latency but, hopefully it will allow you to capture the data prior to another message coming in. In fact, you may be able to get away with doing this only by enableing interrrupts int he TCP interrupt so that it can allow nested intterrupts from the SCI which can capture the data and copy to RAM. Once the TCP transfer is done, you can move the SCI data to where it belongs and adjust the buffer. (ideally you would have a RAM based ring buffer (FIFO) being used for SCI receive data where the interrupt only places data into the ring buffer and the SCI data is processed at a non interrupt level.I dont't know exactly what the LwIP port (from the Texas Instruments tutorial) does in the tx interrupt. I would assume it tries to spend as less time as possible, which is what I also do. I tried your idea to use a nested IRQ so that the serial interface is able to interrupt the tcp interface. But the board crashes. I also tried it setting the SCI to work with FIQ, but the result is the same. I guess the tcp interrupt of the Port was not designed to be interrupted.What about using SCI3 with DMA? Would that help to send all incoming data to the DMA buffer while avoiding the blocking from the tcp?CD--> Same idea I had earlier. I would suggest giving that a try. There is an SCI DMA example in the examples of HalCoGen that works well. You have to kick off the first communication manually I think to get things kick started.Best Regards,Julio