Hello all,
I'm having an odd little issue with lwIP on an RM46. The problem occurs about once every 24 hours, so this has been a long debug path. Basically the TCP stack will suddenly stop receiving new packets and everything freezes up. We're running a freemodbus implementation and a webserver, and when this bug crops up, they both freeze up. I dug through and finally found that the EMAC core basically isn't producing new interrupts. I don't think it can though, as the RX0FREEBUFFER register is reporting 0x00.
My guess is this: Something is causing the rx interrupt handler to miss a packet. That packet sits in the buffer. Another packet comes along soon thereafter, generates an interrupt, and the first packet is processed, leaving +1 packet in the buffer. Things keep going along normally with one extra packet in the buffer until whatever mechanism is causing this makes it +2. This goes on until there's 256 packets (or however it's measured) in the buffer, and there's no room left. As a result of hitting the threshold value, it tries to generate a threshold interrupt (but I don't have one enabled), and RX filtering then gets enabled. This means only high-priority packets get through (which none in this case) As a result, no interrupts are generated, and no further packets are processed, and both the webserver and modbus functions stop completely.
So, what's the appropriate way to handle this? We're using the TI HDK example code + TI PHY, though instead of calling the interrupt handler directly in an interrupt, each time I get an interrupt I increment a register, and ensure rx_interrupt_handler() is run once for each tick. Pseudocode:
rxInt(){
rxCount++;
}
txInt() {
txCount++;
}
main()
{
while(1)
{
if(rxCount)
{
rxIntHandler();
rxCount--;
<check for negative rxCount, other stuff>
}
<txCount similar>
}
}
The other notable thing I see in the EMAC registers are that RXINTSTATRAW is 0x00000101, telling me there's a threshold interrupt being generated (but it doesn't do anything as it hasn't been set up to). If I write 0 to MACEOIVECTOR, that interrupt will clear itself, it'll work for a bit, and then 'crash' again.
We're doing interrupts the way we are for the simple fact that we don't want to miss any of them... timer or EMAC, but something is still getting missed. My best guess is maybe the timer interrupt occurs when the emac interrupt is running, and rxCount maybe doesn't get incremented? But then the interrupt shouldn't be acknowledged and the system would freeze, as the EMAC core shouldn't generate two interrupts (At least the Tech Ref manual seems to say that once EMAC generates an interrupt, it won't generate one again until the first one has been cleared).
Thanks,
Martin