Other Parts Discussed in Thread: C2000WARE
Hi all,
this will be a longer post. Apologies for that.
For a past few days, I've been trying to determine the cause of hanging CM. It was occurring about 3 hours after start up.
Our CM is communicating over ethernet with a Linux SOM via ethernet switch. Tested multiple times on multiple devices. Always the same: about 3 hours runtime and CM is dead.
The first thing to track down this problem was to enable entry hooks on CM to get the address of the last function. We than wrote a piece of software to show this address on a LCD (on CPU1) in the runtime.
So, the CM was freezing on the Ethernet_genericISRCustom() Isr routine (source taken from the TI ENET example).
On a side note: there are two different Ethernet_genericISR routines provided by TI: one is with the driverlib (named Ethernet_genericISR) and other with the lwIP example (named Ethernet_genericISRCustom).
We used the latter at first.
So, without going deep into what is going on in the routine I switched to the driverlib version.
The effect was exactly the same: CM dead after about 3 hours of operation, and stuck in the Ethernet_genericISR.
Ok. So now lets try to determine what is causing the entry to the Isr. There are multitude of possible interrupt sources, and they are logically OR'ed to generate the Isr flag.
There are two figures in the TRM which shows the sources (Fig. 43-8 and 43-9).
Decoding of those figures is not that easy, as there are no register names given to look up, only short signal names.
I was trying to eliminate the possible sources one after another. I started a debug session and looked at the Registers window to find the interesting registers and the Enable flags.
Unfortunately it turned out that some of the registers are missing from the Registers window. The only way to see them was to use Memory Browser.
I also tried to catch the exact moment with debugger connected, but as you may imagine, it was bothersome. I was able to determine that the MACIS bit in the DMA_Interrupt_Status Register is set, so definitely there is some source for the interrupt.
I checked and eliminated almost all but the one source: the AND gate with the MMCIS and MMCIE signals.
I searched the TRM for the enable signal: MMCIE. Single occurrence, on the Fig. 43-8. Well, this complicates things. We have enable signal that is not documented in the manual.
Ok, lets see the MMCIS bit and the register description (MAC_Interrupt_Status Register).
There is also corresponding MAC_Interrupt_Enable Register, but the 8'th bit (where the enable should be) is "reserved". When looking at the contents of the MAC_Interrupt_Enable Register it is 0x00000000 - nothing is enabled. Ok, I thought, an error in the TRM, but the bits are zeroed so there are no enabled interrupts from this leg.
I WAS WRONG.
The undeniable fact was that for some reason the CM entered Ethernet_genericISR and never left it. Or to be exact: it enters, checks for some flags, and exits, just to immediately enter again, blocking the whole core for indefinite period of time.
Totally by accident I noticed in the initialization routine of the Enet below fragment:
// // Disable the MAC Management counter interrupts as they are not used // in this application. // HWREG(Ethernet_device_struct.baseAddresses.enet_base + ETHERNET_O_MMC_RX_INTERRUPT_MASK) = 0xFFFFFFFF; HWREG(Ethernet_device_struct.baseAddresses.enet_base + ETHERNET_O_MMC_IPC_RX_INTERRUPT_MASK) = 0xFFFFFFFF;
Hmmm... These are disabled on purpose. But what about TX? Why TX is not masked? What if (a loose thought) there is an interrupt after sending a defined number of packet or bytes?
Bingo!
The cause of the interrupt was one of the TX data counters reaching half the range, that is 0x80000000 bytes sent. In our case it was after about 3 hours of operation - 2 gigabytes of data.
I added this magic line:
HWREG(Ethernet_device_struct.baseAddresses.enet_base + ETHERNET_O_MMC_TX_INTERRUPT_MASK) = 0xFFFFFFFF;
and the problem was GONE.
This cost me several days of debugging, a few stressful nights, getting through over 500 pages of Ethernet manual, just to pinpoint this single line of code.
Ok, we got here a missed line of code in the driver example. It happens.
But unfortunately it is not everything.
Why on earth the Ethernet_genericISR can't deal with all the sources of interrupt? Well, I definitely would like to ask this question to TI engineers.
Why we have two different Ethernet_genericISR routines? There are major differences between them. Which one to use?
We have inconsistency in the manual with missing MMCIE Interrupt Enable bit.
Well, I think that a top of the line processor deserves for a better ethernet driver.
Looking forward to a new, improved version of C2000Ware that addresses this issue. Waiting for an answer where the MMCIE bit has gone.
Best of luck for all the developers of C2000 MCUs.
Regards,
Andy