Driver.lib 2.1.1.71 DK versus EK 1294 - EMACDMAIM REG59 and EMACDMARIS REG57 are mirror MCU designs OR !

Genatco

Source: (emac.c)

EMACIntStatus() returns the interrupt unmask values of the EMACMARIS REG57 if Boolean (false) and masked values EMACDMAIM REG59 if (true).

The masked returned value (true) claims the EMACDMAIM REG59 0xC1C to be unmaskable and manually OR'd with EMAC_NON_MASK_INTS.

Yet the lower 16:0 bits are maskable in the EK-1294 EMACDMAIM and only the upper 16 bits EMACDMARIS TS/RS status bits are being manually masked. So it looks like the lower 16 maskable interrupts bits are being ignored during the OR process.

Seems better to OR the EMAC_MASKABLE_INTS values with EMACDMAIM against interrupt status register EMACDMARIS and perhaps manually mask (OR) the TS/RS bits?

EMACIntStatus(uint32_t ui32Base, bool bMasked) { ~~~~~~~~~~~~~~~~~

    if(bMasked)
    {
        ui32Val &= (EMAC_MASKABLE_INTS | HWREG(ui32Base + EMAC_O_DMAIM));//EMAC_NON_MASKED_INTS
    }

over 10 years ago

0 Amit Ashara over 10 years ago

TI__Guru**** 244400 points

Hello BP101,

The EMAC_NON_MASK_INTS are for the bits that are unmaskable and must be returned with the status of the MASKED bits.

Regards
Amit

0 Genatco over 10 years ago in reply to Amit Ashara

Guru 55913 points

It would seem so but it appears the compiler is treading the mask bits as an AND. Perhaps the lower 17 bits of HREG are not actually being OR'd with the maskable bits of the DMAIM regardless of the OR'd TT/PMT/MMC/TS/RS. It would seem the assumption is made if a bit is 0 in the OR operation the compiler should assume the lower 17 bits are a 1 to mask the bit position.

Likewise the EMACIntClear() logic seems to be reversed as when the AIS flag is raised (ui32IntFlags) and DMAIM or DMARIS seems to be missing to assert |= (ui32IntFlags) RW1C cleared by writing 1 Not 0 to the bit position. The condition causes the AIS/RU bits abate the |= take forever to clear, by that time the RU condition has locked the EMACK RX controller. Seems more prudent to |= all abnormal/normal interrupts when ever the AIS/NIS flag is raised since we don't know what bit in the interrupt was set in DMARIS that needs to be cleared. Other than by & the interrupt status via (ui32INtFlags) which really only sets one side of the OR gate equation. Later changed below to &= so both sides of the AND gate are tested with DMARIS RW1C clearing all masked bits.

Case EMAC0 receiver asserts INT 0x8080 (32896) AIS/RU end up crashing the EMAC. Reading EMACMARIS the RI bit is not set with AIS/RU, guessing since the FIFO has the start of the frame during RU. Then Interrupt status 0x180c0 (98496) NIS,AIS,RU,RI starts posting but only after some time has passed and numerous AIS,RU flags are first reported by EMACDMARIS.

Seems the application was working from the masked status word upper 31:17 bits and generating a SW56INT instead of using HW interrupt status to resolve RU condition. That approach might be ok until the EMAC is being heavily stressed or actually runs out of RX buffers then what?

The interrupt clear seemingly should test for flag bit NIS/AIS and appears was testing for all status bits and only OR the flag bits [15,17] AIS/NIS to (ui32IntFlags). That has been changed below and is clearing all interrupt sources. So far is working but won't know until RU ever occurs again, praying it don't. Making the change above post the EMAC receiver stayed connected to IOT for 50k seconds. The EMAC receiver randomly drops IOT at 10-16k seconds before DMARIS reported AIS (32896) RU.

Changing the INT priority level of LWIP timer call has helped to negate random exception error 11 INT5 (bus error), debug reveal often NIS/RI/RU on a POR.

EMACIntClear(void)

{
    //
    // Mask in the normal interrupt if one of the sources it relates to is
    // specified.
    //
    if(ui32IntFlags & EMAC_INT_NORMAL_INT) //Read NIS flag
    {
        ui32IntFlags |= EMAC_NORMAL_INTS; // |= all NIS flags
    }

    //
    // Similarly, mask in the abnormal interrupt if one of the sources it
    // relates to is specified.
    //
    if(ui32IntFlags & EMAC_INT_ABNORMAL_INT) //Read AIS flag
    {
        ui32IntFlags |= EMAC_ABNORMAL_INTS; // |= all AIS flags
    }

0 Genatco over 10 years ago in reply to Amit Ashara

Guru 55913 points

Hi Amit,

Even more strange is a hardware (HW) interrupt vector is assigned to EMAC0 INT 56 in (startup_ccs.c) yet LWIP timer call to SW INT 56 is being triggered simultaneously or some time later. Text notes to avoid using Mutexes must keep all interrupt sources in the LWIP context avoiding reentrance.

Seems any masked DMAMARIS interrupt flag will vector directly to the HW INT handler blowing through a programmed SW INT trigger and possibly violate non-reentrant LWIP.

Does the HW interrupt vector export [extern void lwIPEthernetIntHandler(void);] in (startup_ccs.c) negate the SW INT trigger interrupt source 56? How to stop a HW/SW tie to NVIC or repeating the same INT56 into this global export vector?

For a level-sensitive interrupt, when the processor returns from the ISR, the NVIC samples the interrupt signal. If the signal is asserted, the state of the interrupt changes to pending, which might cause the processor to immediately re-enter the ISR. Otherwise, the state of the interrupt changes to inactive

0 Genatco over 10 years ago in reply to Genatco

Guru 55913 points

Hi Amit,

Wonder how possible to set NVIC SW triggered INT56 in non-privileged access mode - like all program examples of EMAC/PHY have configured seemingly without privileged access?

HWREG(NVIC_SW_TRIG) |= INT_EMAC0 - 16;

Data sheet states we must unlock REG76 MPUATTR 0xDA0 - AP bit filed = 000, defaulting as No Access to non/privileged SW. Warning Notes REG76 can only be accessed from privileged mode. Writing 0x011 to the AP bit field REG76 gives non privileged SW full RW access in the first place.

What confusion is going on here - might it be Stellaris MCU has default Full RW privilege access AP but field set 0x011 by default?

Accordingly the EMAC INT56 SW interrupt should not be functional, causing a non privileged access error. (Datasheet page 139) AP bit field must be set RW= 011 to gain full access to redirect EMAC0 HW INT 56 source in NVIC upon gaining full privileged access to REG53 SWTRIG 0xF00 page 161.

0 Amit Ashara over 10 years ago in reply to Genatco

TI__Guru**** 244400 points

Hello BP101

No. The HW interrupt vector does not negate the SW vector. The SW vector can be considered as a jump start for a process rather than polling. I can check for the register value, but if you look at the all the bit field settings, they are 0. Even Size is kept as 0 but that does not stop the CPU from executing from a larger flash size.

I can check for the Read of the register and data path to see if the register does get updated, but it could be a function of the design itself.

Regards
Amit

0 Genatco over 10 years ago in reply to Amit Ashara

Guru 55913 points

CCS debug now showing the HW interrupt 56 bit 40 (32-54) as disabled bit position 9 after enabling privileged mode versus setting CFGCTRL, MAINPEND=1.

For some reason CCS Debug NVIC_SW_TRIG bits do not show any changes during refresh but the mask bits of DMAMARIS are all being reset after changing the |= to &= of all NIS/AIS source INTS. The strange part is the UARTprintf() messages would print the AIS status bits multiple times in looping not clearing them between IOT connection attempts. That symptom would seem to infer the DMAMARIS set & clear was failing so try hard to locate the cause.

Masking NIS/AIS all bits had issues that appear in the interrupt Priory group 4:4 NIL rule as LWIP timer HW INT39 was below SW INT 56. When the priority of LWIP timer priority level was set farther down the 3 bits grouping say 0x10 versus 0xA0, NVIC would raise exception 11 bus error shortly after POR. The bit positions in the grouping priority has something to do with the error cause. Find it odd that EMAC priority 0xC0 and LWIP timer priority 0xA0 the third bit positions are in line to binary 4:4 priority.

At least learning few things about MPU protected memory mode TEX,S,C, AP bits, adding code below for enabling SW triggers in NVIC per-data sheet.

/* Enable the Cortex-M memory protection unit. PRIV_DEFAULT:
 * Enables the default memory map when in privileged mode and
 * when no other regions are defined. HARDFLT_NMI: Enables
 * the MPU while in a hard fault or NMI exception handler.
 * If this option is not enabled, then the MPU is disabled while
 * in one of these exception handlers and the default
 * memory map is applied. */

 ROM_MPUEnable(MPU_CONFIG_PRIV_DEFAULT | MPU_CONFIG_HARDFLT_NMI);

 /* MPU attributes Software privileged mode, RW full access */
  HWREG(NVIC_MPU_ATTR) = (NVIC_MPU_ATTR_AP_RW_RW);

0 Amit Ashara over 10 years ago in reply to Genatco

TI__Guru**** 244400 points

Hello BP101,

Please clarify the LWIP HW INT39? was below SW INT 56?

Regards
Amit

0 Genatco over 10 years ago in reply to Amit Ashara

Guru 55913 points

Hi Amit,

Priority grouping rules state the lower value interrupt takes priority over the higher interrupt value no matter the priority level of the higher interrupt source. Think that is how NVIC now determines NIL sequencing of chained interrupt sources. Explains how NVIC more rapidly services chained interrupts when ever the priority level falls close together in the 3 bit positions.

Reviewed again post masking of EMACIntStatus(), bear in mind we want the real time interrupt masked source to always pass into the (ui32Val) variable.

So the masked value returned (true) is | with DMAIM static Interrupt mask source bits, &= with (ui32Val). Seemingly the &= combined with | could negate the real time (ui32Val) DMARIS values that were previously taken. Noting the MPU is not flawless - higher speeds can at times have a certain % of random logic error with loosely written code. Otherwise what seemingly works for 13 hours without incident should never bus fault from random unexpected interrupt errors. Should we be forced to run our MPU peripherals at turtle access times when SYSCLK gives us the advantage to warp time and space? Without adding SysCtlDelay() at several turning points the peripheral train derails mostly upon accessing EMAC, EEROM, SRAM. Case high speed access the interrupts handling has to be near flawless.

The application layer expects real time interrupt status for Integer compares returned from calls made to (emac.c). Adding any static values in the mix could intermittently lead to a bad outcome should the mask | negate real time interrupt status.

Might it be safer to never allow DMAIM into the mix in real time interrupts, that removes question "Could this be an issue" when things go wrong. The &= inside HWREG masks bits if the real time bit positions of TT,PMT,MMC are set. Case logic DMARIS & 1+1=1 / 0+1=0 far better than DMAIM | 0+1=1 / 1+1=1.

    if(bMasked)
    {
        ui32Val &= (EMAC_NON_MASKED_INTS & HWREG(ui32Base + EMAC_O_DMARIS));
        //ui32Val &= (EMAC_NON_MASKED_INTS | HWREG(ui32Base + EMAC_O_DMAIM));
    }

0 Genatco over 10 years ago in reply to Genatco

Guru 55913 points

For any looking in on this post we call this Multitasking an issue that may have several areas in code leading to the very same exception fault error number. Learning that is what makes it so difficult to pin an exception fault down to just one area in code being the cause. So keep an open mind and diverge your focus into all key points of the source code and libraries.

0 Genatco over 10 years ago in reply to Amit Ashara

Guru 55913 points

Hi Amit,
More strange SW56 interrupt happenings may just be CCS5.4 debug causing but not sure just yet.

EMAC INT56/40 bit 9 is disable in CCS debug ( INT32 - 63), assumed was made disabled after redirecting NVIC over to SW INT56. Oddly the disabled HW INT56 randomly asserts pending in CCS debug after many hours of run time. Note we can not see SW INT56 being triggered in any debug NVIC interrupt registers.

When disabled HW INT56 randomly asserts pending CCS debug register refresh cycle stops & presumably halts the TM4C123 ICDI yet does not cause an exception fault in TM4C1294. Upon stopping CCS debug the TM4C1294 resumes and continues adding to IOT time since reset seconds count remains connected to Cloud reporting http 204 status events. That is not always the case and sometimes an exception 11 follows that random interrupt event in TM4C1294.

How is it possible a SW INT56 is randomly switching into a HW INT56 is the question of the day?

0 Amit Ashara over 10 years ago in reply to Genatco

TI__Guru**** 244400 points

Hello BP101,

Looking at the code of EMACIntStatus, the idea is to read the RIS register, remove the bits not AE, TS and RS from the status and then use the IM register to get the rest of the status. Since that does not cover the upper bits, the OR condition is required.

Now regarding the Interrupt scheduling the priority level can only be seen when the two interrupts occur together in time, something that is difficult to monitor in SW environment as the visibility into the occurrence of interrupts is not there on a pin. The cause of Fault must be something else and not what we are suspecting. It may be a code execution which in context of the application is showing some other symptoms. One simple but not so straightforward method is to move entire code execution to SRAM or an external SDRAM memory. This eliminates Flash or any other core event from the application.

Regards
Amit

0 Genatco over 10 years ago in reply to Amit Ashara

Guru 55913 points

when the two interrupts occur together in time,

That is what the code appears to do, EMAC HW INT56 is set enabled (tiva-tm4c1294.c) and SW INT56 triggering inside LWIP timer call function. Long ago added an unpend statement for SW INT56 into the interrupt handler, later comment it out not understating both HW/SW INT56 were both firing. Today added that NVIC unpend INT56 back into the interrupt handler. Apparently we must handle the HW interrupt clearing when NVIC sets pend and not just clear DMARIS as the code was doing.

Seems when both HW & SW INT56 ties NVIC, he sets a pending status on the (disabled) INT56/40 bit 9 in CCS debug. Sometimes that pending status quickly goes away, assume SW interrupt. Other times pend 56 stops the application in pending status, assume HW 56 is not cleared. Polled level interrupts do not typically auto unpend unless they are pulsed and SW interrupt. How HW/SW INT56 is even working together amazes, not without throwing all caution to the wind as it seems. Believe HW INT56 is asserted when an obscure EMAC DMA condition as it does not raise HW 56 flag very often yet it does. Think it may be safer fewer issues to not use a SW INT56 trigger in the LWIP timer and just poll the HW INT handler as to advert would be NVIC ties for the same interrupt.

Random exception 11 appears to lean more as interrupt, EMAC DMA, timing related issue about the interrupt vector handling. CCS debug reveals the EEROM is still incrementing R/W address and the LWIP timer counting after the phantom pend suddenly stopped the application.

0 Genatco over 10 years ago in reply to Amit Ashara

Guru 55913 points

Hi Amit,

"priority level can only be seen when the two interrupts occur together in time"

Priority grouping changes the way priority ordered IRQ's are serviced.

2.5.6: If multiple pending interrupts have the same group priority, the subpriority field determines the order in which they are processed. If multiple pending interrupts have the same group priority (4:x) and subpriority (x:4), the interrupt with the lowest IRQ number is processed first. This case LWIP timer had a higher priority yet the lower IRQ.

Major surprise:

The datasheet makes no direct connection between IRQ56 and EMAC0 other than NVIC holds a place for IRQ56. There is not an EMAC peripheral interrupt raised when EMACDMARIS flags any number of previously masked interrupts. That appears to have alluded TI programmers for the way NVIC was handling IRQ56. That confused many to believe there were two unique ways IRQ56 would assert in the EMAC when there was no physical IRQ56 hardware assignment EMAC0. The EMAC peripheral makes no claim of a single IRQ56 raised for register EMACDMARIS. Now we poll DMARIS via LWIP timer call and disable IRQ56 remove the priority. Same results AIS at some point in time will not be cleared and start a cascade of un-cleared DMARIS bits 16:0.

One of the reasons to witness at times repeating AIS and crash EMAC0 RX/TX controller. The IOT code made this very wrong assumption we all became a victim of. It's not a matter of [ if(ui32Status) ] rather it is when (ui32Status), other words get the dang IF out that EMACIntClear() and always clear DMARIS or the next time LWIP timer SW triggers or even Polls the interrupt handler the same interrupt will be read many times over.

    /* Read and Clear EMACDMARIS REG-57 interrupt.*/
    ui32Status = MAP_EMACIntStatus(EMAC0_BASE, false);
    
  // If the interrupt really came from the Ethernet and not our
    // timer, clear it.
    //
    //if(ui32Status)
    //{
        MAP_EMACIntClear(EMAC0_BASE, ui32Status);
    //}

0 Amit Ashara over 10 years ago in reply to Genatco

TI__Guru**** 244400 points

Hello BP101,

But interrupt vector 56 is the EMAC Interrupt. The unclaimed AIS is not cleared because the cause is still continuing and without rectifying the cause the interrupt status shall be updated. However with the IM bits being cleared the physical interrupt shall not occur from these cause. The polling in the LWIP timer call is to ensure that there is no other pending source.

Regards
Amit

0 Genatco over 10 years ago in reply to Amit Ashara

Guru 55913 points

Hi Amit,
Success - so far no more random AIS/NIS error loops, exception 11, or IOT stops. Suspect one reason why was due to SW triggered handler where even adding unpend IRQ56 faired worse. The EMAC DMA engine is not forgiving to miss servicing a single interrupt once bits are set in EMACDMARIS. The likely hood of that occurring escalates with faster access times around the IRQ handler and actual load on the DMA engine. How we ever saw IOT reach 70k seconds in past was only achieved after tweaking DMA engine settings, never reached that high of connect seconds in months.

Polling the MARIS register from LWIP timer call after disabling IRQ 56 and EMAC Priority is proving to be far more stable. Perhaps one answer lies in the way NVIC services the SW trigger unpend cycle seemingly the trigger randomly flat lined switching from pulse to level sensitive. NVIC auto unpend process might not be to far off from asserting a manual unpend IRQ.

//! Un-pends an interrupt.
//!
//! \param ui32Interrupt specifies the interrupt to be un-pended. The
//! \e ui32Interrupt parameter must be one of the valid \b INT_* values listed
//! in Peripheral Driver Library User's Guide and defined in the inc/hw_ints.h
//! header file.
//!
//! The specified interrupt is un-pended in the interrupt controller. This
//! causes any previously generated interrupts that have not been handled
//! yet (due to higher priority interrupts or the interrupt not having been
//! enabled yet) to be discarded.

0 Amit Ashara over 10 years ago in reply to Genatco

TI__Guru**** 244400 points

Hello BP101

The interrupt is edge sensitive and not level sensitive. So if multiple interrupts happen before an UNPEND is given may cause the HW interrupt to be UNPEND as well. However servicing the cause via the status poll should be able to still give the correct status back.

Regards
Amit

0 Genatco over 10 years ago in reply to Amit Ashara

Guru 55913 points

But interrupt vector 56 is the EMAC Interrupt.

So everyone was convinced but seemingly not true at all and is not the EMAC peripheral IRQ. If IRQ56 was a peripheral then by not enabling it in tiva-tm4c129.c and disabling the SW trigger, each should halt the DMA engines operation but it doesn't.

0 Amit Ashara over 10 years ago in reply to Genatco

TI__Guru**** 244400 points

Hello BP101,

The SW trigger as I mentioned earlier is used to Prime an action. I believe so this was done here as well to be able to get the communication started. The simplest method would be to check the INT EN bits in the NVIC that should be viewable through the CCS Register Browser.

Regards
Amit

0 Genatco over 10 years ago in reply to Amit Ashara

Guru 55913 points

Hi Amit,
"The interrupt is edge sensitive and not level sensitive"

Agree, was referring to SW trigger pulse seems to randomly switch to level mode when it would incorrectly set the disabled & clear IRQ56/40 to a pending state, halting the LP until stopping debug then it came back from the dead. Something is not being liked by DMA engine in that IRQ trigger function to read EMACDMARIS since we see the same condition outside of CCS debug with SW trigger. Bear in mind we have 2 clients on the TCP stack providing some loading on the PHY/EMAC.

0 Genatco over 10 years ago in reply to Amit Ashara

Guru 55913 points

Hi Amit,

The SW trigger as I mentioned earlier is used to Prime an action.

Better check that theory twice since LWIP must service the EMAC in the same interrupt context to keep from being re-entered. Perhaps that is exactly what was happening to cause major issues at faster access times. Not always clearing the MARIS bits early in the handler seems to be a mistake. Adding secondary testing on the IRQ handler to qualify the MARIS interrupt condition does appear to randomly be leaving bits set for the next cycle to trip upon when frame traffic picks up.

0 Genatco over 10 years ago in reply to Amit Ashara

Guru 55913 points

Hi Amit,

"The SW trigger as I mentioned earlier is used to Prime an action"

Sorry for beating this horse to death but EMAC0/DMA engine is not behaving in accordance with datasheet disclosure circuit analysis of Ethernet peripheral. We have disabled the EMAC0 IRQ56 and removed the priority being the LWIP timer call is now polling the original interrupt handler assigned to IRQ56.

That being said it appears the EMAC0 believes IRQ56/40 is at first enabled and as noted earlier post magically pends the disabled IRQ56/40 one time and un-pends IRQ56/40 one time only.

Perhaps this is the undocumented EMAC0 interrupt trigger you are referring to get the process started but that would make it HW trigger event in that case.

Please advise why we see IRQ56/40 which is not enabled and not explicitly disabled set the IRQ pending?

0 Amit Ashara over 10 years ago in reply to Genatco

TI__Guru**** 244400 points

Hello BP101,

Note that the upper bits in the EMACDMARIS are not maskable. Could the status be originating from there?

Regards
Amit

0 Genatco over 10 years ago in reply to Amit Ashara

Guru 55913 points

Hi Amit,

Don't believe that is even possible. Review datasheet several times no text even implies IRQ56 is raised with or for interrupt sets inside EMACDMARIS. Oddly IRQ56 stays in set & clear pending indefinitely as shown above post.

IRQ 56 at this point is seemingly a hardware IRQ, originating from where and why is the question?

Point is IRQ56 interferes with default SW trigger IRQ56 when SW IRQ56 has been configured.

Far as DMARIS all upper bits 31:17 are completely masked out.

EMACIntStatus(uint32_t ui32Base, bool bMasked)
{
~~~~
    //
    // Get the unmasked (false) interrupt status 16:0 and clear 31:17 status fields.
    //
    ui32Val = HWREG(ui32Base + EMAC_O_DMARIS);
    ui32Val &= ~(EMAC_NON_MASKED_INTS | EMAC_DMARIS_AE_M |
    		                            EMAC_DMARIS_TS_M | EMAC_DMARIS_RS_M);

0 Amit Ashara over 10 years ago in reply to Genatco

TI__Guru**** 244400 points

Hello BP101

OK. Only TT and PMT are the source of the interrupts in EMACDMARIS outside EMACDMAIM. When you start the code, do you use System Reset to ensure that the entire device is reset? Also can you try to isolate the time frame or code location where the IRQ56 gets pend?

Regards
Amit

0 Genatco over 10 years ago in reply to Amit Ashara

Guru 55913 points

Hi Amit,

Actually TT, PMT, MMC are all inside EMAC_NON_MASKED_INTS - those 3 were left out of the original EMACIntStatus(). Who knows it they were self setting some how. Really did not see need to have all 31:17 status confusing the basic TX/RX frames, Kissing it at this point.

Still unclear if EMAC peripheral logic asserts IRQ56 since figure 20-1 and EMAC/DMA text do not make that point clear?

All we know for certain is the IRQ assignment map suggest EMAC0 IQ56/40 and NVIC confirms we can configure and use it for SW IRQ56 with proper RW priority RA bits = (0x011) or by other means. Do not like nor trust the results of SW IRQ56 to call the Ethernet interrupt handler. The SW trigger more frequently and randomly causes exception (11). Bus error seems to be cause related to faults that occur anywhere in the instruction decode cycle of peripheral RW access times. Possibly the application code moves faster with a SW trigger reads of EMACDMARIS status than it does by polling is one thought.

The MAC/PHY peripherals are 1st reset, MAC reset waits for DMA fully reset, then EMACInit() and lastly EmacConfigSet().

// Wait for EMAC0 DMA to signal it has soft reset.
// MACDMABUSMOD SWR[1] toggles 0 prior to initializing DMA.
while(HWREG(EMAC0_BASE + EMAC_O_DMABUSMOD) & EMAC_DMABUSMOD_SWR)
{
}

BTW:

Thanks for assistance and why not try this out with qs_iot to confirm IRQ56 behavior.

The NIS/AIS interrupts were not enable in the vanilla code as they are in post above.

0 Genatco over 10 years ago in reply to Amit Ashara

Guru 55913 points

CCS5.4 debug simulator may be part to blame in strange DMARIS -IM registers reading - not the first strange debug post ever captured.

Have to LOL what we see here TU & TI stopped updating these bits in simulator, complete nonsense being EMAC is transmitting just fine at this time.

0 Genatco over 10 years ago in reply to Genatco

Guru 55913 points

Ops our bad as we never enabled the TU bit interrupt in EMACDMAIM in above post. Point is we only use the NIS bit to flag any of the OR'd individual enabled interrupts as they are set in the IM. So we don't mask or flag all the OR'd interrupts of NIS, AIS if any single interrupt is enabled as it was programmed in the Tivaware library.

That view point original EMACDMAIM set bits logic may work for awhile but seemingly is backwards at the program level and at heavy load high speed faults (11) like a mad man with hatchet.

0 Genatco over 10 years ago in reply to Genatco

Guru 55913 points

TI and TU are happening so often the debug DMARIS bits set 1 above post were only changing 0 but for a brief second. That is until the Telnet client connection has been closed. Then TI, TU bits become more cyclic in a near 50:50 duty cycle.

TCP Nagle algorithm disabled for the high speed Telnet client is causing a high TX rate, 25mbytes/second. After many hours of sustained operation the TI, TU bits appear to stay set 1 frozen as if all TX descriptors have been used in an never ending chain ring.

Likely TI, TU are changing so fast in CCS debug the simulator refresh rate is ignoring any time under 1 second for rapid register updates.

A preventative measure might include increasing TX descriptors from 8 to 16 as not to cause a fault TU - MEM_ERR, questionably raising exception (11)?

0 Amit Ashara over 10 years ago in reply to Genatco

TI__Guru**** 244400 points

Hello BP101

The debugger refresh is awfully slow since all data has to be read and serialized and then repacked on PC.

Regards
Amit

0 Genatco over 10 years ago in reply to Amit Ashara

Guru 55913 points

Hi Amit,

Agree that might be what was happening at that point in debug time since TU interrupt was not masked in DMAIM.

Early on posts had claimed to set TU, TX_Stopped mask bits DMAIM enable and added TU flag into the interrupt handler (tiva-tm4c129.c) and later removed both during trouble shooting. Both interrupts were not originally configured, after seeing the debug behavior OR's them with existing TI flag.

After re-adding both the TU debug bit stated to cycle more often. More importantly now [tivaif_process_transmit(tivaif);] asserts not only with TI but also TU, TX_Stopped time periods. TU might not seem so important but that also helps to mitigate the raising exception (11) during high speed TX data controller access. Unplugging Ethernet cable is good stress test of TX_Stopped interrupt. The good thing is IOT time since reset count picks up where it left off (reconnected) or until the server dumps the http connection keep alive. Watch dog timer1 starts to complain capturing the Ethernet link event moderating the loop. WDT1 not doing much at this point yet realize he could be used to read the DMAMARIS 31:17 bits status and invoke an Ai like action based on reported EMAC conditions..

Seems EMAC INT56 was being used merely as a SW trigger source and issues in SW interrupt priority queuing appear to raise exception (11).

Polling the EMACDMARIS register has stopped triggering random exception (11) events at various points in the IOT application. That one random event was most difficult to find cause against all odds.

0 Amit Ashara over 10 years ago in reply to Genatco

TI__Guru**** 244400 points

Hello BP101

Again, the polling should not be the cause of a exception (if that is a bus fault status being referred to). If indeed it is the case, I will rate it a serious issue.

Regards
Amit

0 Genatco over 10 years ago in reply to Amit Ashara

Guru 55913 points

Hi Amit,
"the polling should not be the cause of a exception" Polling DMARIS helps to prevent exception 11. INT56 when used for a SW trigger into the same interrupt handler may have an errata with any priority level assignment, some levels are worse than others. Confirmed that SW triggers occurring at IRQ priority level 0x10 versus 0x0a causes more random and frequent bus faults.

Polling the DMARIS register appears to be far more stable with a 2.5us(5) or 12.5us LWIP timer interval with only one priory level versus two. Issue seems to be related to access timing on MPU local bus. The tendency for raising exception appears to be linked directly to speed. TI definitely should check this issue out in the LAB.

0 Genatco over 10 years ago in reply to Amit Ashara

Guru 55913 points

Hi Amit,

Interesting discovery there all along; DMAIM ETE masked bit is not enabled and RU seems to be reversed with ETI bit in debug. Setting RX descriptors to 4 as to see if RU would toggle more often in CCS debug makes no difference other than posting more frequent code 99525. The AIS UARTprintf() reports code 99525 so the RU bit is actually asserting but not reporting proper debug bit position. Look at above posts the ETI set was assumed to be normal but on review is quite abnormal. : -)

0 Genatco over 10 years ago in reply to Amit Ashara

Guru 55913 points

Hi Amit,

Further discovery finds [RI] bit is mostly being set/cleared in debug when the high speed Telnet client is running. Below: DMARIS [RI] should be set at the end of every received (complete) frame. Failure of (slower (IOT) client) [RI] being set in CCS debug to coincide with RX descriptors or near the same interval of http 204 (113 bytes) status messages returned by IOT server to indicate a (frame reception is complete). Should not [RI] be toggling when ever the IOT client receive FIFO has a complete frame?

Tried adding disable interrupt on complete [31] of RDES1 flag into the chain testing last RX descriptor seemingly has no effect on the RI bit in MARIS. That infers RI must not be disabled as the reason. Not sure can trust CCS debug since ETI is also set only when the high speed Telnet client runs but not for the low speed IOT client. Possibly the CCS debug or DAP is not responding to slower DMA transfer rates but does for faster rates ? Very confusing simulator results input from the DAP CCS 5.4.

Example: g_pRxDescriptors[ui32Loop].Desc.ui32CtrlStatus = DES0_RX_CTRL_OWN | DES1_RX_CTRL_DISABLE_INT;

[x:6] RI RW1C 0x0 Receive Interrupt

Value Description

0 = No frame reception complete event has occurred.
1 = A frame reception is complete. When reception is complete,
bit[31] of RDES1 (disable interrupt on completion) is reset in
the last descriptor, and the specific frame status information is
updated in the descriptor. The reception remains in the Running
state.

This bit is cleared by writing a 1 to it.

0 Genatco over 10 years ago in reply to Genatco

Guru 55913 points

Ok RI[1] does change in time but seems far to slow for the amount of frames being received @113 bytes. TCP_MSS = 600 bytes so every 5-6th http 204 should raise RI[1] indicating the end of a receive frame. If RI is not raised at each frame end the RX descriptors will run out raising RU[7] at some later point in time more often leading to exception 11 data bus error. We are seeing exactly that scenario, by the time RU[7] has raised, the lack of PBUFs can not be corrected at the abstraction layer. Slowing down the entire process adding delays etc.. results only to buy more time that the bug inevitably drops the payload.

Far worse is ETI[10] and ERI[14] are reversed in the asserting of NIS[17] and ETE/ERE not being masked DMAIM should not be set at all.

Possibly the data sheet incorrectly lists OR summary AIS: asserts ETI[10] and NIS: asserts ERI[14] as debug is showing opposite firing.

0 Amit Ashara over 10 years ago in reply to Genatco

TI__Guru**** 244400 points

Hello BP101

Would this be a debugger artifact of reading too slow compared to the HW operation?

Regards
Amit

0 Genatco over 10 years ago in reply to Amit Ashara

Guru 55913 points

Hi Amit,

Wishful thinking may help the S&P500 but not such hardware artifacts. PHY link ever randomly dropped and returned even while under watchdog interrupt constraints is most unforgiving, 95% of link disconnect tests simply end up locking the EMAC at some point in transmit.

Real world scenarios when the power on the switch suddenly drops and PHY link interrupt then signals linkup to LWIP low level netif. The EMAC runs for a bit then throws up chunks of salsa and stops processing any frames. Huston we got issues in this big MAC.

Soft resetting the DMA has no effect to recover TX/RX controller when what ever locks up in the EMAC goes into oblivion.

This capture 70k seconds IOT online after the link to switch dropped, returned a few seconds later. AIS seems to be the dream killer causing RU.

WatchdogIntHandler(void)
{

		/* Check if the interrupt belongs to Watchdog-0*/
		if(ROM_WatchdogIntStatus(WATCHDOG0_BASE, true))
		{
			/* Clear watchdog 0 interrupt */
			ROM_WatchdogIntClear(WATCHDOG0_BASE);
			//
			// If the motor is in the running state, fault when this occurs,
			// otherwise, do nothing.
			//
			if(g_ulState & STATE_FLAG_RUN)

			{
				//
				// Indicate an watch dog fault.
				//
				MainSetFault(FAULT_WATCHDOG);

				MainEmergencyStop();

			}
		}
		else /* Check if the interrupt belongs to Watchdog-1*/
			if(ROM_WatchdogIntStatus(WATCHDOG1_BASE, true))
		     {
			  /* Clear watchdog 1 interrupt */
		            ROM_WatchdogIntClear(WATCHDOG1_BASE);

		        UARTprintf("<< Watchdog-1 Timeouts Cleared >> \r\n");

		        /* See if there is an active link. Query the
		         * PHY basic mode status REG73:MR1(EPYBMSR) */
		         bHaveLink = MAP_EMACPHYRead(EMAC0_BASE, 0, EPHY_BMSR) & EPHY_BMSR_LINKSTAT;

		        if(bHaveLink == !EPHY_BMSR_LINKSTAT)
		        {
		            UARTprintf("<< Watchdog-1 No Valid IP Address >>\n\n");

		            bHaveLink = false;

				       /* Set back the timestamp of the last call to sys_check_timeouts()
				        * This is necessary if sys_check_timeouts() hasn't been called for
				        * a long time (While saving energy, waiting for a valid link)
				        * to prevent all timer functions of that period being called.  */
				        sys_restart_timeouts();

			                /* Close the socket reset connection values and buffers */
				        exoHAL_SocketClose(0);

				        /* Temporarilly disable the IOT SW trigger interrupt
				         * vector call to Print All Data */
				         IntDisable(54);

		        }
		        else
		        {
		        	UARTprintf("<< Watchdog-1 Valid IP Address >>\n\n");

                           /* Inform the Application we now have a link */
		            bHaveLink = true;

		           /* Punch 1 times the dog this long delay. */
		            ExoHALPunchDog(1);

                           /* Delay so we don't panic the NVIC or
                            * EMAC as the link becomes stable */
                            SysCtlDelay(g_ui32SysClock * 1);

		           /* Re-enable the IOT SW trigger interrupt
		            * vector call to Print All Data */
		            IntEnable(54);

			   /* Software trigger IOT Print All Data */
			    HWREG(NVIC_SW_TRIG) = 54 - 16;

	        }
      	}
}

0 Amit Ashara over 10 years ago in reply to Genatco

TI__Guru**** 244400 points

Hello BP101,

Could it be freed up memory not available to the main pool?

Regards
Amit

0 Genatco over 10 years ago in reply to Amit Ashara

Guru 55913 points

Hi Amit,

That is what the PHY at a HW level in conjunction with the EMAC and abstraction layer is supposed to take care of. When the physical link is lost the MAC should drop the packets flushing the FIFO but it seems the TX/RX controller keeps processing the last buffered frames regardless of the AIS status. Hence we added the watchdog timer to disable the Print All Data (tStats) function as not to flood the FIFO when LWIP is not processing PBUF's when the link is down.

Added a power switch to the Ethernet switch today and hard wired the power pack to it and the PCB. So we can now switch it off then see what happens when the link is randomly dropped and returned while the EMAC is working. 95% failure rate are not good results.

Time since reset reaching 70k-225k seconds online often end in fault or just stop processing. CCS debug pause mostly ends up in LWIP TIME_WAIT while loop [if (PBUF !=NULL)]. Reason for adding the LWIP sys_restart_timeouts call in the watchdog,

edit

Arm-based microcontrollers

Arm-based microcontrollers forum

Driver.lib 2.1.1.71 DK versus EK 1294 - EMACDMAIM REG59 and EMACDMARIS REG57 are mirror MCU designs OR !