This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

RTOS/EK-TM4C1294XL: MCU's are burning out like XMAS lights

Guru 54027 points
Part Number: EK-TM4C1294XL

Tool/software: TI-RTOS

Launch pad MCU may randomly go to 85% CPU usage, hit 75*C or more yet may continue to run the application with a temperature fault (unattended). Other launch pads might allow POR but jump to 85% CPU usage slowly climb to 75*C until unplugged from USB power yet run the application.  Launch pads like the first one described after cool down will pull down +3v3 LDO and PWRLED won't light. Those MCU have a very low ohmic reading (42ohms) JP2 pulled. Good MCU JP2 pulled has ohmic reading 900ohm-1k.

No body touched the launch pad as it went into melt down. Oddly two MCU that burned out occurred shortly after RTOS kernel had been installed and randomly crashing application. They both were recently flashed back to the basic application when the MCU few days later did the XMAS light burn out thing.

  • During the past week (perhaps even longer) the (near) record (and sustained) cold temperatures have drastically reduced the Humidity (inside) our nicely heated (US Midwest) homes/offices.      As such - despite your "disclaimer" (didn't do nothing) - ESD "hogs the Police Line-UP" as, "Prime Suspect."     Your town's news headline, "Multiple inanimate objects (LPads) - "Throw themselves" - from tall cliff" - proves impossible to miss!

    At home - I've noted during (incidental/slight) "dog-cat" contact  - "shocks occur!"     (never past noted!)    At the office - if we don't "strap in" (our ESD wrist-bands) - "NUM Lock" (on the keyboard) goes out.

    I would weigh "ESD" as far more likely - than (any) RTOS as "cause agent!"        Do confirm date/time for MCU's wake.     Flowers (via FTD) are, "On the wing!"    (Weeping Willow appears (most) appropriate...)

    And - should a "dim" (even unlit) tree - next Xmas - prove too depressing ... might you switch to "candles" (long (past) employed) - and likely to survive (even) the "extreme" ESD  - you (aka: nobody) regularly generate yet disclaim.         For those interested in their (very own) "ESD Resistant" Candles:        https://e2e.ti.com/group/helpcentral/f/301/t/653117

  • Honeywell bulk furnace humidifier placed airflow out keeps all room nicely 36% relative humidity. Water drops forming on glass inside windows does tend to keep static zapping at bay. Yet the other launch pad similarly failed reported this forum last summer and near the same RTOS flashing and later removal of RTOS. No great mystery at that time amid a few other forum LP failures gone unanswered meantime.

    The odd thing is BOTH afflicted launch pads (same lot) MCU internal temp sensor was detecting 52-54*C, 10-11% CPU usage (NoRTOS). Earlier lots report 45-48*C temperature range 24% CPU usage(NoRTOS). 

    Notice CPU usage dropped in 54*C  MCU reporting 10-11% indicates the PLL was perhaps running much faster than 480mHz, SYSCLK was well above120Mhz. So RTOS Clock_tick was running well above 1000us and aborting tasks with (if) directive.

    Quote of the day; "Fool me once shame on you, fool me twice shame on me."  

    Seemingly these MCU silicon (RA2) were marginal to being with and loading RTOS somehow pushed them over the edge.

    Anyone up for MCU extraction replacement party, BYORP "Bring your own reflow paste" LOL.

  • Long ago - famed "Tech Cookbook author" Don Lancaster advised - "Blame yourself FIRST - blame the IC LAST!"    Firm/I have LONG found this TRUE!

    Should you (really) seek to monitor MCU's temperature - via a (more correct, "MCU independent means") - a small, temperature sensor may be "thermally bonded" atop the MCU. (and monitored via separate means)

    As you know - it proves (always "unwise") to, "Employ the DUT* to make/provide its (own) - always suspect, measurements.

    * DUT (Device Under Test)

    Might your use of the "DUT" to "make & report" key findings, echo, "He who serves as his (own) attorney - has a "fool" for a client?"

  • Had expected to give green stamps to all PLL overclocking enthusiasts.

    Two in the hand is better than 2 in the burning bush. Never could get CPU% to match between all. Total 4 launch pads (RA2 silicon); 2 CPU% behaved exactly alike after POR and other 2 PLL over clocking right out of the static bag/box. Those 2 had 10*C higher MCU temperature double CPU%. Hind sight SYSCLK makes sense if PLL register indicated MOSC/2 but PLL was not actually dividing as indicated in the SYSCTL register.

    Question is perhaps one of PLL divisor (errata SYSCTL #22) not being completely the same between all RA2 MCU's. TI Lisa's customer this forum TM4C1294ENC RA2 silicon may have been reporting the same issue but was more intermittent.
  • I'd "pray" - but (somehow) - have "forgotten the words."     Mr. Lancaster's - (very) correct "Tech" guideline - likely warrants further (i.e. some) consideration...

    BTW - my Xmas Tree lights (those not (yet) destroyed by shelter dog/kat) - continue to "glow brightly" - apparently "immune" to BP misfortune...

  • More seriously how or does TI have any plans to deal with errata SYSCTL #22, at the silicon level? Otherwise after connecting the seriously overclocking dots starting to wonder how external 120Mhz OSC and bypass PLL can produce 25Mhz for EMAC0 use.

    The one MCU launch pad put back in box last July still works. Seemingly PLL will no longer divide (/2) as MCU temp quickly reaches 60*C thus requires a heat sink to keep it from thermal runaway. If that don't describe an overclocking PLL my XMAS tree lights (all 700) don't require a wall switch dimmer either.

  • Hi BP101,
    I am concerned with your findings. Use of the updated function SysCtlClockFreqSet() from TivaWare instead of ROM_SysCtlClockFreqSet() "should" have resolved erratum SYSCTL #22. I have not heard of reports of "over clocking" once the new function is used, but do have a valid case of "under clocking". Do you have any direct evidence of the actual system clock speed, such as UART, TIMER, PMW or CAN operating at the incorrect frequency? Or do you output a divided version of the system clock? Sorry, but it would be very helpful to know if the system is running at 240MHz instead of the expected 120MHz, or if it is running at some other frequency.
  • Hi Bob,

    Think it was Todd or another TI engineer this very forum recently stated SYSCTRL#22 is not consistent across all RA2 MCU. The MOSC function calls effect Tivaware 2.1.2 earlier, yet later libraries still seem to effect some but not all RA2 MCU. Check again SYSCTRL#22 is library version specific errata not that ROM loading effects PLL.

    Earlier RTOS version prior to 2.16.1.14 PLL was not being divided by 2 yet configured in RTOS to do so. Until adding a second SysCtlClockFreqSet() in (main.c) did the PLL divider actually get programmed prior to Bios_start directive. Besides the application has MAP_SysCtlClockFreqSet() in both RTOS and no RTOS projects.

    These two RA2 launch pads seemingly were running random throttled PLL out of the box, PLL seems to loose lock then relocks synchronously. The MCU temperature was typically 10*C or more above the other two launch pads (purchased 8 months apart) all 4 launch pads RA2 silicon. The (identifiable) trouble with both MCU started after running RTOS analyzer CPU load monitor and seeing the CPU load bounce 20%-90% build meandering load graphs eventually Bus faulting precise 0x3. Confirmed debug register SYSCTL PLL/2 was indeed programmed. Suspect the RA2 silicon in certain MCU PLL becomes more damaged after loading RTOS and reprogramming the MOSC before the BIOS start command is asserted.

  • Bob Crosby said:
    have not heard of reports of "over clocking" once the new function is used, but do have a valid case of "under clocking".

    If VCO is set 480Mhz and PLL is not being divided by 4  after invoking SysCtlClkFreqSet() do we not end up with 240Mhz SYSCLK? 

    The odd thing is after flashing RTOS and setting SysCtlClkFreqSet() in (main.c) checked debug register = PLL/2, in hind sight would that not produce 240Mhz SYSCLK? Became aware over clocking MCU only after loading RTOS debug ROV indicated PLL was not being enabled. Yet debug register PLL was /2 seemed ok since the application configured MOSC (main.c), not RTOS. I only become aware the PLL was not being enabled in RTOS then noticed PLL/2 not evaluating it should be PLL/4.

  • BP101 said:
    ROV indicated PLL was not being enabled. Yet debug register PLL was /2 seemed ok

    Really?    Look again - is not your own, "Cut/Paste" indicating   "PWM/2" ... NOT "PLL/2?"     (Might the "dimmed lighting" - due to ESD impacted Xmas Lights - have dulled your visual acuity?)

    There has "long" been an eased & suitable means to employ an MCU Timer to output a clearly, "Known & reduced frequency output replica of system clock" - which (I'd bet the farm) would dissolve any, "Cry of PLL Over-Clocking" - which emanates from "so few."      (maybe just one ...)

  • Hi BP101,
    There is a divide by 2 coming out of the PLL. Unfortunately the documentation does not make this clear. It talks about a VCO frequency of 480MHz, but then f(vco) is defined as 240MHz. Then the PSYSDIV field of the RSCLKCFG register divides this down to make the system clock.
    f(sys) = f(vco)/(PSYSDIV+1)
    120MHz = 240MHz / (1+1)

    The issue that I am concerned about is that the PSYSDIV field is really just a copy of the register that does the divide. To avoid clock glitches with registers on different clock domains, a write to the PSYSDIV bits generates a load pulse to the real internal divider if, and only if, the value in PSYSDIV has changed. I have verified that is some cases a write of 1 to the PSYSDIV field (to create divide by 2) did not cause the real divider to be changed. In the case I have recreated, the effective divider was much larger (divide by 16 if I remember correctly). The issue happened roughly once out of every 3000 power cycles. The workaround is to write PSYSDIV to 2 (divide by 3) then to 1 (divide by 2). With that workaround a known failing device has been power cycling once per second for the last three weeks without coming up at the wrong frequency.

    I am wondering if you are seeing the same issue. Even though your code sets PSYSDIV to 1 (divide by 2), the internal divider is really still in divide by1. That would leave the system clock running at 240MHz. At that speed, CPU operation will become erratic as instructions will be missed or fetched incorrectly from flash.

  • Bob Crosby said:
    f(sys) = f(vco)/(PSYSDIV+1)
    120MHz = 240MHz(1+1)

    One suspects that (bottom) line (intended):   120MHz = 240MHz / (1+1)

  • Yes, thank you. I have corrected the previous post.
  • Good that - until poster "BP" settles upon "PWM/2" (as he presented) or "PLL/2" (claimed - yet unlikely/little in evidence) - it proves wise to define System Clock properly...
  • Why does BP101 have so many odd problems?

  • Cb1 raises hand - waves rigorously, "Dave, Dave ... I know!"       (cb1's special:  "Forum decoder ring" reveals,  "Source Two"  to decode to, "Dave.")

    BP's failure to "stockpile" the banned - yet soothing  "incandescent glow"  -  likely "blurs and/or colors" his (uncertain) PLL vs. PWM findings...

    Note too - "Rudolph" recognized - and (properly) refused to land/unload - upon BP's "ESD Beacon!"    (i.e. BP's roof ... while highly sensitive, "Electronics on (Sleigh)-Board!")

  • Hi Bob,

    Ok will review and we are using Tivaware 2.1.1.71 driver library to configure MOSC functions. That RTOS configuration page is bit misleading.

    It appears you are correct above post: Table 5-7 25Mhz XTAL shows 480Mhz VCO (N=0X4), (Fig5-5) VCO/N so 240/2 =120Mhz SYSCLK if PLL/2. Table 5-7 (N=0x4) is incorrect if the PLL is already divided by 2 in the main clock tree. seem to recall N is a register HEX value for the actual numerical divisor.
     
    What it seems I'm also witnessing in both application realms is the PLL seems to randomly loose phase lock as the application is running. Cyclic sudden 2X speed burst in CPU usage, PWMCLK ticks, then SYSCLK goes back to normal speed. The PLL seems to behave like that even in several RA2 MCU that run a typical expected temperatures relative to ambient room. RTOS Analyzer CPU load monitor displays repeating cyclic line graphs 20%-90% and perhaps points to unexplained PWM burst captures I've posted in this forum. 
    The (main.c) code is not re-entrant best of my knowledge MOSC is not being reconfigured in cycles. Else the ARM memory address pointer is jumping 0x0 regardless of what embedded code is written to stop reentrance from occurring.

  • I have recorded time line IOT server of the temperature rise over several hours after POR. That said Bob's explanation seems to fit the (consistent) 2X CPU usage display of those two Lpads.

    Look up one line on copy paste, PLL/2.5 is actually SYSCLK in the Advanced section of that page. Shows PLL/2.5 confused me to think divisor should be 4 not 2.5 yet debug register was PLL/2 when all seemingly running well.

  • Hi Bob,

    In addition to your great detective work with PHYSDIV bit division there may be others in a rapid fire PLL configuration. 

    Looking at the SysCtlClokFreqSet() a potential exists for PLL to jump out of lock status. Code is missing test for PLL lock state of any length of time to confirm remains in lock status. The code assumes PLL entered a locked status invoking a one pass register read(SysCtl_PllStat_Lock) = 1 in a FOR timeout break loop. Talk about believing in miracles, perhaps explains why the PLL might actually be phasing in and out of lock status.

    Another code gotcha is expecting (!=) logical (NOT) to compare an HWREG binary value against a know value, often ends up in incorrect match especially with (&) involved. The WA in that case is to use (&~) a Bitwise (NOT) against the register binary read value. That might be a C++ issue but it has proven to fix code that was perhaps working marginally or at all.

    Other suspicious code:

            // If there were no changes to the PLL do not force the PLL to lock by
            // writing the PLL settings.     
            if((HWREG(SYSCTL_PLLFREQ1) !=
                g_pppui32XTALtoVCO[i32VCOIdx][i32XtalIdx][1]) ||
               (HWREG(SYSCTL_PLLFREQ0) !=
                (g_pppui32XTALtoVCO[i32VCOIdx][i32XtalIdx][0] |
                 SYSCTL_PLLFREQ0_PLLPWR)))
            {
                bNewPLL = true;
            }
            else
            {
                bNewPLL = false;
            }
  • PLL test above compiles with (&~(xxxx)) versus (!=xxx) , the test now (AND's) the HWREG value returned with the indirect array value laid out.