This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

How to debug an NVIC_FAULT_ADDR at 0x2004 0000

Hello.

Recently we made some changes in our program and it happenned at the same time than a change in our developing/compiling computer.

Since then, every 25k cycles of our program (one cycle is around 13 ms), we get a FaultISR. When we take a look at the registers we get:

  •  NVIC_FAULT_STAT  as 0x0000.8200
  • NVIC_FAULT_ADDR  as 0x2004.0000, being mapped as: Bit-banded on-chip SRAM. 

How can we debug what is causing this issue? Is there any  recommended procedure?

EDITED: It seems to be  related with memoory and more prone to appear using EMAC Ethernet and lwip, even with no connections. If we disable them  the program runs without issues.

Thank you and Regards.

  • PAk said:
    the program runs without issues.

    Careful here - could not the same be said at the 24K99 cycle mark - when those assumed, "issue causers" are present?

    Our tech firm has encountered not necesssarily Fault_ISR but other program "deviations" - some not showing up for 8 days!   (And "dead repeatable" at that same 8 day mark.)

    The fact that you've, "not yet noted/detected" a fault does not provide, "proof positive" that, "all is well."   As a relatively famed (and recent) example - giant automaker swore in court that their vehicle was not, "Guilty as charged."    And presented, "acres of test data" in support of their assertion.    Yet - with the courtroom "packed" - an expert witness convincingly demonstrated (for all to see) that a, "rare" (yet quite possible) sequence of events would indeed cause such fault - each & every time that sequence occurred!   

    The judge, jury - even automaker's attorneys, "gasped" - so powerful was this courtroom demonstration!   (case was lost - right then - right there!)

    Jury then (quickly) found in favor of the plaintiff (of course) - defendant automaker paid a huge fine.  (even though they had "tested and tested!")

    Just because your test is (relatively) long (by your small group standards) - and some aberrant cause-sequence has not (yet) been generated - may not "prove" you (or a famed - yet poorer - automaker) are, "fault-free."

    As you/I "talk" you know that I'm, "on your side" - my goal here is to alert you to the supreme difficulty in (positively) assuring that software runs, "fault & error free!"    Usually - but not always - the SW test design does not exercise every possible combination of events - at the critical rates - and multiple "bugs" may then arrive - perhaps "to spoil" your day...

  • Ok, let's just say then that it happens at 25K mark with Ethernet+lwip enabled, and maybe it is something in our program.

    How could we debug this FaultISR?

    Why EMAC makes it more prone to appear?

    Regards

  • PAk said:
    How could we debug this FaultISR?

    Both this vendor - and ARM - have provided fairly extensive, "Fault Debug Guides."     And - as always - vendor's Amit is, "off the chart" skilled, motivated, helpful!

    I note that our firm has - on multiple occasions - noted that the "bypass" of certain functions has (appeared) to resolve an issue.   Yet - when the fault was finally (and conclusively) identified - it was not always confined to the "bypassed" function!    Functions - their sequencing - and their "draw" upon MCU resources - may combine to produce "fault opportunities" - difficult to detect - and a "bear" to finally find & resolve.

    Proper - thorough - intensive/methodical, "Program Design" surely trumps the labeling of programs as, "working" - when in fact the fault(s) have (yet) to be noticed... 

    All the, "testing in the world" cannot, "qualify & pronounce" an inadequately designed & implemented program as, "working!"    There is no substitute for strict, skilled, "best practice" program design.   (implemented & maintained from project's beginning...)

  • Hello PAk,

    Which TM4C12x device is this?

    Regards
    Amit
  • cb1- said:
    Both this vendor - and ARM - have provided fairly extensive, "Fault Debug Guides." 

    And that guide is extremely helpful (the first thing I check in a case like this) , but it doesn't  reflect all the situations like this one. Maybe a little more could be even easier for a newbie...the stepping curve is relate to guides like that one.

    Finally, we have found the issue, a limit reached (and beyond) on the index of the ethernet packets to send....interesting how the FaultISR was pointing to the SRAM in our TM4c129X.

    Thank you cb1, I really thank your attitude and service!!

  • Thank you Pak for your, "caring enough to close this loop." Perhaps your "sharing" (if deemed proper/appropriate) of that "index limit" may assist others in avoiding such issues...

    Yet - dare I say - I (continue) to believe you should (really) involve Amit.    Unless that "index limit" involves some "trivial roll-over" - I'm fearful that the "fix" may be illusory...    My firm's sense (always) better "really sure/safe" than (later) sorry!