This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

RM48L952: RM48L952 HET Timer pin registers are getting corrupted

Part Number: RM48L952
Other Parts Discussed in Thread: HALCOGEN

Tool/software:

We are using RM48L952DPGET in our product. We have an Incremental Encoder (2 Ch from each left and right wheels) connected to the HET pins of the Micro. Actually the raw differential inputs are connected to the MAX33076 which then converts the differential inputs to Quadrature signals which is directly then fed to the HET pins of the RM48L952. Specifically they are directly connected to Pin 38 (NHET1[06]), Pin 40 (NHET1[19] for left Encoder inputs and for right encoder inputs are connected to Pin 41(NHET1[15] and Pin 54(NHET1[11]. 

The function of the HET pins are to count the edges of the Encoder signals which then we are reading from the Register that it stores the the value in to see the Encoder Ticks. 

The Problem: We have recently received new batch of production boards, where after running the Robot from about 10min we are seeing that the Register which primarily stores the encoder ticks from Right encoder Latches to a random value or latches to a max 25bit value. When we turn the Wheel forward the count increases from max value to 0 and then 1 and then does not change. When we spin the Right wheel in reverse direction, the Right side encoder ticks does decrease but as soon as the wheel stops spinning, it again latches to a random value. In most of the cases the left wheel Encoder ticks are recorded fine. 

When we reset the power (restart the Robot) it again runs fine for 10 min and after that the same behavioure is observed again. In that first 10min, both Left and Right Encoder ticks are recorded correctly. 

We have used the same revision of boards and same firmware in the past without any issues. This issues is somehow related to the new batch of boards we received. On a faulty board I replaced the Microcontroller and the issue was then not seen again and we were able to run the Robot fine. We have verified that the Encoder signals from Encoder to the input of the micro pins are correct. Just the fact that, after replacing the Micro and in some cases replacing the board from previous batch resolves the issue, directly points fingers to the Micro. 

We are using a default HET code for initializing the HET module and to do the pules counting. We are only reading data that is then stored in the Registers as mentioned above. Since the outputs from MAX33076 are directly fed to the above mentioned micro pin and in our default setup the HET pins are neither pulled low or high, we also tried pulling the HET pins to internally pulled low and high and the results were same where the failure occurs in 10min. 

What could be wrong with the Micro here? Is this a Hardware failure internal to the Micro or can we do any HET pin configuration change to resolve this? As I mentioned earlier, we have the same hw and FW working for last 1 year in the field without any issues and now we are seeing this on the new batch of production boards. Looking forward to get some help / insights here, 

  • Hi Udit Raval,

    There is no recent new silicon releases happened for this device, there should not be any changes in controller hardware. However, this is strange behavior in controller perspective.

    Is it possible to setup one live demo related to this issue?

    I am working in IST (Indian Standard Time) hours between 10AM to 8PM, can you please setup one meeting based on your availability.

    --
    Thanks & regards,
    Jagadish.

  • Hi Jagadish, 

    Can you please share your email address? I am not sure what is the best way to setup a meeting here?

    Regards,
    Udit Raval

  • Please accept friendship request so that we can chat in private chat window!

  • Hi Udit,

    Apologies for the delay and we are having internal discussion now on this and we will provide you an update soon.

    --
    Thanks & regards,
    Jagadish.

  • Hi Jagadish, Thank you. Please let me know as soon as possible on what you find. 

    Regards,
    Udit Raval

  • Hi Udit,

    I checked with our internal team on this issue.

    They confirmed that, we do not have any known production issues with the RM48L952 devices or the HET module. 

    For fault isolation we would recommend

    • Flash investigation
      • Dump the flash for the failing units, compare that it matches the expected flash image.  We have seen issues where a customer’s production flash programmer had an issue, and a full batch got an incorrect image.
      • As a related debug without re-work, you can attempt to re-flash a suspect RM48L952 to see if that resolves the error condition.
    • A-B-A swap
      • Replace a suspect RM48L952 onto a known working system.  If the system, then begins to fail then the fault can be isolated to the RM48L952.  If the error does not recur, then it is possible the fault was in the RM48L952 solder assembly.
    • Systematic inquiry
      • Is the failure onset at exactly the same time for every failing unit? 
      • Is the failure mode identical for all units?

    If the ABA swap follows the device and flash is eliminated as a root cause, then you can contact the TI Customer Support Center to initiate a return for analysis.

    --
    Thanks & regards,
    Jagadish.

  • Hi Jagadhish,

    we have verified that when I replaced a bad micro on a good working board, the fault followed the bad Micro and same fault was observed. Also we have verified that there is No issues with Flash on our end. 

    Now can you please let me know what is the next process since we determined that it is an issue with a Micro. 

    Regards,
    Udit Raval

  • Hi Udit,

    I want to suggest few other issues that could also cause the issue,

    When the pulse count value is "abruptly changing," it suggests a fundamental issue with either the input signal, the N2HET's operation, or how the count is being read by the CPU.

    1. External Signal Integrity Issues (Noise, Glitches):

      • Cause: The N2HET is highly sensitive to input signal quality. Noise, ringing, or glitches on the input pin can be interpreted as additional pulses, leading to inflated or erratic counts. If the signal momentarily drops out, it could cause a lower count or a reset if the N2HET program is designed to react to signal loss.
      • Documentation Relevance: While not explicitly in the provided snippets, this is a common issue for any edge-counting application. The "High Resolution Clock" and "Loop Resolution Clock" (Table 20-5) indicate the N2HET's sensitivity to timing, meaning even small glitches can be detected.
      • Troubleshooting:
        • Use an oscilloscope to examine the input signal directly at the microcontroller pin. Look for noise, bounces, or unexpected transitions.
        • Implement hardware filtering (RC filter) or use a Schmitt trigger input if the signal source is noisy.
        • Ensure proper grounding and shielding of the signal wire.
    2. N2HET Program Logic Errors (Counter Overflow/Reset):

      • Cause:
        • Counter Overflow: If the N2HET instruction used for counting (e.g., a CNT instruction or a custom counting logic) reaches its maximum value and wraps around, it could appear as an "abrupt change" (e.g., from max value to 0). If the CPU is reading a smaller portion of the counter than the N2HET is using, or if the CPU's variable type is too small, overflow can also occur on the CPU side.
        • Conditional Reset: The N2HET program might have a condition that inadvertently resets the counter. This could be due to an unexpected input, a timing issue, or a bug in the N2HET assembly code.
        • Race Conditions within N2HET: While less common, complex N2HET programs could have internal race conditions if not carefully designed, leading to incorrect updates.
      • Documentation Relevance: The N2HET is programmable (White Box configuration mentioned in Section 2), meaning custom logic can introduce bugs.
      • Troubleshooting:
        • Review the N2HET assembly code (if custom) or the HALCoGen configuration (if Black Box) for the edge counting logic.
        • Verify the size of the counter variable in the N2HET program and the CPU application. Ensure it can hold the expected maximum count.
        • Simulate the N2HET program or use a debugger to step through its execution (if possible) to observe the counter behavior.
    3. Cause: If the VCLK2 or the N2HET's internal prescalers (HR clock, Loop Resolution Clock, as per Section 20.2.3) are unstable or misconfigured, the N2HET's timing could be erratic, leading to missed edges or incorrect counting intervals.
      • Troubleshooting:
        • Verify the system clock (VCLK2) stability and frequency.
        • Double-check the N2HET clock prescaler configurations in HALCoGen or your initialization code.
    4. CPU Software Bugs:

      • Cause:
        • Incorrect Variable Type: The variable used to store the count on the CPU side might be too small, leading to overflow when the N2HET count exceeds its capacity.
        • Uninitialized Variable: The variable might not be properly initialized, leading to random initial values.
        • Memory Corruption: Other parts of the CPU application might be inadvertently writing to the memory location where the N2HET count is stored.
        • Interrupt Handling: If the count is read within an Interrupt Service Routine (ISR), ensure the ISR is efficient and doesn't introduce delays or re-entrancy issues that could affect the read.
      • Troubleshooting:
        • Perform a thorough code review of the CPU application, especially the parts interacting with the N2HET.
        • Use a debugger to inspect the count variable's value in real-time. Set watchpoints to see if anything else is writing to its memory location.

    Please make sure that we are not doing above mentioned errors.

    If they also didn't help you then you can initiate a return analysis of a unit through the sales channel where you purchased it. Customer returns | Additional information | TI.com

    --
    Thanks & regards,
    Jagadish.