This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

DRV8889-Q1: Stall fault bit is not clearing immediately, is inconsistent

Part Number: DRV8889-Q1

Tool/software:

We have custom hardware talking with the DRV8889(latest SI rev of the chip).

We have verified normal operation, by stepping and stalling the motor. Communication packets look good and have been verified.

Open load detection is disabled in our case.

Observation:

  1. When a stall condition is detected, it lowers the Faultline, which is an ISR into our host software.
  2. Host software detects the fault line go low(or high) and send a command to clear the latched faults.
  3. A readback from the fault status register, 1ms. later, shows the stall bit still set.
  4. However a clear fault issued again ~250ms later, clears the fault

Question:
Is there a minimum time for the DRV8889 for the stall to clear? Is yes, where is this time specified. If no, how do we calculate this?


Additional info:

The register values we use are below:
FAULT: 0xA5
DIAG1: 0x00
DIAG2: 0x00
CTRL1: 0x83
CTRL2: 0x0F
CTRL3: 0xA4
CTRL4: 0x31
CTRL5: 0x18
CTRL6: 0x0C
CTRL7: 0x00
CTRL8: 0x03

  • HI Gerry,

    Thanks for your question.

    I am assigning this to my teammate who supports the device. He will review and get back to you .

    Thanks,

    Ibinu

  • Hi Grey,

    Thank you for posting in this forum. 

    The provided registers look good. There is no minimum time needed to clear time. 

    can you please use nSLEEP for greater than 18 µs and shorter than 35 to reset the fault as elaborated below:

    "In addition to the CLR_FLT bit in the SPI register, a latched fault can be cleared through a quick nSLEEP pulse. This pulse width must be greater than 18 µs and shorter than 35 µs. If nSLEEP is low for longer than 35 µs but less than 75 µs, the faults are cleared and the device may or may not shutdown, as shown in the timing diagram.

    Can you please check the any latency issue coming from the software. 

    Best

    Mojtaba.

  • Mojtaba,

    What is the reason for this delay.
    We are also noticing a glitch on the fault line, when a clear fault is sent out.
    It seems like, in in fault, the DRV decides to lower the line back after a clear stall command. Is this intended? It only seems to happen when a clear stall command is sent down and the system is in stall. All other spi commands show no glitches when the fault line is low.


    Here is an expanded version, showing the glitches when the clear stall is sent down. The time between the fault line going low, then high is ~12ms.

  • Hi Gerry, 

    Let me investigate this issue and will get back to you. 

    Best regards ,

    Mojtaba.

  • Hello, could we get any updates on this.

  • Hi Gerry,

    Thank you for your patience. 

    I'm working on this issue and will get back to you tomorrow. 

    Best regards,

    Mojtaba.

  • Hi Gerry,

    In order to clear a latched stall fault the torque count must be higher than the stall threshold. This means the mechanical stall condition must not be present, when the step pulses are continued to be issued. What was the STEP input condition when CLR_FLT was written to 1?

    nSLEEP reset pulse is not required if CLR_FLT is used to clear the latched faults including stall fault. This would be an either / or option. Thank you.

    Regards, Murugavel  

  • Hi,

    There are no step pulses issues when we try to clear the fault. In fact, the host software stops issuing steps, the moment it detects a fault and then issues a clear fault.
    A rising edge in STEP causes a step. Are you saying that leaving the the STEP high could be an issue?

  • Here is a sreenshot showing the step signal, going low.
    The FLT line still goes high every time the clear fault spi command is sent.

  • Hi Gerry,

    Are you saying that leaving the the STEP high could be an issue?

    While no STEP input the input must be LOW. Thanks.

    Regards, Murugavel 

  • Hi Gerry,

    Here is a sreenshot showing the step signal, going low.
    The FLT line still goes high every time the clear fault spi command is sent.

    What is the D2 FLT trace? I assume the A2 FLT is the nFAULT pin. For every CLR_FLT command, the nFAULT is cleared with the high going pulse but immediately a fault is again detected. As soon as a stall is detected can you set the stall threshold register to 0 before issuing a CLR_FLT and see what happens? Thanks.

    Regards, Murugavel 

  • D2 FLT is what the Salea sees as the logic level of the fault line vs. A2 which is the analog version of the same signal. I turned A2 on because the chip was reporting tons of fault interrupts, after a few hours of debugging on the code side wanted to see the voltage level directly and that's what caught the odd spikes in the fault line around clearing.

    Here is a capture where:

    - CTRL6 is written to 0x0 to set the stall threshold to 0x0 immediately before a stall clear (write to CTRL4 with high bit set) is issued.

    - Stall threshold is being restored before we begin stepping again.

    The markers in the image show (from left to right):

    2 - read of fault register 0x0

    3 - read of rev id (we do a lot of rev id reads to ensure we've got stable comms)

    0 - write of 0x0 to CTRL6 to set stall threshold to 0

    1 - write to CTRL4 with high bit masked into the present register value to clear the fault (do we need to read-modify-write this register? we didn't want to alter its previous value when clearing the fault value but it does result in an extra read)

    4 - write of 12 to CTRL6 to restore the stall threshold to 12 before we begin stepping again

    It appears that the fault line glitches are not present in this trace. The fault line also rises immediately after the clear is sent, without any fault clear retries being necessary.

  • The second fault is unexpected and causing an issue with our system. We've changed direction and are trying to step to find the other limit of the system, we shouldn't get a fault for several seconds. The second fault appears to be occurring immediately upon restoring the threshold to 12. See how it aligns with marker 4 in the image here.

    At that point stepping has not started again.

  • Hi Chris, 

    Thank you for your update. 

    I will test it in the lab and get back to you. 

    Best regards,

    Mojtaba.

  • Hi Chris,

    Thanks for doing the tests and sharing the results. This is helpful to narrow down the issue.

    The second fault is unexpected and causing an issue with our system. We've changed direction and are trying to step to find the other limit of the system, we shouldn't get a fault for several seconds. The second fault appears to be occurring immediately upon restoring the threshold to 12. See how it aligns with marker 4 in the image here.

    At that point stepping has not started again.

    The torque count may lower than the threshold at that point because the stepping was not started yet. It takes two electrical cycles (one electrical cycle = one full waveform - sine wave in the case of mircrostepping) for torque count to be updated to the new value and once stable the latest value is updated every electrical half-cycle.

    Setting the threshold after step pulses are input will help avoid this second fault. Thank you.

    Regards, Murugavel 

  • Hi Murugavel.

    Is the recommendation that after a stall we perform:

    - Set the threshold to zero

    - Clear fault

    - Step at least twice

    - Restore threshold

    - Continue stepping etc

    Is there a time based element, outside of stepping that is clearing the TRQ_CNT / TRQ_COUNT value? It appears this is the case as the stall does clear if we repeatedly clear faults and/or if we wait some 220ms. I don't see any mention of anything time based in the data sheet though.

    I'm asking because it is additional logic to perform the process above with steps and it may be simpler to use the time based approach.

  • Hi,

    Thank you for your question. Our expert Mojtaba is assigned for your question while Murugavel is OOO. Please wait his feedback.

  • Hi Chris,

    Please let me setup the sequence in our bench and get back to you. Thank you.

    Regards, Murugavel 

  • Hi Chris,

    Sorry about the delay in getting back to you. 

    With our bench tests a CLR_FLT clears the stall fault when the stepper is not running and even when stall threshold is non-zero and set to the actual stall threshold. Every single time I was able to clear the stall fault when no step pulses are input and the stall threshold unchanged. This is how the device is used in several applications. Do you happen to have one of our EVMs with which you can compare with? 

    What is the maximum torque count in the CTRL7 register when the motor is running with target velocity and with no stall condition? Thank you.

    Regards, Murugavel 

  • Hi Murugavel.

    We use a torque threshold of 12.

    You've confirmed the clear occurs immediately? We see it clearing as well if we issue the clear command after some time has elapsed but not on the first try.

  • Hi Chris,

    You've confirmed the clear occurs immediately?

    Yes correct and with only one try. What is the time interval from nFAULT = 0 and issuing the first CLR_FLT? In the first post you mentioned, "However a clear fault issued again ~250ms later, clears the fault". Was specifically 250 ms delay needed for the customer or did they try much smaller delays as well?

    Regards, Murugavel 

  • How long was your delay before the fault clearing was attempted? It was a bit tricky to catch the fault glitches as they are very narrow.

    In our case if the clear occurs < ~250ms it doesn't stick.

    In the above traces we attempt to clear some 200uS or so and we see glitches on the fault line after each of these clear attempts until the total elapsed time is ~250ms from the fault occurring.

  • Hi Chris,

    In our case if the clear occurs < ~250ms it doesn't stick.

    In the above traces we attempt to clear some 200uS or so and we see glitches on the fault line after each of these clear attempts until the total elapsed time is ~250ms from the fault occurring.

    Thanks for this information. This does not look related to the CLR_FLT logic which is designed and tested to clear all faults immediately while initiated. If the fault persists the fault may remain stuck although a CLR_FLT was executed. 

    Regards, Murugavel 

  • Ok. I'm not sure what or when we'll get back to digging in deeper here. The motor is stationary when the fault is being cleared and we can reproduce pretty readily. Appreciate the replies and that it's not always possible to get to the bottom of an issue like this.

    If you do find something and you are reminded of this issue we'd be interested in how we can accommodate or resolve in terms of how we interact with the chip.

  • Hi Chris,

    We tried to reproduce the issue described by you with a DRV8889-Q1EVM and a stepper motor driven with VM = 12 V, and were unable to do so, thus far. I understand your frustration. Sorry this has taken long.   

    Based on the register settings shared, you were using 1/4 step mode. What was the STEP pps? We'll try with the same frequency in our tests. I assume VM = 12 V matches your setup. We set IFS = 300 mA. I also assume you were able to reproduce this problem with similar behavior on multiple driver devices. 

    The EVM firmware runs with a scheduler and reads the nFAULT pin as a GPIO status. There is a latency of some ms involved in this process. However this latency is definitely not 250 ms. only sub 10 ms. We plan to modify the EVM firmware to minimize this latency and see if we can reproduce this issue. This may take a few days to implement. I'll keep you posted. Thank you. 

    Regards, Murugavel 

  • Hi Murugavel.

    These things are complex, it was always a possibility that it wasn't some odd behavior in the chip, or that it only happens in particular cases that are difficult to reproduce.

    Our system is 24V, 50% current (750mA), with a PPS of 300 (although we vary this at times during run time).

    We are able to reproduce on multiple devices.

    There is a 4.7k pull-up to 3.3v on the DRV_FLT signal but otherwise no other electronics on that line. Microcontroller configured for input on that pin.

    Note that the earlier traces in this thread show the fault line glitches, so it does appear they are occurring.

  • Hi Chris,

    Thank you for this information. We'll get back to you on this as soon as we can.

    Regards, Murugavel 

  • Hi Chris,

    Thank you for your patience.

    I was able to speed up the EVM firmware scheduler as well as modify the code to perform a CLR_FLT as soon as the nFAULT pin was read LOW. Still was not able to use interrupt but could read it much faster than the regular EVM firmware. As soon as the nFAULT pin was detected LOW I did a stop issuing STEP pulses and then did a CLR_FLT in CTRL4 register. I could reproduce the issue that you observed. At this time I do not have an explanation for the behavior. We're looking at the digital logic design to understand the behavior.

    Meanwhile I tried a work around which seems to address the issue. The sequence as follows. After nFAULT pin was detected LOW stop issuing STEP pulses and STEP input LOW. The disable stall by writing EN_STL = 0 in the CTRL5 register. Do a CLR_FLT in CTRL4. Next enable stall by EN_STL = 1. This enables stall detection for the next cycle. This seem to work consistently. See below captures. Yellow trace is nFAULT, green trace is torque count analog voltage converted from CTRL7 register reading and a DAC, and blue trace is the coil B current. You can see the torque count reduce as the motor approaches the end point and eventually stalls. nFAULT turns LOW and then cleared by the CLR_FLT. Took about 100 μs with the modified scheduler. Running multiple cycles yielded consistent results with the CLR_FLT. 

    For context see below capture which does not do EN_STL = 0 prior to doing a CLR_FLT. There are intermediate spikes as the loop attempts several CLR_FLT attempts every scheduler cycle. To be specific I was never able to reproduce 250 ms of delay instead the delay was consistent around 12 ms with my EVM HW. 

    Let me know if any questions. I hope this will help you to move forward with the work around. I will close with you as soon as we find the reason for the unexpected behavior of the CLR_FLT while EN_STL = 1. Thanks.

    Regards, Murugavel