Safety is a Herculean Task!
C.P. Ravikumar, Texas Instruments
“Better safe than sorry,” so goes the old saying. I had a chance to listen to Karl Greb, a veteran in the subject of functional safety at Texas Instruments on this specialized topic, when he visited Bangalore.
You may have read in the newspapers about incidents where lives are lost, people are injured, or property is damaged due to the malfunctioning of a system. A lithium-ion battery caught fire due to overheating during a recharge. Remember that Lithium-ion batteries are used in a number of electronic gadgets, including mobile phones and laptops. A car accelerates or decelerates by itself, without the driver’s intention to do so. An elevator crashes. An automatic door closes, unable to sense a hand, crushing it in the process. A joy ride in a park turns into a nightmare. A user of an electrical equipment receives a shock.
In each of these examples, considerable analysis will be required to point a finger at the precise reason for the malfunctioning of the complete system. A system comprises of mechanical, electrical, and electronic components. Other than manufacturing defects, components also fail due to mechanical stress, thermal stress, electrical stress, aging, environmental influences, etc. With integration of more functionality in the same chip, one can expect a higher failure rate if no changes are made to the design.
Tolerating Faults
One of my teachers had a permanent disability in his right hand due to an accident; he began to use his left hand to write on the black board. Fault-tolerance is a design principle that makes the device more robust and tolerant to faults. Fault-tolerant design is important in safety-critical systems. Given that more and more electronic content is entering safety-critical systems such as automotives, medical systems, and industrial automation, system designers seek building blocks that are fault-tolerant. In a technique called spatial redundancy, designers duplicate hardware blocks. In a technique called code redundancy, designers use error correction codes (ECC) to detect faults in storage or communication of data.
Fault Tolerance!
Art by Ananya Ravikumar
The Hercules processor from Texas Instruments is intended for safety-critical applications and has been designed with safety features built in. Use of two CPUs in this MCU is an example of spatial redundancy. The use of ECC memory is an example of code redundancy. Many other design features are included in these MCUs to enable customers implement safety standards for the end equipment. Finally, the software developer can also make use of temporal redundancy, where the computation is repeated more than once to verify the calculation; remember how you verify your answers in an examination by repeating your calculation! Go through the online documentation on Hercules to know more about the other safety features included in Hercules; the quiz below may help you in the process.
Testing and Quality Assurance
Testing of chips is a way to ensure that chips that have manufacturing defects are not sold and we can assure quality to the customer. The number of parts that turned out to be defective (at the customer’s site) for every million chips purchased from the vendor is a measure of quality (DPPM – Defective Parts Per Million). Good manufacturing and testing will improve this figure. To enable testing of the device for the purposes of maintenance, designers implement the “self-testing” feature in both logic and memory blocks.
Semiconductor vendors specify the expected life-time for their devices, since a chip may fail due to electrical and thermal stresses as well as environmental influences such as humidity. Sometimes, faults are caused by electrostatic discharges, electromagnetic interference, and radiation. Stress testing is performed by semiconductor manufacturers to reduce the probability of a chip failing before its life-time. It is expected that the life-time of the system will be extended by regular maintenance and replacement of old parts.
There is still a chance that a device may fail within its lifetime and result in malfunctioning of the overall system. This is where fault-tolerance becomes important.
Teaching Safety concepts
When listening to Karl Greb's presentation, the thought that was going through my mind was, how can these important concepts be taught to engineering students? That was my motivation for writing this blog entry. I am not sure if concepts such as testing, quality, safety and fault-tolerance are emphasized in the coursework. Perhaps it will be good to include at least one lecture on this topic in a course on microcontrollers or embedded system design. I would think that a full course on the topic is needed in postgraduate curriculum specializing in electronics or allied areas. Will be glad to hear your opinions on the subject! Before I sign off, here is an invitation to take part in an adventure! Don't worry, it is perfectly safe!
Quiz - The 12 Labors of Hercules!
Hercules is the Roman name for the divine hero Heracles from Greek mythology. He was the son of Zeus and is known for his physical strength, using which he performed twelve great feats, also known as the “Twelve Labors of Hercules.” Here is your chance to perform a Herculean task. Fortunately, you will have the power of the Search Engines to locate answers for these quiz questions on the Internet!
1. Many processors are used inside an automotive. These processors make use of a network communication protocol to exchange information. Which of these network communication protocols, which is supported by TMS570 Hercules processor, was designed for fault-tolerant operation?
- CAN
- LIN
- FlexRay
- SafeTI
2. Match the following!
End Equipment |
Safety Standard |
Car |
IEC 61508 |
Washing Machine |
IEC 60730 |
Ventilator used in an ICU |
ISO 26262 |
3. Two processor cores are used in Hercules MCU. Which ones?
- Two MSP430 processors
- One ARM Cortex-M4 and an MSP430
- Two ARM Cortex-R4 cores
- One ARM9 core and one ARM Cortex-R4 core
4. Which of the following design precautions reduce common mode failures in Hercules?
- Two separate clock trees are used to provide clock signals to the processors
- Use of ECC Flash Memory and RAM
- Use of Built-in Self-Test for memory and logic
- All of the above
5. TI provides SafeTI design packages to help customers achieve safety certification for the end products that are used in safety-critical applications. SafeTI design packages are available for which safety standards?
- IEC 61508
- ISP 26262
- IEC 60730
- All of the above
6. In an MCU a watchdog timer is
- A timer intended for computing elapsed time between two events
- A safety device to prevent theft of CPU cycles by a virus
- A safety device to prevent system lockup
- A timer intended to turn on the burglar alarm
7. In Hercules MCU, the watchdog timer is made more robust by
- Doubling the number of bits in the timer
- Doubling the clock speed of the timer
- Flagging a fault if the watchdog timer is reset outside a time window
- Ensuring that the watchdog timer cannot be reset
8. Hercules uses two CPU cores – let us call them A and B. Which of these statements is correct?
- A and B use different instruction sets to compute the same function, thereby catching an error if the outputs do not match
- B executes the same instruction as A and checks its output matches that of A
- Both A and B execute the same instruction at the same time and a checker is used to compare the results from A and B
- B executes the same instruction as A, but with a small delay, and a checker is used to compare the results from A and B
9. The ECC memory in Hercules is capable of
- Single Error Detection and Correction
- Double Error Detection and Single Error Correction
- Double Error Correction and Single Error Detection
- Double Error Detection and Double Error Detection
10. Hercules MCU is used in a motor control application. The feedback signal from the motor is a critical signal and must be monitored in a fail-safe way. How does Hercules support this?
- By providing a special hardware accelerator for monitoring critical signals
- By allowing more than on on-chip ADC to receiving the same signal for monitoring
- By providing parity check on critical signals
- By providing a special instruction in the CPU for monitoring critical signals
11. Use of a safety MCU from the Hercules family will
- Help improve the MTBF metric
- Help improve the fault coverage metric
- Help improve the safety factor of the mechanical load connected to motors
- All of the above
12. FIT is defined to be 1 failure in 1000,000,000 hours. If I have an equipment that has a rating of 50 FITS, then
- It may fail a maximum of 50 times in a year
- It may fail a maximum of 0.0000000005 times in a year
- It may fail a maximum of 0.0000000005 times in an hour
- None of the above