Tips for designing a robust computer vision system for self-driving cars

The robustness and reliability of a self-driving car’s computer vision system has received a lot of  news coverage. As a vision software engineer at TI helping customers implement advanced driver assistance systems (ADAS) on our TDAx platform, I know how hard it is to design a robust vision system that performs well in any environmental condition.

When you think about it, engineers have been trying to mimic the visual system of human beings. As Leonardo DaVinci said, “Human subtlety will never devise an invention more beautiful, more simple or more direct than does nature, because in her inventions nothing is lacking and nothing is superfluous.”

Indeed, I was recently painfully reminded of this fact. On March 2, 2018, I caught an eye infection which was eventually diagnosed as a severe adenovirus. After a month, my vision is almost back to normal. Throughout my health ordeal, I learned a few things about the human visual system that are applicable to our modern-day challenge of making self-driving cars.

The importance of sensor redundancy

The virus affected my right eye first, and the vision in that eye became very blurry. When I had both eyes open, however, my vision was still relatively good, as though my brain was selecting the image produced only by my left eye but using the blurry image produced by my right eye to assess distance. From this, I can infer that stereo vision doesn’t need to operate on full resolution images, although of course that is optimal. A downsampled image and even a reduced frame rate should be OK.

When my right eye became so painful I could no longer open it, I had to rely on my left eye only. Although my overall vision was still OK, I had difficulty assessing the distance of objects. During my recovery, my right eye started healing first, and my brain did the same thing once more: it relied on the eye that was improving.

From these observations, I can draw these conclusions about autonomous driving: for each position around the car where vision is used for object detection, there should be multiple cameras (at least two) pointing to the line of sight. This setup should be in place even when the vision algorithms only need monovision data.

Sensor multiplicity allows for failure detection of the primary camera by comparing images with auxiliary cameras. The primary camera feeds its data to the vision algorithm. If the system detects a failure of the primary camera, it should be able to reroute one of the auxiliary cameras’ data to the vision algorithm.

With multiple cameras available, the vision algorithm should also take advantage of stereo vision. Collecting depth data at a lower resolution and lower frame rate will conserve the processing power. Even when processing is monocamera by nature, depth information can speed up object classification by reducing the number of scales that need processing based on the minimum and maximum distances of objects in the scene.

TI has planned for such requirements by equipping its TDAx line of automotive processors with the necessary technology to handle at least eight camera inputs, and to perform state-of-the-art stereo-vision processing through a vision accelerator pack.

The importance of low-light vision, reliance on an offline map and sensor fusion

After the virus affected both of my eyes, I became so sensitive to light  I had to close all the window blinds in my home and live in nearly total darkness. I managed to move around despite little light and poor vision because I could distinguish object shapes and remember their location.

From this experience, I believe low-light vision processing requires a mode of processing different than the one used in daylight, as images captured in low-light conditions have a low signal-to-noise ratio and structured elements, such as edges, are buried beneath the noise. In low-light conditions, I think vision algorithms should rely more on blobs or shapes rather than edges. Since histogram of oriented gradients (HOG)-based object classifications rely mostly on edges, I expect they would perform poorly in low-light conditions.

If the system detects low-light conditions, the vision algorithm should switch to a low-light mode. This mode could be implemented as a deep learning network that is trained using low-light images only. Low-light mode should also rely on data from an offline map or an offline world view. A low-light vision algorithm can provide cues in order to find the correct location on a map and reconstruct a scene from an offline world view, which should be enough for navigation in static environments. In dynamic environments, however, with moving or new objects that were not previously recorded, fusion with other sensors (LIDAR, radar, thermal cameras, etc.) will be necessary in order to ensure optimum performance.

TI’s TDA2P and TDA3x processors have a hardware image signal processor supporting wide-dynamic-range sensors for low-light image processing. The TI Deep Learning (TIDL) library implemented using the vision accelerator pack can take complex deep learning networks designed with the Caffe or Tensor flow frameworks and execute them in real time within a 2.5W power envelope. Semantic segmentation and single-shot detector are among the networks successfully demonstrated on TDA2x processors.

To complement our vision technology, TI has been ramping efforts to develop radar technology tailored for ADAS and the autonomous driving market. The results include:

  • Automotive radar millimeter-wave (mmWave) sensors such as the AWR1xx for mid- and long-range radar.
  • A software development kit running on TDAx processors that implements the radar signal processing chain, enabling the processing of as many as four radar signals.

The importance of faulty sensor detection and a fail-safe mechanism

When my symptoms were at their worst, even closing my eyes wouldn’t relieve the pain. I was also seeing strobes of light and colored patterns that kept me from sleeping. My brain was acting as if my eyes were open, and was constantly trying to process the incoming noisy images. I wished it could detect that my eyes were not functioning properly and cut the signal! I guess I found a flaw in nature’s design.

In the world of autonomous driving, a faulty sensor or even dirt can have life-threatening consequences, since a noisy image can fool the vision algorithm and lead to incorrect classifications. I think there will be a greater focus applied to developing algorithms that can detect invalid scenes produced by a faulty sensor. The system can implement fail-safe mechanisms, such as activating the emergency lights or gradually bringing the car to a halt.

If you are involved in developing self-driving technology, I hope my experience will inspire you to make your computer-vision systems more robust.

Additional resources