Three elements for efficient processing in medical imaging: Part One

While Texas is still hot, we do have ways to keep medical imaging cool. My colleague, Mark Nadeski did a blog post last month called, Medical imaging is heating up where he discussed some of the hot trends in medical imaging.

Low power systems are key for portable imaging devices, and helps bring them closer to the patients. They allow these devices to be available in emergency vehicles, field hospitals and remote health care centers.

To keep power low, we must focus on efficient implementation of data processing in order to take advantage of the high compute density per power in embedded processors, used widely in these systems. The processing of data from capture to display is often complicated requiring heavy computations. There are three major elements to pay attention to while designing with these processors:

  • Input/output or I/O bandwidth
  • Memory bandwidth
  • Compute need

I will divide these three elements into two blog posts. In this first post, I will focus on the I/O and memory bandwidth.

Let’s start with I/O bandwidth. TI’s multicore processors come with high speed I/O interfaces like the Gb Ethernet, PCIe Express, Serial rapid I/O (SRIO) as well as proprietary interfaces like the Hyperlink. Even then, it may be necessary to do some pre-processing on the data. For example, in medical ultrasound imaging, the conventional preprocessing is to do beamforming which combines the data from all the elements in the transducer into one set. An alternate re-partition of the system has also been demonstrated where the demodulation is done in the analog front end (AFE) to reduce the I/O bandwidth between the AFE and the processing unit. The AFE5809 from TI provides this capability. In this case, as discussed in Mark’s blog, the beamforming takes place inside the processor.

Conventional beamforming reduces the I/O throughput by combining the output of all the channels into one

The second element we will discuss is memory bandwidth. Efficiently moving the data between on-chip and external memory so as not to overwhelm the memory bandwidth is a key aspect of embedded system implementation. The idea is simple:  Do as much processing as you can while the data stays in on-chip memory. This often requires repartitioning of the processing tasks or the data or both.

Let’s take a typical example of processing tasks carried out on medical images before presenting them for display. The image first goes through some noise reduction technique, usually through a data dependent spatial filtering; the edges are enhanced and finally the contrast is adjusted. One can perform each of these tasks on the whole image before moving onto the next task. However, a better way to do this is to perform all of these tasks on a subset of the image which is kept in on-chip memory. This will significantly reduce the number of times data is transferred between memory hierarchies.

The Direct Memory Access (DMA) capability of these processors allows data movement across memory hierarchy and across I/O peripherals while the cores continue to perform computations. In the ideal case, there is no overhead associated with data movement and the cores can spend all this time in processing. The I/O and memory bandwidth utilization can be designed so that the computation time for the data processing is larger than the time required for various data movements. I’m interested in hearing from you. Did you face bandwidth problems in your system? What techniques did you use to solve your issue?

Be sure to check back for part two of this blog post where I will discuss the compute need.