Sampling and reconstruction are cornerstones of modern digital audio processing. We take a closer look at these processes and the limitations they impose.
Sampling is the act of converting a continuously varying signal to discrete samples suitable for digital manipulation. Reconstruction is the reverse, converting a sampled signal back to its continuous form, which can then drive a speaker allowing us to hear it. Intuitively, the idea may appear impossible. How can a finite set of samples possibly capture the infinite variations of a continuously changing signal? What about the signal segments between the sample points?
The sampling theorem
The questions posed above are resolved by the sampling theorem. Attributed variously to Claude Shannon, Harry Nyquist, E. T. Whittaker, and others, this theorem sets out conditions permitting a continuous signal to be sampled and subsequently reconstructed:
- The signal must be bandlimited.
- The sample rate must be higher than twice the bandwidth of the signal.
If these conditions are met, the sampled values uniquely and exactly describe the full continuous waveform. There is no approximation. Increasing the sample rate cannot improve anything since there is no error to begin with.
For a signal with bandwidth \(B\), the minimum sample rate, \(2B\), required for an accurate capture is commonly referred to as the Nyquist rate. Conversely, if the sample rate is \(f_s\), the maximum allowed signal bandwidth, \(f_s/2\), is known as the Nyquist frequency.
Sampling a sine wave
Suppose we sample an 8 kHz sine wave at the commonly used rate of 48 kHz. The conditions of the sampling theorem are met since the sample rate is well above twice the maximum (and in this case only) frequency of the signal. Figure 1 shows one period of this sine wave with the sample points marked as red circles.
If the signal frequency is too high, in this case above 24 kHz, a phenomenon known as aliasing occurs. In figure 2 we see a 40 kHz sine wave (green) together with the same 8 kHz signal as above (dashed blue). Notice that the sample points end up in exactly the same for both waveforms. Also notice that 40 kHz is precisely 8 kHz below the sample rate.
Given only the sample data, it would be impossible to tell which of these two waveforms was the source. A similar aliasing situation occurs for again for a signal frequency 8 kHz above the sample rate, that is 56 kHz, as can be seen in figure 3.
In general, for every valid signal frequency, there exist an infinite number of alias frequencies in symmetrical pairs around every multiple of the sample rate.
Due to the aliasing effect illustrated above, it is important that the sampled signal is properly band limited. If it is not, any frequencies above the Nyquist frequency will alias into the lower range and distort the capture. If we can’t be certain about the signal bandwidth, we must precede the sampling stage with an analogue low-pass filter. Since the purpose of this filter is to remove alias frequencies, it is commonly called an anti-alias filter.
Ideally, the anti-alias filter would cut out everything from the Nyquist frequency and up, leaving the lower frequencies untouched. A perfect low-pass filter like this is, unfortunately, impossible to construct in practice. The solution is to set the sample rate, not to precisely twice the highest frequency of interest, but somewhat higher, providing some margin between the top of the target band and the point where aliasing sets in. This allows the anti-alias filter a transition band wherein its response gradually goes from passing frequencies below to blocking those above.
The generally accepted upper limit for human hearing is 20 kHz. A sampled audio system thus needs a sample rate of at least 40 kHz. With a little margin added for the anti-alias filter, we arrive at the common sample rates of 44.1 kHz and 48 kHz. Those exact frequencies were chosen for technical reasons unrelated to the sampling process.
If we accept some aliasing distortion above 20 kHz, the width of the transition band can be doubled. This is possible since the aliases are mirrored around the Nyquist frequency, so for a 48 kHz sample rate, a 28 kHz signal component is aliased to 20 kHz.
Even when permitting aliasing in the transition band, an anti-alias filter suitable for a 44.1 kHz or 48 kHz sample rate can be a challenge to design. This task is simplified by sampling at a much higher rate followed by a digital decimation stage since a digital low-pass filter can readily be made very steep without adversely affecting the pass band or requiring high-precision components. Oversampling, as this technique is called, additionally permits the use of a less accurate A/D conversion stage while maintaining the same signal to noise ratio in the audio band. In its simplest form, each doubling of the sample rate gains one effective bit of resolution, and noise shaping can improve this further.
For audio purposes, sampling would be mostly useless without a means of converting the signal back to its analogue form. After all, our ears do not accept digital inputs.
Mathematically, a sampled signal can be viewed as a sequence of impulses, one for each sample, with heights corresponding to the sample values. This is illustrated in figure 4.
That doesn’t look much like a sine wave. However, computing the Fourier transform yields the spectrum in figure 5 below.
Below the Nyquist frequency, 24 kHz, everything looks good with a single 8 kHz tone, exactly as desired. Above 24 kHz, things are not looking so good. There are additional tones at 40 kHz, 56 kHz, and so on around every multiple of the sample rate, and effect called imaging. For every actual frequency in the signal, this crude reconstruction has generated a multitude of image frequencies. As the reader may have noticed, these additional frequencies coincide with the alias frequencies we encountered during the sampling process.
Frequency imaging aside, an impulse based D/A converter isn’t practical. Such fast switching while producing an accurate voltage level is not easily achieved. A more reasonable approach is to hold the output voltage constant for the duration of each sample. This gives us the waveform displayed in figure 6.
This method is called a zero-order hold. The curve it produces looks a little more like a sine wave, though it still has some way to go. Figure 7 shows the spectrum.
As we can see, this method also produces the same image frequencies. Their level drops a little as the frequency increases, though not by much. Clearly, something must be done.
A solution to the problem of image frequencies is to simply remove them using an analogue low-pass filter, unsurprisingly referred to as an anti-imaging filter. If we remove everything above the Nyquist frequency, 24 kHz, only the originally sampled signal remains.
As with the anti-aliasing filter earlier, a perfect low-pass filter is impossible to construct. We do, however, still have the margin between the limit of hearing, 20 kHz, and the Nyquist frequency within which to work. Of course, that rather small margin still presents the same challenge.
Once again, oversampling comes to the rescue. If we increase the sample rate by inserting one or more zeros after each sample, we obtain a digital version of the impulse sequence we looked at previously. The image frequencies in its spectrum can now be removed using a digital low-pass filter, which as already noted, is much easier to implement.
Having done a digital oversampling of the signal, we can then pass it to the same zero-order hold D/A converter as before. The output from this process using a 2x oversampling can be seen in figure 8.
While there are still steps, they are smaller, and the reconstruction follows the desired curve much more closely. In figure 9 we see that also the spectrum has been improved
The first pair of images, around 48 kHz, is gone, as are those for all odd multiples of the sample rate. The digital oversampling took care of that. To get rid of the remainder, a much more reasonable analogue filter can be used. The higher the oversampled rate, the simpler the analogue anti-imaging filter can be. In practice, an oversampling factor of 8x is common, placing the first images around 384 kHz.
Sampling captures a continuous signal up to a maximum frequency, and the reconstruction process does the reverse, turning discrete samples back into a continuous waveform. There is a lot of symmetry between the two processes. Both rely on low-pass filters to function correctly, which presents some challenges. Likewise, digital filtering techniques operating at a higher sample rate greatly simplify this task.