Sampling and reconstruction are cornerstones of modern digital audio processing. We take a closer look at these processes and the limitations they impose.

Sampling is the act of converting a continuously varying signal to discrete samples suitable for digital manipulation. Reconstruction is the reverse, converting a sampled signal back to its continuous form, which can then drive a speaker allowing us to hear it. Intuitively, the idea may appear impossible. How can a finite set of samples possibly capture the infinite variations of a continuously changing signal? What about the signal segments between the sample points?

## The sampling theorem

The questions posed above are resolved by the sampling theorem. Attributed variously to Claude Shannon, Harry Nyquist, E. T. Whittaker, and others, this theorem sets out conditions permitting a continuous signal to be sampled and subsequently reconstructed:

- The signal must be bandlimited.
- The sample rate must be higher than twice the bandwidth of the signal.

If these conditions are met, the sampled values uniquely and exactly describe the full continuous waveform. There is no approximation. Increasing the sample rate cannot improve anything since there is no error to begin with.

For a signal with bandwidth \(B\), the minimum sample rate, \(2B\), required for an accurate capture is commonly referred to as the Nyquist rate. Conversely, if the sample rate is \(f_s\), the maximum allowed signal bandwidth, \(f_s/2\), is known as the Nyquist frequency.

## Sampling a sine wave

Suppose we sample an 8 kHz sine wave at the commonly used rate of 48 kHz. The conditions of the sampling theorem are met since the sample rate is well above twice the maximum (and in this case only) frequency of the signal. Figure 1 shows one period of this sine wave with the sample points marked as red circles.

If the signal frequency is too high, in this case above 24 kHz, a phenomenon known as aliasing occurs. In figure 2 we see a 40 kHz sine wave (green) together with the same 8 kHz signal as above (dashed blue). Notice that the sample points end up in exactly the same locations for both waveforms. Also notice that 40 kHz is precisely 8 kHz below the sample rate.

Given only the sample data, it would be impossible to tell which of these two waveforms was the source. A similar aliasing situation occurs for again for a signal frequency 8 kHz above the sample rate, that is 56 kHz, as can be seen in figure 3.

In general, for every valid signal frequency, there exist an infinite number of alias frequencies in symmetrical pairs around every multiple of the sample rate.

## Anti-aliasing filters

Due to the aliasing effect illustrated above, it is important that the sampled signal is properly band limited. If it is not, any frequencies above the Nyquist frequency will alias into the lower range and distort the capture. If we can’t be certain about the signal bandwidth, we must precede the sampling stage with an analogue low-pass filter. Since the purpose of this filter is to remove alias frequencies, it is commonly called an anti-alias filter.

Ideally, the anti-alias filter would cut out everything from the Nyquist frequency and up, leaving the lower frequencies untouched. A perfect low-pass filter like this is, unfortunately, impossible to construct in practice. The solution is to set the sample rate, not to precisely twice the highest frequency of interest, but somewhat higher, providing some margin between the top of the target band and the point where aliasing sets in. This allows the anti-alias filter a transition band wherein its response gradually goes from passing frequencies below to blocking those above.

The generally accepted upper limit for human hearing is 20 kHz. A sampled audio system thus needs a sample rate of at least 40 kHz. With a little margin added for the anti-alias filter, we arrive at the common sample rates of 44.1 kHz and 48 kHz. Those exact frequencies were chosen for technical reasons unrelated to the sampling process.

If we accept some aliasing distortion above 20 kHz, the width of the transition band can be doubled. This is possible since the aliases are mirrored around the Nyquist frequency, so for a 48 kHz sample rate, a 28 kHz signal component is aliased to 20 kHz.

## Oversampling

Even when permitting aliasing in the transition band, an anti-alias filter suitable for a 44.1 kHz or 48 kHz sample rate can be a challenge to design. This task is simplified by sampling at a much higher rate followed by a digital decimation stage since a digital low-pass filter can readily be made very steep without adversely affecting the pass band or requiring high-precision components. Oversampling, as this technique is called, additionally permits the use of a less accurate A/D conversion stage while maintaining the same signal to noise ratio in the audio band. In its simplest form, each doubling of the sample rate gains one effective bit of resolution, and noise shaping can improve this further.

## Reconstruction

For audio purposes, sampling would be mostly useless without a means of converting the signal back to its analogue form. After all, our ears do not accept digital inputs.

Mathematically, a sampled signal can be viewed as a sequence of impulses, one for each sample, with heights corresponding to the sample values. This is illustrated in figure 4.

That doesn’t look much like a sine wave. However, computing the Fourier transform yields the spectrum in figure 5 below.

Below the Nyquist frequency, 24 kHz, everything looks good with a single 8 kHz tone, exactly as desired. Above 24 kHz, things are not looking so good. There are additional tones at 40 kHz, 56 kHz, and so on around every multiple of the sample rate, and effect called imaging. For every actual frequency in the signal, this crude reconstruction has generated a multitude of image frequencies. As the reader may have noticed, these additional frequencies coincide with the alias frequencies we encountered during the sampling process.

Frequency imaging aside, an impulse based D/A converter isn’t practical. Such fast switching while producing an accurate voltage level is not easily achieved. A more reasonable approach is to hold the output voltage constant for the duration of each sample. This gives us the waveform displayed in figure 6.

This method is called a zero-order hold. The curve it produces looks a little more like a sine wave, though it still has some way to go. Figure 7 shows the spectrum.

As we can see, this method also produces the same image frequencies. Their level drops a little as the frequency increases, though not by much. Clearly, something must be done.

## Anti-imaging filters

A solution to the problem of image frequencies is to simply remove them using an analogue low-pass filter, unsurprisingly referred to as an anti-imaging filter. If we remove everything above the Nyquist frequency, 24 kHz, only the originally sampled signal remains.

As with the anti-aliasing filter earlier, a perfect low-pass filter is impossible to construct. We do, however, still have the margin between the limit of hearing, 20 kHz, and the Nyquist frequency within which to work. Of course, that rather small margin still presents the same challenge.

## Oversampling (again)

Once again, oversampling comes to the rescue. If we increase the sample rate by inserting one or more zeros after each sample, we obtain a digital version of the impulse sequence we looked at previously. The image frequencies in its spectrum can now be removed using a digital low-pass filter, which as already noted, is much easier to implement.

Having done a digital oversampling of the signal, we can then pass it to the same zero-order hold D/A converter as before. The output from this process using a 2x oversampling can be seen in figure 8.

While there are still steps, they are smaller, and the reconstruction follows the desired curve much more closely. In figure 9 we see that also the spectrum has been improved

The first pair of images, around 48 kHz, is gone, as are those for all odd multiples of the sample rate. The digital oversampling took care of that. To get rid of the remainder, a much more reasonable analogue filter can be used. The higher the oversampled rate, the simpler the analogue anti-imaging filter can be. In practice, an oversampling factor of 8x is common, placing the first images around 384 kHz.

## Final words

Sampling captures a continuous signal up to a maximum frequency, and the reconstruction process does the reverse, turning discrete samples back into a continuous waveform. There is a lot of symmetry between the two processes. Both rely on low-pass filters to function correctly, which presents some challenges. Likewise, digital filtering techniques operating at a higher sample rate greatly simplify this task.

It’s awesome that this article talked about the solution to getting around setting up a low pass filter. I appreciate you helping me learn more about these filters and how to work with them. I will have to look into a way to work with anti alias filters in the future.

This makes sense in the studio… but once the digital media has been created, what good does oversampling in a DAC do? If the audio is encoded at 44.1khz, no additional information of a higher resolution can be obtained… you’d simply be sampling the same data point multiple times, no?

Even with the 8x oversampling example above, I still don’t understand how a smooth sine wave can be generated from a series of square steps – even if it’s 8 times as many. the example is 8khz, but consider sampling 20khz. Even with oversampling that’s not very many samples… is the analog output stage actually tracking the sampled/oversampled zero-hold output, i.e. attempting to adjust and hold it’s voltage in steps? Are high frequency waveforms rounded out simply by virtue of analog components not being able to swing their voltages instantaneously, such that the next step up or down in voltage is gradual and thus smoothed out? That still wouldn’t be accurate, at least not for all frequencies.

This is great information but I’m still missing something, maybe it’s the last step. How does the digital stepped representation become uniformly gradual and produce an ACTUAL sine wave such that if we were to zoom in closely the ramp up/down would be smooth and at the proper angle?

^^^ This question I still have btw! I get how the sampled data accurately represents the analog waveform… but how we turn that sampled data INTO an analog waveform from a digitized zero-hold series of steps?

The stepped output is made smooth using an analogue low-pass filter. The benefit of digital interpolation is that the smaller steps it yields allow a simpler analogue filter to be used. Since the analogue filter isn’t perfect, some residual of the unwanted higher frequencies will remain. Again, the digital interpolation helps by moving these artefacts to higher frequencies where they are less harmful and the analogue filter is also more effective.

Since storage capacity is steadily increasing, why isn’t all audio recorded and delivered (either through streaming or download) at the highest possible bit-depth/resolution and sampling rate?

If I understood this article, that would allow for cheaper DACs since you wouldn’t need additional oversampling required by the analogue filters?

A side effect of this allows you to remove/not use dithering, reducing noise floor and increase dynamic range. I have never tried a really high-end audio setup, so I don’t know if it’s appreciable enough to warrant the increased storage space and probably more expensive recording equipment?

I’m also thinking from a preservation standpoint. Perhaps in the future all DACs regardless of price can easily reconstruct 24/192 for instance, then it would be a shame if the recording wasn’t available in high resolution (given that it is actually appreciable, as stated above) just because we wanted to save a bit of storage space or money today.

I would like to see some graphs in the “Anti-aliasing filters” – section:

1) Ideal vs. practical low-pass filter

2) Something to visualize the two last sentences in that section. What do you mean by “so for a 48 kHz sample rate, a 28 kHz signal component is aliased to 20 kHz.”? I assume it is the same as Fig.1-3 are showing, but it’s hard to mentally visualize.

Anyways, read all your articles. Not sure I understood everything, but they were still insightfull and interesting. Great work!

Hello, I have an IFI Zen v2 (DSD1793) currently running with the non-MQA firmware. I am curious to know which upsampling/downsampling rate (using SOX in JRiver) is optimal to feed the DAC? Would it be better to use 192kHz or 384kHz? Please help.

The answer depends on what you mean by optimal. Regardless, however, I would stick to 192 kHz or below as then the DAC chip will do a further 8x upsampling. With 384 kHz input, this is bypassed. Looking at the audible range only (up to 20 kHz), the DAC chip tends to perform slightly better at lower sample rates, though the difference is too small to be audible.