Quick Guide To Digital Audio

Quick Guide To Digital Audio

How is sound digitally recorded? How important are sample rates and resolutions? We look at the crystal clear world of digital audio.

Digitising audio is the process of converting sound into a series of numbers. The hardware that does the job is known, naturally enough, as an audio-to-digital converter, usually abbreviated to ADC. A digital-to-audio converter (DAC) reverses the process, converting digital audio in a sampler or on a hard drive, for example, to sound which we can hear.

The two most important aspects during conversion are the sample rate and sample resolution and these determine the overall quality of the material.

Sample rate

To convert sound to a digital format, the ADC measures or samples it so-many times per second. The more samples taken in a given time, the more accurate the digital representation of the sound. This can be clearly seen in the following diagrams that use a sine wave for a sound source.

 This is clearly a sine wave but you can see the steps in the waveform showing at which points it has been sampled.

This is clearly a sine wave but you can see the steps in the waveform showing at which points it has been sampled.

 This sine wave has been recorded at a higher sample rate and is therefore more accurate and closer to the original sine wave.

This sine wave has been recorded at a higher sample rate and is therefore more accurate and closer to the original sine wave.

Although the second example is closer to a perfect sine wave, the steps are still evident and on playback it might sound a little ‘rough’.

The next two illustrations show what happens with extremely high and low sample rates.

This wave has been sampled at a very high sample rate and although the steps are there we can see that it is virtually a pure sine wave.

This wave has been sampled at a very high sample rate and although the steps are there we can see that it is virtually a pure sine wave.

This has been sampled at a very low sample rate. In fact the samples are so far apart the sine wave shape is only barely distinguishable. On playback, it wouldn’t sound anything like a sine wave.

This wave has been sampled at a very low sample rate, in fact the samples are so far apart that the sine wave shape is only barely distinguishable. On playback, it wouldn't sound anything like a sine wave.

The Nyquist limit

So, we can clearly see that the higher the sample rate, the more accurate the digital representation of the sound. Mathematician Harry Nyquist showed that to accurately digitise a sound we need only to sample it at twice its frequency. Assuming the limit of human hearing is around 20kHz, we should be able to capture our full audio spectrum by sampling at 40kHz. So the 44.1kHz sample rate of audio CDs ought to give us a little headroom.

From Harry’s calculations we get the Nyquist Limit which is half the frequency of the sample rate. If you sample a frequency beyond the Nyquist Limit – that is, more than half the sampling rate – the sample is ‘folded over’ and stored at a value lower than it actually is. This creates an effect called aliasing which produce frequencies that were not in the original recording resulting in a distorted sound. Not something you want in a recording.

The going rate

It will not have escaped your notice that many systems offer sample rates of 48kHz (although this is mainly for compatibility with DAT machines as 48kHz isn’t so much of an improvement over 44.1kHz) with additional rates up to 96kHz and some even as high as 192kHz. Are there advantages to using them? There is still some controversy over the benefits of these higher rates.

On the one hand, with the threshold of human hearing at 20kHz, why try to record anything higher? And from a practical viewpoint the majority of modern listeners use playback systems such as walkmans and iPods with headphones that are unable to appreciate even the quality of standard audio CDs.

The purists, on the other hand, argue that a 96kHz sample rate, for example, allows the accurate recording of frequencies at the upper threshold of human hearing and beyond. While one benefit is the prevention of aliasing, such frequencies may be difficult to notice on a purely objective level, although subjectively the psychoacoustic benefits can make the sound seem more open, airy, and give it more depth and space. The difference may be subtle and you will need both good ears and a good playback system to perceive a difference – but upon such subtleties hang the questions surrounding the benefits of modern digital audio recording.

Sample resolution

The sample resolution is the scale used to measure the recorded samples. Audio CDs use a sample resolution of 16 bits. Bits are binary numbers and are easily converted to decimal by raising 2 to the power of the bits. So 16 bits is 2^16 = 65536. This is how many values there are to store the sampled data. 8 bits gives us (2^8) 256 values.

So, going back to our sine wave, when we sample it, each value must be in the range of 0 to 65536 for a 16-bit recording and in the range of 0 to 256 for an 8-bit recording. 16 bits, therefore, affords a much finer and accurate representation of a sample than 8 bits.

Tech Terms
Signal-to-noise ratio
Often abbreviated to SNR, it’s the ratio between the level of the signal you’re recording and the background noise inherent in the system. It’s closely related to dynamic range because the greater the ratio, the wider the difference there can be between the quietest and loudest parts of a signal without noise getting in the way.

Decibel
Abbreviated to dB, this is technically a relative measure of the difference between two signals which can be confusing. The important thing to remember is that a 6dB increase represents a doubling of the volume.

Dynamic range

Dynamic range is the difference between the quietest and loudest parts of a signal and is closely related to the number of bits in the sample resolution. The more bits, the greater the difference between the lowest and highest sample values.

Dynamic Range Tip
A quick way to calculate the dynamic range of a recording is to multiply the number of bits by 6.

The ultimate goal in digital audio recording is to capture the full range of human hearing. The difference between the quietest and the loudest sounds we can hear is around 140dB. A 16-bit sample resolution gives us a maximum dynamic range of 96dB. That’s pretty good and a massive improvement over 8-bit recording with a range of (8×6) 48dB.

However, crank the resolution up to 24 bits and you get a ear-topping dynamic range of 144dB. Who could ask for more? Well, recording engineers for one. The problem is to do with clipping and headroom.

Clipping and headroom

Let’s say we’re using 16 bits and along comes a sound that requires a value of 65600. Working with a maximum dynamic range of 96dB, this is quite likely. The resolution only allows values up to 65336 so what happens? The sample is truncated or clipped and stuffed into the 65536 box resulting in a distortion of the signal.

When recording, you try to get as much sound into the system as possible, but to avoid unwanted clipping you may want to lower the overall sound level to give you some headroom. In practise that may mean working with an effective resolution of just 14 or 15 bits. That means you’re reducing the dynamic range to 84dB or 90dB and while that’s much better than analogue tape, it doesn’t fully realise the potential of 16 bits.

If a waveform is clipped, it will have flat areas at its highest and lowest peaks, like this:

If a waveform is clipped, it will have flat areas at its highest and lowest peaks.

If a waveform is clipped, it will have flat areas at its highest and lowest peaks.

However, if you drop a few bits from a 24-bit recording you still get a stupendous 132-138dB dynamic range so you don’t have to ride the input levels so carefully. Even many consumer sound cards now support 24-bit recording. But up that to 32-bit and you can afford a whole 8 bits headroom while still achieving the full dynamic range of human hearing. Just as 24-bit recording is now very affordable, one day all recording systems will support 32-bit. Oh happy day.

One thing ‘old’ recording engineers working with tape used to do was to push the recording levels into the red. This caused a saturation effect on the tape, a little like compression, creating a warm sound. You cannot do this with digital recording – all you’ll do is clip – so don’t try. There are lots of effects you can apply later to ‘warm up’ the sound if you feel it necessary.

Size matters

So after all that, it should be clear that higher sample rates and higher sample resolutions equal higher-quality audio. The major drawback of high-rated systems is that they require increased storage space and processing power. A 96kHz/24-bit recording, for example, requires three times the storage space of a 44.1kHz/16-bit recording.

In time, as computers become increasingly powerful and cheaper, and hard disks grow larger, this will not be a consideration but currently, if you want the very highest quality, it is. You must, therefore, consider the trade-off between digital audio quality and storage space and processing power. If your system won’t run to higher rates, remember that ‘humble’ audio CD quality is still far superior to analogue tape reel-to-reel systems that produce state-of-the-art recordings not so many years ago. But if you’re planning or upgrading a digital audio system, think about incorporating the higher rates and resolutions into it.

For more info…

Perhaps we could modestly direct you towards The Quick Guide To Digital Audio Recording.