Screaming Stone Design

About MP3 Encoding

All commercially available MP3s are encoded incorrectly and in this post I will explain why.

If you were, for example, to purchase an MP3 online from the likes of Amazon it will use constant bitrate encoding (CBE) at 128 kilo bits per second and use joint stereo.

This is considered an acceptable standard but because of the low bitrate it is possible to hear many “artefacts” and weird warbles in every recording.

To me this is actually not acceptable as MP3s cost nearly as much as compact discs and yet are far inferior and therefore you are not getting what you pay for.

In order to explain this better I will be using spectrograms.

A spectrogram is a visual representation of the spectrum of frequencies of a sound (or piece of music) as it varies with time - you can consider each to be a map of a piece of audio.

The following spectrogram is of a song from a CD which I own.

original sound spectrogram

When an audio recording is compressed it should be done in such a way that when it is uncompressed it is as close to the original as possible.

The following is the spectrogram of the same recording but after it has been compressed with 128 kbps constant bitrate encoding (CBE):

128 kbps spectrogram

At first glance these look identical and for most people they even sound identical but if you look carefully you will see that nearly all of the frequencies above 16 Khz have been cut off.

Far more than just the higher frequencies have been cut off - a huge amount of the original data is missing from the 128 kbps version.

The following shows much more clearly the difference between the original recording and the 128 kbps CBE version - basically it is a spectrogram of the sounds which are missing:

spectrogram of difference between original and 128 kbps version

An even better way to understand how much audio is missing from a 128 kbps CBE encoded file is to listen to the missing audio.

For many people the solution to this is simply to encode MP3s at the highest constant bit rate available - 320 kilo bits per second.

This next spectrogram shows the difference between the original recording and the 320 kbps CBE version:

spectrogram of difference between original and 320 kbps version

It is clearly far superior to the 128 kbps CBE version as far less of the sound is missing but this creates an entirely new problem.

Because the compressed data needs much more space it means the MP3 file is bigger - MUCH bigger.

The 128 kbps CBE version of the MP3 file is 4.8 Mb in size whereas the 320 kbps CBE version is a whopping 12.0 Mb, 2.5 times bigger.

The solution to both problems is to use variable bitrate encoding (VBE).

Frames

In order to understand how VBE works it is first necessary to understand how MP3s are subdivided for encoding.

Each second of audio is separated into 75 equal segments which are called frames.

So, a frame is 1/75th of a second long.

The song which I have used as an example above is 11450 frames long, and this equates to 11450/75 seconds which is 152.67 seconds (2 minutes and 32.67 seconds).

When a piece of audio is compressed using variable bitrate encoding (VBE) each frame uses only as many bits of data as is necessary to store the compressed data so that it can be uncompressed to accurately reproduce the original audio.

To understand this better look at the following example of the above song as compressed by the LAME encoder:

audio encoded by LAME using variable bitrate encoding

You can see that 5662 of the 11450 frames were encoded at 224 kbps because that was the minimum amount of data required by each of those frames in order to reproduce them accurately.

You can see also that only 17 of the entire 11450 frames were encoded at 320 kbps as only those 17 frames needed that much data to encode them!

Notice too that 139 of those frames needed as little as 32 kbps to encode them.

If you encoded the same song using 320 kbps constant bitrate encoding it would look like the following:

audio encoded by LAME using constant bitrate encoding at 320 kbps

It should be fairly obvious that encoding every single frame, especially ones that really don't need it, at 320 kbps is incredibly wasteful.

This song, when encoded using VBE, is only 8.9 mb - it is larger than the 128 kbps CBE version but smaller than the 320 kbps CBE version.

The following is the spectrogram of the VBE version of the song:

variable bitrate encoding spectrogram

It isn't as close to the original audio as the 320 kbps CBE version, as you can see by looking below at the difference between it and the original, but it is far superior in quality to the 128 kbps CBE version and it is “perceptually” almost identical to the original:

spectrogram of difference between original and variable bitrate encoded version

While it is possible to hear the imperfections of a file encoded at 128 kbps CBE (although many people can't) it is highly unlikely anyone will be able to hear such imperfections when that file is compressed using variable bitrate encoding (although some people will).

Hopefully you will realize, after reading this, that any of the MP3s you can purchase online are just pure junk and although they are very handy for portable listening, as so many devices can now play them, they are never going to be as good as compact discs.

Instead, do what I do, purchase your music in compact disc form and encode your own MP3s.