Tips and tricks for working with PCM data

Our models take raw PCM samples as input, and output raw PCM samples. This is relevant for when we process streams of audio; we don’t always work on files. Audio can stream in from a network source, a physical hardware source, or from somewhere else. For this reason we have developed a version of our inference engine that is an audio stream processor. At one of our Linux workstations we can simply pipe the PCM samples through the model and get the output.

When we train data, the requirement is the same. Our training data sets can be sourced in many types of media containers: extracting PCM samples in binary from all source data simplifies the organization of training data.

Before our models can train or process audio using real-world data, there are many issues to deal with, such as different compression algorithms, byte ordering, bit-depth, endianness, and reverse interleaving.

Fortunately there are some great tricks we rely on regularly for extracting PCM samples from diffetent kinds of encoding. Here are a few of them.

soxi output

Let’s say I have the following WAV file: mono, 44.1k, 32-bit. We can use the awesome soxi command as depicted above to view some important characteristics about the file, such as:

the number of channels
the bit-depth
the encoding of the samples

We’ll stick to signed integers as our encoding throughout this short tutorial.

Now I want to extract just the PCM data. I can do so using ffmpeg:

ffmpeg -i mono-32bit.wav -f f32le -acodec pcm_f32le mono-32bit.pcm

To break down this command:

-i is for the input file
-f is for the source format, in our case: f32le
-acodec to the the output codec: pcm_f32le
the output filename

I can then play the PCM audio by invoking the ffplay command. It just has to be told what exactly it’s playing, as this information is usually contained in the WAV fmt header, which is not included when only the PCM data is saved.

ffplay -f f32le -ar 44100 -ac 1 mono-32bit.pcm

To break down this command:

-f is for the format: f32le
-ar is for the sample rate
-ac is the number of channels
the filename of the source data

There’s a screenshot below of the ffplay command running. Check out the cool spectrogram :)

ffplay output

If I want to convert the original WAV file to 16-bit, I can do so using the following command:

ffmpeg -i mono-32bit.wav -c:a pcm_s16le -ar 44100 mono-16bit.wav

To break down this command:

-i is for the input file
-c:a pcm_s16le indicates that we want to apply this conversion to the audio stream(s) matching signed 16-bit little endian integers
-ar is the sampling rate
the filename of the output data

I can then extract the 16-bit PCM samples like so:

ffmpeg -i mono-16bit.wav -f s16le -acodec pcm_s16le mono-16bit.pcm

To play it, run the following similar command:

ffplay -f s16le -ar 44100 -ac 1 mono-16bit.wav

About

Tech

Blog

Team

Contact

PCM Cheatsheet

Tips and tricks for working with PCM data