Tips and tricks for working with PCM data
Our models take raw PCM samples as input, and output raw PCM samples. This is relevant for when we process streams of audio; we don’t always work on files. Audio can stream in from a network source, a physical hardware source, or from somewhere else. For this reason we have developed a version of our inference engine that is an audio stream processor. At one of our Linux workstations we can simply pipe the PCM samples through the model and get the output.
When we train data, the requirement is the same. Our training data sets can be sourced in many types of media containers: extracting PCM samples in binary from all source data simplifies the organization of training data.
Before our models can train or process audio using real-world data, there are many issues to deal with, such as different compression algorithms, byte ordering, bit-depth, endianness, and reverse interleaving.
Fortunately there are some great tricks we rely on regularly for extracting PCM samples from diffetent kinds of encoding. Here are a few of them.
Let’s say I have the following WAV file: mono, 44.1k, 32-bit. We can use the awesome soxi command as depicted above to view some important characteristics about the file, such as:
- the number of channels
- the bit-depth
- the encoding of the samples
We’ll stick to signed integers as our encoding throughout this short tutorial.
Now I want to extract just the PCM data. I can do so using ffmpeg:
ffmpeg -i mono-32bit.wav -f f32le -acodec pcm_f32le mono-32bit.pcm
To break down this command:
- -i is for the input file
- -f is for the source format, in our case: f32le
- -acodec to the the output codec: pcm_f32le
- the output filename
I can then play the PCM audio by invoking the ffplay command. It just has to be told what exactly it’s playing, as this information is usually contained in the WAV fmt header, which is not included when only the PCM data is saved.
ffplay -f f32le -ar 44100 -ac 1 mono-32bit.pcm
To break down this command:
- -f is for the format: f32le
- -ar is for the sample rate
- -ac is the number of channels
- the filename of the source data
There’s a screenshot below of the ffplay command running. Check out the cool spectrogram :)
If I want to convert the original WAV file to 16-bit, I can do so using the following command:
ffmpeg -i mono-32bit.wav -c:a pcm_s16le -ar 44100 mono-16bit.wav
To break down this command:
- -i is for the input file
- -c:a pcm_s16le indicates that we want to apply this conversion to the audio stream(s) matching signed 16-bit little endian integers
- -ar is the sampling rate
- the filename of the output data
I can then extract the 16-bit PCM samples like so:
ffmpeg -i mono-16bit.wav -f s16le -acodec pcm_s16le mono-16bit.pcm
To play it, run the following similar command:
ffplay -f s16le -ar 44100 -ac 1 mono-16bit.wav