Reading and Writing Audio Files with wave

back to overview page

The wave module is part of the Python standard library.

Documentation:

Audio data is handled with Python strings/buffers.

Advantages:

  • part of the standard library, no further dependencies
  • 24-bit files can be used (with manual conversion)
  • partial reading is possible
  • works with both Python 2 and 3

Disadvantages:

  • 32-bit float not supported
  • WAVEX doesn't work
  • manual de-interleaving and conversion is necessary

Reading

Reading a 16-bit WAV file into a NumPy array is not hard, but it requires a few lines of code:

In [1]:
import wave
import numpy as np
from contextlib import closing
from utility import pcm2float

with closing(wave.open('data/test_wav_pcm16.wav')) as w:
    channels = w.getnchannels()
    assert w.getsampwidth() == 2
    data = w.readframes(w.getnframes())

sig = np.frombuffer(data, dtype='<i2').reshape(-1, channels)

normalized = pcm2float(sig, np.float32)

But not so fast! Let's do that step-by-step and put a few explanations in between.

First, let's enable inline plotting and load all the NumPy goodies into the local namespace:

In [2]:
%pylab inline
Populating the interactive namespace from numpy and matplotlib

The only module we really need to load is wave. However, let's also load contextlib.closing in order to be able to automagically close all our opened files:

In [3]:
import wave
from contextlib import closing

# Note: starting from Python 3.4, wave.open() returns a context manager, closing() won't be necessary anymore

Now we open a 16-bit WAV file, show a few informations and read its contents.

In [4]:
with closing(wave.open('data/test_wav_pcm16.wav')) as w:
    framerate = w.getframerate()
    frames = w.getnframes()
    channels = w.getnchannels()
    width = w.getsampwidth()
    print("sampling rate = {framerate} Hz, length = {frames} samples, channels = {channels}, sample width = {width} bytes".format(**locals()))
    
    data = w.readframes(frames)
sampling rate = 44100 Hz, length = 15 samples, channels = 7, sample width = 2 bytes

We see that the sample width is 2 bytes, which we expected for a 16-bit file.

If the audio file has 7 channels, is 15 samples long and uses 2 bytes (16 bit) per sample, the buffer has a size of $7 \times 15 \times 2 = 210$ bytes.

In [5]:
len(data)
Out[5]:
210

Audio data is stored in a Python buffer object (basically a contiguous memory area with bytes of data). In Python 2 this looks somehow like a string object, in Python 3 it's an object of type bytes.

In [6]:
type(data)
Out[6]:
bytes

The buffer data looks like this (which isn't really helpful):

In [7]:
data
Out[7]:
b'\xff\x7f\xb6mm[$I\xdb6\x92$I\x12RsgDX\x14\xba\xef\xcc\xdd\r\xdf\xb7\xed\xceO\x96\xe7\xa1\xad\x1a\xbe\xcb\xf3\xcd\x16I\x12{\x1c\'\x9d\xff\xc6\x9a-l1\xdd\xf7\xb7\xed\x85\xe3\'\x9d\x019\x9a-\x94\xce\xdd\xf7I\x122\xb0\x96\xe7_R\x1a\xbe5\x0c\xcd\x16\xb7\xed\xae\x8cgD\xa8\xeb\xba\xef4"\r\xdfI\x12\x01\x80\xb6m\x93\xa4$I%\xc9\x92$\xb7\xed\xae\x8cgD\xa8\xeb\xba\xef4"\r\xdfI\x122\xb0\x96\xe7_R\x1a\xbe5\x0c\xcd\x16\xb7\xed\x85\xe3\'\x9d\x019\x9a-\x94\xce\xdd\xf7I\x12{\x1c\'\x9d\xff\xc6\x9a-l1\xdd\xf7\xb7\xed\xceO\x96\xe7\xa1\xad\x1a\xbe\xcb\xf3\xcd\x16I\x12RsgDX\x14\xba\xef\xcc\xdd\r\xdf\xb7\xed\xff\x7f\xb6mm[$I\xdb6\x92$I\x12'

We could convert the bytes by means of the struct module, but it's easier with the frombuffer() function from NumPy. We have to specify (similar to how it's done in the struct module) how the bytes should be interpreted to yield the desired numbers.

If we would interpret them as single (unsigned) bytes, it would look like this (still not very useful):

In [8]:
frombuffer(data, dtype='B')
Out[8]:
array([255, 127, 182, 109, 109,  91,  36,  73, 219,  54, 146,  36,  73,
        18,  82, 115, 103,  68,  88,  20, 186, 239, 204, 221,  13, 223,
       183, 237, 206,  79, 150, 231, 161, 173,  26, 190, 203, 243, 205,
        22,  73,  18, 123,  28,  39, 157, 255, 198, 154,  45, 108,  49,
       221, 247, 183, 237, 133, 227,  39, 157,   1,  57, 154,  45, 148,
       206, 221, 247,  73,  18,  50, 176, 150, 231,  95,  82,  26, 190,
        53,  12, 205,  22, 183, 237, 174, 140, 103,  68, 168, 235, 186,
       239,  52,  34,  13, 223,  73,  18,   1, 128, 182, 109, 147, 164,
        36,  73,  37, 201, 146,  36, 183, 237, 174, 140, 103,  68, 168,
       235, 186, 239,  52,  34,  13, 223,  73,  18,  50, 176, 150, 231,
        95,  82,  26, 190,  53,  12, 205,  22, 183, 237, 133, 227,  39,
       157,   1,  57, 154,  45, 148, 206, 221, 247,  73,  18, 123,  28,
        39, 157, 255, 198, 154,  45, 108,  49, 221, 247, 183, 237, 206,
        79, 150, 231, 161, 173,  26, 190, 203, 243, 205,  22,  73,  18,
        82, 115, 103,  68,  88,  20, 186, 239, 204, 221,  13, 223, 183,
       237, 255, 127, 182, 109, 109,  91,  36,  73, 219,  54, 146,  36,
        73,  18], dtype=uint8)

To be able to do something useful with the data, we have to take pairs of 2 bytes and convert them to 16-bit integers. Thereby we have to consider that data in WAV files is stored in little endian format (the least significan byte comes first and then the most significant byte), which is specified with '<' in the format string.

Let's also reshape the array to get a column for each channel:

In [9]:
sig = frombuffer(data, dtype='<i2').reshape(-1, channels)
sig
Out[9]:
array([[ 32767,  28086,  23405,  18724,  14043,   9362,   4681],
       [ 29522,  17511,   5208,  -4166,  -8756,  -8435,  -4681],
       [ 20430,  -6250, -21087, -16870,  -3125,   5837,   4681],
       [  7291, -25305, -14593,  11674,  12652,  -2083,  -4681],
       [ -7291, -25305,  14593,  11674, -12652,  -2083,   4681],
       [-20430,  -6250,  21087, -16870,   3125,   5837,  -4681],
       [-29522,  17511,  -5208,  -4166,   8756,  -8435,   4681],
       [-32767,  28086, -23405,  18724, -14043,   9362,  -4681],
       [-29522,  17511,  -5208,  -4166,   8756,  -8435,   4681],
       [-20430,  -6250,  21087, -16870,   3125,   5837,  -4681],
       [ -7291, -25305,  14593,  11674, -12652,  -2083,   4681],
       [  7291, -25305, -14593,  11674,  12652,  -2083,  -4681],
       [ 20430,  -6250, -21087, -16870,  -3125,   5837,   4681],
       [ 29522,  17511,   5208,  -4166,  -8756,  -8435,  -4681],
       [ 32767,  28086,  23405,  18724,  14043,   9362,   4681]], dtype=int16)

Note that neither frombuffer() nor reshape() made a copy of the data. We are still using the buffer of bytes we got from readframes(). By using the .base attribute of the array we get the result of frombuffer() (before reshape()) and by using .base a second time, we get a reference to the original buffer object:

In [10]:
sig.base.base is data
Out[10]:
True

With the flags attribute we get a few details about the buffer. C_CONTIGUOUS means that the rows are contiguous (in row-major format, like it's used in C). We also see that sig doesn't "own" the data and that it's not writable:

In [11]:
sig.flags
Out[11]:
  C_CONTIGUOUS : True
  F_CONTIGUOUS : False
  OWNDATA : False
  WRITEABLE : False
  ALIGNED : True
  UPDATEIFCOPY : False

We've already got the correct values but if we want to do some signal processing with the data, it's normally easier to convert the numers to floating point format and to normalize them to a range from -1 to 1.

To do that, I wrote a little helper function called pcm2float(), located in the file utility.py, let's load it and apply it to our signal:

In [12]:
from utility import pcm2float

normalized = pcm2float(sig, 'float32')

plot(normalized);

Because we change the data type from int16 to float32, a copy of the array is created, which now is writable and "owns" its data:

In [13]:
normalized.flags
Out[13]:
  C_CONTIGUOUS : True
  F_CONTIGUOUS : False
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  UPDATEIFCOPY : False

24-bit files can be opened with wave.open() as well, but the conversion is a little more complicated. (See also http://stackoverflow.com/a/17443437/500098)

In [14]:
with closing(wave.open('data/test_wav_pcm24.wav')) as w:
    framerate = w.getframerate()
    frames = w.getnframes()
    channels = w.getnchannels()
    width = w.getsampwidth()
    print("sampling rate = {framerate} Hz, length = {frames} samples, channels = {channels}, sample width = {width} bytes".format(**locals()))
    
    data = w.readframes(frames)
sampling rate = 44100 Hz, length = 15 samples, channels = 7, sample width = 3 bytes

Note that the sample width is 3 bytes (because we loaded a 24-bit file). Sadly, there is no 3-byte integer type in NumPy. Therefore, we have to fill each 3-byte number with a further byte and then convert it to a 4-byte integer. We can add this byte (filled with zeros) either as most significant or a least significant byte. This doesn't change the precision of the data, we just have to remember which one it was when we do calculations with the stored values. If we add the zero-byte as LSB, the resulting values will have the full range of a 4-byte integer, therefore we can use the pcm2float() function from above. If we would add the zero-byte as MSB, the range would be limited to a 3-byte integer and we would have to write a new function for normalization.

In [15]:
assert width == 3

temp = bytearray()

for i in range(0, len(data), 3):
    temp.append(0)
    temp.extend(data[i:i+3])

# Starting with an empty bytearray and extending it on each iteration might be slow.
# See further below for how to reserve all necessary memory in the beginning.

four_bytes = frombuffer(temp, dtype=uint8).reshape(-1, 4)
four_bytes
Out[15]:
array([[  0, 255, 255, 127],
       [  0, 219, 182, 109],
       [  0, 182, 109,  91],
       [  0, 146,  36,  73],
       [  0, 109, 219,  54],
       [  0,  73, 146,  36],
       [  0,  36,  73,  18],
       [  0, 242,  82, 115],
       [  0, 222, 103,  68],
       [  0,  67,  88,  20],
       [  0, 100, 185, 239],
       [  0,  17, 204, 221],
       [  0, 223,  12, 223],
       [  0, 220, 182, 237],
       [  0, 131, 206,  79],
       [  0,  22, 150, 231],
       [  0,  47, 160, 173],
       [  0, 191,  25, 190],
       [  0,  11, 203, 243],
       [  0,  74, 205,  22],
       [  0,  36,  73,  18],
       [  0, 145, 123,  28],
       [  0, 158,  38, 157],
       [  0, 199, 254, 198],
       [  0, 148, 154,  45],
       [  0, 177, 108,  49],
       [  0, 178, 220, 247],
       [  0, 220, 182, 237],
       [  0, 111, 132, 227],
       [  0, 158,  38, 157],
       [  0,  57,   1,  57],
       [  0, 148, 154,  45],
       [  0,  79, 147, 206],
       [  0, 178, 220, 247],
       [  0,  36,  73,  18],
       [  0, 125,  49, 176],
       [  0,  22, 150, 231],
       [  0, 209,  95,  82],
       [  0, 191,  25, 190],
       [  0, 245,  52,  12],
       [  0,  74, 205,  22],
       [  0, 220, 182, 237],
       [  0,  14, 173, 140],
       [  0, 222, 103,  68],
       [  0, 189, 167, 235],
       [  0, 100, 185, 239],
       [  0, 239,  51,  34],
       [  0, 223,  12, 223],
       [  0,  36,  73,  18],
       [  0,   1,   0, 128],
       [  0, 219, 182, 109],
       [  0,  74, 146, 164],
       [  0, 146,  36,  73],
       [  0, 147,  36, 201],
       [  0,  73, 146,  36],
       [  0, 220, 182, 237],
       [  0,  14, 173, 140],
       [  0, 222, 103,  68],
       [  0, 189, 167, 235],
       [  0, 100, 185, 239],
       [  0, 239,  51,  34],
       [  0, 223,  12, 223],
       [  0,  36,  73,  18],
       [  0, 125,  49, 176],
       [  0,  22, 150, 231],
       [  0, 209,  95,  82],
       [  0, 191,  25, 190],
       [  0, 245,  52,  12],
       [  0,  74, 205,  22],
       [  0, 220, 182, 237],
       [  0, 111, 132, 227],
       [  0, 158,  38, 157],
       [  0,  57,   1,  57],
       [  0, 148, 154,  45],
       [  0,  79, 147, 206],
       [  0, 178, 220, 247],
       [  0,  36,  73,  18],
       [  0, 145, 123,  28],
       [  0, 158,  38, 157],
       [  0, 199, 254, 198],
       [  0, 148, 154,  45],
       [  0, 177, 108,  49],
       [  0, 178, 220, 247],
       [  0, 220, 182, 237],
       [  0, 131, 206,  79],
       [  0,  22, 150, 231],
       [  0,  47, 160, 173],
       [  0, 191,  25, 190],
       [  0,  11, 203, 243],
       [  0,  74, 205,  22],
       [  0,  36,  73,  18],
       [  0, 242,  82, 115],
       [  0, 222, 103,  68],
       [  0,  67,  88,  20],
       [  0, 100, 185, 239],
       [  0,  17, 204, 221],
       [  0, 223,  12, 223],
       [  0, 220, 182, 237],
       [  0, 255, 255, 127],
       [  0, 219, 182, 109],
       [  0, 182, 109,  91],
       [  0, 146,  36,  73],
       [  0, 109, 219,  54],
       [  0,  73, 146,  36],
       [  0,  36,  73,  18]], dtype=uint8)

Now we have a bytearray where each group of 4 bytes represents an integer (in little-endian order).

Next, let's convert it to actual integers and reshape the channels into columns.

In [16]:
sig = frombuffer(temp, dtype='<i4').reshape(-1, channels)
sig
Out[16]:
array([[ 2147483392,  1840700160,  1533916672,  1227133440,   920349952,
          613566720,   306783232],
       [ 1934815744,  1147657728,   341328640,  -273062912,  -573828864,
         -552804608,  -306783232],
       [ 1338934016,  -409594368, -1382011136, -1105608960,  -204797184,
          382552576,   306783232],
       [  477860096, -1658413568,  -956381440,   765105152,   829206784,
         -136531456,  -306783232],
       [ -477860096, -1658413568,   956381440,   765105152,  -829206784,
         -136531456,   306783232],
       [-1338934016,  -409594368,  1382011136, -1105608960,   204797184,
          382552576,  -306783232],
       [-1934815744,  1147657728,  -341328640,  -273062912,   573828864,
         -552804608,   306783232],
       [-2147483392,  1840700160, -1533916672,  1227133440,  -920349952,
          613566720,  -306783232],
       [-1934815744,  1147657728,  -341328640,  -273062912,   573828864,
         -552804608,   306783232],
       [-1338934016,  -409594368,  1382011136, -1105608960,   204797184,
          382552576,  -306783232],
       [ -477860096, -1658413568,   956381440,   765105152,  -829206784,
         -136531456,   306783232],
       [  477860096, -1658413568,  -956381440,   765105152,   829206784,
         -136531456,  -306783232],
       [ 1338934016,  -409594368, -1382011136, -1105608960,  -204797184,
          382552576,   306783232],
       [ 1934815744,  1147657728,   341328640,  -273062912,  -573828864,
         -552804608,  -306783232],
       [ 2147483392,  1840700160,  1533916672,  1227133440,   920349952,
          613566720,   306783232]], dtype=int32)

Let's put this into a function, just in case we find ourselves needing to load 24-bit WAV files some time in the future.

In [17]:
def pcm24to32_bytearray(data, channels=1):
    if len(data) % 3 != 0:
        raise ValueError("Size of data must be a multiple of 3 bytes")

    size = len(data) // 3
    
    # reserve memory (initialized with null bytes):
    temp = bytearray(size * 4)

    for i in range(size):
        newidx = i * 4 + 1
        oldidx = i * 3
        temp[newidx:newidx+3] = data[oldidx:oldidx+3]

    return np.frombuffer(temp, dtype='<i4').reshape(-1, channels)

This looks a bit clumsy (TODO: timing measurements!), let's try it another way:

In [18]:
def pcm24to32_view(data, channels=1):
    if len(data) % 3 != 0:
        raise ValueError("Size of data must be a multiple of 3 bytes")

    temp = np.zeros((len(data) // 3, 4), dtype='B')
    temp[:, 1:] = np.frombuffer(data, dtype='B').reshape(-1, 3)
    return temp.view('<i4').reshape(-1, channels)

This is what we're doing here:

  • create a zero-initialized array of bytes with 4 columns
  • write the data to the last 3 columns (leaving the first column filled with zeros)
  • interpret each row (4 bytes) as one integer

I think this is better than pcm24to32_bytearray(), but it still has a little flaw: the returned array doesn't own it's data.

In [19]:
sig = pcm24to32_view(data, channels)
sig.flags.owndata
Out[19]:
False

This is because a view is returned. The original array (with its 4 columns) has exactly the same values as the array four_bytes which we were using before.

In [20]:
all(sig.base == four_bytes)
Out[20]:
True

So instead of returning a view, let's turn this around and create the correct array first and use a view internally to copy the data. Additionally, instead of using the reshape() method (which would also return a view), we directly set the shape.

In [21]:
def pcm24to32(data, channels=1):
    if len(data) % 3 != 0:
        raise ValueError("Size of data must be a multiple of 3 bytes")
    
    out = np.zeros(len(data) // 3, dtype='<i4')
    out.shape = -1, channels
    temp = out.view('B').reshape(-1, 4)
    temp[:, 1:] = np.frombuffer(data, dtype='B').reshape(-1, 3)
    return out

I think this one is better. Now the returned array owns its data:

In [22]:
sig = pcm24to32(data, channels)
sig.flags.owndata
Out[22]:
True

BTW, the result of all three functions is of course the same:

In [23]:
all(sig == pcm24to32_view(data, channels)), all(sig == pcm24to32_bytearray(data, channels))
Out[23]:
(True, True)

Finally, we convert the 32-bit integer values to 32-bit floating point values and normalize them to a range from -1 to 1 (see above).

In [24]:
normalized = pcm2float(sig, float32)
plot(normalized);

WAVEX doesn't work:

In [25]:
import traceback
try:
    w = wave.open('data/test_wavex_pcm16.wav')
except:
    traceback.print_exc()
else:
    print("It works (unexpectedly)!")
Traceback (most recent call last):
  File "<ipython-input-25-b3a5b32b9f2d>", line 3, in <module>
    w = wave.open('data/test_wavex_pcm16.wav')
  File "/usr/lib/python3.4/wave.py", line 497, in open
    return Wave_read(f)
  File "/usr/lib/python3.4/wave.py", line 163, in __init__
    self.initfp(f)
  File "/usr/lib/python3.4/wave.py", line 143, in initfp
    self._read_fmt_chunk(chunk)
  File "/usr/lib/python3.4/wave.py", line 259, in _read_fmt_chunk
    raise Error('unknown format: %r' % (wFormatTag,))
wave.Error: unknown format: 65534

Let's try 32-bit float:

In [26]:
try:
    w = wave.open('data/test_wav_float32.wav')
except:
    traceback.print_exc()
else:
    print("It works (unexpectedly)!")
Traceback (most recent call last):
  File "<ipython-input-26-1b1692fc5dc7>", line 2, in <module>
    w = wave.open('data/test_wav_float32.wav')
  File "/usr/lib/python3.4/wave.py", line 497, in open
    return Wave_read(f)
  File "/usr/lib/python3.4/wave.py", line 163, in __init__
    self.initfp(f)
  File "/usr/lib/python3.4/wave.py", line 143, in initfp
    self._read_fmt_chunk(chunk)
  File "/usr/lib/python3.4/wave.py", line 259, in _read_fmt_chunk
    raise Error('unknown format: %r' % (wFormatTag,))
wave.Error: unknown format: 3

Writing

TODO

In [26]:
 

Version Info

In [27]:
import sys, IPython
print("Versions: NumPy = {}; IPython = {}".format(numpy.__version__, IPython.__version__))

print("Python interpreter:")
print(sys.version)
Versions: NumPy = 1.8.1; IPython = 2.1.0
Python interpreter:
3.4.1+ (default, Aug 10 2014, 13:20:12) 
[GCC 4.9.1]