PyOracle: Automatic analysis of musical structure using Python

PyOracle is an ongoing project which uses Python to automatically analyze aspects of musical structure.

Audio Oracle, an algorithm based on the Factor Oracle string matching algorithm, is used to detect introductions and repetitions of musical materials. Through this analysis, aspects of musical structure can be understood, and new versions of the analyzed work can be created. PyOracle aims to be a toolbox for exploring this type of automated musical analysis.

PyOracle has applications for:

  • automated music analysis
  • algorithmic composition
  • machine improvisation

PyOracle is currently implemented as a Python module, and can be used as a stand-alone CL application or embedded in Max or PD through the py/pyext externals.

Factor Oracle

PyOracle uses the Audio Oracle (AO) algorithm, which is based on the Factor Oracle (FO) string matching structure proposed by Allauzen et al. FO is an acyclic automaton which recognizes all substrings ("factors") in a given string. FO will also sometimes recognize factors which are not part of the original string, but this is not a problem for our application. The structure can be built in an online fashion, i.e. character-by-character, and allows for efficient substring searches.

Forward transitions (upper arcs in figures) from state 0 indicate the appearance of a new character, while forward transitions from other states indicate the appearance of sub-factors.

Suffix links (lower arcs) point backward to the occurence of repeated patterns. The connected states share some common context. The length of the repeated suffix (LRS) is not indicated in this visualization.

So, navigating this oracle structure can allow generation of new patterns which inherit some of the structure of the original sequence. Traveling along forward transitions gives us subfactors (sometimes false) of the original sequence, which jumping back along suffixes allows for creation of new sequences which share a common context.

Factor Oracle

FO Algorithm

This code is called every time a new state $i$ is added:

def add_state(states, new_data):
    # create a new state numbered i
    new_state = State.State(len(states))
    states.append(new_state) 
    # assign a new transition from state i - 1 to state i
    states[-2].add_transition(new_data, states[-1]) 
    # iteratively backtrack suffixes from state i - 1
    k = states[-2].suffix
    while k and (new_data not in [s.data for s in k.transition]):
        # while we haven't reached state zero and the suffix doesn't have a forward link to a state == the new state
        k.add_transition(new_data, states[-1]) # add forward transition from k to the new state
        k = k.suffix # continue to follow suffix links
    if k:
        # if backtrack ended before state 0, look at all transitions from where backtrack ended
        # if one of those transitions points to the same letter as the new input, the new state
        # will have a suffix to the state pointed to by that transition
        for t in k.transition:
            if t.data == new_data:
                S_i = t
        states[-1].add_suffix(S_i.pointer)
    else:
        # backtracked to state 0
        states[-1].add_suffix(states[0])

Audio Oracle

AO is a sequential learning automaton which can incrementally learn repeating structure of an audio signal.

AO is used to index audio data, and is based on string-matching (FO) and information-theoretic approaches.

An audio signal is analyzed using one of many feature-extraction algorithms, and subdivided into states. These states are analogous to the characters of a string used in FO. The AO structure is constructed similarly to FO, but uses a distance function to determine similarity between states.

def add_state(states, new_data, threshold = 0, weights = None):
    # create a new state numbered i
    new_state = State.State(len(states))
    states.append(new_state) 
    # assign a new transition from state i - 1 to state i
    states[-2].add_transition(new_data, states[-1]) 
    k = states[-2].suffix
    pi_1 = states[-2]
    # iteratively backtrack suffixes from state i - 1
    # adding shlomo's suggested improvement - gs 06.04.2012
    while k != 0:
        dvec = [get_distance(new_data, s.data, weights) < threshold for s in k.transition]
        # if there is not a transition from the suffix
        if True not in dvec:
            k.add_transition(new_data, states[-1])
            pi_1 = k # from shlomo's code
            k = k.suffix
        else:
            break
    # if backtrack ended before state 0:
    if k == None or k == 0:
        states[-1].add_suffix(states[0])
    else:
        # filter out all above distance thresh - 06.04.2012
        filtered_transitions = filter(lambda x: get_distance(x.data, new_data, weights) <= threshold, k.transition)
        # sort the possible suffixes by lrs
        sorted_list = sorted(filtered_transitions, key = lambda x: x.pointer.lrs)
        for t in sorted_list:
            if get_distance(t.data, new_data, weights) <= threshold: 
                S_i = t
                states[-1].add_suffix(S_i.pointer)
                # add reverse suffix
                S_i.pointer.add_reverse_suffix(states[-1])
                break
    # LRS algorithm from shlomo's AOS.m
    ss = states[-1].suffix
    if ss == 0 or ss.number == 0 or ss.number == 1:
        states[-1].lrs = 0
    else:
        pi_2 = states[ss.number - 1]
        if pi_2 == pi_1.suffix:
            states[-1].lrs = pi_1.lrs + 1
        else:
            while pi_2.suffix != pi_1.suffix:
                pi_2 = pi_2.suffix
            states[-1].lrs = min(pi_1.lrs, pi_2.lrs) + 1

Building an Audio Oracle using PyOracle

PyOracle makes it easy to experiment with Oracle structures

In [1]:
%load_ext autoreload
%autoreload 2
In [103]:
import pyoracle
In [105]:
filename = 'audio/prokofiev.wav'
fft_size = 8192
hop_size = 8192 

Extract features from audio file - PyOracle currently supports MFCCs (1st 10), Chroma, RMS, # Zerocrossings, Spectral Centroid.

In [15]:
features = pyoracle.make_features(filename, fft_size, hop_size)
sample_rate=44100
num_channels=2
sample_width=2
num_frames=2006361, num_secs=45.000000
bytes_per_frame=4
bytes_per_second=176400
bytes_per_buffer=65536

As an example, plot feature data (a bit of a distorted view, because of the hop size). Matplotlib (part of Numpy/Scipy) makes it easy to do Matlab-style plotting.

In [109]:
cur_feature = 'rms'

# features is a dictionary, so we can access a particular feature vector with the corresponding key
plot(features[cur_feature])

xlim((0, len(features[cur_feature])))
title(cur_feature + ' per Analysis Frame', fontsize=18)
xlabel('Analysis Frame', fontsize=14)
ylabel(cur_feature, fontsize=14)

fig = gcf()
fig.set_size_inches(20, 4)

Building an audio oracle from our features:

pyoracle.make_oracle() takes arguments for:

  • a threshold value for the distance function
  • the complete feature vector from the feature extractor
  • a string indicating the feature to build the oracle on
  • an optional number of feature frames to average
In [120]:
frames_per = 1
threshold = 0.1
oracle = pyoracle.make_oracle(threshold, features, 'chroma', frames_per)

Oracle Visualization -

pyoracle.draw_oracle() writes a visualization of the oracle to disk.

IPython Notebook allows the image to be loaded from disk and displayed inline.

Suffix links to state 0 are omitted for clarity.

In [121]:
from IPython.core.display import Image
pyoracle.draw_oracle(oracle, 'output.png', size=(1000, 400))
Image(filename='output.png')
Out[121]:

Changing the number of frames averaged together has an impact on the temporal resolution of the oracle.

Information Rate

Information Rate (IR) is a measure of the reduction in uncertainty about the present given what we know about the past, and is defined as the difference between the unconditional complexity of $x$ and the conditional complexity of $x_n$ given $x_{past}$:

$$ IR(x_{past}, x_n) = C(x_n) - C(x_n | x_{past}) $$

where

$$ C(\cdot) $$

is a compressed version of the oracle and is used to estimate the complexity. It is possible to have multiple suffixes which are varying lengths of the same pattern. We can compress the oracle by keeping only the longest repeated suffixes. The longest repeated suffix (LRS) of a state is the longest suffix of the prefix to a state $i$ that appears at least twice in the prefix. I.e. the longest sequence of states leading up to $i$ that appears at least twice before $i$.

Compression in AO is as follows:

  1. We first need the size of the 'alphabet' - this is straight-forward in the symbolic FO, but in AO we use the number of transitions from state 0. Each transition from 0 corresponds to a new state. The number of unique states = size of the alphabet.
  2. The number of bits needed to encode that set is $log2(# of transitions from 0)$. This is used as a measure of unconditional complexity, giving us: $C(x_n)$
  3. An array $K = [1]$ of encoding events is generated:
    1. If no repetition of $i$ is found, then $sfx(i) = 0$ . This is a new frame, and will need to be individually encoded. It is added to $K$
    2. Otherwise, if a suffix link to an earlier state is found, and the LRS < the number of states since the last encoding event, the entire segment is encoded as $(length, position)$.
  4. Using $K$, we can determine $C(x_n|x_{past})$. For every state i, # bits required to represent the state = # bits required to represent the pair (length, position) / length of encoded block. $$ C(x_n|x_{past}) = \frac{log_2(N) + log_2(M)}{K(i+1) - K(i)} $$ where $M = max(LRS){ }$ and $N$ is sequence length.

Code is an indication of the compressed version of the oracle. Each tuple indicates the number and source of repetitions.

In [122]:
ir, code, compror = pyoracle.calculate_ir(oracle, alpha='None', old=True)
plot(ir)
ylim((0, max(ir) + 0.25))
xlim((0, len(ir)))
title('IR of Oracle with threshold' + ' ' + str(threshold), fontsize=18)
fig = gcf()
fig.set_size_inches(20, 4)
print 'Code:', code
print 'Compror:', compror
Code: [(0, 1), (0, 2), (0, 3), (1, 3), (0, 5), (1, 3), (0, 7), (0, 8), (1, 9), (0, 10), (0, 11), (0, 12), (0, 13), (0, 14), (0, 15), (0, 16), (0, 17), (0, 18), (0, 19), (1, 6), (0, 21), (0, 22), (1, 23), (1, 11), (0, 25), (0, 26), (0, 27), (2, 28), (0, 30), (0, 31), (1, 2), (3, 33), (0, 36), (3, 37), (0, 40), (0, 41), (4, 42), (0, 46), (0, 47), (2, 48), (0, 50), (0, 51), (1, 51), (0, 53), (1, 48), (1, 48), (1, 42), (0, 57), (2, 42), (0, 60), (1, 61), (4, 47), (3, 52), (4, 54), (1, 3), (1, 3), (1, 20), (1, 3), (0, 77), (0, 78), (0, 79), (2, 80), (1, 15), (0, 83), (1, 16), (0, 85), (1, 41), (0, 87), (0, 88), (1, 2), (1, 8), (0, 91), (0, 92), (1, 6), (0, 94), (4, 79), (0, 99), (1, 100), (0, 101), (1, 78), (0, 103), (0, 104), (2, 105), (1, 3), (6, 93), (1, 82), (0, 115), (0, 116), (0, 117), (3, 118), (1, 18), (1, 6), (1, 37), (0, 124), (0, 125), (0, 126), (1, 102), (2, 118), (1, 119), (0, 131), (1, 100), (0, 133), (3, 103), (0, 137), (0, 138), (1, 139), (0, 140), (0, 141), (3, 142), (2, 56), (1, 132), (1, 102), (0, 149), (0, 150), (0, 151), (0, 152), (0, 153), (2, 154), (0, 156), (0, 157), (1, 2), (0, 159), (1, 11), (0, 161), (1, 162), (1, 47), (1, 78), (3, 165), (0, 168), (7, 169), (4, 170), (4, 180), (1, 172), (0, 185), (0, 186), (1, 2), (1, 59), (0, 189), (0, 190), (0, 191), (2, 33), (2, 194), (0, 196), (0, 197), (0, 198), (3, 33), (3, 193), (3, 205), (1, 189), (0, 209), (1, 27), (0, 211), (0, 212), (3, 205), (3, 208), (1, 27), (1, 198), (0, 221), (0, 222), (1, 2), (8, 221), (5, 229), (2, 165), (0, 239), (3, 204), (2, 59), (1, 190), (4, 33), (2, 243), (28, 246), (12, 246), (2, 74), (2, 75), (1, 21), (3, 297), (1, 22), (1, 3), (2, 7), (1, 37), (0, 305), (2, 306), (0, 308), (2, 90), (1, 305), (2, 239), (5, 247), (3, 204), (6, 245), (7, 252), (25, 320), (3, 329), (1, 91), (2, 7), (2, 5), (8, 298), (3, 297), (0, 379), (7, 326), (3, 364), (2, 77), (3, 166), (2, 364), (2, 364), (1, 162), (16, 306), (8, 321), (1, 251), (1, 86), (2, 426), (2, 87), (0, 430), (7, 359), (0, 438), (23, 432), (0, 462), (1, 47), (0, 464), (11, 410), (6, 315), (5, 322), (0, 487), (0, 488), (0, 489)]
Compror: [4, 6, 9, 20, 23, 24, 29, 32, 35, 39, 45, 49, 52, 54, 55, 56, 59, 61, 65, 68, 72, 73, 74, 75, 76, 81, 82, 84, 86, 89, 90, 93, 98, 100, 102, 106, 107, 113, 114, 120, 121, 122, 123, 127, 129, 130, 132, 136, 139, 144, 146, 147, 148, 155, 158, 160, 162, 163, 164, 167, 175, 179, 183, 184, 187, 188, 193, 195, 201, 204, 207, 208, 210, 215, 218, 219, 220, 223, 231, 236, 238, 242, 244, 245, 249, 251, 279, 291, 293, 295, 296, 299, 300, 301, 303, 304, 307, 310, 311, 313, 318, 321, 327, 334, 359, 362, 363, 365, 367, 375, 378, 386, 389, 391, 394, 396, 398, 399, 415, 423, 424, 425, 427, 429, 437, 461, 463, 475, 481, 486]

How does IR correspond to the signal? (Might have to adjust number of frames averaged together in Oracle construction...) What about oracles on other features?

In [123]:
from scipy.io import wavfile
x = wavfile.read(filename)
x = [f[0] for f in x[1]]
figure()
s = specgram(x, NFFT=256)
fig = gcf()
ir = array(ir)
ir = ir / ir.max()
plot(range(0, len(x)/2, len(x)/2 / len(ir)+1), ir)
fig.set_size_inches(22, 4)
# xlim((0, len(x)))

Above, I chose the oracle distance threshold arbitrarily, but we can use IR to find the 'best' threshold. The total IR value changes as a function of the distance threshold, and we choose the threshold which yields maximum IR.

In [124]:
r = (0.0001, 0.5, 0.02) # set a range of threshold values to check (FOUND TO BE 0.051 for zerocrossings and mfcc)
ideal_t = pyoracle.calculate_ideal_threshold(r, features, 'chroma', frames_per_state=frames_per, alpha=1.1, old=True) 
In [125]:
x_a = [i[1] for i in ideal_t[1]]
y_a = [i[0] for i in ideal_t[1]]
plot(x_a, y_a)
title('IR vs. Threshold Value')
Out[125]:
<matplotlib.text.Text at 0x9065370>
In [82]:
# best threshold is
ideal_t[0]
Out[82]:
(845.8513267779111, 0.0201)

So now, in theory, this is the best oracle.

In [126]:
best_oracle = pyoracle.make_oracle(ideal_t[0][1], features, 'chroma', frames_per_state=frames_per)

This oracle seems to correspond more closely to the binary form of the Prokofiev work.

In [127]:
pyoracle.draw_oracle(best_oracle, 'boutput.png', size=(1200, 400))
Image(filename='boutput.png')
Out[127]:
In [128]:
figure()
ir, code, compror = pyoracle.calculate_ir(best_oracle, 0, old=True)
s = specgram(x, NFFT=256)
fig = gcf()
ir = array(ir)
ir = ir / ir.max()
plot(range(0, len(x)/2, len(x)/2 / len(ir)+1), ir)
fig.set_size_inches(24, 4)
# xlim((0, len(x)))