PyOracle is an ongoing project which uses Python to automatically analyze aspects of musical structure.

Audio Oracle, an algorithm based on the Factor Oracle string matching algorithm, is used to detect introductions and repetitions of musical materials. Through this analysis, aspects of musical structure can be understood, and new versions of the analyzed work can be created. PyOracle aims to be a toolbox for exploring this type of automated musical analysis.

PyOracle has applications for:

- automated music analysis
- algorithmic composition
- machine improvisation

PyOracle is currently implemented as a Python module, and can be used as a stand-alone CL application or embedded in Max or PD through the py/pyext externals.

PyOracle uses the *Audio Oracle* (AO) algorithm, which is based on the *Factor Oracle* (FO) string matching structure proposed by Allauzen et al. FO is an acyclic automaton which recognizes all substrings ("factors") in a given string. FO will also sometimes recognize factors which are not part of the original string, but this is not a problem for our application. The structure can be built in an online fashion, i.e. character-by-character, and allows for efficient substring searches.

*Forward transitions* (upper arcs in figures) from state 0 indicate the appearance of a new character, while forward transitions from other states indicate the appearance of sub-factors.

*Suffix links* (lower arcs) point backward to the occurence of repeated patterns. The connected states share some common context. The length of the repeated suffix (LRS) is not indicated in this visualization.

So, navigating this oracle structure can allow generation of new patterns which inherit some of the structure of the original sequence. Traveling along forward transitions gives us subfactors (sometimes false) of the original sequence, which jumping back along suffixes allows for creation of new sequences which share a common context.

This code is called every time a new state $i$ is added:

```
def add_state(states, new_data):
# create a new state numbered i
new_state = State.State(len(states))
states.append(new_state)
# assign a new transition from state i - 1 to state i
states[-2].add_transition(new_data, states[-1])
# iteratively backtrack suffixes from state i - 1
k = states[-2].suffix
while k and (new_data not in [s.data for s in k.transition]):
# while we haven't reached state zero and the suffix doesn't have a forward link to a state == the new state
k.add_transition(new_data, states[-1]) # add forward transition from k to the new state
k = k.suffix # continue to follow suffix links
if k:
# if backtrack ended before state 0, look at all transitions from where backtrack ended
# if one of those transitions points to the same letter as the new input, the new state
# will have a suffix to the state pointed to by that transition
for t in k.transition:
if t.data == new_data:
S_i = t
states[-1].add_suffix(S_i.pointer)
else:
# backtracked to state 0
states[-1].add_suffix(states[0])
```

AO is a sequential learning automaton which can incrementally learn repeating structure of an audio signal.

AO is used to index audio data, and is based on string-matching (FO) and information-theoretic approaches.

An audio signal is analyzed using one of many feature-extraction algorithms, and subdivided into states. These states are analogous to the characters of a string used in FO. The AO structure is constructed similarly to FO, but uses a distance function to determine similarity between states.

```
def add_state(states, new_data, threshold = 0, weights = None):
# create a new state numbered i
new_state = State.State(len(states))
states.append(new_state)
# assign a new transition from state i - 1 to state i
states[-2].add_transition(new_data, states[-1])
k = states[-2].suffix
pi_1 = states[-2]
# iteratively backtrack suffixes from state i - 1
# adding shlomo's suggested improvement - gs 06.04.2012
while k != 0:
dvec = [get_distance(new_data, s.data, weights) < threshold for s in k.transition]
# if there is not a transition from the suffix
if True not in dvec:
k.add_transition(new_data, states[-1])
pi_1 = k # from shlomo's code
k = k.suffix
else:
break
# if backtrack ended before state 0:
if k == None or k == 0:
states[-1].add_suffix(states[0])
else:
# filter out all above distance thresh - 06.04.2012
filtered_transitions = filter(lambda x: get_distance(x.data, new_data, weights) <= threshold, k.transition)
# sort the possible suffixes by lrs
sorted_list = sorted(filtered_transitions, key = lambda x: x.pointer.lrs)
for t in sorted_list:
if get_distance(t.data, new_data, weights) <= threshold:
S_i = t
states[-1].add_suffix(S_i.pointer)
# add reverse suffix
S_i.pointer.add_reverse_suffix(states[-1])
break
# LRS algorithm from shlomo's AOS.m
ss = states[-1].suffix
if ss == 0 or ss.number == 0 or ss.number == 1:
states[-1].lrs = 0
else:
pi_2 = states[ss.number - 1]
if pi_2 == pi_1.suffix:
states[-1].lrs = pi_1.lrs + 1
else:
while pi_2.suffix != pi_1.suffix:
pi_2 = pi_2.suffix
states[-1].lrs = min(pi_1.lrs, pi_2.lrs) + 1
```

PyOracle makes it easy to experiment with Oracle structures

In [1]:

```
%load_ext autoreload
%autoreload 2
```

In [103]:

```
import pyoracle
```

In [105]:

```
filename = 'audio/prokofiev.wav'
fft_size = 8192
hop_size = 8192
```

In [15]:

```
features = pyoracle.make_features(filename, fft_size, hop_size)
```

In [109]:

```
cur_feature = 'rms'
# features is a dictionary, so we can access a particular feature vector with the corresponding key
plot(features[cur_feature])
xlim((0, len(features[cur_feature])))
title(cur_feature + ' per Analysis Frame', fontsize=18)
xlabel('Analysis Frame', fontsize=14)
ylabel(cur_feature, fontsize=14)
fig = gcf()
fig.set_size_inches(20, 4)
```

pyoracle.make_oracle() takes arguments for:

- a threshold value for the distance function
- the complete feature vector from the feature extractor
- a string indicating the feature to build the oracle on
- an optional number of feature frames to average

In [120]:

```
frames_per = 1
threshold = 0.1
oracle = pyoracle.make_oracle(threshold, features, 'chroma', frames_per)
```

pyoracle.draw_oracle() writes a visualization of the oracle to disk.

IPython Notebook allows the image to be loaded from disk and displayed inline.

Suffix links to state 0 are omitted for clarity.

In [121]:

```
from IPython.core.display import Image
pyoracle.draw_oracle(oracle, 'output.png', size=(1000, 400))
Image(filename='output.png')
```

Out[121]:

Information Rate (IR) is a measure of the reduction in uncertainty about the present given what we know about the past, and is defined as the difference between the unconditional complexity of $x$ and the conditional complexity of $x_n$ given $x_{past}$:

$$ IR(x_{past}, x_n) = C(x_n) - C(x_n | x_{past}) $$where

$$ C(\cdot) $$is a compressed version of the oracle and is used to estimate the complexity. It is possible to have multiple suffixes which are varying lengths of the same pattern. We can compress the oracle by keeping only the longest repeated suffixes. The longest repeated suffix (LRS) of a state is the longest suffix of the prefix to a state $i$ that appears at least twice in the prefix. I.e. the longest sequence of states leading up to $i$ that appears at least twice before $i$.

Compression in AO is as follows:

- We first need the size of the 'alphabet' - this is straight-forward in the symbolic FO, but in AO we use the number of transitions from state 0. Each transition from 0 corresponds to a new state. The number of unique states = size of the alphabet.
- The number of bits needed to encode that set is $log2(# of transitions from 0)$. This is used as a measure of unconditional complexity, giving us: $C(x_n)$
- An array $K = [1]$ of encoding events is generated:
- If no repetition of $i$ is found, then $sfx(i) = 0$ . This is a new frame, and will need to be individually encoded. It is added to $K$
- Otherwise, if a suffix link to an earlier state is found, and the LRS < the number of states since the last encoding event, the entire segment is encoded as $(length, position)$.

- Using $K$, we can determine $C(x_n|x_{past})$. For every state i, # bits required to represent the state = # bits required to represent the pair (length, position) / length of encoded block. $$ C(x_n|x_{past}) = \frac{log_2(N) + log_2(M)}{K(i+1) - K(i)} $$ where $M = max(LRS){ }$ and $N$ is sequence length.

Code is an indication of the compressed version of the oracle. Each tuple indicates the number and source of repetitions.

In [122]:

```
ir, code, compror = pyoracle.calculate_ir(oracle, alpha='None', old=True)
plot(ir)
ylim((0, max(ir) + 0.25))
xlim((0, len(ir)))
title('IR of Oracle with threshold' + ' ' + str(threshold), fontsize=18)
fig = gcf()
fig.set_size_inches(20, 4)
print 'Code:', code
print 'Compror:', compror
```

In [123]:

```
from scipy.io import wavfile
x = wavfile.read(filename)
x = [f[0] for f in x[1]]
figure()
s = specgram(x, NFFT=256)
fig = gcf()
ir = array(ir)
ir = ir / ir.max()
plot(range(0, len(x)/2, len(x)/2 / len(ir)+1), ir)
fig.set_size_inches(22, 4)
# xlim((0, len(x)))
```

In [124]:

```
r = (0.0001, 0.5, 0.02) # set a range of threshold values to check (FOUND TO BE 0.051 for zerocrossings and mfcc)
ideal_t = pyoracle.calculate_ideal_threshold(r, features, 'chroma', frames_per_state=frames_per, alpha=1.1, old=True)
```

In [125]:

```
x_a = [i[1] for i in ideal_t[1]]
y_a = [i[0] for i in ideal_t[1]]
plot(x_a, y_a)
title('IR vs. Threshold Value')
```

Out[125]:

In [82]:

```
# best threshold is
ideal_t[0]
```

Out[82]:

So now, in theory, this is the best oracle.

In [126]:

```
best_oracle = pyoracle.make_oracle(ideal_t[0][1], features, 'chroma', frames_per_state=frames_per)
```

This oracle seems to correspond more closely to the binary form of the Prokofiev work.

In [127]:

```
pyoracle.draw_oracle(best_oracle, 'boutput.png', size=(1200, 400))
Image(filename='boutput.png')
```

Out[127]:

In [128]:

```
figure()
ir, code, compror = pyoracle.calculate_ir(best_oracle, 0, old=True)
s = specgram(x, NFFT=256)
fig = gcf()
ir = array(ir)
ir = ir / ir.max()
plot(range(0, len(x)/2, len(x)/2 / len(ir)+1), ir)
fig.set_size_inches(24, 4)
# xlim((0, len(x)))
```