In this notebook, we will be exploring computational investigation of musical audio features. In particular, we will examine the workflow for finding associations between recorded music in an unfamiliar collection: in this case, the musical corpus is a set of tracks from someone's personal collection of music, but we (in principle) know nothing more about it in detail than that, other than perhaps some per-track metadata.
We’ll import some Python libraries here to use later on in the notebook. Don’t worry about this at the moment; for now just execute the cell, which makes the libraries available.
import glob
import csv
import matplotlib.pylab as plt
We will also import the library for the audio feature database, giving access to its functionality through the name adb
:
from pyadb import adb
The first step is to create an empty database. In the next cell, the first line simply gives a name to a string; the second creates a database file with the given filename, returning a Python object which can act as a handle or interface to that database file.
dbpathname = 'ecda2015.adb'
db = adb.Pyadb(dbpathname)
If you later create a new Pyadb
object with the same filename, the existing database (with all its existing data) will be opened. Usually, that is what you want: as we will see below, part of the point of the audio database is to store feature vectors for later computation. However, sometimes things go wrong and you want to start afresh; in that case, you can simply delete the database from the filesystem, either using a regular File Manager, or through the IPython notebook using
# import os
# os.remove(dbpathname)
# db = adb.Pyadb(dbpathname)
Note that this is commented out, to avoid deletion through accidental evaluation of the code; if you actually want to remove the database, you will need to uncomment the code. Also note the third line here: if you do delete the database for whatever reason, you will have to create a new Pyadb
object to go with a new database.
There is a downloadable archive of features (specifically, NNLS chroma features at a 1-second granularity) here. You should be able to download that archive file and extract it to the data
subdirectory of your iPython process.
The next cell is likely to seem cryptic, unless you are already experienced in Python or scripting. However, what it does is simple to describe: for each comma-separated variable (csv) file, it reads in the data, and inserts the numerical feature values into the database, with the music filename (minus the .flac
extension) as the database key. Again, don’t worry if the details of the code in the cell escape you; what is important is that after executing this cell, the database in memory (and stored on the filesystem) will know about the features.
for filename in glob.glob("data/Music/*/*/*.csv"):
f = open(filename)
x = []
k = ''
for line in csv.reader(f):
if k == '': k = line[0][0:-5]
x.append(map(float,line[2:14]))
if k != '': db.insert(featData=np.array(x), key=k)
We can ask the database to tell us about its contents; this is done using the liszt
method of the database object. (This is not just a bad pun; there are good reasons for not calling a method by the name list
). The liszt_results
function is a helper to print the output of the liszt
method as an HTML table rather than as a Python data structure.
liszt_results(db.liszt()[0:10])
Key | Vectors |
---|---|
Various Artists/Insalata (I Fagiolini)/02 - La Bomba | 147 |
Various Artists/Insalata (I Fagiolini)/16 - Losing my mind | 252 |
Various Artists/Insalata (I Fagiolini)/11 - Dieu! qu'il la fait bon regarder | 134 |
Various Artists/Insalata (I Fagiolini)/05 - Hey trolly loly lo | 235 |
Various Artists/Insalata (I Fagiolini)/10 - Singet dem Herrn: III. Lobet den Herrn | 247 |
Various Artists/Insalata (I Fagiolini)/01 - J'ay mis mon cueur | 208 |
Various Artists/Insalata (I Fagiolini)/08 - Audi coelum | 483 |
Various Artists/Insalata (I Fagiolini)/12 - A death | 176 |
Various Artists/Insalata (I Fagiolini)/13 - Early one morning | 205 |
Various Artists/Insalata (I Fagiolini)/15 - Nellie was a lady | 279 |
We can also ask the database to retrieve the numerical features we have inserted for a given key, and visualize them. The visualization here is not as polished as in Sonic Visualizer, but it can be useful to give a quick idea of the feature:
key = "Nicolas Gombert/Church Music (Henry's Eight feat. Director Jonathan Brown)/11 - Lugebat David Absalon"
xx = db.retrieve_datum(key, features=True)
plt.imshow(xx.transpose(), cmap=plt.cm.Greys)
plt.show()
One advantage of working in Python, rather than a more specialized tool, is that the visualizations we can produce are not limited by the tool itself. For example, we can calculate the self-similarity of the feature vector by comparing every time slice of the feature with every other, which sounds like a potentially complicated operation but can be expressed very simply:
plt.imshow(np.dot(xx,xx.transpose()), cmap=plt.cm.Greys)
plt.show()
Note the black line going down the main diagonal, indicating that each time slice is similar to itself. Regions of interest elsewhere in the plot are dark patches, which suggest that one passage is in some way similar to another (for example, the passage between about 250 and 290 seconds into the track is more similar than might be expected to the material between about 460 and 490 seconds.
The essential functionality that a database of audio features can offer us is to search for and retrieve data matching some criterion. At an absolute minimum, a database of audio features should be able to retrieve, given some track identifier, the features that have been stored for that track.
When investigating and navigating through a collection of unknown audio, however, it is not very useful to retrieve features corresponding to track identifiers, as that does not provide us with any information to go between tracks. Instead, what we will concentrate on here is searching for content.
When searching for content, we are looking for material in the database which matches, in some sense, the query content. We do not want to retrieve only content that exactly matches the query: if we did that, our search would not give us any new information. Instead, we want to retrieve data in the database which corresponds to sufficiently similar material as the query. The AudioDB software performs this conceptual operation by ordering the search results based on geometric distance between feature vectors.
To start with, we will perform a search using a single sequence as a query.
Firstly, why a sequence? As you might have seen from the self-similarity matrix (above), it can be hard to find structure in similarity, even when the features are at the relatively coarse granularity of one per second (some features are more typically extracted at 10 to 50 feature vectors per second). In addition, music is inherently time-based; we experience it not as isolated stimuli but as a continuous sequence. To replicate that, we use as a query not just one second’s worth of features, but (in this case) 15 seconds (the seqLength
dictionary entry, below).
The single sequence that we will use for this query starts at 30 seconds (the seqStart
entry), and we will ask for the 10 retrieved tracks (ntracks
), ordered by their single (npoints
) closest matching 15-second sequence of features.
db.configQuery = {
'seqLength': 15,
'seqStart': 30,
'ntracks': 10,
'npoints': 1,
'accumulation': 'track',
'distance': 'eucNorm',
'exhaustive': False, # shouldn't need this but do seem to
'resFmt': 'list' }
r = db.query(key=key)
query_results(r)
Note in this results list firstly that the “identity” match has been found (with a distance of 0, as it should be): and secondly that we have also retrieved (as we asked) 9 other tracks, with the associated position of closest match and distance at that position. This is not telling us that these matches are necessarily interesting, or even particularly close – merely that in this single database, these are the nine tracks with closest matches.
Searching using a single sequence is an operation to use when you are specifically looking for instances of a particular fragment: say a motif, or sample (such as the “Amen break”). You might instead be in the situation that you are interested to find any material similar to any portion of a source track; for that, you would want to search exhaustively: to match all possible sequences in the query with all possible sequences in the database. This is accomplished by using the exhaustive
dictionary entry, as below:
db.configQuery = {
'seqLength': 15,
'seqStart': 30,
'accumulation': 'track',
'distance': 'eucNorm',
'exhaustive': True,
'npoints': 1,
'ntracks': 10,
'resFmt': 'list' }
r = db.query(key=key)
query_results(r)
What this results list indicates is perhaps more interesting. The first result, as before, is the identity match (you may have a confusing-looking value in the distance field, which is an artifact of how calculations are done: it is effectively zero). The second result should be a different recording of Lugebat David Absalon (attributed to “Anon./Gombert?” when the record was first published), and the third should be a chanson “J'ay mis mon cueur”, which shares a significant amount of musical material with the Lugebat.
What is presented in this notebook is just a taste. If you have time during the session, or are interested to continue with this later in the week, here are some thoughts for experimentation:
np.dot
on both feature vectors at once. Can you identify from this where in the Lugebat the source material from the chanson is used?seqLength
. What happens when you use very short sequence lengths? Very long ones?This notebook has introduced you to creating and searching databases of audio features (Mauch and Dixon, 2010), using a relatively small collection of audio (Rhodes et al., 2010). It so happens that the collection of music used contains examples of relatively recently attributed works (Picker, 2001), and the results of searching through the features extracted from these recordings are enough to be strongly suggestive – even if there was no metadata to hand.
Matthias Mauch and Simon Dixon, Approximate Note Transcription for the Improved Identification of Difficult Chords. In Proc. ISMIR, pp135–140, 2010
Martin Picker, A spurious motet of Josquin, a chanson by Gombert, and some related works: A case study in contrafactum and parody. In Peter Niedermüller, Cristina Urchueguía and Oliver Wiener (eds), Quellenstudium und musikalische Analyse: Festschrift Martin Just zum 70. Geburtstag, pp33–45, Ergon-Verlag, 2001.
Christophe Rhodes, Tim Crawford, Michael Casey and Mark d'Inverno, Investigating Music Collections at Different Scales with AudioDB, Journal of New Music Research 39:4, pp337–348, 2010