To start, we're gonna work on unzipping the file automatically with some bash commands.
%%bash
cd files
mkdir tmp
unzip "Joseph Kosma, English: Johnny Mercer, French: Jacques Prevert - Autumn Leaves.mxl" -d tmp
Archive: Joseph Kosma, English: Johnny Mercer, French: Jacques Prevert - Autumn Leaves.mxl
mkdir: cannot create directory ‘tmp’: File exists replace tmp/musicXML.xml? [y]es, [n]o, [A]ll, [N]one, [r]ename: NULL (EOF or read error, treating as "[N]one" ...)
I created a temp directory because I might wanna delete the tmp/musicXML.xml file after working with it. For the time being, I'll just leave it there. The conclusion of this step is that we can now work with the XML in the file and parse it.
What we're really interested in is extracting the chord succession of a song. It turns out the developers of MusicXML have a nice tutorial on chords. Essentially, it boils down to parsing some XML code. Let's see an example for two different chords below (a G major 6 bass D and a A(9)):
g_chord = """
<harmony default-y="100">
<root>
<root-step>G</root-step>
</root>
<kind halign="center" text="6">major-sixth</kind>
<bass>
<bass-step>D</bass-step>
</bass>
</harmony>
"""
a_chord ="""
<harmony default-y="100">
<root>
<root-step>A</root-step>
</root>
<kind halign="center" parentheses-degrees="yes">major</kind>
<degree>
<degree-value>9</degree-value>
<degree-alter>0</degree-alter>
<degree-type text="">add</degree-type>
</degree>
</harmony>
"""
Let's see if we can parse those chords correctly.
import xml.etree.cElementTree as ET
tree = ET.fromstring(g_chord)
tree
<Element 'harmony' at 0x9e3bd88>
First thing would be to get the root of the chord and the type.
for elem in tree.findall('root/root-step'):
print elem.text
G
for elem in tree.findall('kind'):
print elem.text
major-sixth
def get_chords(xml):
tree = ET.fromstringlist(xml)
roots = []
kinds = []
for elem in tree.findall('root/root-step'):
roots.append(elem.text)
for elem in tree.findall('kind'):
kinds.append(elem.text)
return (roots, kinds)
get_chords(g_chord)
(['G'], ['major-sixth'])
get_chords(a_chord)
(['A'], ['major'])
xml = file("files/tmp/musicXML.xml", 'r').readlines()
len(xml)
4280
print get_chords(xml)
([], [])
Here we see that our function doesn't work. But that's because we calibrated it on the examples. We don't really know the exact structure of the real file, therefore we're going to modify the existing code to search for all root/root-step elements starting anywhere in the depth of the tree.
def get_chords(xml):
tree = ET.fromstringlist(xml)
roots = []
kinds = []
for elem in tree.findall('.//root/root-step'):
roots.append(elem.text)
for elem in tree.findall('.//kind'):
kinds.append(elem.text)
return (roots, kinds)
print get_chords(xml)
(['E', 'A', 'F', 'B', 'E', 'E', 'C', 'B', 'A', 'B', 'E', 'E', 'E', 'A', 'B', 'A', 'B', 'E', 'E', 'C', 'B', 'A', 'B', 'E', 'E', 'A', 'B', 'A', 'B', 'E', 'E', 'F', 'B', 'E', 'D', 'G', 'F', 'B', 'E', 'C', 'B', 'B', 'A', 'D', 'G', 'C', 'F', 'B', 'E', 'A', 'D', 'G', 'C', 'F', 'B', 'E', 'F', 'B', 'E', 'A', 'D', 'G', 'F', 'B', 'E', 'A', 'D', 'G', 'A', 'A', 'B', 'E', 'E'], ['minor', 'minor', 'half-diminished', 'dominant', 'minor', 'minor-seventh', 'dominant', 'dominant', 'minor', 'dominant', 'minor', 'dominant', 'dominant', 'minor', 'dominant', 'minor', 'dominant', 'minor', 'minor-seventh', 'dominant', 'dominant', 'minor', 'dominant', 'minor', 'dominant', 'minor', 'dominant', 'minor', 'dominant', 'minor', 'minor-seventh', 'dominant', 'dominant', 'minor', 'dominant', 'major', 'half-diminished', 'dominant', 'minor', 'dominant', 'augmented-seventh', 'dominant', 'minor-seventh', 'dominant', 'major-seventh', 'major-seventh', 'half-diminished', 'dominant', 'minor-seventh', 'minor-seventh', 'dominant', 'major-seventh', 'major-seventh', 'half-diminished', 'dominant', 'minor-seventh', 'half-diminished', 'dominant', 'minor-seventh', 'minor-seventh', 'dominant', 'major-seventh', 'half-diminished', 'dominant', 'minor-seventh', 'dominant', 'minor-seventh', 'dominant', 'major', 'minor', 'dominant', 'minor-seventh', 'minor-seventh'])
If we zip the information together, we'll get the transitions.
(roots, kinds) = get_chords(xml)
chords = []
for (root, kind) in zip(roots, kinds):
chords.append(" ".join([root, kind]))
print chords
['E minor', 'A minor', 'F half-diminished', 'B dominant', 'E minor', 'E minor-seventh', 'C dominant', 'B dominant', 'A minor', 'B dominant', 'E minor', 'E dominant', 'E dominant', 'A minor', 'B dominant', 'A minor', 'B dominant', 'E minor', 'E minor-seventh', 'C dominant', 'B dominant', 'A minor', 'B dominant', 'E minor', 'E dominant', 'A minor', 'B dominant', 'A minor', 'B dominant', 'E minor', 'E minor-seventh', 'F dominant', 'B dominant', 'E minor', 'D dominant', 'G major', 'F half-diminished', 'B dominant', 'E minor', 'C dominant', 'B augmented-seventh', 'B dominant', 'A minor-seventh', 'D dominant', 'G major-seventh', 'C major-seventh', 'F half-diminished', 'B dominant', 'E minor-seventh', 'A minor-seventh', 'D dominant', 'G major-seventh', 'C major-seventh', 'F half-diminished', 'B dominant', 'E minor-seventh', 'F half-diminished', 'B dominant', 'E minor-seventh', 'A minor-seventh', 'D dominant', 'G major-seventh', 'F half-diminished', 'B dominant', 'E minor-seventh', 'A dominant', 'D minor-seventh', 'G dominant', 'A major', 'A minor', 'B dominant', 'E minor-seventh', 'E minor-seventh']
Note to self: this way of parsing is not super robust. I might get false parsing data if one of the file doesn't follow the right structure. Then again, if this happens, my whole zip stuff would break. I'd notice.
While comparing the chords from the file above with the website, I noticed that alterated chords were not detected. Upon checking, I've realized that the alteration is described in the following way:
f_sharp_chord = """
<harmony print-frame="no">
<root>
<root-step>F</root-step>
<root-alter>1</root-alter>
</root>
<kind text="m7b5">half-diminished</kind>
</harmony>
"""
Therefore, to parse this correctly, we need to account for the root-alter tag. Let's see if we can integrate this into our parsing function.
tree = ET.fromstringlist(f_sharp_chord)
tree
<Element 'harmony' at 0xa31c34c>
for elem in tree.findall('.//root'):
alter = elem.find('root-alter')
if alter != None:
print alter.text
1
We can now update the get_chords function to take this into account.
def get_chords(xml):
tree = ET.fromstringlist(xml)
chords = []
for elem in tree.findall('.//harmony'):
root = elem.find('root/root-step').text
alter_elem = elem.find('root/root-alter')
alter = (alter_elem != None and alter_elem.text)
kind = elem.find('kind').text
chords.append((root, alter, kind))
return chords
print get_chords(xml)
[('E', False, 'minor'), ('A', False, 'minor'), ('F', '1', 'half-diminished'), ('B', False, 'dominant'), ('E', False, 'minor'), ('E', False, 'minor-seventh'), ('C', False, 'dominant'), ('B', False, 'dominant'), ('A', False, 'minor'), ('B', False, 'dominant'), ('E', False, 'minor'), ('E', False, 'dominant'), ('E', False, 'dominant'), ('A', False, 'minor'), ('B', False, 'dominant'), ('A', False, 'minor'), ('B', False, 'dominant'), ('E', False, 'minor'), ('E', False, 'minor-seventh'), ('C', False, 'dominant'), ('B', False, 'dominant'), ('A', False, 'minor'), ('B', False, 'dominant'), ('E', False, 'minor'), ('E', False, 'dominant'), ('A', False, 'minor'), ('B', False, 'dominant'), ('A', False, 'minor'), ('B', False, 'dominant'), ('E', False, 'minor'), ('E', False, 'minor-seventh'), ('F', '1', 'dominant'), ('B', False, 'dominant'), ('E', False, 'minor'), ('D', False, 'dominant'), ('G', False, 'major'), ('F', '1', 'half-diminished'), ('B', False, 'dominant'), ('E', False, 'minor'), ('C', False, 'dominant'), ('B', False, 'augmented-seventh'), ('B', False, 'dominant'), ('A', False, 'minor-seventh'), ('D', False, 'dominant'), ('G', False, 'major-seventh'), ('C', False, 'major-seventh'), ('F', '1', 'half-diminished'), ('B', False, 'dominant'), ('E', False, 'minor-seventh'), ('A', False, 'minor-seventh'), ('D', False, 'dominant'), ('G', False, 'major-seventh'), ('C', False, 'major-seventh'), ('F', '1', 'half-diminished'), ('B', False, 'dominant'), ('E', False, 'minor-seventh'), ('F', '1', 'half-diminished'), ('B', False, 'dominant'), ('E', False, 'minor-seventh'), ('A', False, 'minor-seventh'), ('D', False, 'dominant'), ('G', False, 'major-seventh'), ('F', '1', 'half-diminished'), ('B', False, 'dominant'), ('E', False, 'minor-seventh'), ('A', False, 'dominant'), ('D', False, 'minor-seventh'), ('G', False, 'dominant'), ('A', False, 'major'), ('A', False, 'minor'), ('B', False, 'dominant'), ('E', False, 'minor-seventh'), ('E', False, 'minor-seventh')]
Finally, we can define some utility function to translate our chords triplet into real chord names.
def increment_note(note, half_tones):
notes = ['C', 'C#', 'D', 'D#', 'E', 'F', 'F#', 'G', 'G#', 'A', 'A#', 'B']
ind = notes.index(note)
return notes[(ind + half_tones) % 12]
increment_note('C', 6)
'F#'
def triplet_to_string(triplet):
(root, alter, kind) = triplet
if alter == False:
return " ".join([root, kind])
else:
return " ".join([increment_note(root, int(alter)), kind])
print map(triplet_to_string, get_chords(xml))
['E minor', 'A minor', 'F# half-diminished', 'B dominant', 'E minor', 'E minor-seventh', 'C dominant', 'B dominant', 'A minor', 'B dominant', 'E minor', 'E dominant', 'E dominant', 'A minor', 'B dominant', 'A minor', 'B dominant', 'E minor', 'E minor-seventh', 'C dominant', 'B dominant', 'A minor', 'B dominant', 'E minor', 'E dominant', 'A minor', 'B dominant', 'A minor', 'B dominant', 'E minor', 'E minor-seventh', 'F# dominant', 'B dominant', 'E minor', 'D dominant', 'G major', 'F# half-diminished', 'B dominant', 'E minor', 'C dominant', 'B augmented-seventh', 'B dominant', 'A minor-seventh', 'D dominant', 'G major-seventh', 'C major-seventh', 'F# half-diminished', 'B dominant', 'E minor-seventh', 'A minor-seventh', 'D dominant', 'G major-seventh', 'C major-seventh', 'F# half-diminished', 'B dominant', 'E minor-seventh', 'F# half-diminished', 'B dominant', 'E minor-seventh', 'A minor-seventh', 'D dominant', 'G major-seventh', 'F# half-diminished', 'B dominant', 'E minor-seventh', 'A dominant', 'D minor-seventh', 'G dominant', 'A major', 'A minor', 'B dominant', 'E minor-seventh', 'E minor-seventh']
Now that we know how to parse a MusicXML file, we can shift the chords to the key of C-major if we note the root key for the song. In our case, the song Les feuilles mortes has a F sharp in its key, therefore we're in G major or equivalently, E minor (in this case E minor, because there are some D sharps in the melody). So what we need to do is to compute the distance between C and G and shift all the chords.
def distance_between_notes(first_note, second_note):
""" returns distance between the two notes, in half-tones
"""
notes = ['C', 'C#', 'D', 'D#', 'E', 'F', 'F#', 'G', 'G#', 'A', 'A#', 'B']
first_index = notes.index(first_note)
second_index = notes.index(second_note)
distance = second_index - first_index
if distance < 0:
return distance + 12
else:
return distance
print distance_between_notes('C', 'D')
print distance_between_notes('G', 'C')
2 5
increment_note('G', 5)
'C'
Here, the main loop is:
chord_information = get_chords(xml)
distance_to_C_major = distance_between_notes('G', 'C')
new_chord_information = []
for info in chord_information:
info = (increment_note(info[0], distance_to_C_major), info[1], info[2])
new_chord_information.append(info)
chords = map(triplet_to_string, new_chord_information)
print chords
['A minor', 'D minor', 'B half-diminished', 'E dominant', 'A minor', 'A minor-seventh', 'F dominant', 'E dominant', 'D minor', 'E dominant', 'A minor', 'A dominant', 'A dominant', 'D minor', 'E dominant', 'D minor', 'E dominant', 'A minor', 'A minor-seventh', 'F dominant', 'E dominant', 'D minor', 'E dominant', 'A minor', 'A dominant', 'D minor', 'E dominant', 'D minor', 'E dominant', 'A minor', 'A minor-seventh', 'B dominant', 'E dominant', 'A minor', 'G dominant', 'C major', 'B half-diminished', 'E dominant', 'A minor', 'F dominant', 'E augmented-seventh', 'E dominant', 'D minor-seventh', 'G dominant', 'C major-seventh', 'F major-seventh', 'B half-diminished', 'E dominant', 'A minor-seventh', 'D minor-seventh', 'G dominant', 'C major-seventh', 'F major-seventh', 'B half-diminished', 'E dominant', 'A minor-seventh', 'B half-diminished', 'E dominant', 'A minor-seventh', 'D minor-seventh', 'G dominant', 'C major-seventh', 'B half-diminished', 'E dominant', 'A minor-seventh', 'D dominant', 'G minor-seventh', 'C dominant', 'D major', 'D minor', 'E dominant', 'A minor-seventh', 'A minor-seventh']
chord_changes = {}
for ind, val in enumerate(chords[:-1]):
transition = "->".join([val, chords[ind + 1]])
if transition in chord_changes:
chord_changes[transition] += 1
else:
chord_changes[transition] = 1
chord_changes
{'A dominant->A dominant': 1, 'A dominant->D minor': 2, 'A minor->A dominant': 2, 'A minor->A minor-seventh': 3, 'A minor->D minor': 1, 'A minor->F dominant': 1, 'A minor->G dominant': 1, 'A minor-seventh->A minor-seventh': 1, 'A minor-seventh->B dominant': 1, 'A minor-seventh->B half-diminished': 1, 'A minor-seventh->D dominant': 1, 'A minor-seventh->D minor-seventh': 2, 'A minor-seventh->F dominant': 2, 'B dominant->E dominant': 1, 'B half-diminished->E dominant': 6, 'C dominant->D major': 1, 'C major->B half-diminished': 1, 'C major-seventh->B half-diminished': 1, 'C major-seventh->F major-seventh': 2, 'D dominant->G minor-seventh': 1, 'D major->D minor': 1, 'D minor->B half-diminished': 1, 'D minor->E dominant': 7, 'D minor-seventh->G dominant': 3, 'E augmented-seventh->E dominant': 1, 'E dominant->A minor': 7, 'E dominant->A minor-seventh': 5, 'E dominant->D minor': 4, 'E dominant->D minor-seventh': 1, 'F dominant->E augmented-seventh': 1, 'F dominant->E dominant': 2, 'F major-seventh->B half-diminished': 2, 'G dominant->C major': 1, 'G dominant->C major-seventh': 3, 'G minor-seventh->C dominant': 1}
occurences = [chord_changes[key] for key in chord_changes]
bar(arange(len(occurences)), occurences)
xticks(0.5 + arange(len(occurences)))
locs, labels = xticks()
xticks(locs, chord_changes.keys(), rotation = 90);
%%bash
cd files
rm -r tmp/
ls
big.txt Demokratie.jpg High_tone.wav Joseph Kosma, English: Johnny Mercer, French: Jacques Prevert - Autumn Leaves.mxl Low_tone.wav WiiBoard_data.npy
rm: cannot remove ‘tmp/’: No such file or directory