_ _ __ _ _|_ (_ (_| | (_| |_ computer-aided rhythm analysis toolbox
This notebook shows how to extract onsets from a recording using the carat library.
The procedure consist in computing an accentuation feature function, based on siezing the changes in the spectral magnitude of the audio signal along different frequency bands, and peak picking in the accentuation feature function to identify onset candidates.
The following steps shows how to:
You can download the notebook and run it locally in your computer.
You can also run it in Google Colab by using the following link.
Run in Google Colab |
You should install the following packages by running the next two cells.
!pip install carat
import matplotlib.pyplot as plt
import os, sys
import numpy as np
import IPython.display as ipd
from carat import features, annotations, audio, display, onsets, util
%matplotlib inline
This first step loads the audio file and the manual annotations of onsets.
# use an example audio file provided
audio_path = util.example("chico_audio")
# load audio file (only 30 seconds)
y, sr = audio.load(audio_path, duration=30.0)
# time corresponding to the audio signal
time = np.arange(0, y.size)/sr
plt.figure(figsize=(12,6))
ax1 = plt.subplot(211)
display.wave_plot(y, sr, ax=ax1)
plt.tight_layout()
We can listen to the first 30 seconds of the audio file.
Note: This is a separate track from a performance comprising three drums. The track corresponds to the chico drum, which is the timekeeper of the ensemble. The performance starts by playing the clave pattern (timeline pattern). After a few rhythmic cycles the chico drum starts playing an ostinato pattern that articulates the four subdivisions of the beat.
ipd.Audio(y, rate=sr)
# use onset annotations provided for the example audio file
onset_annotations_file = util.example("chico_onsets")
# load onset annotations
onsets_ann, _ = annotations.load_onsets(onset_annotations_file)
# plot waveform and beats for the first 30 seconds
plt.figure(figsize=(12,6))
ax1 = plt.subplot(211)
display.wave_plot(y, sr, ax=ax1, onsets=onsets_ann)
plt.tight_layout()
This second step show how to compute an accentuation feature from the audio waveform based on the Spectral flux, that consists in seizing the changes in the spectral magnitude of the audio signal along different frequency bands. In principle, the feature value is high when a note has been articulated and close to zero otherwise.
Note: This example is tailored towards the onsets of the chico drum, the highest sounding of the three candombe drum types, so the analysis focuses on the higher frequencies (500 to 3000 Hz). Some other parameters are also tweaked, such as the hop size for computing the spectrogram.
hop = 5e-3 # hop size
nfilts = 80 # Number of MEL filters
log_flag = True # If LOG should be taken before taking differentiation
alpha = 10e4 # compression parameter for dB conversion - log10(alpha*abs(S)+1)
freqs = [500, 3000] # chico bound frequencies for summing frequency band
acce, times, _ = features.accentuation_feature(y, sr, hop=hop, nfilts=nfilts, log_flag=log_flag, alpha=alpha,
minfreq=freqs[0], maxfreq=freqs[1])
# plot waveform and accentuation feature
plt.figure(figsize=(12,6))
# plot waveform
ax1 = plt.subplot(2, 1, 1)
display.wave_plot(y, sr, ax=ax1)
# plot accentuation feature
ax2 = plt.subplot(2, 1, 2, sharex=ax1)
display.feature_plot(acce, times, ax=ax2)
plt.tight_layout()
For peak picking, following a method proposed in [1] and later modified in [2], a set of simple peak selection rules are implemented in which onset candidates, apart from being a local maximum, have to exceed a threshold that is a combination of a fixed and an adaptive value.
threshd = 0.180 # threshold for peak-picking (chico)
pre_avg = 14 # number of past frames for moving average
pos_avg = 10 # number of future frames for moving average
pre_max = 14 # number of past frames for moving maximum
pos_max = 10 # number of future frames for moving maximum
peak_indxs, mov_avg, mov_max = features.peak_detection(acce, threshold=threshd,
pre_avg=pre_avg, pos_avg=pos_avg,
pre_max=pre_max, pos_max=pos_max)
# time instants of the onsets
onset_times = times[peak_indxs]
# plot waveform and accentuation feature
plt.figure(figsize=(12,6))
# plot waveform
ax1 = plt.subplot(2, 1, 1)
display.wave_plot(y, sr, ax=ax1)
# plot accentuation feature and detected onsets
ax2 = plt.subplot(2, 1, 2, sharex=ax1)
display.feature_plot(acce, times, ax=ax2, onsets=onset_times)
plt.tight_layout()
All of the above can be done by using a single function, called onsets.detection()
, as shown bellow. Note that you can specify all the parameters values that are used in the process by the low level functions, such as features.accentuation_feature
or features.peak_detection
.
onsets_all, _ = onsets.detection(y, fs=sr, hop=hop, nfilts=nfilts, log_flag=log_flag, alpha=alpha,
minfreq=freqs[0], maxfreq=freqs[1], threshold=threshd,
pre_avg=pre_avg, pos_avg=pos_avg, pre_max=pre_max, pos_max=pos_max)
If you are not sure what values you should use for onset detection of a specific drum, we offer a set of standarized presets to choose from. These will automatically load recommended values for detection of onsets of this instrument.
# Display available carat presets
util.list_presets()
# Select one of the available presets
selected_preset = 'chico'
# you can set a path to your own json file with presets
# eg: json_file = '../MyPresets/rock_presets.json'
preset = util.load_preset(selected_preset, json_file = 'default')
print('Values in', selected_preset, 'preset are:', preset)
onsets_all_preset, _ = onsets.detection(y, fs=sr, **preset)
We can now check if the onsets obtained are the same.
# check if the onsets obtained are the same
np.testing.assert_allclose(onset_times, onsets_all)
In the following plot we depict the detected onsets with the manual annotations side by side to compare them visually.
# plot waveform and annotated onsets
plt.figure(figsize=(12,6))
ax1 = plt.subplot(2, 1, 1)
display.wave_plot(y, sr, ax=ax1, onsets=onsets_ann)
plt.title('annotated onsets')
# plot accentuation feature and detected onsets
ax2 = plt.subplot(2, 1, 2, sharex=ax1)
display.feature_plot(acce, times, ax=ax2, onsets=onset_times)
plt.title('detected onsets')
plt.tight_layout()
Note: The first difference we can notice is that there are several detected onsets at the begining that were not annotated. This is beacuse they do not correspond to the typicall chico ostinato parttern the annotator was interested in. Recall that the performance starts by playing the clave pattern (timeline pattern). After a few rhythmic cycles the chico drum starts playing an ostinato pattern that articulates the four subdivisions of the beat. This is the annotated pattern.
Note: Besides, as shown in the following plot, an onset is missing in the automatic detection (around second 14). You could try to tweak the parameters to fix it (as done in next section), or save the onsets to a csv file and edit it in SonicVisualiser.
limts = [12.8, 16]
# plot waveform and annotated onsets
plt.figure(figsize=(12,6))
ax1 = plt.subplot(2, 1, 1)
display.wave_plot(y, sr, ax=ax1, onsets=onsets_ann)
plt.title('annotated onsets')
# plot accentuation feature and detected onsets
ax2 = plt.subplot(2, 1, 2, sharex=ax1)
display.feature_plot(acce, times, ax=ax2, onsets=onset_times)
plt.title('detected onsets')
plt.xlim(limts)
plt.tight_layout()
As noted before, there is an onset missing in the automatic detection. The following plot shows the feature values and the thresholds used for the detection (moving average: 'mov-avg', moving maximum: 'mov-max', and resulting overall threshold: 'threshold'). The moving maximum is used to avoid selecting feature peaks that are very close in time, only the one with the highest value is considered an onset candidate. Besides, only the peaks that are above the threshold are considered onset candidates. This threshold is the sum of a fix threshold value (threshd = 0.180
, in this case) plus the moving average. In order to detect the missing onset we could decrease the fixed threshold value.
limts = [12.8, 16]
max_val = acce.max()
plt.figure(figsize=(12,6))
# plot feature values
ax1 = plt.subplot(2, 1, 1)
display.feature_plot(acce/max_val, times, ax=ax1)
# plot thresholds
plt.plot(times, mov_avg, '-', linewidth=3, color=0.6*np.array([1, 1, 1]), label='mov-avg', alpha=0.7)
plt.plot(times, mov_max, ':', linewidth=1, color=0.6*np.array([1, 1, 1]), label='mov-max')
plt.plot(times, mov_avg + threshd, 'r', linewidth=1, label='threshold')
plt.legend(prop={'size':10})
plt.xlim(limts)
plt.tight_layout()
We now perform the onset detection but with a smaller threshold value.
onsets_all, _ = onsets.detection(y, fs=sr, hop=hop, nfilts=nfilts, log_flag=log_flag, alpha=alpha,
minfreq=freqs[0], maxfreq=freqs[1], threshold=0.16,
pre_avg=pre_avg, pos_avg=pos_avg, pre_max=pre_max, pos_max=pos_max)
# plot waveform and annotated onsets
plt.figure(figsize=(12,6))
ax1 = plt.subplot(2, 1, 1)
# plot accentuation feature and original detected onsets
display.feature_plot(acce, times, ax=ax1, onsets=onset_times)
plt.title('original detected onsets')
# plot accentuation feature and new detected onsets
ax2 = plt.subplot(2, 1, 2, sharex=ax1)
display.feature_plot(acce, times, ax=ax2, onsets=onsets_all)
plt.title('new detected onsets')
plt.xlim(limts)
plt.tight_layout()
Note: The missing onset is now detected, but an spurious onset is also added. This could be fixed by extending the length of the moving maximum.
onsets_all, _ = onsets.detection(y, fs=sr, hop=hop, nfilts=nfilts, log_flag=log_flag, alpha=alpha,
minfreq=freqs[0], maxfreq=freqs[1], threshold=0.16,
pre_avg=pre_avg, pos_avg=pos_avg, pre_max=16, pos_max=16)
# plot waveform and annotated onsets
plt.figure(figsize=(12,6))
ax1 = plt.subplot(2, 1, 1)
# plot accentuation feature and original detected onsets
display.feature_plot(acce, times, ax=ax1, onsets=onset_times)
plt.title('original detected onsets')
# plot accentuation feature and new detected onsets
ax2 = plt.subplot(2, 1, 2, sharex=ax1)
display.feature_plot(acce, times, ax=ax2, onsets=onsets_all)
plt.title('new detected onsets')
plt.xlim(limts)
plt.tight_layout()
Note: Now the missing onset is detected and the spurious onset is avoided.
Now that we detected the onsets we can save them to a csv file.
annotations.save_onsets("detected_onsets_chico.csv", onsets_all)