This notebook and its helper script (Helper_SpeakerDiar.py) contain all information necessary to make plots summarizing our experiments on speaker diarization dataset (known to bnpy code as SpeakerDiar
).
This notebook makes plots to compare inference algorithms for unsupervised learning of HDP-HMM models. The results of these algorithms must be already completed and saved to disk. They are expected to live in the following directory: $BNPYOUTDIR/SpeakerDiar<meetingNum>/nipsexperiment-*
where meetingNum is a number of one of the independent meetings (1, 2, ... 20, 21) for which we have audio recording data.
If these runs do not exist, you need to go run the experiments. The relevant script can be found in the experiments/ directory of the x-hdphmm-nips2015 repository.
experiments/Launch_SpeakerDiar_memo.py
experiments/Launch_SpeakerDiar_sampler.py
experiments/Launch_SpeakerDiar_delmerge.py
experiments/Launch_SpeakerDiar_createanddestroy.py
Note each script launches a separate run for each of the 21 meetings. Each run only processes ONE sequence. There is thus no need for "online" learning, and no need to consider the stochastic variational inference algorithm.
Load required modules, configure notebook to display figures in-line, etc.
doSaveEPS = 1; # If set to 1, will save each plots to .eps for LaTeX publication.
import numpy as np
import glob; # for simple check that saved runs exist
import os;
import bnpy; # bnpy package for learning and plotting BNP models
import SpeakerDiar # Module for the dataset
from bnpy.viz.PlotUtil import pylab
%pylab inline
Populating the interactive namespace from numpy and matplotlib
bnpy.viz.PlotUtil.ConfigPylabDefaults(pylab);
import Helper_SpeakerDiar as Helper
reload(Helper);
reload(Helper.PlotUtil);
jobName = 'nips2015'
pathPattern = "%s/%s/%s" % (os.environ['BNPYOUTDIR'], 'SpeakerDiar*', jobName + '*')
dirList = glob.glob(pathPattern)
if len(dirList) == 0:
raise ValueError("STOP! You have not run the expected experiments yet!")
print "Success! Found %d directories for jobs under the name '%s' for SpeakerDiar" % (len(dirList), jobName)
Success! Found 84 directories for jobs under the name 'nips2015' for SpeakerDiar
Jdict maps keys like "birth Sticky=100 K=30" to full paths names like "nipsexperiment-alg=bnpyHDPHMMstoch-lik=Gauss-hmmKappa=0-..."
Jdict = Helper.setUp()
for key in Jdict.keys():
print key
sampler Sticky=100 K=25 memo Sticky=100 K=25 delmerge Sticky=100 K=25 birth Sticky=100 K=25
TODO: show how to load a bnpy dataset object from the dataset module SpeakerDiar
, plot the data assigned to one sequence, and show the ideal segmentation of that sequence.
scores = Helper.plotScatterComparison(Jdict['delmerge Sticky=100 K=25'],
Jdict['sampler Sticky=100 K=25'],
nTask=10, pylab=pylab)
if doSaveEPS:
pylab.savefig('SpeakerDiar_CompareFinalHamming.eps', bbox_inches='tight');
42 <<
Helper.makeLegendInOwnFigure(Jdict, names=['sampler', 'memo', 'delmerge:delete,merge', 'birth:birth,delete,merge'])
if doSaveEPS:
pylab.savefig('SpeakerDiar_LegendForTracePlots.eps', bbox_inches='tight');
# Trim the bounding box a bit
!sed -i -e 's/BoundingBox: 171 275 440 516/BoundingBox: 205 390 425 505/' SpeakerDiar_LegendForTracePlots.eps
Lower Hamming distance indicates a better segmentation. A value of zero means the segmentation perfectly matches ground truth.
n = '11'
Helper.PlotUtil.plotHammingDist(Jdict, n=n, loc=None, xscale='log', xvar='times');
if doSaveEPS:
pylab.savefig('SpeakerDiar11_HammingVsTime.eps', bbox_inches='tight');
n = '11'
Helper.PlotUtil.plotELBO(Jdict, n=n, loc=None, xscale='log', xvar='times');
if doSaveEPS:
pylab.savefig('SpeakerDiar11_ELBOVsTime.eps', bbox_inches='tight');
How to read this plot
Helper.PlotUtil.plotStateSeq(Jdict['birth Sticky=100 K=25'], taskids='.best', sequences=['11']);
ignoring state -1 Ttrue = 54 ignoring state -2 Ttrue = 109
n = '21'
Helper.PlotUtil.plotHammingDist(Jdict, n=n, loc=None, xscale='log', xvar='times');
#pylab.ylim([-.01, 0.4]);
#pylab.gca().set_yticks([0, 0.1, 0.2, 0.3]);
if doSaveEPS:
pylab.savefig('SpeakerDiar21_HammingVsTime.eps', bbox_inches='tight');
n = '21'
Helper.PlotUtil.plotELBO(Jdict, n=n, loc=None, xscale='log', xvar='times');
pylab.gca().set_yticks([-2.55, -2.5, -2.45, -2.4]);
pylab.ylim([-2.56, -2.39]);
if doSaveEPS:
pylab.savefig('SpeakerDiar21_ELBOVsTime.eps', bbox_inches='tight');
How to read this plot
Helper.PlotUtil.plotStateSeq(Jdict['delmerge Sticky=100 K=25'], taskids='.best', sequences=['21']);
ignoring state -1 Ttrue = 3061 ignoring state -2 Ttrue = 95
n = '16'
Helper.PlotUtil.plotHammingDist(Jdict, n=n, loc=None, xscale='log', xvar='times');
if doSaveEPS:
pylab.savefig('SpeakerDiar16_HammingVsTime.eps', bbox_inches='tight');
n = '16'
Helper.PlotUtil.plotELBO(Jdict, n=n, loc=None, xscale='log', xvar='times');
pylab.gca().set_yticks([-2.7, -2.65, -2.6, -2.55]);
pylab.ylim([-2.71, -2.54]);
if doSaveEPS:
pylab.savefig('SpeakerDiar16_ELBOVsTime.eps', bbox_inches='tight');
Helper.PlotUtil.plotStateSeq(Jdict['delmerge Sticky=100 K=25'], taskids='.best', sequences=['16']);
ignoring state -1 Ttrue = 143 ignoring state -2 Ttrue = 280