This notebook and its helper script (Helper_MoCap6.py) contain all information necessary to make plots summarizing our experiments on the small 6-sequence motion capture dataset we call MoCap6
.
This notebook makes plots to compare inference algorithms for unsupervised learning of HDP-HMM models from the 6-sequence motion capture dataset. To make these plots, the results of these algorithms must be already completed and saved to disk. They are expected to live in the following directory: $BNPYOUTDIR/MoCap6/nipsexperiment-*
If these runs do not exist, you need to go run the experiments. The relevant script can be found in the experiments/ directory of the x-hdphmm-nips2015 repository.
experiments/Launch_MoCap6_memo.py
experiments/Launch_MoCap6_stoch.py
experiments/Launch_MoCap6_sampler.py
experiments/Launch_MoCap6_delmerge.py
experiments/Launch_MoCap6_createanddestroy.py
Load required modules, configure notebook to display figures in-line, etc.
doSaveEPS = 1; # If set to 1, will save each plots to .eps for LaTeX publication.
import numpy as np
import glob; # for simple check that saved runs exist
import os;
import bnpy; # bnpy package for learning and plotting BNP models
import MoCap6 # Module for the dataset
from bnpy.viz.PlotUtil import pylab
%pylab inline
Populating the interactive namespace from numpy and matplotlib
bnpy.viz.PlotUtil.ConfigPylabDefaults(pylab);
pathPattern = "%s/%s/%s" % (os.environ['BNPYOUTDIR'], 'MoCap6', 'nipsexperiment*')
dirList = glob.glob(pathPattern)
if len(dirList) == 0:
raise ValueError("STOP! You have not run the expected experiments yet!")
print "Success! Found %d directories for jobs under the name 'nipsexperiment' for dataset MoCap6" % (len(dirList))
Success! Found 9 directories for jobs under the name 'nipsexperiment' for dataset MoCap6
Jdict maps keys like "birth Sticky=100 K=30" to full paths names like "nipsexperiment-alg=bnpyHDPHMMstoch-lik=Gauss-hmmKappa=0-..."
import Helper_MoCap6 as Helper
reload(Helper);
Jdict = Helper.setUp()
for key in Jdict.keys():
print key
stoch Sticky=100 K=30 stoch Sticky=100 K=60 sampler Sticky=100 K=30 sampler Sticky=100 K=60 memo Sticky=100 K=30 memo Sticky=100 K=60 delmerge Sticky=100 K=30 delmerge Sticky=100 K=60 birth Sticky=100 K=1
Here, we show how to load a bnpy dataset object from the dataset module MoCap6
, plot the data assigned to one sequence, and show the ideal segmentation of all sequences.
Data = MoCap6.get_data()
print "Dataset has %d sequences." % (Data.nDoc)
print "There are %d total atoms (observations), across all sequences." % (Data.X.shape[0])
print "Each observation has dimension %d" % (Data.X.shape[1])
for seqID in [0,1,2,3,4,5]:
startLoc = Data.doc_range[seqID]
stopLoc = Data.doc_range[seqID+1]
print "Sequence #%d starts at index %4d and stops at index %4d" % (seqID, startLoc, stopLoc)
Dataset has 6 sequences. There are 2058 total atoms (observations), across all sequences. Each observation has dimension 12 Sequence #0 starts at index 0 and stops at index 382 Sequence #1 starts at index 382 and stops at index 587 Sequence #2 starts at index 587 and stops at index 838 Sequence #3 starts at index 838 and stops at index 1284 Sequence #4 starts at index 1284 and stops at index 1671 Sequence #5 starts at index 1671 and stops at index 2058
start1 = Data.doc_range[1]
stop1 = Data.doc_range[1+1]
pylab.plot(Data.X[start1:stop1]);
pylab.ylim([-100, 100]);
pylab.ylabel('joint angle (deg)');
pylab.xlabel('time (bins of 100 ms duration)');
Each of the six sequences has variable length. The black region in each plot represents empty timesteps.
# Ztrue is a 1D array. Each entry is an integer in {0, 1, .. 10, 11} that indicates specific exercise type.
Ztrue = Data.TrueParams['Z']
maxT = np.diff(Data.doc_range).max()
ColorMap = Helper.getStateSeqColorMap(nErrorStates=1)
pylab.subplots(nrows=6, ncols=1, figsize=(10,4));
# Loop thru each sequence at plot it as a horizontal image subplot.
for n in [0, 1, 2, 3, 4, 5]:
ax_n = pylab.subplot(6, 1, n+1);
T_n = Data.doc_range[n+1] - Data.doc_range[n]
SegmentIm_n = 1000 * np.ones((1,maxT))
SegmentIm_n[0, :T_n] = Ztrue[Data.doc_range[n]:Data.doc_range[n+1]]
pylab.imshow(SegmentIm_n, interpolation='nearest',
vmin=0, vmax=ColorMap.N, aspect=maxT/12.0, cmap=ColorMap);
pylab.yticks([]);
if n < 5:
pylab.xticks([]);
else:
pylab.xlabel('time (bins of 100ms)');
pylab.ylabel('Seq. %d' % (n), fontsize=12);
pylab.subplots_adjust(left=0.1, right=0.99, top=0.99, bottom=0.1);
with open('../datasets/MoCap6/ActionNames.txt', 'r') as f:
ActionNames = [line.strip() for line in f.readlines()]
for name in ActionNames:
print name,
JumpJack Jog Squat KneeRaise ArmCircle Twist SideReach Box UpDown ToeTouchOneHand SideBend ToeTouchTwoHands
nActions = len(ActionNames)
ax = pylab.imshow(np.arange(nActions)[np.newaxis,:],
cmap=ColorMap,
vmin=0, vmax=ColorMap.N,
interpolation='nearest');
# Disable the image, so we visually focus on the colorbar
pylab.title('Example sequence.', fontsize=20);
pylab.xticks([]); pylab.yticks([]);
# Configure the colorbar legend
cbar = pylab.colorbar();
cbar.set_ticks(0.5 + np.arange(nActions));
cbar.set_ticklabels(ActionNames);
Helper.makeLegendInOwnFigure(Jdict, names=['sampler', 'stoch', 'memo', 'delmerge:delete,merge', 'birth:birth,delete,merge'])
# Don't need to save legend. Can reuse from DDToyHMM experiments.
Stochastic (yellow) drops clusters quickly. This occurs because the global update relies on data from a single sequence. Each sequence in this dataset has a few unique clusters, used by no other sequence. Thus, we may go 3-4 batches in a row without seeing these unique states. Under aggressive learning rate schedules (such as our default), such unique states often disappear (probability pushed to zero).
** Sampler/memoized** (green/blue) largely stay at the initial number of clusters.
** Delete / Merge ** (red) sensibly level off to between 6-15 clusters, no matter which initialization (K=30 or K=60).
Helper.PlotUtil.plotKeff(Jdict, loc=None, xscale='log');
pylab.ylabel('num states K');
if doSaveEPS:
pylab.savefig('MoCap6_KeffVsLaps_sticky=100.eps', bbox_inches='tight');
Lower Hamming distance indicates a better segmentation. A value of zero means the segmentation perfectly matches ground truth.
Stochastic / Sampler / Memoized (yellow/green/blue). These fixed-truncation methods can only improve slightly from their initialization. As expected, those initialized with K=30 states (dashed lines) have better final Hamming distance than those with K=60 states, since there are only K=12 true states.
** Delete / Merge ** (red) These methods quickly reach better distances than the fixed-truncation methods. Removing junk states via delete/merge moves improves the alignment to ground truth.
Helper.PlotUtil.plotHammingDist(Jdict, loc=None, xscale='log');
if doSaveEPS:
pylab.savefig('MoCap6_HammingVsLaps_sticky=100.eps', bbox_inches='tight');
Larger values of the objective $\mathcal{L}$ indicate better fit to the data.
Stochastic / Sampler / Memoized (yellow/green/blue). These fixed-truncation methods plateau at rather low objective values near the initialization.
** Delete / Merge ** (red) All runs of this method quickly climbs to better values than any run of the fixed algorithms.
** Birth/ Delete/ Merge** (purple). All runs of this method climb to better objective scores than any run of the delete/merge only jobs.
Helper.PlotUtil.plotELBO(Jdict, loc=None, xscale='log');
if doSaveEPS:
pylab.savefig('MoCap6_ELBOVsLaps_sticky=100.eps', bbox_inches='tight');
Each figure will show the segmentations of one algorithm after some number of iterations.
We show the segmentations of all 6 sequences in the dataset.
We'll use a many-to-one colormap. Every learned state will be mapped to one "true" state color. This helps show the more serious errors, but disguises cases where multiple learned states explain the data from one true state.
However, we'll still display the Hamming distance computed from all learned states, not the set of aligned many-to-one labels.
seqNames = ['', '', '', '', '', '', '']
sequences = [1,2,3,4,5,6]
Understanding this function is not necessary for reproducing experiments.
def MakeTitleForSegmentation(algName, lapQuery, Kinit=30.0, taskidstr='1'):
''' Reads info from stored job on disk to make title with stats on segmentation.
Uses cool IPython syntax to execute command-line utilities like sed and fgrep,
for reading/writing from text files stored in the output directories for
each experiment.
Returns
-------
titleStr : string of form "sampler after 100 min (30 laps). Kinit=50, Kfinal=8."
'''
if algName == 'sampler':
lapQuery = str(int(lapQuery))
else:
lapQuery = str(float(lapQuery))
Kinit = str(Kinit)
path = Helper.PlotUtil.MakePath(Jdict[algName + ' Sticky=100 K=' + str(int(float(Kinit)))] + "/" + taskidstr + "/")
# Use fgrep to find "50.0" instead of "0.50"
lineNum = !fgrep -n $lapQuery $path/laps-saved-params.txt
lineNum = lineNum[0].split(":")[0] # convert 25:1000 to 25
lineNum_p = lineNum + "p"
lap = !sed -n $lineNum_p $path/laps-saved-params.txt
time_sec = !sed -n $lineNum_p $path/times-saved-params.txt
Kfinal = !sed -n $lineNum_p $path/Keff-saved-params.txt
Hfinal = !sed -n $lineNum_p $path/hamming-distance.txt
lap = lap[0]
time_sec = time_sec[0]
Kfinal = Kfinal[0]
Hfinal = float(Hfinal[0])
time_min = float(time_sec) / 60 # convert to minutes
return "%s: Hdist=%.2f K=%s @ %s laps" % (algName, Hfinal, Kfinal, str(int(float(lapQuery))))
# Just test out the MakeTitle function
MakeTitleForSegmentation("sampler", 500)
'sampler: Hdist=0.50 K=29 @ 500 laps'
Conclusion: This many-to-one segmentation looks broadly OK. However, it makes several glaring over-segmentations, particularly on the red state of sequence 4 or the purple state of sequence 6.
lapToShow = 1000
algName = 'sampler'
taskidstr = '.best'
Kinit = 30
ax = Helper.PlotUtil.plotStateSeq(Jdict[algName + ' Sticky=100 K=%d' % (Kinit)],
taskids=taskidstr, lap=lapToShow, colorManyToOne=1,
seqNames=seqNames, sequences=sequences, showELBOInTitle=0);
ax[0].set_title(MakeTitleForSegmentation(algName, lapToShow, Kinit=Kinit, taskidstr=taskidstr));
if doSaveEPS:
pylab.savefig('MoCap6_EstZ_sampler_best_lap=%d.eps' % (lapToShow));
Conclusion: This many-to-one segmentation looks best among all methods. It avoids the glaring fast-switching seen in the sampler.
lapToShow = 100
algName = 'birth'
taskidstr = '.best'
Kinit = 1
ax = Helper.PlotUtil.plotStateSeq(Jdict[algName + ' Sticky=100 K=%d' % (Kinit)],
taskids=taskidstr, lap=lapToShow, colorManyToOne=1,
seqNames=seqNames, sequences=sequences, showELBOInTitle=0);
ax[0].set_title(MakeTitleForSegmentation(algName, lapToShow, Kinit=Kinit, taskidstr=taskidstr));
if doSaveEPS:
pylab.savefig('MoCap6_EstZ_birth_best_lap=%d.eps' % (lapToShow));
Conclusion: This run looks almost as good as the best run, but is totally missing the pink state in sequences 1 and 3.
lapToShow = 100
algName = 'birth'
taskidstr = '.worst'
Kinit = 1
ax = Helper.PlotUtil.plotStateSeq(Jdict[algName + ' Sticky=100 K=%d' % (Kinit)],
taskids=taskidstr, lap=lapToShow, colorManyToOne=1,
seqNames=seqNames, sequences=sequences, showELBOInTitle=0);
ax[0].set_title(MakeTitleForSegmentation(algName, lapToShow, Kinit=Kinit, taskidstr=taskidstr));
if doSaveEPS:
pylab.savefig('MoCap6_EstZ_birth_worst_lap=%d.eps' % (lapToShow));
Conclusion: This many-to-one segmentation looks decent, but is totally missing the red and peach states in sequence 4.
lapToShow = 100
algName = 'delmerge'
taskidstr = '.best'
Kinit = 30
ax = Helper.PlotUtil.plotStateSeq(Jdict[algName + ' Sticky=100 K=%d' % (Kinit)],
taskids=taskidstr, lap=lapToShow, colorManyToOne=1,
seqNames=seqNames, sequences=sequences, showELBOInTitle=0);
title = MakeTitleForSegmentation(algName, lapToShow, Kinit=Kinit, taskidstr=taskidstr)
ax[0].set_title(title.replace('delmerge', 'del/merge'));
if doSaveEPS:
pylab.savefig('MoCap6_EstZ_delmerge_best_lap=%d.eps' % (lapToShow));
Conclusion: This many-to-one segmentation captures only 5 of the true states. Not at all very good.
lapToShow = 100
algName = 'delmerge'
taskidstr = '.worst'
Kinit = 30
ax = Helper.PlotUtil.plotStateSeq(Jdict[algName + ' Sticky=100 K=%d' % (Kinit)],
taskids=taskidstr, lap=lapToShow, colorManyToOne=1,
seqNames=seqNames, sequences=sequences, showELBOInTitle=0);
ax[0].set_title(MakeTitleForSegmentation(algName, lapToShow, Kinit=Kinit, taskidstr=taskidstr));
if doSaveEPS:
pylab.savefig('MoCap6_EstZ_delmerge_worst_lap=%d.eps' % (lapToShow));