DDToyHMM Experiments for NIPS 2015¶

This notebook and its helper script (Helper_DDToyHMM.py) contain all information necessary to make plots summarizing our experiments on a diagonally-dominant toy dataset we call DDToyHMM. These plots are shown in Fig. 2 of our paper on scalable learning for the HDP-HMM.

Pre-requistes¶

You have completed all runs from the DDToyHMM dataset experiments.

See the scripts experiments/Launch_DDToyHMM_*.py in our experiment repository.

Setup¶

Load required modules, configure notebook to display figures in-line, etc.

In [1]:

import numpy as np

In [2]:

import bnpy; # bnpy package for learning and plotting BNP models
import DDToyHMM # Module describing the toy dataset in question
import glob; # for simple check that saved runs exist
import os;

In [3]:

from bnpy.viz.PlotUtil import pylab
%pylab inline

Populating the interactive namespace from numpy and matplotlib

In [4]:

bnpy.viz.PlotUtil.ConfigPylabDefaults(pylab);

In [5]:

doSaveEPS = 1; # Switch this to 1 to save these plots to .eps, for LaTeX publications.

Verify required saved runs exist¶

This notebook requires saved runs located in the following directory: `$BNPYOUTDIR/DDToyHMM/nipsexperiment...

If these runs do not exist, you need to go run the experiments matching the pattern experiments/Launch_DDToyHMM_*.py

In [6]:

pathPattern = "%s/%s/%s" % (os.environ['BNPYOUTDIR'], 'DDToyHMM', 'nipsexperiment*')
dirList = glob.glob(pathPattern)
if len(dirList) == 0:
    raise ValueError("STOP! You have not run the expected experiments yet!")
print "Found %d directories for jobs under the name 'nipsexperiment' for toy dataset DDToyHMM" % (len(dirList))

Found 22 directories for jobs under the name 'nipsexperiment' for toy dataset DDToyHMM

Load paths to saved runs into job dictionary.¶

Jdict maps keys like "birth Sticky=0 K=50" to full paths names like "nipsexperiment-alg=bnpyHDPHMMstoch-lik=Gauss-hmmKappa=0-..."

In [7]:

import Helper_DDToyHMM as Helper
reload(Helper);
Jdict = Helper.setUp(lineStyleByKey='Sticky')

In [8]:

for key in Jdict.keys():
    print key

stoch Sticky=0 K=50
stoch Sticky=0 K=100
stoch Sticky=50 K=50
stoch Sticky=50 K=100
sampler Sticky=0 K=50
sampler Sticky=0 K=100
sampler Sticky=50 K=50
sampler Sticky=50 K=100
memo Sticky=0 K=50
memo Sticky=0 K=100
memo Sticky=50 K=50
memo Sticky=50 K=100
delmerge Sticky=0 K=50
delmerge Sticky=0 K=100
delmerge Sticky=50 K=50
delmerge Sticky=50 K=100
birth Sticky=0 K=1
birth Sticky=0 K=10
birth Sticky=0 K=50
birth Sticky=50 K=1
birth Sticky=50 K=10
birth Sticky=50 K=50

Separate all jobs into two groups: those that are non-sticky (hmmKappa=0) and sticky (hmmKappa=50).

In [9]:

Jnonsticky = Helper.getSubsetByName(Jdict, 'Sticky=0')
Jsticky = Helper.getSubsetByName(Jdict, 'Sticky=50')
J50 = Helper.getSubsetByName(Jdict, 'K=50')

Dataset visualization¶

We have 8 true states that are each well-separated Gaussian blobs.

The generative HMM state transition model has high stickiness. Each state will self-transition with probability 0.999, and move to one other state (illustrated by arrow) with probability 0.001.

In [10]:

DDToyHMM.transPi

Out[10]:

array([[ 0.999,  0.001,  0.   ,  0.   ,  0.   ,  0.   ,  0.   ,  0.   ],
       [ 0.   ,  0.999,  0.001,  0.   ,  0.   ,  0.   ,  0.   ,  0.   ],
       [ 0.   ,  0.   ,  0.999,  0.001,  0.   ,  0.   ,  0.   ,  0.   ],
       [ 0.   ,  0.   ,  0.   ,  0.999,  0.001,  0.   ,  0.   ,  0.   ],
       [ 0.   ,  0.   ,  0.   ,  0.   ,  0.999,  0.001,  0.   ,  0.   ],
       [ 0.   ,  0.   ,  0.   ,  0.   ,  0.   ,  0.999,  0.001,  0.   ],
       [ 0.   ,  0.   ,  0.   ,  0.   ,  0.   ,  0.   ,  0.999,  0.001],
       [ 0.001,  0.   ,  0.   ,  0.   ,  0.   ,  0.   ,  0.   ,  0.999]])

In [12]:

DDToyHMM.illustrate(Colors=Helper.getStateSeqColorMap());
pylab.xticks([-30, -15, 0, 15, 30]);
pylab.yticks([-30, -15, 0, 15, 30]);
if doSaveEPS:
    pylab.savefig('DDToyHMM_Illustrated.eps', bbox_inches='tight')

Legend for Trace Plots¶

In [47]:

Helper.makeLegendInOwnFigure(Jdict, 
                             names=['sampler', 'stoch', 'memo',
                                    'delmerge:delete,merge', 'birth:birth,delete,merge',
                                    'Sticky=0:Non-stick, kappa=0',
                                    'Sticky=50:Sticky, kappa=50'])
if doSaveEPS:
    pylab.savefig('LegendForAllAlgorithms.eps', bbox_inches='tight');
    !sed -i -e 's/BoundingBox: 174 276 437 515/BoundingBox: 230 375 430 505/' LegendForAllAlgorithms.eps

Trace plots¶

Fixed initialization of 50 states, examining sticky vs. non-sticky models¶

Effective number of states $K$ as more data is seen¶

Dashed lines: non-sticky model ($\kappa=0$)
Solid lines: sticky model ($\kappa=50$)

Conclusions:

Across the board, non-adaptive models are slower to converge without a sticky bias.
With stickiness, most methods quickly discard states even before a full pass of the dataset. Due to our abundant initialization, some initial states are so bad they are not used even in the first step of the algorithm (hence the lower values of solid lines vs. dashed lines at the y intercept).
Stochastic runs (yellow) are very quick to lose states because they use a single batch (4 sequences) rather than the whole-dataset to update parameters at each step. If current batch's usage of a state is not large (esp. in early iterations), it is likely to see global usage drop soon too.

In [11]:

Helper.PlotUtil.plotKeff(J50, loc=None, xscale='log');
pylab.ylim([0, 60]);
pylab.gca().set_yticks([8, 25, 50]);
pylab.gca().set_yticklabels(['*8', '25', '50']);

if doSaveEPS:
    pylab.savefig('DDToyHMM_KeffVsLaps_Kinit=50.eps', bbox_inches='tight');

¶

Hamming distance as more data is seen¶

Dashed lines: non-sticky model ($\kappa=0$)
Solid lines: sticky model ($\kappa=50$)

Conclusions:

Across the board, non-adaptive models are slower to converge without a sticky bias.
With stickiness, most methods quickly discard states even before a full pass of the dataset. Due to our abundant initialization, some initial states are so bad they are not used even in the first step of the algorithm (hence the lower values of solid lines vs. dashed lines at the y intercept).
The sampler (green) eventually wanders to ideal segmentations (zero Hamming dist.), but takes a while to do so.
Stochastic (yellow) shows the widest spread in performance between sticky and non-sticky versions of the model.

In [12]:

Helper.PlotUtil.plotHammingDist(J50, loc=None, xscale='log');
if doSaveEPS:
    pylab.savefig('DDToyHMM_HammingVsLaps_Kinit=50.eps', bbox_inches='tight');

Trace plots of only non-sticky runs ($\kappa = 0$) from different numbers of initial states $K$¶

Quick change to line styles for all future plots¶

Solid: any birth method, or other method initialized to $K=100$ states
Dashed: other method initialized to $K=50$ states

Note that birth methods (purple) are always solid, though we consider also initializing with K=1, K=10, or K=50 states.

In [13]:

# Setup
Helper.PlotUtil.LineStyleMap = Helper.getLineStyleMap_ByKValue()

Trace of effective number of states $K$ as more data is seen¶

Conclusions:

Our methods (red=delete/merge only, purple=with birth moves too) effectively converge to ideal $K=8$ states from variety of initializations.

In [14]:

pylab.gca().set_yticks([8, 25, 50]);
Helper.PlotUtil.plotKeff(Jnonsticky, loc=None, xscale='log');
pylab.ylabel('num states K');
if doSaveEPS:
    pylab.savefig('DDToyHMM_KeffVsLaps_nonsticky.eps', bbox_inches='tight');

Trace of Hamming distance as more data is seen¶

Conclusions

Our methods quickly converge to ideal segmentations (zero Hamming distance) within a few dozen iterations.
Competitors take a long time to converge in this non-sticky scenario.
- Stochastic (yellow) barely makes progress after 5000 laps
- Sampler (green), memoized (blue) eventually get near perfect performance after ~1000 laps.

In [15]:

Helper.PlotUtil.plotHammingDist(Jnonsticky, loc=None, xscale='log');
if doSaveEPS:
    pylab.savefig('DDToyHMM_HammingVsLaps_nonsticky.eps', bbox_inches='tight');

Trace of ELBO variational training objective as more data is seen¶

Conclusions

Our methods (red/purple) quickly converge.
Competitors take a long time to converge in this non-sticky scenario.
- Stochastic (yellow) barely makes progress after 5000 laps
- Sampler (green) eventually gets to the quality levels of our methods around almost 1000 laps
- Memoized (blue) with fixed-truncation is notably worse than the sampler on this benchmark

In [16]:

Helper.PlotUtil.plotELBO(Jnonsticky, loc=None, xscale='log');
if doSaveEPS:
    pylab.savefig('DDToyHMM_ELBOVsLaps_nonsticky.eps', bbox_inches='tight');

nipsexperiment-alg=foxHDPHMMsampler-lik=Gauss-hmmKappa=0-ECovMat-eye-sF=1.0-K=50-initname=randcontigblocks-nBatch=8/1/evidence-saved-params.txt
nipsexperiment-alg=foxHDPHMMsampler-lik=Gauss-hmmKappa=0-ECovMat-eye-sF=1.0-K=100-initname=randcontigblocks-nBatch=8/1/evidence-saved-params.txt

Trace plots using the sticky model ($\kappa = 50$)¶

Again, we'll look at algorithm performance across a variety of initializations.

Num. effective states as more data is seen¶

Conclusions:

Our methods continue to converge reliably within ~10 laps or so to the right number of clusters $K=8$ regardless of initialization.
Competitors are generally slower (though much improved than in the non-sticky model scenario)
- Sampler reaches ideal performance after about 100 laps
- Memoized and stochastic runs often converge to noticeably worse local optima.

In [17]:

Helper.PlotUtil.plotKeff(Jsticky, loc=None, xscale='log');
pylab.ylabel('num states K');
if doSaveEPS:
    pylab.savefig('DDToyHMM_KeffVsLaps_hmmKappa=50.eps', bbox_inches='tight');

Hamming distance as more data is seen¶

In [18]:

Helper.PlotUtil.plotHammingDist(Jsticky, loc=None, xscale='log');
if doSaveEPS:
    pylab.savefig('DDToyHMM_HammingVsLaps_hmmKappa=50.eps', bbox_inches='tight');

ELBO training objective as more data is seen¶

In [19]:

Helper.PlotUtil.plotELBO(Jsticky, loc=None, xscale='log');
if doSaveEPS:
    pylab.savefig('DDToyHMM_ELBOVsLaps_hmmKappa=50.eps', bbox_inches='tight');

nipsexperiment-alg=foxHDPHMMsampler-lik=Gauss-hmmKappa=50-ECovMat-eye-sF=1.0-K=50-initname=randcontigblocks-nBatch=8/1/evidence-saved-params.txt
nipsexperiment-alg=foxHDPHMMsampler-lik=Gauss-hmmKappa=50-ECovMat-eye-sF=1.0-K=100-initname=randcontigblocks-nBatch=8/1/evidence-saved-params.txt

Segmentation visualizations (with non-sticky model)¶

Each figure will show the segmentations of one algorithm after some number of iterations.

We show the segmentations of sequences 1, 3, 5, and 7, which were chosen because together they contain long segments of each of the 8 true states of interest.

In [20]:

seqNames = ['', '', '', '', '', '', '']
sequences = [1,3,5,7]
xticks = [0, 200, 400, 600, 800]

In [21]:

def MakeTitleForSegmentation(algName, lapQuery, Kinit=50.0):
    ''' Reads info from stored job on disk to make title with stats on segmentation.
    
    Returns
    -------
    titleStr : string of form "sampler after 100 min (30 laps). Kinit=50, Kfinal=8."
    '''
    lapQuery = str(lapQuery)
    Kinit = str(Kinit)
    path = Helper.PlotUtil.MakePath(Jnonsticky[algName + '  K=' + str(int(float(Kinit)))] + "/1/")
    lineNum = !grep -n $lapQuery $path/laps-saved-params.txt  
    lineNum = lineNum[0].split(":")[0] # convert 25:1000 to 25
    lineNum_p = lineNum + "p"
    lap = !sed -n $lineNum_p $path/laps-saved-params.txt
    time_sec = !sed -n $lineNum_p $path/times-saved-params.txt
    Kfinal = !sed -n $lineNum_p $path/Keff-saved-params.txt
    lap = lap[0]
    time_sec = time_sec[0]
    Kfinal = Kfinal[0]
    
    time_min = float(time_sec) / 60 # convert to minutes
    
    return "%s:  K=%s after %s laps in %.0f min." % (algName, Kfinal, lapQuery, time_min)

In [22]:

MakeTitleForSegmentation("sampler", 1000)

Out[22]:

'sampler:  K=11 after 1000 laps in 40 min.'

Sampler after 2000 iterations (about 1 hr)¶

In [23]:

lapToShow = 2000
algName = 'sampler'
ax = Helper.PlotUtil.plotStateSeq(Jnonsticky[algName + '  K=50'], taskids='1', 
                                  sequences=sequences, seqNames=seqNames, 
                                  xticks=xticks, showELBOInTitle=0, lap=lapToShow)
ax[0].set_title(MakeTitleForSegmentation(algName, lapToShow));
if doSaveEPS:
    pylab.savefig('DDToyHMM_EstZ_%s_lap=%d.eps' % (algName, lapToShow));

Sampler after 5000 iterations (seems to converge to ideal)¶

In [24]:

lapToShow = 5000
algName = 'sampler'
ax = Helper.PlotUtil.plotStateSeq(Jnonsticky[algName + '  K=50'], taskids='1', 
                                  sequences=sequences, seqNames=seqNames, 
                                  xticks=xticks, showELBOInTitle=0, lap=lapToShow)
ax[0].set_title(MakeTitleForSegmentation(algName, lapToShow));
if doSaveEPS:
    pylab.savefig('DDToyHMM_EstZ_%s_lap=%d.eps' % (algName, lapToShow));

Stochastic after 1000 iterations¶

Takes about 3 hrs, bc some laps do viterbi as well as local fwd/bwd alg step.

In [25]:

lapToShow = 2000
algName = 'stoch'
ax = Helper.PlotUtil.plotStateSeq(Jnonsticky[algName + '  K=50'], taskids='1', 
                                  sequences=sequences, seqNames=seqNames, 
                                  xticks=xticks, showELBOInTitle=0, lap=lapToShow)
ax[0].set_title(MakeTitleForSegmentation(algName, lapToShow));
if doSaveEPS:
    pylab.savefig('DDToyHMM_EstZ_%s_lap=%d.eps' % (algName, lapToShow));

Our delete/merge alg after just 5 minutes¶

In [26]:

lapToShow = 100
algName = 'delmerge'
ax = Helper.PlotUtil.plotStateSeq(Jnonsticky[algName + '  K=50'], taskids='1', 
                                  sequences=sequences, seqNames=seqNames, 
                                  xticks=xticks, showELBOInTitle=0, lap=lapToShow)
ax[0].set_title(MakeTitleForSegmentation(algName, lapToShow).replace('delmerge', 'delete,merge'));
if doSaveEPS:
    pylab.savefig('DDToyHMM_EstZ_%s_lap=%d.eps' % (algName, lapToShow));

Initialization: Description and visualization¶

A crucial part of our experimental set-up is that we used matched initializations for all algorithms, to try to be as fair as possible in the comparison. This means that for each experimental model condition (e.g. $\kappa=50$ and 50 initial states), the runs for the Gibbs sampler algorithm, the memoized variational algorithm, and the stochastic variational algorithms started from the same (randomly-generated) segmentation. We repeated each condition across 10 trials, so there were 10 random initial segmentations.

Below, we sketch out the "random-contiguous-blocks" procedure we used to make the initial segmentation, which is specified via the flag --initname randcontigblocks in bnpy. The code is easily found in the source file FromScratchGauss.py.

INPUT: number of states $K$, complete dataset of $N=32$ toy data sequences
For each cluster $k$,
- Visit the next available sequence $n$ in the dataset, cycling as necessary. Let the length of this sequence be $T_n$.
- Select a contiguous window of size $L$ from this sequence at random. All starting points from $0, 1, ... T_n -L$ are eligible.
- Assign all data items in this window to cluster $k$
- Perform a global step using this assigned data only to initialize $q(\phi_k)$

This procedure yields a valid, data-driven initialization for global parameters, so that our variational algorithms can immediately start optimization steps.

However, the sampler is not parameterized by posteriors over $\phi_k$. To transfer this initialization to the sampler, we first segment all the sequences (via Viterbi) from these posteriors, and then use the resulting hard segmentation to initialize the sampler. We can see from the plots below and the early iterations of the ELBO objective plots that this yields very consistent initial configurations for all methods.

In [27]:

lapToShow = 0
algName = 'sampler'
ax = Helper.PlotUtil.plotStateSeq(Jnonsticky[algName + '  K=50'], taskids='1', 
                                  sequences=sequences, seqNames=seqNames, 
                                  xticks=xticks, showELBOInTitle=0, lap=lapToShow)
ax[0].set_title(MakeTitleForSegmentation(algName, lapToShow));

In [28]:

lapToShow = 0
algName = 'memo'
ax = Helper.PlotUtil.plotStateSeq(Jnonsticky[algName + '  K=50'], taskids='1', 
                                  sequences=sequences, seqNames=seqNames, 
                                  xticks=xticks, showELBOInTitle=0, lap=lapToShow)
ax[0].set_title(MakeTitleForSegmentation(algName, lapToShow));