This demo intends to walk through the training of a multinomial mixture model running EM algorithm on the Toy Bars dataset.
We can use the following import statements to load bnpy and other necessary packages.
import numpy as np
import bnpy
from matplotlib import pylab
%pylab inline
imshowArgs = dict(interpolation='nearest',
cmap='bone_r',
vmin=0.0,
vmax=10./900,
)
Populating the interactive namespace from numpy and matplotlib
MixBarsK10V900
¶We'll use a simple "toy bars" dataset generated from a mixture model. The MixBarsK10V900 dataset consists of documents of 10 different topics and 900 words. Below is a visualization of the dataset where each pixel represents a word and its intensity is specified by an integer value between zero and infinity that indicates the frequency of the word. This dataset is of the same form as a word-document matrix constructed from a set of documents.
Our task here is to assign exactly one cluster to every whole document. Be aware that this is different than assigning a cluster/topic to each word token, like in LDA or HDP topic models.
import MixBarsK10V900
Data = MixBarsK10V900.get_data()
Data.name = 'MixBarsK10V900'
bnpy.viz.BarsViz.plotExampleBarsDocs(Data)
We can now visualize the 10 "true" topics. Each topic bar is a distribution over the 900 words in the vocabulary.
bnpy.viz.BarsViz.showTopicsAsSquareImages(MixBarsK10V900.Defaults['topics'], **imshowArgs);
We now fit the mixture model with multinomial likelihood using EM algorithm. We specify the number of topics to be the true number of topics, i.e., $K=10$.
hmodel, RInfo = bnpy.run(Data, 'FiniteMixtureModel', 'Mult', 'EM',
jobname='true-K-randexamples', K=10, nLap=100, nTask=5,
initname='randexamples')
Toy Bars Data with 10 true topics. Each doc uses ONE topic. size: 2000 units (documents) vocab size: 900 min 5% 50% 95% max 154 163 171 180 189 nUniqueTokensPerDoc 360 360 360 360 360 nTotalTokensPerDoc Hist of word_count across tokens 1 2 3 <10 <100 >=100 0.40 0.29 0.18 0.13 2 0 Hist of unique docs per word type <1 <10 <100 <0.10 <0.20 <0.50 >=0.50 0 0 0 0 0.83 0.17 0 Allocation Model: Finite mixture with K=10. Dir prior param 1.00 Obs. Data Model: Multinomial over finite vocabulary. Obs. Data Prior: Dirichlet over finite vocabulary lam = [ 0.1 0.1] ... Learn Alg: EM Trial 1/5 | alg. seed: 148736 | data order seed: 8541952 savepath: /Users/mni/bnpy-dev/results/MixBarsK10V900/true-K-randexamples/1 1/100 after 0 sec. | K 10 | ev -7.542317373e+00 | 2/100 after 0 sec. | K 10 | ev -5.831483579e+00 | Ndiff 253.784 3/100 after 0 sec. | K 10 | ev -5.606701414e+00 | Ndiff 90.436 4/100 after 0 sec. | K 10 | ev -5.530232929e+00 | Ndiff 104.168 5/100 after 0 sec. | K 10 | ev -5.470397894e+00 | Ndiff 36.057 6/100 after 0 sec. | K 10 | ev -5.453857799e+00 | Ndiff 0.016 ... done. converged. Trial 2/5 | alg. seed: 4646912 | data order seed: 7673856 savepath: /Users/mni/bnpy-dev/results/MixBarsK10V900/true-K-randexamples/2 1/100 after 0 sec. | K 10 | ev -7.483458549e+00 | 2/100 after 0 sec. | K 10 | ev -5.739500119e+00 | Ndiff 148.630 3/100 after 0 sec. | K 10 | ev -5.512282152e+00 | Ndiff 95.739 4/100 after 0 sec. | K 10 | ev -5.456792673e+00 | Ndiff 5.999 5/100 after 0 sec. | K 10 | ev -5.453857391e+00 | Ndiff 0.000 ... done. converged. Trial 3/5 | alg. seed: 6302080 | data order seed: 7360256 savepath: /Users/mni/bnpy-dev/results/MixBarsK10V900/true-K-randexamples/3 1/100 after 0 sec. | K 10 | ev -7.602629156e+00 | 2/100 after 0 sec. | K 10 | ev -5.857246004e+00 | Ndiff 132.504 3/100 after 0 sec. | K 10 | ev -5.604489205e+00 | Ndiff 145.529 4/100 after 0 sec. | K 10 | ev -5.548839437e+00 | Ndiff 0.402 5/100 after 0 sec. | K 10 | ev -5.548829321e+00 | Ndiff 0.674 6/100 after 0 sec. | K 10 | ev -5.548823318e+00 | Ndiff 0.465 7/100 after 0 sec. | K 10 | ev -5.548820026e+00 | Ndiff 0.356 8/100 after 0 sec. | K 10 | ev -5.548818171e+00 | Ndiff 0.349 9/100 after 0 sec. | K 10 | ev -5.548808897e+00 | Ndiff 0.475 10/100 after 0 sec. | K 10 | ev -5.548799965e+00 | Ndiff 0.025 ... done. converged. Trial 4/5 | alg. seed: 8728576 | data order seed: 900864 savepath: /Users/mni/bnpy-dev/results/MixBarsK10V900/true-K-randexamples/4 1/100 after 0 sec. | K 10 | ev -8.188722338e+00 | 2/100 after 0 sec. | K 10 | ev -5.855973091e+00 | Ndiff 192.865 3/100 after 0 sec. | K 10 | ev -5.581170143e+00 | Ndiff 145.047 4/100 after 0 sec. | K 10 | ev -5.540985315e+00 | Ndiff 0.000 ... done. converged. Trial 5/5 | alg. seed: 2384256 | data order seed: 6479872 savepath: /Users/mni/bnpy-dev/results/MixBarsK10V900/true-K-randexamples/5 1/100 after 0 sec. | K 10 | ev -7.176624792e+00 | 2/100 after 0 sec. | K 10 | ev -5.669426901e+00 | Ndiff 137.281 3/100 after 0 sec. | K 10 | ev -5.554618076e+00 | Ndiff 2.048 4/100 after 0 sec. | K 10 | ev -5.554472417e+00 | Ndiff 0.020 ... done. converged.
Below is a visualization to show the learned topic-word parameters. we can see that EM algorithm successfully recovers all 10 topcis.
bnpy.viz.PlotComps.plotCompsForJob('MixBarsK10V900/true-K-randexamples', **imshowArgs)
Runs kmeans on the empirical document word-count vectors, using the specified number of clusters $K$.
hmodel, RInfo = bnpy.run(Data, 'FiniteMixtureModel', 'Mult', 'EM',
jobname='true-K-kmeans', K=10, nLap=500, nTask=5,
initname='kmeansplusplus')
Toy Bars Data with 10 true topics. Each doc uses ONE topic. size: 2000 units (documents) vocab size: 900 min 5% 50% 95% max 154 163 171 180 189 nUniqueTokensPerDoc 360 360 360 360 360 nTotalTokensPerDoc Hist of word_count across tokens 1 2 3 <10 <100 >=100 0.40 0.29 0.18 0.13 2 0 Hist of unique docs per word type <1 <10 <100 <0.10 <0.20 <0.50 >=0.50 0 0 0 0 0.83 0.17 0 Allocation Model: Finite mixture with K=10. Dir prior param 1.00 Obs. Data Model: Multinomial over finite vocabulary. Obs. Data Prior: Dirichlet over finite vocabulary lam = [ 0.1 0.1] ... Learn Alg: EM Trial 1/5 | alg. seed: 148736 | data order seed: 8541952 savepath: /Users/mni/bnpy-dev/results/MixBarsK10V900/true-K-kmeans/1 1/500 after 0 sec. | K 10 | ev -5.552926250e+00 | 2/500 after 0 sec. | K 10 | ev -5.550570336e+00 | Ndiff 0.000 ... done. converged. Trial 2/5 | alg. seed: 4646912 | data order seed: 7673856 savepath: /Users/mni/bnpy-dev/results/MixBarsK10V900/true-K-kmeans/2 1/500 after 0 sec. | K 10 | ev -5.543849940e+00 | 2/500 after 0 sec. | K 10 | ev -5.541390292e+00 | Ndiff 1.000 3/500 after 0 sec. | K 10 | ev -5.540979315e+00 | Ndiff 0.000 ... done. converged. Trial 3/5 | alg. seed: 6302080 | data order seed: 7360256 savepath: /Users/mni/bnpy-dev/results/MixBarsK10V900/true-K-kmeans/3 1/500 after 0 sec. | K 10 | ev -5.553421943e+00 | 2/500 after 0 sec. | K 10 | ev -5.551124852e+00 | Ndiff 0.000 ... done. converged. Trial 4/5 | alg. seed: 8728576 | data order seed: 900864 savepath: /Users/mni/bnpy-dev/results/MixBarsK10V900/true-K-kmeans/4 1/500 after 0 sec. | K 10 | ev -5.554716444e+00 | 2/500 after 0 sec. | K 10 | ev -5.552402473e+00 | Ndiff 0.000 ... done. converged. Trial 5/5 | alg. seed: 2384256 | data order seed: 6479872 savepath: /Users/mni/bnpy-dev/results/MixBarsK10V900/true-K-kmeans/5 1/500 after 0 sec. | K 10 | ev -5.554779268e+00 | 2/500 after 0 sec. | K 10 | ev -5.552495903e+00 | Ndiff 0.949 3/500 after 0 sec. | K 10 | ev -5.552481912e+00 | Ndiff 0.078 4/500 after 0 sec. | K 10 | ev -5.552480817e+00 | Ndiff 0.181 5/500 after 0 sec. | K 10 | ev -5.552472098e+00 | Ndiff 0.105 6/500 after 0 sec. | K 10 | ev -5.552463151e+00 | Ndiff 0.101 7/500 after 0 sec. | K 10 | ev -5.552463072e+00 | Ndiff 0.000 ... done. converged.
bnpy.viz.PlotELBO.plotJobsThatMatchKeywords('MixBarsK10V900/true-K-*')
pylab.ylim([-10, -5])
pylab.legend(loc='lower right');
Both initialization methods converge to the same objective function value, but kmeansplusplus procedure has a higher convergence rate.
By varying the value of $K$, we specify the number of clusters in the mixture model. We can repeat the experiment above with $K=5, 10, 20, 50$.
for K in [5, 10, 20, 50]:
hmodel, RInfo = bnpy.run(Data, 'FiniteMixtureModel', 'Mult', 'EM', jobname='em-K=%d'%(K),
K=K, nLap=100, doWriteStdOut=False, nTask = 1)
print 'K=%3d | final ev % .5f' % (K, RInfo['evBound'])
K= 5 | final ev -5.97316 K= 10 | final ev -5.63832 K= 20 | final ev -5.44802 K= 50 | final ev -5.43334
From the results above, we can see that EM algorithm converges at different log-likelihood with different values of $K$. We can describe this difference by plotting the log evidence (ELBO) against the number of iterations, which is shown in the plot below. In general, the log evidence increases as the value of $K$ increases. In particular, when $K=5$, we have a degenerate model which produces a very bad estimate of the true data. Such difference in log evidence becomes smaller as $K$ becomes large.
bnpy.viz.PlotELBO.plotJobsThatMatchKeywords('MixBarsK10V900/em-K=*')
pylab.ylim([-10, -5])
(-10, -5)