This notebook is associated to this article on my blog.
We use LDAvis to visualize several LDA modeling of the followers of the @alexip account.
The different LDAs were trained with the following parameters
To see the best results, set lambda around [0.5, 0.6]. Lowering Lambda gives more importance to words that are discriminatory for the active topic, words that best define the topic.
You can skip the 2 first models and jump to the last model which is the best (40 topics)
A working version of this notebook is available on nbviewer
# Load the corpus and dictionary from gensim import corpora, models import pyLDAvis.gensim corpus = corpora.MmCorpus('data/alexip_followers_py27.mm') dictionary = corpora.Dictionary.load('data/alexip_followers_py27.dict')
# First LDA model with 10 topics, 10 passes, alpha = 0.001 lda = models.LdaModel.load('data/alexip_followers_py27_t10_p10_a001_b01.lda') followers_data = pyLDAvis.gensim.prepare(lda, corpus, dictionary) pyLDAvis.display(followers_data)
With K=10 topics, nearly all the topics are aggregated together and difficult to distinguish. And even singled out topics [4,9] are not very cohesive. #4 for instance has bitcoin, ruby/ rails and London mixed together. In the following example, we set K to 50 topics and increase the number of passes from 10 to 50.
lda = models.LdaModel.load('data/alexip_followers_py27_t50_p50_a001.lda') followers_data = pyLDAvis.gensim.prepare(lda, corpus, dictionary) pyLDAvis.display(followers_data)