This is a series of notebooks (in progress) to document my learning, and hopefully to help others learn machine learning. I would love suggestions / corrections / feedback for these notebooks.
Email me: email.ryan.kelly@gmail.com
I'd love for you to share if you liked this post.
social()
This notebook will explore the idea of recommending news posts to a reader based their search query. To do this, we also have to introduce basic text processing. Clustering can be defined as classifying unlabelled data by a measurement of similarity.
One of the most robust methods to quantify meaning in textual data is using the bag-of-word approach. For each word in the post, we count track the number of occurances in a vector (vectorization). In this way the data can be stored in an efficient matrix structure.
First we have to convert the text into a bag-of-words
. We can do this using scikit's builtin CountVectorizer
. The input min_df
determines how the function will treat words that are used infrequently. If set to an interger, all words occuring less than that amount will be dropped. If set to a fraction, all words that occur less than the fraction of the overall dataset will be dropped. There are also a lot of other options which will we get into later.
from sklearn.feature_extraction.text import CountVectorizer
vect = CountVectorizer(min_df=1)
print vect
CountVectorizer(analyzer=word, binary=False, charset=None, charset_error=None, decode_error=strict, dtype=<type 'numpy.int64'>, encoding=utf-8, input=content, lowercase=True, max_df=1.0, max_features=None, min_df=1, ngram_range=(1, 1), preprocessor=None, stop_words=None, strip_accents=None, token_pattern=(?u)\b\w\w+\b, tokenizer=None, vocabulary=None)
We see that for now the counting is done at the word level (analyzer = word
).
content = ['how to open a beer without a bottle opener',
'Beer bottles or beer cans',]
X = vect.fit_transform(content)
vect.get_feature_names()
[u'beer', u'bottle', u'bottles', u'cans', u'how', u'open', u'opener', u'or', u'to', u'without']
#Print the vectorized word occurances
print X
print X.toarray()
(0, 0) 1 (1, 0) 2 (0, 1) 1 (1, 2) 1 (1, 3) 1 (0, 4) 1 (0, 5) 1 (0, 6) 1 (1, 7) 1 (0, 8) 1 (0, 9) 1 [[1 1 0 0 1 1 1 0 1 1] [2 0 1 1 0 0 0 1 0 0]]
transform
are stored in the more memory efficient coordinate matrix format, we have to access the full standard vector for analysis though.Let's add some more data.
posts = ['how to open a beer without a bottle opener',
'Do girls like beer bottles or beer cans?',
'where did all my beer go?',
'where did all my beer go? where did all my beer go?',
'recycling beer bottles and cans',
'Is it worth recycling?',
'do not bring bottles to my backyard party, only cans please.',
'This is useless']
X_train = vect.fit_transform(posts)
num_samples, num_features = X_train.shape
print '#samples: {}, #features: {}'.format(num_samples, num_features)
#samples: 8, #features: 31
Let's vectorize a new post, then see how similar it is to our existing corpus.
new_post = 'Opening beer bottles and cans 101'
new_post_vect = vect.transform([new_post])
print(new_post_vect).toarray()
[[0 1 0 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]]
import scipy as sp
def dists(v1, v2):
delta = v1-v2
# Calculate Euclidean "norm" distance
return sp.linalg.norm(delta.toarray())
import sys
def similarity(new_post_vector, corpus):
best_dist = 999
best_i = None
for i in xrange(len(corpus.toarray())):
post = posts[i]
if post == new_post:
continue
post_vec = corpus.getrow(i)
d = dists(post_vec, new_post_vector)
print 'Post %i with dist = %.2f: %s'%(i, d, post)
if d < best_dist:
best_dist = d
best_i = i
print 'Best post is {} with dist = {}'.format(best_i, best_dist)
similarity(new_post_vect, X_train)
Post 0 with dist = 3.00: how to open a beer without a bottle opener Post 1 with dist = 2.45: Do girls like beer bottles or beer cans? Post 2 with dist = 2.83: where did all my beer go? Post 3 with dist = 4.90: where did all my beer go? where did all my beer go? Post 4 with dist = 1.00: recycling beer bottles and cans Post 5 with dist = 2.83: Is it worth recycling? Post 6 with dist = 3.32: do not bring bottles to my backyard party, only cans please. Post 7 with dist = 2.65: This is useless Best post is 4 with dist = 1.0
Great, our first text similarity measurement! We can see here that post 3 is most similar to our new post. However, we can see that post 2
is "closer" to post 3
, even though post 3
is simply post 2
doubled. It is clear the simple counts of words is too simple. The next step is to normalize the word counts to get vectors of unitless lengths to avoid this problem.
# Update our dists function
def dists(v1, v2):
v1_norm = v1/sp.linalg.norm(v1.toarray())
v2_norm = v2/sp.linalg.norm(v2.toarray())
delta = v1_norm-v2_norm
# Calculate Euclidean "norm" distance
return sp.linalg.norm(delta.toarray())
similarity(new_post_vect, X_train)
Post 0 with dist = 1.27: how to open a beer without a bottle opener Post 1 with dist = 0.86: Do girls like beer bottles or beer cans? Post 2 with dist = 1.26: where did all my beer go? Post 3 with dist = 1.26: where did all my beer go? where did all my beer go? Post 4 with dist = 0.46: recycling beer bottles and cans Post 5 with dist = 1.41: Is it worth recycling? Post 6 with dist = 1.18: do not bring bottles to my backyard party, only cans please. Post 7 with dist = 1.41: This is useless Best post is 4 with dist = 0.459505841095
Great, posts 2 & 3 are now equally similar to our new post.
There are many words in language that do not carry much meaning in terms of the overall interpretation of the message. Words like "it" should be much less meaningful than "beer" in our current context. These less important words are called stop words
, and can be removed from the posts since they do not help us distiguish between different posts.
#Add english stop words to our vectorizer object.
vect = CountVectorizer(min_df=1, stop_words='english')
#Display a sample
print sorted(vect.get_stop_words())[80:-150]
['empty', 'enough', 'etc', 'even', 'ever', 'every', 'everyone', 'everything', 'everywhere', 'except', 'few', 'fifteen', 'fify', 'fill', 'find', 'fire', 'first', 'five', 'for', 'former', 'formerly', 'forty', 'found', 'four', 'from', 'front', 'full', 'further', 'get', 'give', 'go', 'had', 'has', 'hasnt', 'have', 'he', 'hence', 'her', 'here', 'hereafter', 'hereby', 'herein', 'hereupon', 'hers', 'herself', 'him', 'himself', 'his', 'how', 'however', 'hundred', 'i', 'ie', 'if', 'in', 'inc', 'indeed', 'interest', 'into', 'is', 'it', 'its', 'itself', 'keep', 'last', 'latter', 'latterly', 'least', 'less', 'ltd', 'made', 'many', 'may', 'me', 'meanwhile', 'might', 'mill', 'mine', 'more', 'moreover', 'most', 'mostly', 'move', 'much', 'must', 'my', 'myself', 'name']
If you already have a list of words in mind you with to stop
, you can simply pass them as a list to the stop_words
argument.
We also need to consider that similar words, such as "girl" and "girls" should probably be considered as the same word. Thus we need a function that reduces words to a finite 'word stem'. We can do thsi with the Natural Language Toolkit (NLTK). After installing NLTK, import the library and try out the stemmer for english.
import nltk.stem
s = nltk.stem.SnowballStemmer('english')
print s.stem('bottles')
print s.stem('bottle')
print s.stem('perception')
print s.stem('perceptive')
print s.stem('crashing')
print s.stem('crashed')
bottl bottl percept percept crash crash
We need to step the posts before we feed then into the CountVectorizer
. The best way to do this is overwrite the method build_analyzer
.
By doing this we utilize the preprocessing functions in the parent class that converts the raw posts into lower case. We tokenize all the words, and then convert each word into the stemmed version.
import nltk.stem
english_stemmer = nltk.stem.SnowballStemmer('english')
class StemmedCountVectorizer(CountVectorizer):
def build_analyzer(self):
analyzer = super(StemmedCountVectorizer, self).build_analyzer()
return lambda doc: (english_stemmer.stem(w) for w in analyzer(doc))
vectorizer = StemmedCountVectorizer(min_df=1, stop_words='english')
X = vectorizer.fit_transform(posts)
vectorizer.get_feature_names()
[u'backyard', u'beer', u'bottl', u'bring', u'can', u'did', u'girl', u'like', u'open', u'parti', u'recycl', u'useless', u'worth']
# Restate the new vectorizer on the data
X_train = vectorizer.fit_transform(posts)
new_post_vect = vectorizer.transform([new_post])
similarity(new_post_vect, X_train)
Post 0 with dist = 0.61: how to open a beer without a bottle opener Post 1 with dist = 0.77: Do girls like beer bottles or beer cans? Post 2 with dist = 1.14: where did all my beer go? Post 3 with dist = 1.14: where did all my beer go? where did all my beer go? Post 4 with dist = 0.71: recycling beer bottles and cans Post 5 with dist = 1.41: Is it worth recycling? Post 6 with dist = 1.05: do not bring bottles to my backyard party, only cans please. Post 7 with dist = 1.41: This is useless Best post is 0 with dist = 0.605810893055
We see now that post 0 is most similar to our new post, because bottles and bottle are now treated as the same word.
print new_post
print posts[0]
Opening beer bottles and cans 101 how to open a beer without a bottle opener
So far we have considered that higher occurrence of certains words in post equates to a greater importance of that word in the post. While this is true to some extent, there is the case where very frequent words really don't carry any meaning to posts. For example, the word "Subject" appears in every blog post, thus it is not really communicating anything important, and does not help us distinguish between posts.
We could perhaps set a 90% occurrence cutoff in our tokenizer, such that words that occur in >90% of the posts are excluded, however, we still run into the problem of border cases, say where the word occurs in only 89% of the posts.
To solve these problems we count the term frequencies for every post while discounting those words that appear in many posts. This is the concept of term frequency - inverse document frequency (TF-IDF). We can implement this using scikit learn's TfidfVectorizer
.
from sklearn.feature_extraction.text import TfidfVectorizer
# Rebuild the function to include our stemmer
class StemmedTfidfVectorizer(TfidfVectorizer):
def build_analyzer(self):
analyzer = super(TfidfVectorizer, self).build_analyzer()
return lambda doc: (english_stemmer.stem(w) for w in analyzer(doc))
vectorizer = StemmedTfidfVectorizer(min_df=1, stop_words='english', decode_error='ignore')
Now instead of counts, our document vectors will contain individual TF-IDF values per term (token).
# Restate new vectorizer
X_train = vectorizer.fit_transform(posts)
new_post_vect = vectorizer.transform([new_post])
similarity(new_post_vect, X_train)
Post 0 with dist = 0.57: how to open a beer without a bottle opener Post 1 with dist = 0.99: Do girls like beer bottles or beer cans? Post 2 with dist = 1.26: where did all my beer go? Post 3 with dist = 1.26: where did all my beer go? where did all my beer go? Post 4 with dist = 0.90: recycling beer bottles and cans Post 5 with dist = 1.41: Is it worth recycling? Post 6 with dist = 1.17: do not bring bottles to my backyard party, only cans please. Post 7 with dist = 1.41: This is useless Best post is 0 with dist = 0.572957858071
So far we have:
Limitations of the bag-of-words approach
Now that we can represent our blog posts quantitatively, to some degree. Now our goal is to cluster similar posts. There are two main times of clustering algorithms: flat and hierarchical.
Flat clustering divides the posts into sets of clusters that minimizes the difference within clusters and maximized the difference between clusters. Generally we have to specify the number of clusters upfront.
Hierarchical clustering does not require the number of clusters as an input. It creates a hierarchy of clusters where very similar posts are grouped together, then similar clusters are then further grouped recursively until one cluster is left that contains all the data. Once completed, the user can discern the optimal number of clusters.
KMeans is probably the most common flat clustering algorithm. First you must specify the number of desired clusters (k). From there, the algorithm first specifies k random seeds within the data. Then it assigns each post to the closest seed centroid. Next, the seeds are relocated to the mean center of the points initially assigned to it. Then the process is repeat, whereby the posts are then reassigned based on the new closest seed point. This continues as long as the seed centroids move a considerable amount, after some n iterations, the movements will fall below a threshold. The algorithm is then considered converged.
We will utilize a machine learning dataset that contains 18 826 posts from 20 different newsgroups. There are many topics including technology, politics, and religion. However, for now we will only use the technical groups.
One question we could ask is, for a certain topic, can we effectivly cluster the newgroups who published that topic into distinct categories?
This data is already split into testing and training data, we can download the data using sklearn.
import sklearn.datasets
save_dir = '/users/ryankelly/downloads/' # Your save file path
# Download data using sklearn
df = sklearn.datasets.load_mlcomp("20news-18828", mlcomp_root=save_dir)
# Data files
print df.filenames
print len(df.filenames)
['/users/ryankelly/downloads/379/raw/comp.graphics/1190-38614' '/users/ryankelly/downloads/379/raw/comp.graphics/1383-38616' '/users/ryankelly/downloads/379/raw/alt.atheism/487-53344' ..., '/users/ryankelly/downloads/379/raw/rec.sport.hockey/10215-54303' '/users/ryankelly/downloads/379/raw/sci.crypt/10799-15660' '/users/ryankelly/downloads/379/raw/comp.os.ms-windows.misc/2732-10871'] 18828
# Data Topics
df.target_names
['alt.atheism', 'comp.graphics', 'comp.os.ms-windows.misc', 'comp.sys.ibm.pc.hardware', 'comp.sys.mac.hardware', 'comp.windows.x', 'misc.forsale', 'rec.autos', 'rec.motorcycles', 'rec.sport.baseball', 'rec.sport.hockey', 'sci.crypt', 'sci.electronics', 'sci.med', 'sci.space', 'soc.religion.christian', 'talk.politics.guns', 'talk.politics.mideast', 'talk.politics.misc', 'talk.religion.misc']
# Restrict data to only 'tech' categories
group = ['comp.graphics', 'comp.os.ms-windows.misc',
'comp.sys.ibm.pc.hardware', 'comp.sys.ma c.hardware',
'comp.windows.x', 'sci.space']
# Reload in only training data with the desired categories
train_data = sklearn.datasets.load_mlcomp('20news-18828', 'train',
mlcomp_root=save_dir,
categories=group)
print(len(train_data.filenames))
3414
While initializing our vectorizer
we have to remember that we are working with real data, which has many errors, which in this case invalid characers that cannot be encoded.
vec = StemmedTfidfVectorizer(min_df=10, max_df=0.5,
stop_words='english', decode_error='ignore')
vecData = vec.fit_transform(train_data.data)
num_samples, num_features = vecData.shape
print('#samples: {}, #features: {}').format(num_samples, num_features)
#samples: 3414, #features: 4331
This is the information we will use as input for KMeans clustering. Since we know there are 5 topic groups in these data, it makes sense that there could be 5 clusters in the data, so we will try this first.
num_clusters = 5
from sklearn.cluster import KMeans
km = KMeans(n_clusters=num_clusters, init='random', n_init=1, verbose=1)
km.fit(vecData)
Initialization complete Iteration 0, inertia 6434.212 Iteration 1, inertia 3302.138 Iteration 2, inertia 3286.234 Iteration 3, inertia 3278.006 Iteration 4, inertia 3274.039 Iteration 5, inertia 3271.234 Iteration 6, inertia 3268.856 Iteration 7, inertia 3267.609 Iteration 8, inertia 3266.964 Iteration 9, inertia 3266.352 Iteration 10, inertia 3265.901 Iteration 11, inertia 3265.509 Iteration 12, inertia 3264.970 Iteration 13, inertia 3263.969 Iteration 14, inertia 3261.887 Iteration 15, inertia 3259.657 Iteration 16, inertia 3258.196 Iteration 17, inertia 3257.560 Iteration 18, inertia 3256.997 Iteration 19, inertia 3256.714 Iteration 20, inertia 3256.482 Iteration 21, inertia 3256.326 Iteration 22, inertia 3256.126 Iteration 23, inertia 3255.998 Iteration 24, inertia 3255.918 Iteration 25, inertia 3255.870 Iteration 26, inertia 3255.826 Iteration 27, inertia 3255.768 Iteration 28, inertia 3255.658 Iteration 29, inertia 3255.574 Iteration 30, inertia 3255.550 Iteration 31, inertia 3255.533 Iteration 32, inertia 3255.527 Iteration 33, inertia 3255.522 Iteration 34, inertia 3255.513 Iteration 35, inertia 3255.508 Iteration 36, inertia 3255.503 Converged at iteration 36
KMeans(copy_x=True, init='random', max_iter=300, n_clusters=5, n_init=1, n_jobs=1, precompute_distances=True, random_state=None, tol=0.0001, verbose=1)
After fitting, we can get the clustering information out of the labels_
property, and cluster centers from cluster_centers_
. We then measure the completeness score to see the percentage of correct predictions.
from sklearn import metrics
metrics.completeness_score(train_data.target, km.labels_)
0.40904043798434664
39% accuracy isn't the best, but this could be because although there are five different topics, the contents are related between them, why dont we test several k
values and see the prediction scores.
from sklearn.cluster import KMeans
def best_k():
for i in range(2,40):
best_k = 0
best_score = 0
km = KMeans(n_clusters=num_clusters, init='random', n_init=1, verbose=1)
km.fit(vecData)
score = metrics.completeness_score(train_data.target, km.labels_)
if score > best_score:
best_k = i
best_score = score
out = [best_k, best_score]
return out
best_k()
Initialization complete Iteration 0, inertia 6445.479 Iteration 1, inertia 3292.339 Iteration 2, inertia 3275.461 Iteration 3, inertia 3270.621 Iteration 4, inertia 3268.049 Iteration 5, inertia 3266.777 Iteration 6, inertia 3266.141 Iteration 7, inertia 3265.889 Iteration 8, inertia 3265.754 Iteration 9, inertia 3265.668 Iteration 10, inertia 3265.602 Iteration 11, inertia 3265.509 Iteration 12, inertia 3265.367 Iteration 13, inertia 3265.151 Iteration 14, inertia 3264.775 Iteration 15, inertia 3264.314 Iteration 16, inertia 3263.827 Iteration 17, inertia 3263.243 Iteration 18, inertia 3262.592 Iteration 19, inertia 3262.179 Iteration 20, inertia 3261.991 Iteration 21, inertia 3261.915 Iteration 22, inertia 3261.842 Iteration 23, inertia 3261.741 Iteration 24, inertia 3261.661 Iteration 25, inertia 3261.614 Iteration 26, inertia 3261.582 Iteration 27, inertia 3261.569 Iteration 28, inertia 3261.557 Iteration 29, inertia 3261.539 Iteration 30, inertia 3261.525 Iteration 31, inertia 3261.499 Converged at iteration 31 Initialization complete Iteration 0, inertia 6524.930 Iteration 1, inertia 3308.247 Iteration 2, inertia 3292.389 Iteration 3, inertia 3283.365 Iteration 4, inertia 3278.358 Iteration 5, inertia 3276.421 Iteration 6, inertia 3275.128 Iteration 7, inertia 3273.981 Iteration 8, inertia 3272.630 Iteration 9, inertia 3270.863 Iteration 10, inertia 3268.894 Iteration 11, inertia 3267.018 Iteration 12, inertia 3265.305 Iteration 13, inertia 3263.985 Iteration 14, inertia 3263.395 Iteration 15, inertia 3262.957 Iteration 16, inertia 3262.720 Iteration 17, inertia 3262.581 Iteration 18, inertia 3262.501 Iteration 19, inertia 3262.414 Iteration 20, inertia 3262.318 Iteration 21, inertia 3262.253 Iteration 22, inertia 3262.192 Iteration 23, inertia 3262.085 Iteration 24, inertia 3261.962 Iteration 25, inertia 3261.815 Iteration 26, inertia 3261.625 Iteration 27, inertia 3261.492 Iteration 28, inertia 3261.394 Iteration 29, inertia 3261.278 Iteration 30, inertia 3261.206 Iteration 31, inertia 3261.134 Iteration 32, inertia 3261.077 Iteration 33, inertia 3261.018 Iteration 34, inertia 3260.997 Iteration 35, inertia 3260.975 Iteration 36, inertia 3260.958 Iteration 37, inertia 3260.949 Converged at iteration 37 Initialization complete Iteration 0, inertia 6392.513 Iteration 1, inertia 3298.129 Iteration 2, inertia 3286.500 Iteration 3, inertia 3280.842 Iteration 4, inertia 3277.803 Iteration 5, inertia 3276.304 Iteration 6, inertia 3274.915 Iteration 7, inertia 3273.931 Iteration 8, inertia 3273.201 Iteration 9, inertia 3272.640 Iteration 10, inertia 3272.355 Iteration 11, inertia 3272.069 Iteration 12, inertia 3271.870 Iteration 13, inertia 3271.619 Iteration 14, inertia 3271.328 Iteration 15, inertia 3271.052 Iteration 16, inertia 3270.824 Iteration 17, inertia 3270.511 Iteration 18, inertia 3270.053 Iteration 19, inertia 3269.612 Iteration 20, inertia 3269.327 Iteration 21, inertia 3269.190 Iteration 22, inertia 3269.089 Iteration 23, inertia 3269.024 Iteration 24, inertia 3268.943 Iteration 25, inertia 3268.846 Iteration 26, inertia 3268.764 Iteration 27, inertia 3268.697 Iteration 28, inertia 3268.597 Iteration 29, inertia 3268.465 Iteration 30, inertia 3268.295 Iteration 31, inertia 3268.120 Iteration 32, inertia 3267.779 Iteration 33, inertia 3267.203 Iteration 34, inertia 3266.515 Iteration 35, inertia 3265.992 Iteration 36, inertia 3265.674 Iteration 37, inertia 3265.235 Iteration 38, inertia 3264.315 Iteration 39, inertia 3263.987 Iteration 40, inertia 3263.929 Iteration 41, inertia 3263.905 Iteration 42, inertia 3263.885 Iteration 43, inertia 3263.866 Iteration 44, inertia 3263.859 Iteration 45, inertia 3263.852 Converged at iteration 45 Initialization complete Iteration 0, inertia 6326.529 Iteration 1, inertia 3294.746 Iteration 2, inertia 3282.371 Iteration 3, inertia 3276.461 Iteration 4, inertia 3273.181 Iteration 5, inertia 3271.013 Iteration 6, inertia 3268.783 Iteration 7, inertia 3266.648 Iteration 8, inertia 3265.133 Iteration 9, inertia 3264.077 Iteration 10, inertia 3263.566 Iteration 11, inertia 3263.328 Iteration 12, inertia 3263.232 Iteration 13, inertia 3263.172 Iteration 14, inertia 3263.125 Iteration 15, inertia 3263.087 Iteration 16, inertia 3263.064 Iteration 17, inertia 3263.053 Iteration 18, inertia 3263.047 Iteration 19, inertia 3263.044 Converged at iteration 19 Initialization complete Iteration 0, inertia 6396.511 Iteration 1, inertia 3292.367 Iteration 2, inertia 3280.269 Iteration 3, inertia 3275.911 Iteration 4, inertia 3272.600 Iteration 5, inertia 3270.273 Iteration 6, inertia 3269.109 Iteration 7, inertia 3268.377 Iteration 8, inertia 3267.638 Iteration 9, inertia 3266.541 Iteration 10, inertia 3265.821 Iteration 11, inertia 3265.175 Iteration 12, inertia 3264.720 Iteration 13, inertia 3264.471 Iteration 14, inertia 3264.307 Iteration 15, inertia 3264.199 Iteration 16, inertia 3264.110 Iteration 17, inertia 3264.035 Iteration 18, inertia 3263.980 Iteration 19, inertia 3263.934 Iteration 20, inertia 3263.922 Iteration 21, inertia 3263.906 Iteration 22, inertia 3263.890 Iteration 23, inertia 3263.867 Iteration 24, inertia 3263.857 Iteration 25, inertia 3263.845 Iteration 26, inertia 3263.827 Iteration 27, inertia 3263.818 Iteration 28, inertia 3263.816 Converged at iteration 28 Initialization complete Iteration 0, inertia 6431.988 Iteration 1, inertia 3293.092 Iteration 2, inertia 3278.216 Iteration 3, inertia 3269.663 Iteration 4, inertia 3265.719 Iteration 5, inertia 3263.092 Iteration 6, inertia 3261.218 Iteration 7, inertia 3260.260 Iteration 8, inertia 3259.782 Iteration 9, inertia 3259.574 Iteration 10, inertia 3259.506 Iteration 11, inertia 3259.466 Iteration 12, inertia 3259.449 Iteration 13, inertia 3259.435 Iteration 14, inertia 3259.422 Converged at iteration 14 Initialization complete Iteration 0, inertia 6434.113 Iteration 1, inertia 3296.655 Iteration 2, inertia 3278.784 Iteration 3, inertia 3272.196 Iteration 4, inertia 3270.036 Iteration 5, inertia 3268.580 Iteration 6, inertia 3266.836 Iteration 7, inertia 3265.345 Iteration 8, inertia 3264.172 Iteration 9, inertia 3263.147 Iteration 10, inertia 3262.455 Iteration 11, inertia 3261.793 Iteration 12, inertia 3261.236 Iteration 13, inertia 3260.754 Iteration 14, inertia 3260.035 Iteration 15, inertia 3259.548 Iteration 16, inertia 3259.407 Iteration 17, inertia 3259.335 Iteration 18, inertia 3259.323 Iteration 19, inertia 3259.319 Iteration 20, inertia 3259.313 Iteration 21, inertia 3259.307 Iteration 22, inertia 3259.302 Iteration 23, inertia 3259.298 Iteration 24, inertia 3259.296 Converged at iteration 24 Initialization complete Iteration 0, inertia 6421.814 Iteration 1, inertia 3300.660 Iteration 2, inertia 3287.858 Iteration 3, inertia 3281.381 Iteration 4, inertia 3276.546 Iteration 5, inertia 3271.531 Iteration 6, inertia 3267.330 Iteration 7, inertia 3264.234 Iteration 8, inertia 3263.418 Iteration 9, inertia 3262.728 Iteration 10, inertia 3262.077 Iteration 11, inertia 3261.563 Iteration 12, inertia 3261.202 Iteration 13, inertia 3260.836 Iteration 14, inertia 3260.469 Iteration 15, inertia 3260.095 Iteration 16, inertia 3259.766 Iteration 17, inertia 3259.590 Iteration 18, inertia 3259.492 Iteration 19, inertia 3259.396 Iteration 20, inertia 3259.263 Iteration 21, inertia 3259.172 Iteration 22, inertia 3259.122 Iteration 23, inertia 3259.087 Iteration 24, inertia 3259.059 Iteration 25, inertia 3259.021 Iteration 26, inertia 3258.983 Iteration 27, inertia 3258.919 Iteration 28, inertia 3258.870 Iteration 29, inertia 3258.826 Iteration 30, inertia 3258.756 Iteration 31, inertia 3258.694 Iteration 32, inertia 3258.621 Iteration 33, inertia 3258.534 Iteration 34, inertia 3258.440 Iteration 35, inertia 3258.277 Iteration 36, inertia 3258.160 Iteration 37, inertia 3258.098 Iteration 38, inertia 3258.041 Iteration 39, inertia 3257.966 Iteration 40, inertia 3257.909 Iteration 41, inertia 3257.860 Iteration 42, inertia 3257.774 Iteration 43, inertia 3257.727 Iteration 44, inertia 3257.694 Iteration 45, inertia 3257.666 Iteration 46, inertia 3257.593 Iteration 47, inertia 3257.551 Iteration 48, inertia 3257.537 Converged at iteration 48 Initialization complete Iteration 0, inertia 6373.464 Iteration 1, inertia 3297.963 Iteration 2, inertia 3287.660 Iteration 3, inertia 3282.323 Iteration 4, inertia 3279.099 Iteration 5, inertia 3277.759 Iteration 6, inertia 3277.064 Iteration 7, inertia 3276.650 Iteration 8, inertia 3276.232 Iteration 9, inertia 3275.737 Iteration 10, inertia 3275.473 Iteration 11, inertia 3275.339 Iteration 12, inertia 3275.253 Iteration 13, inertia 3275.199 Iteration 14, inertia 3275.158 Iteration 15, inertia 3275.128 Iteration 16, inertia 3275.107 Iteration 17, inertia 3275.076 Iteration 18, inertia 3275.055 Iteration 19, inertia 3275.041 Iteration 20, inertia 3275.022 Iteration 21, inertia 3274.999 Iteration 22, inertia 3274.979 Iteration 23, inertia 3274.960 Iteration 24, inertia 3274.942 Iteration 25, inertia 3274.931 Iteration 26, inertia 3274.926 Iteration 27, inertia 3274.922 Iteration 28, inertia 3274.920 Converged at iteration 28 Initialization complete Iteration 0, inertia 6289.281 Iteration 1, inertia 3304.450 Iteration 2, inertia 3288.473 Iteration 3, inertia 3282.639 Iteration 4, inertia 3280.544 Iteration 5, inertia 3279.671 Iteration 6, inertia 3279.139 Iteration 7, inertia 3278.606 Iteration 8, inertia 3278.196 Iteration 9, inertia 3277.723 Iteration 10, inertia 3277.261 Iteration 11, inertia 3276.925 Iteration 12, inertia 3276.570 Iteration 13, inertia 3276.117 Iteration 14, inertia 3275.711 Iteration 15, inertia 3275.582 Iteration 16, inertia 3275.538 Iteration 17, inertia 3275.526 Iteration 18, inertia 3275.517 Iteration 19, inertia 3275.509 Converged at iteration 19 Initialization complete Iteration 0, inertia 6390.941 Iteration 1, inertia 3290.356 Iteration 2, inertia 3274.869 Iteration 3, inertia 3268.843 Iteration 4, inertia 3265.737 Iteration 5, inertia 3264.341 Iteration 6, inertia 3263.580 Iteration 7, inertia 3262.989 Iteration 8, inertia 3262.543 Iteration 9, inertia 3262.156 Iteration 10, inertia 3261.898 Iteration 11, inertia 3261.653 Iteration 12, inertia 3261.429 Iteration 13, inertia 3261.209 Iteration 14, inertia 3260.992 Iteration 15, inertia 3260.760 Iteration 16, inertia 3260.407 Iteration 17, inertia 3259.996 Iteration 18, inertia 3259.382 Iteration 19, inertia 3258.432 Iteration 20, inertia 3257.154 Iteration 21, inertia 3256.723 Iteration 22, inertia 3256.546 Iteration 23, inertia 3256.446 Iteration 24, inertia 3256.391 Iteration 25, inertia 3256.362 Iteration 26, inertia 3256.344 Iteration 27, inertia 3256.339 Converged at iteration 27 Initialization complete Iteration 0, inertia 6432.035 Iteration 1, inertia 3304.822 Iteration 2, inertia 3291.831 Iteration 3, inertia 3281.480 Iteration 4, inertia 3275.025 Iteration 5, inertia 3270.730 Iteration 6, inertia 3266.021 Iteration 7, inertia 3261.621 Iteration 8, inertia 3259.239 Iteration 9, inertia 3258.382 Iteration 10, inertia 3257.763 Iteration 11, inertia 3257.227 Iteration 12, inertia 3256.768 Iteration 13, inertia 3256.410 Iteration 14, inertia 3256.245 Iteration 15, inertia 3256.139 Iteration 16, inertia 3256.045 Iteration 17, inertia 3256.003 Iteration 18, inertia 3255.975 Iteration 19, inertia 3255.955 Iteration 20, inertia 3255.938 Iteration 21, inertia 3255.926 Iteration 22, inertia 3255.919 Iteration 23, inertia 3255.906 Iteration 24, inertia 3255.901 Iteration 25, inertia 3255.899 Iteration 26, inertia 3255.897 Converged at iteration 26 Initialization complete Iteration 0, inertia 6439.297 Iteration 1, inertia 3292.780 Iteration 2, inertia 3279.272 Iteration 3, inertia 3275.342 Iteration 4, inertia 3271.297 Iteration 5, inertia 3266.407 Iteration 6, inertia 3264.193 Iteration 7, inertia 3262.548 Iteration 8, inertia 3261.671 Iteration 9, inertia 3260.768 Iteration 10, inertia 3259.996 Iteration 11, inertia 3259.212 Iteration 12, inertia 3258.566 Iteration 13, inertia 3258.245 Iteration 14, inertia 3258.081 Iteration 15, inertia 3257.916 Iteration 16, inertia 3257.788 Iteration 17, inertia 3257.724 Iteration 18, inertia 3257.663 Iteration 19, inertia 3257.642 Iteration 20, inertia 3257.620 Iteration 21, inertia 3257.606 Iteration 22, inertia 3257.599 Iteration 23, inertia 3257.597 Iteration 24, inertia 3257.592 Converged at iteration 24 Initialization complete Iteration 0, inertia 6437.135 Iteration 1, inertia 3308.062 Iteration 2, inertia 3296.359 Iteration 3, inertia 3288.127 Iteration 4, inertia 3284.844 Iteration 5, inertia 3282.816 Iteration 6, inertia 3280.496 Iteration 7, inertia 3277.755 Iteration 8, inertia 3274.709 Iteration 9, inertia 3271.397 Iteration 10, inertia 3269.900 Iteration 11, inertia 3269.041 Iteration 12, inertia 3268.558 Iteration 13, inertia 3268.149 Iteration 14, inertia 3267.920 Iteration 15, inertia 3267.757 Iteration 16, inertia 3267.569 Iteration 17, inertia 3267.379 Iteration 18, inertia 3267.232 Iteration 19, inertia 3267.083 Iteration 20, inertia 3266.887 Iteration 21, inertia 3266.684 Iteration 22, inertia 3266.575 Iteration 23, inertia 3266.486 Iteration 24, inertia 3266.413 Iteration 25, inertia 3266.331 Iteration 26, inertia 3266.293 Iteration 27, inertia 3266.268 Iteration 28, inertia 3266.235 Iteration 29, inertia 3266.214 Iteration 30, inertia 3266.203 Iteration 31, inertia 3266.192 Iteration 32, inertia 3266.186 Iteration 33, inertia 3266.183 Converged at iteration 33 Initialization complete Iteration 0, inertia 6493.097 Iteration 1, inertia 3302.676 Iteration 2, inertia 3285.066 Iteration 3, inertia 3278.241 Iteration 4, inertia 3274.562 Iteration 5, inertia 3270.829 Iteration 6, inertia 3265.238 Iteration 7, inertia 3261.167 Iteration 8, inertia 3259.118 Iteration 9, inertia 3258.502 Iteration 10, inertia 3258.201 Iteration 11, inertia 3257.948 Iteration 12, inertia 3257.797 Iteration 13, inertia 3257.716 Iteration 14, inertia 3257.673 Iteration 15, inertia 3257.666 Converged at iteration 15 Initialization complete Iteration 0, inertia 6359.146 Iteration 1, inertia 3291.541 Iteration 2, inertia 3279.445 Iteration 3, inertia 3275.558 Iteration 4, inertia 3273.488 Iteration 5, inertia 3272.191 Iteration 6, inertia 3271.287 Iteration 7, inertia 3270.702 Iteration 8, inertia 3270.374 Iteration 9, inertia 3270.197 Iteration 10, inertia 3269.949 Iteration 11, inertia 3269.697 Iteration 12, inertia 3269.348 Iteration 13, inertia 3268.820 Iteration 14, inertia 3267.955 Iteration 15, inertia 3266.767 Iteration 16, inertia 3265.877 Iteration 17, inertia 3265.359 Iteration 18, inertia 3264.872 Iteration 19, inertia 3264.386 Iteration 20, inertia 3263.777 Iteration 21, inertia 3263.350 Iteration 22, inertia 3262.954 Iteration 23, inertia 3262.645 Iteration 24, inertia 3262.343 Iteration 25, inertia 3262.119 Iteration 26, inertia 3262.012 Iteration 27, inertia 3261.943 Iteration 28, inertia 3261.875 Iteration 29, inertia 3261.808 Iteration 30, inertia 3261.770 Iteration 31, inertia 3261.744 Iteration 32, inertia 3261.707 Iteration 33, inertia 3261.679 Iteration 34, inertia 3261.674 Iteration 35, inertia 3261.669 Iteration 36, inertia 3261.667 Converged at iteration 36 Initialization complete Iteration 0, inertia 6373.946 Iteration 1, inertia 3294.749 Iteration 2, inertia 3278.626 Iteration 3, inertia 3273.958 Iteration 4, inertia 3271.969 Iteration 5, inertia 3270.800 Iteration 6, inertia 3269.873 Iteration 7, inertia 3269.060 Iteration 8, inertia 3268.193 Iteration 9, inertia 3267.473 Iteration 10, inertia 3266.822 Iteration 11, inertia 3266.335 Iteration 12, inertia 3266.065 Iteration 13, inertia 3265.876 Iteration 14, inertia 3265.720 Iteration 15, inertia 3265.663 Iteration 16, inertia 3265.627 Iteration 17, inertia 3265.610 Iteration 18, inertia 3265.577 Iteration 19, inertia 3265.549 Iteration 20, inertia 3265.523 Iteration 21, inertia 3265.513 Iteration 22, inertia 3265.503 Iteration 23, inertia 3265.497 Converged at iteration 23 Initialization complete Iteration 0, inertia 6454.118 Iteration 1, inertia 3303.824 Iteration 2, inertia 3288.688 Iteration 3, inertia 3282.998 Iteration 4, inertia 3279.922 Iteration 5, inertia 3278.183 Iteration 6, inertia 3276.889 Iteration 7, inertia 3275.991 Iteration 8, inertia 3275.039 Iteration 9, inertia 3273.694 Iteration 10, inertia 3272.089 Iteration 11, inertia 3270.481 Iteration 12, inertia 3269.142 Iteration 13, inertia 3267.853 Iteration 14, inertia 3266.220 Iteration 15, inertia 3264.370 Iteration 16, inertia 3262.774 Iteration 17, inertia 3261.495 Iteration 18, inertia 3260.136 Iteration 19, inertia 3258.555 Iteration 20, inertia 3256.940 Iteration 21, inertia 3256.170 Iteration 22, inertia 3255.746 Iteration 23, inertia 3255.497 Iteration 24, inertia 3255.385 Iteration 25, inertia 3255.340 Iteration 26, inertia 3255.299 Iteration 27, inertia 3255.283 Converged at iteration 27 Initialization complete Iteration 0, inertia 6362.969 Iteration 1, inertia 3296.454 Iteration 2, inertia 3282.673 Iteration 3, inertia 3275.059 Iteration 4, inertia 3269.156 Iteration 5, inertia 3264.227 Iteration 6, inertia 3259.917 Iteration 7, inertia 3257.108 Iteration 8, inertia 3256.442 Iteration 9, inertia 3256.069 Iteration 10, inertia 3255.857 Iteration 11, inertia 3255.774 Iteration 12, inertia 3255.708 Iteration 13, inertia 3255.674 Iteration 14, inertia 3255.650 Iteration 15, inertia 3255.635 Iteration 16, inertia 3255.631 Iteration 17, inertia 3255.629 Converged at iteration 17 Initialization complete Iteration 0, inertia 6476.107 Iteration 1, inertia 3296.095 Iteration 2, inertia 3282.579 Iteration 3, inertia 3276.890 Iteration 4, inertia 3272.801 Iteration 5, inertia 3268.908 Iteration 6, inertia 3266.789 Iteration 7, inertia 3265.977 Iteration 8, inertia 3265.409 Iteration 9, inertia 3264.982 Iteration 10, inertia 3264.650 Iteration 11, inertia 3264.401 Iteration 12, inertia 3264.138 Iteration 13, inertia 3263.900 Iteration 14, inertia 3263.748 Iteration 15, inertia 3263.628 Iteration 16, inertia 3263.528 Iteration 17, inertia 3263.422 Iteration 18, inertia 3263.345 Iteration 19, inertia 3263.335 Iteration 20, inertia 3263.326 Iteration 21, inertia 3263.324 Converged at iteration 21 Initialization complete Iteration 0, inertia 6467.892 Iteration 1, inertia 3299.482 Iteration 2, inertia 3284.474 Iteration 3, inertia 3276.773 Iteration 4, inertia 3273.421 Iteration 5, inertia 3271.134 Iteration 6, inertia 3269.243 Iteration 7, inertia 3268.631 Iteration 8, inertia 3268.409 Iteration 9, inertia 3268.296 Iteration 10, inertia 3268.184 Iteration 11, inertia 3268.000 Iteration 12, inertia 3267.834 Iteration 13, inertia 3267.674 Iteration 14, inertia 3267.473 Iteration 15, inertia 3267.362 Iteration 16, inertia 3267.273 Iteration 17, inertia 3267.147 Iteration 18, inertia 3267.035 Iteration 19, inertia 3266.914 Iteration 20, inertia 3266.829 Iteration 21, inertia 3266.699 Iteration 22, inertia 3266.545 Iteration 23, inertia 3266.270 Iteration 24, inertia 3265.958 Iteration 25, inertia 3265.560 Iteration 26, inertia 3265.069 Iteration 27, inertia 3264.684 Iteration 28, inertia 3264.510 Iteration 29, inertia 3264.421 Iteration 30, inertia 3264.306 Iteration 31, inertia 3264.165 Iteration 32, inertia 3264.036 Iteration 33, inertia 3263.952 Iteration 34, inertia 3263.910 Iteration 35, inertia 3263.856 Iteration 36, inertia 3263.814 Iteration 37, inertia 3263.778 Iteration 38, inertia 3263.729 Iteration 39, inertia 3263.623 Iteration 40, inertia 3263.525 Iteration 41, inertia 3263.408 Iteration 42, inertia 3263.292 Iteration 43, inertia 3263.134 Iteration 44, inertia 3262.944 Iteration 45, inertia 3262.742 Iteration 46, inertia 3262.450 Iteration 47, inertia 3261.958 Iteration 48, inertia 3260.961 Iteration 49, inertia 3259.360 Iteration 50, inertia 3258.312 Iteration 51, inertia 3257.919 Iteration 52, inertia 3257.750 Iteration 53, inertia 3257.643 Iteration 54, inertia 3257.588 Iteration 55, inertia 3257.580 Converged at iteration 55 Initialization complete Iteration 0, inertia 6420.248 Iteration 1, inertia 3304.572 Iteration 2, inertia 3289.501 Iteration 3, inertia 3282.402 Iteration 4, inertia 3278.539 Iteration 5, inertia 3276.338 Iteration 6, inertia 3274.250 Iteration 7, inertia 3272.702 Iteration 8, inertia 3270.959 Iteration 9, inertia 3269.232 Iteration 10, inertia 3267.949 Iteration 11, inertia 3266.887 Iteration 12, inertia 3265.973 Iteration 13, inertia 3265.242 Iteration 14, inertia 3264.568 Iteration 15, inertia 3264.087 Iteration 16, inertia 3263.834 Iteration 17, inertia 3263.631 Iteration 18, inertia 3263.505 Iteration 19, inertia 3263.451 Iteration 20, inertia 3263.379 Iteration 21, inertia 3263.328 Iteration 22, inertia 3263.294 Iteration 23, inertia 3263.249 Iteration 24, inertia 3263.226 Iteration 25, inertia 3263.212 Iteration 26, inertia 3263.198 Iteration 27, inertia 3263.185 Iteration 28, inertia 3263.176 Iteration 29, inertia 3263.173 Iteration 30, inertia 3263.171 Converged at iteration 30 Initialization complete Iteration 0, inertia 6400.961 Iteration 1, inertia 3298.251 Iteration 2, inertia 3280.432 Iteration 3, inertia 3275.345 Iteration 4, inertia 3273.142 Iteration 5, inertia 3271.588 Iteration 6, inertia 3269.971 Iteration 7, inertia 3268.344 Iteration 8, inertia 3267.296 Iteration 9, inertia 3266.664 Iteration 10, inertia 3265.748 Iteration 11, inertia 3264.808 Iteration 12, inertia 3263.649 Iteration 13, inertia 3262.882 Iteration 14, inertia 3262.461 Iteration 15, inertia 3262.228 Iteration 16, inertia 3262.058 Iteration 17, inertia 3261.915 Iteration 18, inertia 3261.792 Iteration 19, inertia 3261.680 Iteration 20, inertia 3261.592 Iteration 21, inertia 3261.520 Iteration 22, inertia 3261.401 Iteration 23, inertia 3261.279 Iteration 24, inertia 3261.215 Iteration 25, inertia 3261.126 Iteration 26, inertia 3261.046 Iteration 27, inertia 3260.992 Iteration 28, inertia 3260.953 Iteration 29, inertia 3260.912 Iteration 30, inertia 3260.862 Iteration 31, inertia 3260.815 Iteration 32, inertia 3260.791 Iteration 33, inertia 3260.779 Iteration 34, inertia 3260.773 Iteration 35, inertia 3260.761 Iteration 36, inertia 3260.741 Iteration 37, inertia 3260.718 Iteration 38, inertia 3260.701 Iteration 39, inertia 3260.698 Iteration 40, inertia 3260.688 Iteration 41, inertia 3260.677 Iteration 42, inertia 3260.672 Converged at iteration 42 Initialization complete Iteration 0, inertia 6472.336 Iteration 1, inertia 3303.275 Iteration 2, inertia 3284.138 Iteration 3, inertia 3274.154 Iteration 4, inertia 3268.411 Iteration 5, inertia 3265.076 Iteration 6, inertia 3262.077 Iteration 7, inertia 3261.408 Iteration 8, inertia 3260.914 Iteration 9, inertia 3260.573 Iteration 10, inertia 3260.182 Iteration 11, inertia 3259.746 Iteration 12, inertia 3259.141 Iteration 13, inertia 3258.615 Iteration 14, inertia 3258.188 Iteration 15, inertia 3257.699 Iteration 16, inertia 3257.071 Iteration 17, inertia 3256.768 Iteration 18, inertia 3256.620 Iteration 19, inertia 3256.475 Iteration 20, inertia 3256.358 Iteration 21, inertia 3256.205 Iteration 22, inertia 3256.133 Iteration 23, inertia 3256.099 Iteration 24, inertia 3256.074 Iteration 25, inertia 3256.063 Iteration 26, inertia 3256.057 Iteration 27, inertia 3256.055 Iteration 28, inertia 3256.053 Converged at iteration 28 Initialization complete Iteration 0, inertia 6409.635 Iteration 1, inertia 3306.802 Iteration 2, inertia 3292.770 Iteration 3, inertia 3282.228 Iteration 4, inertia 3274.919 Iteration 5, inertia 3269.284 Iteration 6, inertia 3265.479 Iteration 7, inertia 3262.476 Iteration 8, inertia 3260.595 Iteration 9, inertia 3259.696 Iteration 10, inertia 3259.101 Iteration 11, inertia 3258.481 Iteration 12, inertia 3258.167 Iteration 13, inertia 3257.964 Iteration 14, inertia 3257.725 Iteration 15, inertia 3257.538 Iteration 16, inertia 3257.429 Iteration 17, inertia 3257.344 Iteration 18, inertia 3257.202 Iteration 19, inertia 3257.062 Iteration 20, inertia 3256.865 Iteration 21, inertia 3256.692 Iteration 22, inertia 3256.549 Iteration 23, inertia 3256.403 Iteration 24, inertia 3256.245 Iteration 25, inertia 3256.127 Iteration 26, inertia 3256.025 Iteration 27, inertia 3255.952 Iteration 28, inertia 3255.853 Iteration 29, inertia 3255.769 Iteration 30, inertia 3255.630 Iteration 31, inertia 3255.571 Iteration 32, inertia 3255.543 Iteration 33, inertia 3255.516 Iteration 34, inertia 3255.496 Iteration 35, inertia 3255.489 Converged at iteration 35 Initialization complete Iteration 0, inertia 6414.364 Iteration 1, inertia 3292.636 Iteration 2, inertia 3274.091 Iteration 3, inertia 3266.486 Iteration 4, inertia 3263.416 Iteration 5, inertia 3261.789 Iteration 6, inertia 3260.794 Iteration 7, inertia 3260.258 Iteration 8, inertia 3259.941 Iteration 9, inertia 3259.658 Iteration 10, inertia 3259.351 Iteration 11, inertia 3258.914 Iteration 12, inertia 3258.190 Iteration 13, inertia 3257.195 Iteration 14, inertia 3256.270 Iteration 15, inertia 3255.707 Iteration 16, inertia 3255.556 Iteration 17, inertia 3255.521 Iteration 18, inertia 3255.483 Iteration 19, inertia 3255.469 Iteration 20, inertia 3255.460 Iteration 21, inertia 3255.456 Converged at iteration 21 Initialization complete Iteration 0, inertia 6324.895 Iteration 1, inertia 3293.965 Iteration 2, inertia 3275.830 Iteration 3, inertia 3267.741 Iteration 4, inertia 3263.209 Iteration 5, inertia 3261.451 Iteration 6, inertia 3260.725 Iteration 7, inertia 3260.367 Iteration 8, inertia 3260.137 Iteration 9, inertia 3259.991 Iteration 10, inertia 3259.924 Iteration 11, inertia 3259.890 Iteration 12, inertia 3259.877 Iteration 13, inertia 3259.861 Converged at iteration 13 Initialization complete Iteration 0, inertia 6439.038 Iteration 1, inertia 3291.442 Iteration 2, inertia 3276.028 Iteration 3, inertia 3271.637 Iteration 4, inertia 3269.695 Iteration 5, inertia 3268.859 Iteration 6, inertia 3268.340 Iteration 7, inertia 3267.780 Iteration 8, inertia 3267.261 Iteration 9, inertia 3266.530 Iteration 10, inertia 3265.668 Iteration 11, inertia 3264.816 Iteration 12, inertia 3263.986 Iteration 13, inertia 3263.582 Iteration 14, inertia 3263.172 Iteration 15, inertia 3262.976 Iteration 16, inertia 3262.861 Iteration 17, inertia 3262.783 Iteration 18, inertia 3262.751 Iteration 19, inertia 3262.726 Iteration 20, inertia 3262.708 Iteration 21, inertia 3262.699 Converged at iteration 21 Initialization complete Iteration 0, inertia 6458.746 Iteration 1, inertia 3309.368 Iteration 2, inertia 3296.435 Iteration 3, inertia 3288.927 Iteration 4, inertia 3282.518 Iteration 5, inertia 3275.289 Iteration 6, inertia 3267.311 Iteration 7, inertia 3264.367 Iteration 8, inertia 3263.004 Iteration 9, inertia 3262.378 Iteration 10, inertia 3261.967 Iteration 11, inertia 3261.658 Iteration 12, inertia 3261.507 Iteration 13, inertia 3261.294 Iteration 14, inertia 3261.093 Iteration 15, inertia 3260.902 Iteration 16, inertia 3260.740 Iteration 17, inertia 3260.652 Iteration 18, inertia 3260.585 Iteration 19, inertia 3260.539 Iteration 20, inertia 3260.491 Iteration 21, inertia 3260.454 Iteration 22, inertia 3260.426 Iteration 23, inertia 3260.412 Iteration 24, inertia 3260.405 Iteration 25, inertia 3260.402 Iteration 26, inertia 3260.398 Iteration 27, inertia 3260.390 Iteration 28, inertia 3260.382 Iteration 29, inertia 3260.380 Iteration 30, inertia 3260.376 Converged at iteration 30 Initialization complete Iteration 0, inertia 6350.535 Iteration 1, inertia 3291.919 Iteration 2, inertia 3279.374 Iteration 3, inertia 3273.346 Iteration 4, inertia 3269.117 Iteration 5, inertia 3266.915 Iteration 6, inertia 3265.431 Iteration 7, inertia 3264.712 Iteration 8, inertia 3264.349 Iteration 9, inertia 3264.067 Iteration 10, inertia 3263.850 Iteration 11, inertia 3263.726 Iteration 12, inertia 3263.650 Iteration 13, inertia 3263.619 Iteration 14, inertia 3263.607 Iteration 15, inertia 3263.597 Converged at iteration 15 Initialization complete Iteration 0, inertia 6456.248 Iteration 1, inertia 3300.444 Iteration 2, inertia 3283.503 Iteration 3, inertia 3276.788 Iteration 4, inertia 3274.204 Iteration 5, inertia 3272.677 Iteration 6, inertia 3271.439 Iteration 7, inertia 3270.415 Iteration 8, inertia 3269.341 Iteration 9, inertia 3268.165 Iteration 10, inertia 3267.504 Iteration 11, inertia 3267.135 Iteration 12, inertia 3266.829 Iteration 13, inertia 3266.572 Iteration 14, inertia 3266.337 Iteration 15, inertia 3266.077 Iteration 16, inertia 3265.841 Iteration 17, inertia 3265.544 Iteration 18, inertia 3265.359 Iteration 19, inertia 3265.181 Iteration 20, inertia 3265.045 Iteration 21, inertia 3264.936 Iteration 22, inertia 3264.811 Iteration 23, inertia 3264.654 Iteration 24, inertia 3264.496 Iteration 25, inertia 3264.081 Iteration 26, inertia 3263.339 Iteration 27, inertia 3261.533 Iteration 28, inertia 3258.654 Iteration 29, inertia 3256.621 Iteration 30, inertia 3255.979 Iteration 31, inertia 3255.643 Iteration 32, inertia 3255.477 Iteration 33, inertia 3255.403 Iteration 34, inertia 3255.360 Iteration 35, inertia 3255.335 Converged at iteration 35 Initialization complete Iteration 0, inertia 6451.563 Iteration 1, inertia 3304.684 Iteration 2, inertia 3285.713 Iteration 3, inertia 3279.365 Iteration 4, inertia 3277.067 Iteration 5, inertia 3275.508 Iteration 6, inertia 3274.519 Iteration 7, inertia 3273.507 Iteration 8, inertia 3272.746 Iteration 9, inertia 3272.162 Iteration 10, inertia 3271.657 Iteration 11, inertia 3271.264 Iteration 12, inertia 3270.956 Iteration 13, inertia 3270.540 Iteration 14, inertia 3270.082 Iteration 15, inertia 3269.869 Iteration 16, inertia 3269.726 Iteration 17, inertia 3269.584 Iteration 18, inertia 3269.468 Iteration 19, inertia 3269.352 Iteration 20, inertia 3269.178 Iteration 21, inertia 3269.011 Iteration 22, inertia 3268.723 Iteration 23, inertia 3268.353 Iteration 24, inertia 3267.843 Iteration 25, inertia 3267.215 Iteration 26, inertia 3266.362 Iteration 27, inertia 3265.584 Iteration 28, inertia 3265.157 Iteration 29, inertia 3264.786 Iteration 30, inertia 3264.364 Iteration 31, inertia 3263.901 Iteration 32, inertia 3263.552 Iteration 33, inertia 3263.260 Iteration 34, inertia 3262.937 Iteration 35, inertia 3262.485 Iteration 36, inertia 3261.695 Iteration 37, inertia 3261.107 Iteration 38, inertia 3260.828 Iteration 39, inertia 3260.594 Iteration 40, inertia 3260.428 Iteration 41, inertia 3260.389 Iteration 42, inertia 3260.367 Iteration 43, inertia 3260.365 Iteration 44, inertia 3260.359 Converged at iteration 44 Initialization complete Iteration 0, inertia 6405.600 Iteration 1, inertia 3302.004 Iteration 2, inertia 3283.203 Iteration 3, inertia 3276.145 Iteration 4, inertia 3273.083 Iteration 5, inertia 3271.498 Iteration 6, inertia 3270.418 Iteration 7, inertia 3269.699 Iteration 8, inertia 3268.915 Iteration 9, inertia 3267.884 Iteration 10, inertia 3266.646 Iteration 11, inertia 3265.083 Iteration 12, inertia 3263.472 Iteration 13, inertia 3262.431 Iteration 14, inertia 3261.918 Iteration 15, inertia 3261.636 Iteration 16, inertia 3261.445 Iteration 17, inertia 3261.310 Iteration 18, inertia 3261.224 Iteration 19, inertia 3261.135 Iteration 20, inertia 3261.059 Iteration 21, inertia 3261.018 Iteration 22, inertia 3260.983 Iteration 23, inertia 3260.947 Iteration 24, inertia 3260.900 Iteration 25, inertia 3260.840 Iteration 26, inertia 3260.790 Iteration 27, inertia 3260.764 Iteration 28, inertia 3260.743 Iteration 29, inertia 3260.738 Converged at iteration 29 Initialization complete Iteration 0, inertia 6448.216 Iteration 1, inertia 3298.831 Iteration 2, inertia 3279.635 Iteration 3, inertia 3269.284 Iteration 4, inertia 3263.260 Iteration 5, inertia 3259.594 Iteration 6, inertia 3257.439 Iteration 7, inertia 3256.139 Iteration 8, inertia 3255.675 Iteration 9, inertia 3255.538 Iteration 10, inertia 3255.445 Iteration 11, inertia 3255.393 Iteration 12, inertia 3255.364 Iteration 13, inertia 3255.356 Iteration 14, inertia 3255.344 Iteration 15, inertia 3255.334 Converged at iteration 15 Initialization complete Iteration 0, inertia 6455.246 Iteration 1, inertia 3306.953 Iteration 2, inertia 3294.150 Iteration 3, inertia 3287.016 Iteration 4, inertia 3283.105 Iteration 5, inertia 3280.206 Iteration 6, inertia 3277.649 Iteration 7, inertia 3275.314 Iteration 8, inertia 3273.816 Iteration 9, inertia 3272.719 Iteration 10, inertia 3271.792 Iteration 11, inertia 3270.814 Iteration 12, inertia 3270.039 Iteration 13, inertia 3269.696 Iteration 14, inertia 3269.384 Iteration 15, inertia 3269.025 Iteration 16, inertia 3268.540 Iteration 17, inertia 3268.051 Iteration 18, inertia 3267.514 Iteration 19, inertia 3267.302 Iteration 20, inertia 3267.222 Iteration 21, inertia 3267.177 Iteration 22, inertia 3267.135 Iteration 23, inertia 3267.080 Iteration 24, inertia 3266.960 Iteration 25, inertia 3266.678 Iteration 26, inertia 3265.716 Iteration 27, inertia 3262.812 Iteration 28, inertia 3257.784 Iteration 29, inertia 3256.421 Iteration 30, inertia 3255.818 Iteration 31, inertia 3255.614 Iteration 32, inertia 3255.518 Iteration 33, inertia 3255.469 Iteration 34, inertia 3255.443 Iteration 35, inertia 3255.435 Iteration 36, inertia 3255.429 Iteration 37, inertia 3255.420 Iteration 38, inertia 3255.416 Iteration 39, inertia 3255.409 Iteration 40, inertia 3255.399 Iteration 41, inertia 3255.378 Iteration 42, inertia 3255.365 Iteration 43, inertia 3255.355 Iteration 44, inertia 3255.347 Iteration 45, inertia 3255.345 Iteration 46, inertia 3255.342 Iteration 47, inertia 3255.340 Converged at iteration 47 Initialization complete Iteration 0, inertia 6373.585 Iteration 1, inertia 3295.265 Iteration 2, inertia 3276.429 Iteration 3, inertia 3270.790 Iteration 4, inertia 3269.210 Iteration 5, inertia 3268.392 Iteration 6, inertia 3267.849 Iteration 7, inertia 3267.406 Iteration 8, inertia 3267.006 Iteration 9, inertia 3266.540 Iteration 10, inertia 3266.094 Iteration 11, inertia 3265.727 Iteration 12, inertia 3265.176 Iteration 13, inertia 3264.168 Iteration 14, inertia 3262.569 Iteration 15, inertia 3261.010 Iteration 16, inertia 3260.253 Iteration 17, inertia 3260.028 Iteration 18, inertia 3259.907 Iteration 19, inertia 3259.861 Iteration 20, inertia 3259.830 Iteration 21, inertia 3259.785 Iteration 22, inertia 3259.758 Iteration 23, inertia 3259.755 Converged at iteration 23 Initialization complete Iteration 0, inertia 6354.581 Iteration 1, inertia 3307.480 Iteration 2, inertia 3294.591 Iteration 3, inertia 3286.870 Iteration 4, inertia 3283.171 Iteration 5, inertia 3280.286 Iteration 6, inertia 3277.624 Iteration 7, inertia 3275.121 Iteration 8, inertia 3272.140 Iteration 9, inertia 3269.519 Iteration 10, inertia 3267.144 Iteration 11, inertia 3264.701 Iteration 12, inertia 3262.442 Iteration 13, inertia 3260.466 Iteration 14, inertia 3258.164 Iteration 15, inertia 3257.111 Iteration 16, inertia 3256.494 Iteration 17, inertia 3255.938 Iteration 18, inertia 3255.690 Iteration 19, inertia 3255.623 Iteration 20, inertia 3255.598 Iteration 21, inertia 3255.591 Iteration 22, inertia 3255.587 Iteration 23, inertia 3255.583 Converged at iteration 23 Initialization complete Iteration 0, inertia 6456.341 Iteration 1, inertia 3299.840 Iteration 2, inertia 3286.698 Iteration 3, inertia 3281.930 Iteration 4, inertia 3279.365 Iteration 5, inertia 3275.912 Iteration 6, inertia 3271.700 Iteration 7, inertia 3268.976 Iteration 8, inertia 3267.243 Iteration 9, inertia 3266.373 Iteration 10, inertia 3265.959 Iteration 11, inertia 3265.614 Iteration 12, inertia 3265.320 Iteration 13, inertia 3265.040 Iteration 14, inertia 3264.620 Iteration 15, inertia 3264.257 Iteration 16, inertia 3264.017 Iteration 17, inertia 3263.875 Iteration 18, inertia 3263.794 Iteration 19, inertia 3263.725 Iteration 20, inertia 3263.691 Iteration 21, inertia 3263.666 Iteration 22, inertia 3263.640 Iteration 23, inertia 3263.625 Iteration 24, inertia 3263.621 Iteration 25, inertia 3263.610 Iteration 26, inertia 3263.607 Iteration 27, inertia 3263.604 Converged at iteration 27
[39, 0.40027932557045898]
40% accuracy using 39 clusters is only marginally better than our model with 5 clusters, we will definately choose the simpler model moving forward. Remember though that these results are still in sample
error, and are probably better than we can expect on real data.
Now we are at the stage where we can recommend similar articles to the user. This could be implemented as part of the serach algorithm, or simply recommended posts to read after the current page.
We first need to vectorize the new post before we predict it's label.
new_post = '''hard drives can fail at any time,
it is important to always backup your data.'''
new_post_vec = vec.transform([new_post])
new_post_label = km.predict(new_post_vec)[0] # predict the class it belongs to
# Select all posts with the same cluster label as the new post vector
similar_label = (km.labels_ == new_post_label).nonzero()[0]
Now, between the records we know are similar, we build a new list of similarity scores, similar to what we did above in earlier examples.
similar = []
for i in similar_label:
dist = sp.linalg.norm((new_post_vec - vecData[i].toarray()))
similar.append((dist, train_data.target[i], train_data.data[i]))
similar = sorted(similar)
print(len(similar))
175
# Present the most similar posts
print similar[0]
(1.1757159813728066, 2, 'From: gjp@sei.cmu.edu (George Pandelios)\nSubject: Help me select a Backup Solution\n\n\nHi Netters!\n\nI\'m looking at purchasing some sort of backup solution. After you read about\nmy situation, I\'d like your opinion. Here\'s the scenario:\n\n1. There are two computers in the house. One is a small 286 (40MB IDE drive).\n The other is a 386DX (213 SCSI drive w/ Adaptec 1522 controller). Both \n systems have PC TOOLS and will use Central Point Backup as the backup / \n restore program. Both systems have 3.5" and 5.25" floppies.\n\n2. The computers are not networked (nor will they be anytime soon).\n\nFrom what I have seen so far, there appear to be at least 4 possible\nsolutions (I\'m sure there are others I haven\'t thought about). For these \noptions, I would appreciate hearing from anyone who has tried them or sees \nany flaws (drive type X won\'t coexist with device Y, etc.) in my thinking \n(I don\'t know very much about these beasts):\n\n1. Put 2.88MB floppy drives (or a combination drive) on each system.\n Can someone supply cost and brand information? What\'s a good brand?\n What do the floppies themselves cost?\n\n\n2. Put an internal tape backup unit on the 386 using my SCSI adapter, and\n continue to back up the 286 with floppies. Again, can someone recommend a\n few manufacturers? The only brand I remember is Colorado Memories. Any\n happy or unhappy users (I know about the compression controversy)?\n \n\n3. Connect an external tape backup unit on the 386 using my SCSI adapter, and\n (maybe?) connect it to the 286 somehow (any suggestions?)\n\n\n4. Install a Floptical drive in each machine. Again, any gotcha\'s or \n recommendations for manufacturers? \n\nI appreciate your help. You may either post or send me e-mail. I will\nsummarize all responses for the net.\n\nThanks,\n\nGeorge\n=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=\n George J. Pandelios\t\t\t\tInternet: gjp@sei.cmu.edu\n Software Engineering Institute\t\tusenet:\t sei!gjp\n 4500 Fifth Avenue\t\t\t\tVoice:\t (412) 268-7186\n Pittsburgh, PA 15213\t\t\t\tFAX:\t (412) 268-5758\n=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=\nDisclaimer: These opinions are my own and do not reflect those of the\n\t Software Engineering Institute, its sponsors, customers, \n\t clients, affiliates, or Carnegie Mellon University. In fact,\n\t any resemblence of these opinions to any individual, living\n\t or dead, fictional or real, is purely coincidental. So there.\n=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=\n')
from IPython.core.display import HTML
def css_styling():
styles = open("/users/ryankelly/desktop/custom_notebook.css", "r").read()
return HTML(styles)
css_styling()
def social():
code = """
<a style='float:left; margin-right:5px;' href="https://twitter.com/share" class="twitter-share-button" data-text="Check this out" data-via="Ryanmdk">Tweet</a>
<script>!function(d,s,id){var js,fjs=d.getElementsByTagName(s)[0],p=/^http:/.test(d.location)?'http':'https';if(!d.getElementById(id)){js=d.createElement(s);js.id=id;js.src=p+'://platform.twitter.com/widgets.js';fjs.parentNode.insertBefore(js,fjs);}}(document, 'script', 'twitter-wjs');</script>
<a style='float:left; margin-right:5px;' href="https://twitter.com/Ryanmdk" class="twitter-follow-button" data-show-count="false">Follow @Ryanmdk</a>
<script>!function(d,s,id){var js,fjs=d.getElementsByTagName(s)[0],p=/^http:/.test(d.location)?'http':'https';if(!d.getElementById(id)){js=d.createElement(s);js.id=id;js.src=p+'://platform.twitter.com/widgets.js';fjs.parentNode.insertBefore(js,fjs);}}(document, 'script', 'twitter-wjs');</script>
<a style='float:left; margin-right:5px;'target='_parent' href="http://www.reddit.com/submit" onclick="window.location = 'http://www.reddit.com/submit?url=' + encodeURIComponent(window.location); return false"> <img src="http://www.reddit.com/static/spreddit7.gif" alt="submit to reddit" border="0" /> </a>
<script src="//platform.linkedin.com/in.js" type="text/javascript">
lang: en_US
</script>
<script type="IN/Share"></script>
"""
return HTML(code)