Last.fm¶

Author : Kartik Jagdale (https://github.com/kartikjagdale)¶

Last.fm is a music discovery service that gives you personalised recommendations based on the music you listento.

Here we are going to do some machine learning and data anlysis on the dataset of last.fm inorder to recommend the next songs to the user.

We are going to use NearestNeighbors Algorithm to predict next songs that user will like to hear

Note: Dataset retrieved Last.fm [LastFM_Matrix.csv] contaning 1257 records and 285 Songs

In [4]:

# First Import some essential Libraries
import os
import pandas as pd
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity # For calculating similarity matrix
from sklearn.neighbors import NearestNeighbors

In [5]:

DIR_PATH = os.getcwd() #Get currect directory

lfm = pd.read_csv(DIR_PATH + "//LastFM_Matrix.csv") #Load dataset
lfm.head() #Display Head of the dataset

Out[5]:

	user	adam green	...
0	1	0	...
1	33	1	...
2	42	0	...
3	51	0	...
4	62	0	...

5 rows × 286 columns

lets get all/some names of songs and user coloumn in the dataset

In [6]:

songs = pd.DataFrame(lfm.columns)
songs.head(10)

Out[6]:

	0
0	user
1	a perfect circle
2	abba
3	ac/dc
4	adam green
5	aerosmith
6	afi
7	air
8	alanis morissette
9	alexisonfire

Now let's import only songs and make a new DataFrame

In [7]:

lfm_songs = lfm.drop("user",axis =1) #drop user column
lfm_songs.head() # Show Head

Out[7]:

	adam green	...
0	0	...
1	1	...
2	0	...
3	0	...
4	0	...

5 rows × 285 columns

In [8]:

lfm_songs.shape #gives out total rows and columns

Out[8]:

(1257, 285)

Calculate cosine_similarity in order to get Similarity Matrix

In [9]:

data_similarity = cosine_similarity(lfm_songs.T) #
data_similarity

Out[9]:

array([[ 1.        ,  0.        ,  0.01791723, ...,  0.06506   ,
         0.05216405,  0.        ],
       [ 0.        ,  1.        ,  0.05227877, ...,  0.        ,
         0.02536731,  0.        ],
       [ 0.01791723,  0.05227877,  1.        , ...,  0.02039967,
         0.13084898,  0.        ],
       ..., 
       [ 0.06506   ,  0.        ,  0.02039967, ...,  1.        ,
         0.        ,  0.        ],
       [ 0.05216405,  0.02536731,  0.13084898, ...,  0.        ,
         1.        ,  0.02969569],
       [ 0.        ,  0.        ,  0.        , ...,  0.        ,
         0.02969569,  1.        ]])

Now we have obtained data similarity matrix now lets use K-nearest neighbour algo and predict the recommendations but first we will label the matrix

In [10]:

type(data_similarity)

Out[10]:

numpy.ndarray

Lets convert it ito DataFrame

In [11]:

data_similarity_df = pd.DataFrame(data_similarity, columns=(lfm_songs.columns), index=(lfm_songs.columns))

In [12]:

data_similarity_df.head()# similarity Matrix

Out[12]:

	a perfect circle	abba	ac/dc	adam green	aerosmith	afi	air	alanis morissette	alicia keys	...	timbaland	tom waits	tool	tori amos	travis	trivium	u2	underoath	volbeat	yann tiersen
a perfect circle	1.000000	0.000000	0.017917	0.051554	0.062776	0.000000	0.051755	0.060718	0.000000	...	0.047338	0.081200	0.394709	0.125553	0.030359	0.111154	0.024398	0.06506	0.052164	0.000000
abba	0.000000	1.000000	0.052279	0.025071	0.061056	0.000000	0.016779	0.029527	0.000000	...	0.000000	0.000000	0.000000	0.061056	0.029527	0.000000	0.094916	0.00000	0.025367	0.000000
ac/dc	0.017917	0.052279	1.000000	0.113154	0.177153	0.067894	0.075730	0.038076	0.088333	...	0.044529	0.067894	0.058241	0.039367	0.000000	0.087131	0.122398	0.02040	0.130849	0.000000
adam green	0.051554	0.025071	0.113154	1.000000	0.056637	0.000000	0.093386	0.000000	0.025416	...	0.000000	0.146516	0.083789	0.056637	0.082169	0.025071	0.022011	0.00000	0.023531	0.088045
aerosmith	0.062776	0.061056	0.177153	0.056637	1.000000	0.000000	0.113715	0.100056	0.061898	...	0.052005	0.029735	0.025507	0.068966	0.033352	0.000000	0.214423	0.00000	0.057307	0.000000

5 rows × 285 columns

In [13]:

data_similarity_df.index.is_unique # check if there is no repeated songs

Out[13]:

True

Now we will use NearestNeighbors Algorithm and apply to similarity matrix to get the recommendation

In [14]:

neigh = NearestNeighbors(n_neighbors=285)
neigh.fit(data_similarity_df) # Fit the data

Out[14]:

NearestNeighbors(algorithm='auto', leaf_size=30, metric='minkowski',
         metric_params=None, n_neighbors=285, p=2, radius=1.0)

In [15]:

#Copy the predicted data to a new DataFrame
model = pd.DataFrame(neigh.kneighbors(data_similarity_df, return_distance=False))
model.head() #gives you integer values instead of song names

Out[15]:

	0	1	2	3	4	5	6	7	8	9	...	275	276	277	278	279	280	281	282	283	284
0	0	277	81	70	189	206	108	235	264	80	...	216	147	60	90	159	254	261	57	32	218
1	1	221	88	165	174	175	83	208	113	103	...	230	33	213	172	19	79	162	150	125	241
2	2	128	172	36	190	75	182	116	258	140	...	218	39	263	248	57	68	179	261	17	32
3	3	255	267	25	276	47	84	104	266	59	...	213	11	90	20	238	79	92	162	150	125
4	4	281	157	158	115	93	106	78	103	262	...	253	10	19	162	22	241	39	125	20	150

5 rows × 285 columns

In [16]:

final_model = pd.DataFrame(data_similarity_df.columns[model], index=data_similarity_df.index)#gives names with respect to songs

In [17]:

final_model.head() #preview final Model

Out[17]:

	0	1	2	3	4	5	6	7	8	9	...	275	276	277	278	279	280	281	282	283	284
a perfect circle	a perfect circle	tool	dredg	deftones	nine inch nails	porcupine tree	godsmack	staind	the smashing pumpkins	dream theater	...	red hot chili peppers	katy perry	coldplay	ensiferum	leona lewis	the kooks	the pussycat dolls	christina aguilera	beyonce	rihanna
abba	abba	robbie williams	elvis presley	madonna	michael jackson	mika	duffy	queen	groove coverage	frank sinatra	...	slipknot	billy talent	rammstein	metallica	arctic monkeys	disturbed	linkin park	killswitch engage	in flames	system of a down
ac/dc	ac/dc	iron maiden	metallica	black sabbath	nirvana	die toten hosen	motorhead	hammerfall	the offspring	judas priest	...	rihanna	bloc party	the shins	the decemberists	christina aguilera	death cab for cutie	modest mouse	the pussycat dolls	arcade fire	beyonce
adam green	adam green	the libertines	the strokes	babyshambles	tom waits	bright eyes	editors	franz ferdinand	the streets	cocorosie	...	rammstein	amon amarth	ensiferum	as i lay dying	subway to sally	disturbed	equilibrium	linkin park	killswitch engage	in flames
aerosmith	aerosmith	u2	led zeppelin	lenny kravitz	guns n roses	eric clapton	genesis	dire straits	frank sinatra	the rolling stones	...	the killers	all that remains	arctic monkeys	linkin park	atreyu	system of a down	bloc party	in flames	as i lay dying	killswitch engage

5 rows × 285 columns

The above model gives us all 285 Recommendation, but we want only Top 10 recommendation, so lets modify the DataFrame a bit

In [18]:

top10 = final_model[list(final_model.columns[:11])]

In [19]:

top10.head()

Out[19]:

	0	1	2	3	4	5	6	7	8	9	10
a perfect circle	a perfect circle	tool	dredg	deftones	nine inch nails	porcupine tree	godsmack	staind	the smashing pumpkins	dream theater	opeth
abba	abba	robbie williams	elvis presley	madonna	michael jackson	mika	duffy	queen	groove coverage	frank sinatra	hans zimmer
ac/dc	ac/dc	iron maiden	metallica	black sabbath	nirvana	die toten hosen	motorhead	hammerfall	the offspring	judas priest	bloodhound gang
adam green	adam green	the libertines	the strokes	babyshambles	tom waits	bright eyes	editors	franz ferdinand	the streets	cocorosie	queens of the stone age
aerosmith	aerosmith	u2	led zeppelin	lenny kravitz	guns n roses	eric clapton	genesis	dire straits	frank sinatra	the rolling stones	deep purple

Now lets put our results in CSV File called top10

In [20]:

top10.to_csv("top10.csv",index_label = "Index") # store data in csv file

Now lets read the CSV File to check if its saved or not

In [21]:

pd.read_csv("top10").head()

Out[21]:

	Index	0	1	2	3	4	5	6	7	8	9	10
0	a perfect circle	a perfect circle	tool	dredg	deftones	nine inch nails	porcupine tree	godsmack	staind	the smashing pumpkins	dream theater	opeth
1	abba	abba	robbie williams	elvis presley	madonna	michael jackson	mika	duffy	queen	groove coverage	frank sinatra	hans zimmer
2	ac/dc	ac/dc	iron maiden	metallica	black sabbath	nirvana	die toten hosen	motorhead	hammerfall	the offspring	judas priest	bloodhound gang
3	adam green	adam green	the libertines	the strokes	babyshambles	tom waits	bright eyes	editors	franz ferdinand	the streets	cocorosie	queens of the stone age
4	aerosmith	aerosmith	u2	led zeppelin	lenny kravitz	guns n roses	eric clapton	genesis	dire straits	frank sinatra	the rolling stones	deep purple

`Conclude`¶

To conclude we have created a model which recommends next song user will like to hear by using last.fm data.

Further we can now use this model to make an API and use it in our Website or WebApp to recommend songs to the user.

** Github Link **: https://github.com/kartikjagdale/Last.fm-Song-Recommender

In [21]:

	user	adam green	...
0	1	0	...
1	33	1	...
2	42	0	...
3	51	0	...
4	62	0	...

	adam green	...
0	0	...
1	1	...
2	0	...
3	0	...
4	0	...

	0	1	2	3	4	5	6	7	8	9	...	275	276	277	278	279	280	281	282	283	284
0	0	277	81	70	189	206	108	235	264	80	...	216	147	60	90	159	254	261	57	32	218
1	1	221	88	165	174	175	83	208	113	103	...	230	33	213	172	19	79	162	150	125	241
2	2	128	172	36	190	75	182	116	258	140	...	218	39	263	248	57	68	179	261	17	32
3	3	255	267	25	276	47	84	104	266	59	...	213	11	90	20	238	79	92	162	150	125
4	4	281	157	158	115	93	106	78	103	262	...	253	10	19	162	22	241	39	125	20	150

	user	adam green	...
0	1	0	...
1	33	1	...
2	42	0	...
3	51	0	...
4	62	0	...

	adam green	...
0	0	...
1	1	...
2	0	...
3	0	...
4	0	...

	0	1	2	3	4	5	6	7	8	9	...	275	276	277	278	279	280	281	282	283	284
0	0	277	81	70	189	206	108	235	264	80	...	216	147	60	90	159	254	261	57	32	218
1	1	221	88	165	174	175	83	208	113	103	...	230	33	213	172	19	79	162	150	125	241
2	2	128	172	36	190	75	182	116	258	140	...	218	39	263	248	57	68	179	261	17	32
3	3	255	267	25	276	47	84	104	266	59	...	213	11	90	20	238	79	92	162	150	125
4	4	281	157	158	115	93	106	78	103	262	...	253	10	19	162	22	241	39	125	20	150

Last.fm¶

Author : Kartik Jagdale (https://github.com/kartikjagdale)¶

Conclude¶

`Conclude`¶

	user	adam green	...
0	1	0	...
1	33	1	...
2	42	0	...
3	51	0	...
4	62	0	...

	adam green	...
0	0	...
1	1	...
2	0	...
3	0	...
4	0	...

	0	1	2	3	4	5	6	7	8	9	...	275	276	277	278	279	280	281	282	283	284
0	0	277	81	70	189	206	108	235	264	80	...	216	147	60	90	159	254	261	57	32	218
1	1	221	88	165	174	175	83	208	113	103	...	230	33	213	172	19	79	162	150	125	241
2	2	128	172	36	190	75	182	116	258	140	...	218	39	263	248	57	68	179	261	17	32
3	3	255	267	25	276	47	84	104	266	59	...	213	11	90	20	238	79	92	162	150	125
4	4	281	157	158	115	93	106	78	103	262	...	253	10	19	162	22	241	39	125	20	150