Last.fm is a music discovery service that gives you personalised recommendations based on the music you listento.
Here we are going to do some machine learning and data anlysis on the dataset of last.fm inorder to recommend the next songs to the user.
We are going to use NearestNeighbors Algorithm
to predict next songs that user will like to hear
Note: Dataset retrieved Last.fm [LastFM_Matrix.csv] contaning 1257 records and 285 Songs
# First Import some essential Libraries
import os
import pandas as pd
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity # For calculating similarity matrix
from sklearn.neighbors import NearestNeighbors
DIR_PATH = os.getcwd() #Get currect directory
lfm = pd.read_csv(DIR_PATH + "//LastFM_Matrix.csv") #Load dataset
lfm.head() #Display Head of the dataset
user | a perfect circle | abba | ac/dc | adam green | aerosmith | afi | air | alanis morissette | alexisonfire | ... | timbaland | tom waits | tool | tori amos | travis | trivium | u2 | underoath | volbeat | yann tiersen | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
1 | 33 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
2 | 42 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
3 | 51 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
4 | 62 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
5 rows × 286 columns
lets get all/some names of songs and user coloumn in the dataset
songs = pd.DataFrame(lfm.columns)
songs.head(10)
0 | |
---|---|
0 | user |
1 | a perfect circle |
2 | abba |
3 | ac/dc |
4 | adam green |
5 | aerosmith |
6 | afi |
7 | air |
8 | alanis morissette |
9 | alexisonfire |
Now let's import only songs and make a new DataFrame
lfm_songs = lfm.drop("user",axis =1) #drop user column
lfm_songs.head() # Show Head
a perfect circle | abba | ac/dc | adam green | aerosmith | afi | air | alanis morissette | alexisonfire | alicia keys | ... | timbaland | tom waits | tool | tori amos | travis | trivium | u2 | underoath | volbeat | yann tiersen | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
3 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
4 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
5 rows × 285 columns
lfm_songs.shape #gives out total rows and columns
(1257, 285)
Calculate cosine_similarity
in order to get Similarity Matrix
data_similarity = cosine_similarity(lfm_songs.T) #
data_similarity
array([[ 1. , 0. , 0.01791723, ..., 0.06506 , 0.05216405, 0. ], [ 0. , 1. , 0.05227877, ..., 0. , 0.02536731, 0. ], [ 0.01791723, 0.05227877, 1. , ..., 0.02039967, 0.13084898, 0. ], ..., [ 0.06506 , 0. , 0.02039967, ..., 1. , 0. , 0. ], [ 0.05216405, 0.02536731, 0.13084898, ..., 0. , 1. , 0.02969569], [ 0. , 0. , 0. , ..., 0. , 0.02969569, 1. ]])
Now we have obtained data similarity matrix now lets use K-nearest neighbour algo and predict the recommendations but first we will label the matrix
type(data_similarity)
numpy.ndarray
Lets convert it ito DataFrame
data_similarity_df = pd.DataFrame(data_similarity, columns=(lfm_songs.columns), index=(lfm_songs.columns))
data_similarity_df.head()# similarity Matrix
a perfect circle | abba | ac/dc | adam green | aerosmith | afi | air | alanis morissette | alexisonfire | alicia keys | ... | timbaland | tom waits | tool | tori amos | travis | trivium | u2 | underoath | volbeat | yann tiersen | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
a perfect circle | 1.000000 | 0.000000 | 0.017917 | 0.051554 | 0.062776 | 0.000000 | 0.051755 | 0.060718 | 0 | 0.000000 | ... | 0.047338 | 0.081200 | 0.394709 | 0.125553 | 0.030359 | 0.111154 | 0.024398 | 0.06506 | 0.052164 | 0.000000 |
abba | 0.000000 | 1.000000 | 0.052279 | 0.025071 | 0.061056 | 0.000000 | 0.016779 | 0.029527 | 0 | 0.000000 | ... | 0.000000 | 0.000000 | 0.000000 | 0.061056 | 0.029527 | 0.000000 | 0.094916 | 0.00000 | 0.025367 | 0.000000 |
ac/dc | 0.017917 | 0.052279 | 1.000000 | 0.113154 | 0.177153 | 0.067894 | 0.075730 | 0.038076 | 0 | 0.088333 | ... | 0.044529 | 0.067894 | 0.058241 | 0.039367 | 0.000000 | 0.087131 | 0.122398 | 0.02040 | 0.130849 | 0.000000 |
adam green | 0.051554 | 0.025071 | 0.113154 | 1.000000 | 0.056637 | 0.000000 | 0.093386 | 0.000000 | 0 | 0.025416 | ... | 0.000000 | 0.146516 | 0.083789 | 0.056637 | 0.082169 | 0.025071 | 0.022011 | 0.00000 | 0.023531 | 0.088045 |
aerosmith | 0.062776 | 0.061056 | 0.177153 | 0.056637 | 1.000000 | 0.000000 | 0.113715 | 0.100056 | 0 | 0.061898 | ... | 0.052005 | 0.029735 | 0.025507 | 0.068966 | 0.033352 | 0.000000 | 0.214423 | 0.00000 | 0.057307 | 0.000000 |
5 rows × 285 columns
data_similarity_df.index.is_unique # check if there is no repeated songs
True
Now we will use NearestNeighbors Algorithm
and apply to similarity matrix to get the recommendation
neigh = NearestNeighbors(n_neighbors=285)
neigh.fit(data_similarity_df) # Fit the data
NearestNeighbors(algorithm='auto', leaf_size=30, metric='minkowski', metric_params=None, n_neighbors=285, p=2, radius=1.0)
#Copy the predicted data to a new DataFrame
model = pd.DataFrame(neigh.kneighbors(data_similarity_df, return_distance=False))
model.head() #gives you integer values instead of song names
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | ... | 275 | 276 | 277 | 278 | 279 | 280 | 281 | 282 | 283 | 284 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 277 | 81 | 70 | 189 | 206 | 108 | 235 | 264 | 80 | ... | 216 | 147 | 60 | 90 | 159 | 254 | 261 | 57 | 32 | 218 |
1 | 1 | 221 | 88 | 165 | 174 | 175 | 83 | 208 | 113 | 103 | ... | 230 | 33 | 213 | 172 | 19 | 79 | 162 | 150 | 125 | 241 |
2 | 2 | 128 | 172 | 36 | 190 | 75 | 182 | 116 | 258 | 140 | ... | 218 | 39 | 263 | 248 | 57 | 68 | 179 | 261 | 17 | 32 |
3 | 3 | 255 | 267 | 25 | 276 | 47 | 84 | 104 | 266 | 59 | ... | 213 | 11 | 90 | 20 | 238 | 79 | 92 | 162 | 150 | 125 |
4 | 4 | 281 | 157 | 158 | 115 | 93 | 106 | 78 | 103 | 262 | ... | 253 | 10 | 19 | 162 | 22 | 241 | 39 | 125 | 20 | 150 |
5 rows × 285 columns
final_model = pd.DataFrame(data_similarity_df.columns[model], index=data_similarity_df.index)#gives names with respect to songs
final_model.head() #preview final Model
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | ... | 275 | 276 | 277 | 278 | 279 | 280 | 281 | 282 | 283 | 284 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
a perfect circle | a perfect circle | tool | dredg | deftones | nine inch nails | porcupine tree | godsmack | staind | the smashing pumpkins | dream theater | ... | red hot chili peppers | katy perry | coldplay | ensiferum | leona lewis | the kooks | the pussycat dolls | christina aguilera | beyonce | rihanna |
abba | abba | robbie williams | elvis presley | madonna | michael jackson | mika | duffy | queen | groove coverage | frank sinatra | ... | slipknot | billy talent | rammstein | metallica | arctic monkeys | disturbed | linkin park | killswitch engage | in flames | system of a down |
ac/dc | ac/dc | iron maiden | metallica | black sabbath | nirvana | die toten hosen | motorhead | hammerfall | the offspring | judas priest | ... | rihanna | bloc party | the shins | the decemberists | christina aguilera | death cab for cutie | modest mouse | the pussycat dolls | arcade fire | beyonce |
adam green | adam green | the libertines | the strokes | babyshambles | tom waits | bright eyes | editors | franz ferdinand | the streets | cocorosie | ... | rammstein | amon amarth | ensiferum | as i lay dying | subway to sally | disturbed | equilibrium | linkin park | killswitch engage | in flames |
aerosmith | aerosmith | u2 | led zeppelin | lenny kravitz | guns n roses | eric clapton | genesis | dire straits | frank sinatra | the rolling stones | ... | the killers | all that remains | arctic monkeys | linkin park | atreyu | system of a down | bloc party | in flames | as i lay dying | killswitch engage |
5 rows × 285 columns
The above model gives us all 285 Recommendation, but we want only Top 10 recommendation, so lets modify the DataFrame a bit
top10 = final_model[list(final_model.columns[:11])]
top10.head()
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | |
---|---|---|---|---|---|---|---|---|---|---|---|
a perfect circle | a perfect circle | tool | dredg | deftones | nine inch nails | porcupine tree | godsmack | staind | the smashing pumpkins | dream theater | opeth |
abba | abba | robbie williams | elvis presley | madonna | michael jackson | mika | duffy | queen | groove coverage | frank sinatra | hans zimmer |
ac/dc | ac/dc | iron maiden | metallica | black sabbath | nirvana | die toten hosen | motorhead | hammerfall | the offspring | judas priest | bloodhound gang |
adam green | adam green | the libertines | the strokes | babyshambles | tom waits | bright eyes | editors | franz ferdinand | the streets | cocorosie | queens of the stone age |
aerosmith | aerosmith | u2 | led zeppelin | lenny kravitz | guns n roses | eric clapton | genesis | dire straits | frank sinatra | the rolling stones | deep purple |
Now lets put our results in CSV
File called top10
top10.to_csv("top10.csv",index_label = "Index") # store data in csv file
Now lets read the CSV
File to check if its saved or not
pd.read_csv("top10").head()
Index | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | a perfect circle | a perfect circle | tool | dredg | deftones | nine inch nails | porcupine tree | godsmack | staind | the smashing pumpkins | dream theater | opeth |
1 | abba | abba | robbie williams | elvis presley | madonna | michael jackson | mika | duffy | queen | groove coverage | frank sinatra | hans zimmer |
2 | ac/dc | ac/dc | iron maiden | metallica | black sabbath | nirvana | die toten hosen | motorhead | hammerfall | the offspring | judas priest | bloodhound gang |
3 | adam green | adam green | the libertines | the strokes | babyshambles | tom waits | bright eyes | editors | franz ferdinand | the streets | cocorosie | queens of the stone age |
4 | aerosmith | aerosmith | u2 | led zeppelin | lenny kravitz | guns n roses | eric clapton | genesis | dire straits | frank sinatra | the rolling stones | deep purple |
Conclude
¶To conclude we have created a model which recommends next song user will like to hear by using last.fm data.
Further we can now use this model to make an API
and use it in our Website or WebApp to recommend songs to the user.
** Github Link **: https://github.com/kartikjagdale/Last.fm-Song-Recommender