This IPython notebook consists in combining the Top-N recommended items from different recommender methodologies (here one list each coming from collaborative filtering, content-based, and most-popular) for a given user using interleaved ranking, in order to obtain a final recommended list.
A simple approach to combine recommendations from different sources is to add or multiply the score that each item for a given user gets under each algorithm, but this might not end up changing the recommendations too much if the scores are dissimilar or if they come in the form of a ranking. Interleaved ranking – originally an algorithm for mixing search engine results – offers a method to force the final recommended list to be more “mixed” by making them contain elements from each list.
There are different algorithms for making an interleaved ranked list – here I’ll use the simplest algorithm, also known as the soccer team selection, which intuitively is as follows: each recommended list gets to contribute items to the final list in a sequence, by trying to add their top-ranked item, but ignoring items that got already put in the final list by another recommended list.
Here I’ll produce three different recommended lists of 20 items each using the MovieLens 1M dataset for the user numbered $100$ (userId = 100) as follows:
2. Producing a Most-Popular recommended list
3. Producing a Collaborative Filtering recommended list
4. Producing a Content-Based recommended list
5. Examining the recommendations
6. Combining recommended lists
Initiallizing spark locally (will be used for most computations) and loading the necessary libraries
import numpy as np, pandas as pd, re, findspark
from collections import defaultdict
from sklearn.decomposition import PCA
from scipy.sparse import csc_matrix
findspark.init("/home/david/Downloads/spark-2.1.1-bin-hadoop2.7/")
import pyspark
sc = pyspark.SparkContext()
from pyspark.sql import SQLContext
sqlContext = SQLContext(sc)
from pyspark.mllib.regression import (LabeledPoint, RidgeRegressionWithSGD)
from pyspark.ml.regression import LinearRegression
from pyspark.ml.recommendation import ALS
Loading the MovieLens-1M ratings:
ratings=pd.read_table("/home/david/movielens/ml-1m/ml-1m/ratings.dat", sep="::", names=["userId","movieId","Rating","Timestamp"], engine='python')
ratings.head()
userId | movieId | Rating | Timestamp | |
---|---|---|---|---|
0 | 1 | 1193 | 5 | 978300760 |
1 | 1 | 661 | 3 | 978302109 |
2 | 1 | 914 | 3 | 978301968 |
3 | 1 | 3408 | 4 | 978300275 |
4 | 1 | 2355 | 5 | 978824291 |
Loading the movie titles encoding - will be used later to examine recommended lists:
movie_titles=pd.read_csv('/home/david/movielens/ml-1m/ml-1m/movies.dat', sep="::", names=['movieId','MovieTitle','genres'],engine='python')
movie_titles={i.movieId:i.MovieTitle for i in movie_titles.itertuples()}
user=100
movies_watched_by_user=set(list(ratings.movieId.loc[ratings.userId==user]))
avg_ratings=ratings.groupby('movieId')['Rating'].mean().to_frame().rename(columns={'Rating':'AvgRating'})
num_ratings=ratings.groupby('movieId')['Rating'].agg(lambda x: len(tuple(x))).to_frame().rename(columns={'Rating':'NumRatings'})
pop_rec=num_ratings.join(avg_ratings)
pop_rec.loc[~pop_rec.index.isin(movies_watched_by_user)]
pop_rec['score']=pop_rec.NumRatings*pop_rec.AvgRating
pop_rec=pop_rec.sort_values('score',ascending=False)
pop20=list(pop_rec.index[:20])
pop_rec['Title']=pop_rec.index.map(lambda x: movie_titles[x])
pop_rec.head()
NumRatings | AvgRating | score | Title | |
---|---|---|---|---|
movieId | ||||
2858 | 3428 | 4.317386 | 14800.0 | American Beauty (1999) |
260 | 2991 | 4.453694 | 13321.0 | Star Wars: Episode IV - A New Hope (1977) |
1196 | 2990 | 4.292977 | 12836.0 | Star Wars: Episode V - The Empire Strikes Back... |
1210 | 2883 | 4.022893 | 11598.0 | Star Wars: Episode VI - Return of the Jedi (1983) |
2028 | 2653 | 4.337354 | 11507.0 | Saving Private Ryan (1998) |
Here I'm using ALS from PySpark to factorize the ratings matrix:
ratings_df=sqlContext.createDataFrame(ratings)
cfmodel=ALS(rank=50, regParam=0.5, userCol="userId", itemCol="movieId", ratingCol="Rating").fit(ratings_df)
movies_available=set(list(ratings.movieId))
movies_available=movies_available.difference(movies_watched_by_user)
preds=pd.DataFrame([(user,m) for m in movies_available],columns=['userId','movieId'])
preds_df=sqlContext.createDataFrame(preds)
preds_scores=cfmodel.transform(preds_df).collect()
preds_scores=pd.DataFrame(preds_scores, columns=['userId','movieId','score_cf'])
preds_scores=preds_scores.sort_values('score_cf',ascending=False)
cf20=list(preds_scores.movieId.iloc[:20])
preds_scores['Title']=preds_scores.movieId.map(lambda x: movie_titles[x])
preds_scores.head()
userId | movieId | score_cf | Title | |
---|---|---|---|---|
1405 | 100 | 3382 | 4.950840 | Song of Freedom (1936) |
3333 | 100 | 557 | 3.812159 | Mamma Roma (1962) |
1244 | 100 | 989 | 3.618343 | Schlafes Bruder (Brother of Sleep) (1995) |
512 | 100 | 578 | 3.510315 | Hour of the Pig, The (1993) |
2633 | 100 | 3233 | 3.498407 | Smashing Time (1967) |
The overall idea is to get user demographic info including their geographical region, which I get from their zip codes by using some free zip code databases, and movie information by taking the movie tags from the latest movielens releases, matching them by title to the movielens-1m ratings and adding the movie genres and release year as a discretized category.
Then, a regression is performed on the centered rating against the outer product of the user and movie features - a more detailed and explained version can be found here.
movies=pd.read_csv('/home/david/movielens/ml-latest/ml-latest/movies.csv')
movies_humanreadable=movies.copy()
movies['hasYear']=movies.title.map(lambda x: bool(re.search("\s\((\d{4})\)$",x.strip())))
movies['Year']='unknown'
movies['Year'].loc[movies.hasYear]=movies.title.loc[movies.hasYear].map(lambda x: re.search("\s\((\d{4})\)$",x.strip()).group(1))
del movies['hasYear']
movies['genres']=movies.genres.map(lambda x: set(x.split('|')))
present_genres=set()
for movie in movies.itertuples():
present_genres=present_genres.union(movie.genres)
for genre in present_genres:
movies['genre'+genre]=movies.genres.map(lambda x: 1.0*(genre in x))
tags=pd.read_csv('/home/david/movielens/ml-latest/ml-latest/genome-scores.csv')
tags_wide=tags.pivot(index='movieId', columns='tagId', values='relevance')
tags_wide=tags_wide.fillna(0)
pca=PCA(svd_solver='full')
pca.fit(tags_wide)
tags_pca=pd.DataFrame(pca.transform(tags_wide)[:,:50])
tags_pca.columns=["pc"+str(x) for x in tags_pca.columns.values]
tags_pca['movieId']=tags_wide.index
movies=pd.merge(movies,tags_pca,how='inner',on='movieId')
def discretize_year(x):
if x=='unknown':
return x
else:
x=int(x)
if x>=2000:
return '>=2000'
if x>=1995 and x<=1999:
return str(x)
if x>=1990 and x<=1994:
return 'low90s'
if x>=1980 and x<=1989:
return '80s'
if x>=1970 and x<=1979:
return '70s'
if x>=1960 and x<=1969:
return '60s'
if x>=1950 and x<=1959:
return '50s'
if x>=1940 and x<=1959:
return '40s'
if x<1940:
return '<1940'
else:
return 'unknown'
movies_features=movies.copy()
del movies_features['title']
del movies_features['genres']
del movies_features['genre(no genres listed)']
movies_features['Year']=movies_features.Year.map(lambda x: discretize_year(x))
movies_features=pd.get_dummies(movies_features, columns=['Year'])
movies_features.set_index('movieId',inplace=True)
zipcode_abbs=pd.read_csv("/home/david/movielens/zips/states.csv")
zipcode_abbs_dct={z.State:z.Abbreviation for z in zipcode_abbs.itertuples()}
us_regs_table=[
('New England', 'Connecticut, Maine, Massachusetts, New Hampshire, Rhode Island, Vermont'),
('Middle Atlantic', 'Delaware, Maryland, New Jersey, New York, Pennsylvania'),
('South', 'Alabama, Arkansas, Florida, Georgia, Kentucky, Louisiana, Mississippi, Missouri, North Carolina, South Carolina, Tennessee, Virginia, West Virginia'),
('Midwest', 'Illinois, Indiana, Iowa, Kansas, Michigan, Minnesota, Nebraska, North Dakota, Ohio, South Dakota, Wisconsin'),
('Southwest', 'Arizona, New Mexico, Oklahoma, Texas'),
('West', 'Alaska, California, Colorado, Hawaii, Idaho, Montana, Nevada, Oregon, Utah, Washington, Wyoming')
]
us_regs_table=[(x[0],[i.strip() for i in x[1].split(",")]) for x in us_regs_table]
us_regs_dct=dict()
for r in us_regs_table:
for s in r[1]:
us_regs_dct[zipcode_abbs_dct[s]]=r[0]
zipcode_info=pd.read_csv("/home/david/movielens/free-zipcode-database.csv")
zipcode_info=zipcode_info.groupby('Zipcode').first().reset_index()
zipcode_info['State'].loc[zipcode_info.Country!="US"]='UnknownOrNonUS'
zipcode_info['Region']=zipcode_info['State'].copy()
zipcode_info['Region'].loc[zipcode_info.Country=="US"]=zipcode_info.Region.loc[zipcode_info.Country=="US"].map(lambda x: us_regs_dct[x] if x in us_regs_dct else 'UsOther')
zipcode_info=zipcode_info[['Zipcode', 'Region']]
users=pd.read_table("/home/david/movielens/ml-1m/ml-1m/users.dat",sep='::',names=["userId","Gender","Age","Occupation","Zipcode"], engine='python')
users["Zipcode"]=users.Zipcode.map(lambda x: np.int(re.sub("-.*","",x)))
users=pd.merge(users,zipcode_info,on='Zipcode',how='left')
users['Region']=users.Region.fillna('UnknownOrNonUS')
users_features=users.copy()
users_features['Gender']=users_features.Gender.map(lambda x: 1.0*(x=='M'))
del users_features['Zipcode']
users_features['Age']=users_features.Age.map(lambda x: str(x))
users_features['Occupation']=users_features.Occupation.map(lambda x: str(x))
users_features=pd.get_dummies(users_features, columns=['Age', 'Occupation', 'Region'])
users_features.set_index('userId',inplace=True)
movies_w_sideinfo=set(list(movies.movieId))
ratings=ratings.loc[ratings.movieId.map(lambda x: x in movies_w_sideinfo)]
avg_rating_by_user=ratings.groupby('userId')['Rating'].mean().to_frame().rename(columns={'Rating':'AvgRating'})
ratings_train=pd.merge(ratings, avg_rating_by_user, left_on='userId',right_index=True)
ratings_train['RatingCentered']=ratings_train.Rating-ratings_train.AvgRating
def generate_features(user,movie,users_features_bc,movies_features_bc):
user_feats=users_features_bc.value.loc[user].as_matrix()
movie_feats=movies_features_bc.value.loc[movie].as_matrix()
return csc_matrix(np.kron(user_feats,movie_feats).reshape(-1,1))
users_features_bc=sc.broadcast(users_features)
movies_features_bc=sc.broadcast(movies_features)
trainset=sc.parallelize([(i.userId,i.movieId,i.RatingCentered) for i in ratings_train.itertuples()])\
.map(lambda x: LabeledPoint(x[2],generate_features(x[0],x[1],users_features_bc,movies_features_bc)))\
.map(lambda x: (float(x.label),x.features.asML())).toDF(['label','features'])
trainset.repartition(50)
recommender=LinearRegression(regParam=1e-4).fit(trainset)
formula_coeffs=recommender.coefficients.toArray()
def generate_features_series(user,movie):
user_feats=users_features.loc[user].as_matrix()
movie_feats=movies_features.loc[movie].as_matrix()
return pd.Series(np.kron(user_feats,movie_feats).astype('float64'))
preds_scores=preds_scores.loc[preds_scores.movieId.map(lambda x: x in movies_w_sideinfo)]
X_predict=preds_scores.movieId.apply(lambda x: generate_features_series(user,x))
preds_scores['score_cb']=X_predict.dot(formula_coeffs)
preds_scores=preds_scores.sort_values('score_cb',ascending=False)
cb20=list(preds_scores.movieId.iloc[:20])
preds_scores.head()
/home/david/anaconda2/lib/python2.7/site-packages/pandas/core/indexing.py:179: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy self._setitem_with_indexer(indexer, value) /home/david/anaconda2/lib/python2.7/site-packages/IPython/core/interactiveshell.py:2717: DtypeWarning: Columns (11) have mixed types. Specify dtype option on import or set low_memory=False. interactivity=interactivity, compiler=compiler, result=result)
userId | movieId | score_cf | Title | score_cb | |
---|---|---|---|---|---|
3191 | 100 | 1262 | 3.136641 | Great Escape, The (1963) | 1.030581 |
835 | 100 | 3030 | 3.163622 | Yojimbo (1961) | 1.015274 |
398 | 100 | 908 | 3.137593 | North by Northwest (1959) | 1.012968 |
354 | 100 | 3435 | 3.167782 | Double Indemnity (1944) | 1.003191 |
473 | 100 | 3196 | 3.033045 | Stalag 17 (1953) | 0.998952 |
Now taking a look at what these lists are actually recommend each - their recommendations are very different with little intersection, and as expected, collaborative filtering tends to favor less popular items for this user. First Most-Popular recommended list:
def print_reclist(reclist):
list_w_info=[str(m+1)+") - "+movie_titles[reclist[m]]+\
" - Average Rating: "+str(np.round(avg_ratings.loc[reclist[m]].iloc[0],2))+\
" - Number of ratings: "+str(num_ratings.loc[reclist[m]].iloc[0]) for m in range(len(reclist))]
print "\n".join(list_w_info)
print_reclist(pop20)
1) - American Beauty (1999) - Average Rating: 4.32 - Number of ratings: 3428 2) - Star Wars: Episode IV - A New Hope (1977) - Average Rating: 4.45 - Number of ratings: 2991 3) - Star Wars: Episode V - The Empire Strikes Back (1980) - Average Rating: 4.29 - Number of ratings: 2990 4) - Star Wars: Episode VI - Return of the Jedi (1983) - Average Rating: 4.02 - Number of ratings: 2883 5) - Saving Private Ryan (1998) - Average Rating: 4.34 - Number of ratings: 2653 6) - Raiders of the Lost Ark (1981) - Average Rating: 4.48 - Number of ratings: 2514 7) - Silence of the Lambs, The (1991) - Average Rating: 4.35 - Number of ratings: 2578 8) - Matrix, The (1999) - Average Rating: 4.32 - Number of ratings: 2590 9) - Sixth Sense, The (1999) - Average Rating: 4.41 - Number of ratings: 2459 10) - Terminator 2: Judgment Day (1991) - Average Rating: 4.06 - Number of ratings: 2649 11) - Fargo (1996) - Average Rating: 4.25 - Number of ratings: 2513 12) - Schindler's List (1993) - Average Rating: 4.51 - Number of ratings: 2304 13) - Braveheart (1995) - Average Rating: 4.23 - Number of ratings: 2443 14) - Back to the Future (1985) - Average Rating: 3.99 - Number of ratings: 2583 15) - Shawshank Redemption, The (1994) - Average Rating: 4.55 - Number of ratings: 2227 16) - Godfather, The (1972) - Average Rating: 4.52 - Number of ratings: 2223 17) - Jurassic Park (1993) - Average Rating: 3.76 - Number of ratings: 2672 18) - Princess Bride, The (1987) - Average Rating: 4.3 - Number of ratings: 2318 19) - Shakespeare in Love (1998) - Average Rating: 4.13 - Number of ratings: 2369 20) - L.A. Confidential (1997) - Average Rating: 4.22 - Number of ratings: 2288
Collaborative filtering recommended list:
print_reclist(cf20)
1) - Song of Freedom (1936) - Average Rating: 5.0 - Number of ratings: 1 2) - Mamma Roma (1962) - Average Rating: 4.5 - Number of ratings: 2 3) - Schlafes Bruder (Brother of Sleep) (1995) - Average Rating: 5.0 - Number of ratings: 1 4) - Hour of the Pig, The (1993) - Average Rating: 4.5 - Number of ratings: 2 5) - Smashing Time (1967) - Average Rating: 5.0 - Number of ratings: 2 6) - Gate of Heavenly Peace, The (1995) - Average Rating: 5.0 - Number of ratings: 3 7) - Apple, The (Sib) (1998) - Average Rating: 4.67 - Number of ratings: 9 8) - Ulysses (Ulisse) (1954) - Average Rating: 5.0 - Number of ratings: 1 9) - Follow the Bitch (1998) - Average Rating: 5.0 - Number of ratings: 1 10) - I Am Cuba (Soy Cuba/Ya Kuba) (1964) - Average Rating: 4.8 - Number of ratings: 5 11) - One Little Indian (1973) - Average Rating: 5.0 - Number of ratings: 1 12) - Lamerica (1994) - Average Rating: 4.75 - Number of ratings: 8 13) - Foreign Student (1994) - Average Rating: 3.0 - Number of ratings: 2 14) - Sanjuro (1962) - Average Rating: 4.61 - Number of ratings: 69 15) - Lured (1947) - Average Rating: 5.0 - Number of ratings: 1 16) - Bells, The (1926) - Average Rating: 4.5 - Number of ratings: 2 17) - Bittersweet Motel (2000) - Average Rating: 5.0 - Number of ratings: 1 18) - Seven Samurai (The Magnificent Seven) (Shichinin no samurai) (1954) - Average Rating: 4.56 - Number of ratings: 628 19) - Jar, The (Khomreh) (1992) - Average Rating: 4.0 - Number of ratings: 1 20) - For All Mankind (1989) - Average Rating: 4.44 - Number of ratings: 27
Content-based recommended list:
print_reclist(cb20)
1) - Great Escape, The (1963) - Average Rating: 4.38 - Number of ratings: 696 2) - Yojimbo (1961) - Average Rating: 4.4 - Number of ratings: 215 3) - North by Northwest (1959) - Average Rating: 4.38 - Number of ratings: 1315 4) - Double Indemnity (1944) - Average Rating: 4.42 - Number of ratings: 551 5) - Stalag 17 (1953) - Average Rating: 4.23 - Number of ratings: 394 6) - It Happened One Night (1934) - Average Rating: 4.28 - Number of ratings: 374 7) - Seven Samurai (The Magnificent Seven) (Shichinin no samurai) (1954) - Average Rating: 4.56 - Number of ratings: 628 8) - Gladiator (2000) - Average Rating: 4.11 - Number of ratings: 1924 9) - Casablanca (1942) - Average Rating: 4.41 - Number of ratings: 1669 10) - Third Man, The (1949) - Average Rating: 4.45 - Number of ratings: 480 11) - Maltese Falcon, The (1941) - Average Rating: 4.4 - Number of ratings: 1043 12) - To Kill a Mockingbird (1962) - Average Rating: 4.43 - Number of ratings: 928 13) - Treasure of the Sierra Madre, The (1948) - Average Rating: 4.29 - Number of ratings: 453 14) - Everest (1998) - Average Rating: 4.01 - Number of ratings: 167 15) - Wrong Trousers, The (1993) - Average Rating: 4.51 - Number of ratings: 882 16) - In the Heat of the Night (1967) - Average Rating: 4.13 - Number of ratings: 348 17) - Terminator 2: Judgment Day (1991) - Average Rating: 4.06 - Number of ratings: 2649 18) - Modern Times (1936) - Average Rating: 4.24 - Number of ratings: 305 19) - City Lights (1931) - Average Rating: 4.39 - Number of ratings: 271 20) - Terminator, The (1984) - Average Rating: 4.15 - Number of ratings: 2098
Finally, combining these three lists through interleaved ranking, prioritizing them in this order: CF-CB-MP:
def interleaved_ranking(lst_of_lists,n):
final_list=list()
while len(final_list)<n:
for lst in lst_of_lists:
lst=[m for m in lst if m not in final_list]
if len(lst)==0:
continue
new=lst[0]
final_list.append(new)
if len(final_list)==n:
break
return final_list
mixed_list=interleaved_ranking([cf20,cb20,pop20],20)
print_reclist(mixed_list)
1) - Song of Freedom (1936) - Average Rating: 5.0 - Number of ratings: 1 2) - Great Escape, The (1963) - Average Rating: 4.38 - Number of ratings: 696 3) - American Beauty (1999) - Average Rating: 4.32 - Number of ratings: 3428 4) - Mamma Roma (1962) - Average Rating: 4.5 - Number of ratings: 2 5) - Yojimbo (1961) - Average Rating: 4.4 - Number of ratings: 215 6) - Star Wars: Episode IV - A New Hope (1977) - Average Rating: 4.45 - Number of ratings: 2991 7) - Schlafes Bruder (Brother of Sleep) (1995) - Average Rating: 5.0 - Number of ratings: 1 8) - North by Northwest (1959) - Average Rating: 4.38 - Number of ratings: 1315 9) - Star Wars: Episode V - The Empire Strikes Back (1980) - Average Rating: 4.29 - Number of ratings: 2990 10) - Hour of the Pig, The (1993) - Average Rating: 4.5 - Number of ratings: 2 11) - Double Indemnity (1944) - Average Rating: 4.42 - Number of ratings: 551 12) - Star Wars: Episode VI - Return of the Jedi (1983) - Average Rating: 4.02 - Number of ratings: 2883 13) - Smashing Time (1967) - Average Rating: 5.0 - Number of ratings: 2 14) - Stalag 17 (1953) - Average Rating: 4.23 - Number of ratings: 394 15) - Saving Private Ryan (1998) - Average Rating: 4.34 - Number of ratings: 2653 16) - Gate of Heavenly Peace, The (1995) - Average Rating: 5.0 - Number of ratings: 3 17) - It Happened One Night (1934) - Average Rating: 4.28 - Number of ratings: 374 18) - Raiders of the Lost Ark (1981) - Average Rating: 4.48 - Number of ratings: 2514 19) - Apple, The (Sib) (1998) - Average Rating: 4.67 - Number of ratings: 9 20) - Seven Samurai (The Magnificent Seven) (Shichinin no samurai) (1954) - Average Rating: 4.56 - Number of ratings: 628
The list seems now a lot more diverse, which is probably a good thing. Such a list can be further diversified following other heuristics as illustrated in this other IPython notebook, and the lists can be further rotated in time (e.g. starting from rank10 instead of rank1) to offer more items to the user.
When there is a large degree of intersection between the items from each list, changing the order in which they are prioritized will change not only the relative orderings in the final list, but also the items that end up appearing. This is not the case here though, as there is little intersection between the lists:
mixed_list=interleaved_ranking([pop20,cf20,cb20],20)
print_reclist(mixed_list)
1) - American Beauty (1999) - Average Rating: 4.32 - Number of ratings: 3428 2) - Song of Freedom (1936) - Average Rating: 5.0 - Number of ratings: 1 3) - Great Escape, The (1963) - Average Rating: 4.38 - Number of ratings: 696 4) - Star Wars: Episode IV - A New Hope (1977) - Average Rating: 4.45 - Number of ratings: 2991 5) - Mamma Roma (1962) - Average Rating: 4.5 - Number of ratings: 2 6) - Yojimbo (1961) - Average Rating: 4.4 - Number of ratings: 215 7) - Star Wars: Episode V - The Empire Strikes Back (1980) - Average Rating: 4.29 - Number of ratings: 2990 8) - Schlafes Bruder (Brother of Sleep) (1995) - Average Rating: 5.0 - Number of ratings: 1 9) - North by Northwest (1959) - Average Rating: 4.38 - Number of ratings: 1315 10) - Star Wars: Episode VI - Return of the Jedi (1983) - Average Rating: 4.02 - Number of ratings: 2883 11) - Hour of the Pig, The (1993) - Average Rating: 4.5 - Number of ratings: 2 12) - Double Indemnity (1944) - Average Rating: 4.42 - Number of ratings: 551 13) - Saving Private Ryan (1998) - Average Rating: 4.34 - Number of ratings: 2653 14) - Smashing Time (1967) - Average Rating: 5.0 - Number of ratings: 2 15) - Stalag 17 (1953) - Average Rating: 4.23 - Number of ratings: 394 16) - Raiders of the Lost Ark (1981) - Average Rating: 4.48 - Number of ratings: 2514 17) - Gate of Heavenly Peace, The (1995) - Average Rating: 5.0 - Number of ratings: 3 18) - It Happened One Night (1934) - Average Rating: 4.28 - Number of ratings: 374 19) - Silence of the Lambs, The (1991) - Average Rating: 4.35 - Number of ratings: 2578 20) - Apple, The (Sib) (1998) - Average Rating: 4.67 - Number of ratings: 9
Interleaved ranking can also be used as a heuristic to movie away from most-popular recommendations, by following the same algorithm as before but removing the items that came from most-popular (and this can be expanded by letting most-popular choose more than one item at once and other heuristics) - here is a simple version to force the list to contain less popular items (in this particular case it's the same as just exluding the most-popular list as there is pretty much no intersection):
def interleaved_ranking_decreased_popularity(most_popular,lst_of_lists,n):
final_list=list()
excl_list=set()
while len(final_list)<n:
most_popular=[m for m in most_popular if m not in excl_list]
excl=most_popular[0]
excl_list.add(excl)
for lst in lst_of_lists:
lst=[m for m in lst if m not in excl_list]
if len(lst)==0:
continue
new=lst[0]
final_list.append(new)
excl_list.add(new)
if len(final_list)==n:
break
return final_list
mixed_list_dec_pop=interleaved_ranking_decreased_popularity(pop20,[cf20,cb20],10)
print_reclist(mixed_list_dec_pop)
1) - Song of Freedom (1936) - Average Rating: 5.0 - Number of ratings: 1 2) - Great Escape, The (1963) - Average Rating: 4.38 - Number of ratings: 696 3) - Mamma Roma (1962) - Average Rating: 4.5 - Number of ratings: 2 4) - Yojimbo (1961) - Average Rating: 4.4 - Number of ratings: 215 5) - Schlafes Bruder (Brother of Sleep) (1995) - Average Rating: 5.0 - Number of ratings: 1 6) - North by Northwest (1959) - Average Rating: 4.38 - Number of ratings: 1315 7) - Hour of the Pig, The (1993) - Average Rating: 4.5 - Number of ratings: 2 8) - Double Indemnity (1944) - Average Rating: 4.42 - Number of ratings: 551 9) - Smashing Time (1967) - Average Rating: 5.0 - Number of ratings: 2 10) - Stalag 17 (1953) - Average Rating: 4.23 - Number of ratings: 394