Notebook

Combining recommended lists¶

This IPython notebook consists in combining the Top-N recommended items from different recommender methodologies (here one list each coming from collaborative filtering, content-based, and most-popular) for a given user using interleaved ranking, in order to obtain a final recommended list.

A simple approach to combine recommendations from different sources is to add or multiply the score that each item for a given user gets under each algorithm, but this might not end up changing the recommendations too much if the scores are dissimilar or if they come in the form of a ranking. Interleaved ranking – originally an algorithm for mixing search engine results – offers a method to force the final recommended list to be more “mixed” by making them contain elements from each list.

There are different algorithms for making an interleaved ranked list – here I’ll use the simplest algorithm, also known as the soccer team selection, which intuitively is as follows: each recommended list gets to contribute items to the final list in a sequence, by trying to add their top-ranked item, but ignoring items that got already put in the final list by another recommended list.

Here I’ll produce three different recommended lists of 20 items each using the MovieLens 1M dataset for the user numbered $100$ (userId = 100) as follows:

Most-popular: each item’s score is the sum of the ratings they get from all users, thus favoring both highly rated and highly voted movies. This is a non-personalized list (i.e. it’s the same for all users).
Collaborative filtering: a low-rank matrix factorization of the ratings matrix using alternating least squares.
Content-based: regression of the (centered) ratings against the outer product of user and movie features – this is a more involved process and the details can be found in this other IPython notebook.

Sections¶

1. Loading the data

2. Producing a Most-Popular recommended list

3. Producing a Collaborative Filtering recommended list

4. Producing a Content-Based recommended list

5. Examining the recommendations

6. Combining recommended lists

1. Loading the data¶

Initiallizing spark locally (will be used for most computations) and loading the necessary libraries

In [1]:

import numpy as np, pandas as pd, re, findspark
from collections import defaultdict
from sklearn.decomposition import PCA
from scipy.sparse import csc_matrix

findspark.init("/home/david/Downloads/spark-2.1.1-bin-hadoop2.7/")

import pyspark
sc = pyspark.SparkContext()
from pyspark.sql import SQLContext
sqlContext = SQLContext(sc)

from pyspark.mllib.regression import (LabeledPoint, RidgeRegressionWithSGD)
from pyspark.ml.regression import LinearRegression
from pyspark.ml.recommendation import ALS

Loading the MovieLens-1M ratings:

In [2]:

ratings=pd.read_table("/home/david/movielens/ml-1m/ml-1m/ratings.dat", sep="::", names=["userId","movieId","Rating","Timestamp"], engine='python')
ratings.head()

Out[2]:

	userId	movieId	Rating	Timestamp
0	1	1193	5	978300760
1	1	661	3	978302109
2	1	914	3	978301968
3	1	3408	4	978300275
4	1	2355	5	978824291

Loading the movie titles encoding - will be used later to examine recommended lists:

In [3]:

movie_titles=pd.read_csv('/home/david/movielens/ml-1m/ml-1m/movies.dat', sep="::", names=['movieId','MovieTitle','genres'],engine='python')
movie_titles={i.movieId:i.MovieTitle for i in movie_titles.itertuples()}

2. Producing a Most-Popular recommended list¶

Items are ranked by sum of their ratings:

In [4]:

user=100
movies_watched_by_user=set(list(ratings.movieId.loc[ratings.userId==user]))

avg_ratings=ratings.groupby('movieId')['Rating'].mean().to_frame().rename(columns={'Rating':'AvgRating'})
num_ratings=ratings.groupby('movieId')['Rating'].agg(lambda x: len(tuple(x))).to_frame().rename(columns={'Rating':'NumRatings'})
pop_rec=num_ratings.join(avg_ratings)
pop_rec.loc[~pop_rec.index.isin(movies_watched_by_user)]
pop_rec['score']=pop_rec.NumRatings*pop_rec.AvgRating
pop_rec=pop_rec.sort_values('score',ascending=False)
pop20=list(pop_rec.index[:20])
pop_rec['Title']=pop_rec.index.map(lambda x: movie_titles[x])
pop_rec.head()

Out[4]:

	NumRatings	AvgRating	score	Title
movieId
2858	3428	4.317386	14800.0	American Beauty (1999)
260	2991	4.453694	13321.0	Star Wars: Episode IV - A New Hope (1977)
1196	2990	4.292977	12836.0	Star Wars: Episode V - The Empire Strikes Back...
1210	2883	4.022893	11598.0	Star Wars: Episode VI - Return of the Jedi (1983)
2028	2653	4.337354	11507.0	Saving Private Ryan (1998)

3. Producing a Collaborative Filtering recommended list¶

Here I'm using ALS from PySpark to factorize the ratings matrix:

In [5]:

ratings_df=sqlContext.createDataFrame(ratings)

cfmodel=ALS(rank=50, regParam=0.5, userCol="userId", itemCol="movieId", ratingCol="Rating").fit(ratings_df)
movies_available=set(list(ratings.movieId))
movies_available=movies_available.difference(movies_watched_by_user)
preds=pd.DataFrame([(user,m) for m in movies_available],columns=['userId','movieId'])
preds_df=sqlContext.createDataFrame(preds)
preds_scores=cfmodel.transform(preds_df).collect()
preds_scores=pd.DataFrame(preds_scores, columns=['userId','movieId','score_cf'])
preds_scores=preds_scores.sort_values('score_cf',ascending=False)
cf20=list(preds_scores.movieId.iloc[:20])
preds_scores['Title']=preds_scores.movieId.map(lambda x: movie_titles[x])
preds_scores.head()

Out[5]:

	userId	movieId	score_cf	Title
1405	100	3382	4.950840	Song of Freedom (1936)
3333	100	557	3.812159	Mamma Roma (1962)
1244	100	989	3.618343	Schlafes Bruder (Brother of Sleep) (1995)
512	100	578	3.510315	Hour of the Pig, The (1993)
2633	100	3233	3.498407	Smashing Time (1967)

4. Producing a Content-Based recommended list¶

The overall idea is to get user demographic info including their geographical region, which I get from their zip codes by using some free zip code databases, and movie information by taking the movie tags from the latest movielens releases, matching them by title to the movielens-1m ratings and adding the movie genres and release year as a discretized category.

Then, a regression is performed on the centered rating against the outer product of the user and movie features - a more detailed and explained version can be found here.

In [6]:

movies=pd.read_csv('/home/david/movielens/ml-latest/ml-latest/movies.csv')
movies_humanreadable=movies.copy()
movies['hasYear']=movies.title.map(lambda x: bool(re.search("\s\((\d{4})\)$",x.strip())))
movies['Year']='unknown'
movies['Year'].loc[movies.hasYear]=movies.title.loc[movies.hasYear].map(lambda x: re.search("\s\((\d{4})\)$",x.strip()).group(1))
del movies['hasYear']

movies['genres']=movies.genres.map(lambda x: set(x.split('|')))
present_genres=set()
for movie in movies.itertuples():
    present_genres=present_genres.union(movie.genres)
for genre in present_genres:
    movies['genre'+genre]=movies.genres.map(lambda x: 1.0*(genre in x))

tags=pd.read_csv('/home/david/movielens/ml-latest/ml-latest/genome-scores.csv')
tags_wide=tags.pivot(index='movieId', columns='tagId', values='relevance')
tags_wide=tags_wide.fillna(0)
pca=PCA(svd_solver='full')
pca.fit(tags_wide)
tags_pca=pd.DataFrame(pca.transform(tags_wide)[:,:50])
tags_pca.columns=["pc"+str(x) for x in tags_pca.columns.values]
tags_pca['movieId']=tags_wide.index
movies=pd.merge(movies,tags_pca,how='inner',on='movieId')

def discretize_year(x):
    if x=='unknown':
        return x
    else:
        x=int(x)
        if x>=2000:
            return '>=2000'
        if x>=1995 and x<=1999:
            return str(x)
        if x>=1990 and x<=1994:
            return 'low90s'
        if x>=1980 and x<=1989:
            return '80s'
        if x>=1970 and x<=1979:
            return '70s'
        if x>=1960 and x<=1969:
            return '60s'
        if x>=1950 and x<=1959:
            return '50s'
        if x>=1940 and x<=1959:
            return '40s'
        if x<1940:
            return '<1940'
        else:
            return 'unknown'

movies_features=movies.copy()
del movies_features['title']
del movies_features['genres']
del movies_features['genre(no genres listed)']
movies_features['Year']=movies_features.Year.map(lambda x: discretize_year(x))
movies_features=pd.get_dummies(movies_features, columns=['Year'])
movies_features.set_index('movieId',inplace=True)

zipcode_abbs=pd.read_csv("/home/david/movielens/zips/states.csv")
zipcode_abbs_dct={z.State:z.Abbreviation for z in zipcode_abbs.itertuples()}
us_regs_table=[
    ('New England', 'Connecticut, Maine, Massachusetts, New Hampshire, Rhode Island, Vermont'),
    ('Middle Atlantic', 'Delaware, Maryland, New Jersey, New York, Pennsylvania'),
    ('South', 'Alabama, Arkansas, Florida, Georgia, Kentucky, Louisiana, Mississippi, Missouri, North Carolina, South Carolina, Tennessee, Virginia, West Virginia'),
    ('Midwest', 'Illinois, Indiana, Iowa, Kansas, Michigan, Minnesota, Nebraska, North Dakota, Ohio, South Dakota, Wisconsin'),
    ('Southwest', 'Arizona, New Mexico, Oklahoma, Texas'),
    ('West', 'Alaska, California, Colorado, Hawaii, Idaho, Montana, Nevada, Oregon, Utah, Washington, Wyoming')
    ]
us_regs_table=[(x[0],[i.strip() for i in x[1].split(",")]) for x in us_regs_table]
us_regs_dct=dict()
for r in us_regs_table:
    for s in r[1]:
        us_regs_dct[zipcode_abbs_dct[s]]=r[0]

zipcode_info=pd.read_csv("/home/david/movielens/free-zipcode-database.csv")
zipcode_info=zipcode_info.groupby('Zipcode').first().reset_index()
zipcode_info['State'].loc[zipcode_info.Country!="US"]='UnknownOrNonUS'
zipcode_info['Region']=zipcode_info['State'].copy()
zipcode_info['Region'].loc[zipcode_info.Country=="US"]=zipcode_info.Region.loc[zipcode_info.Country=="US"].map(lambda x: us_regs_dct[x] if x in us_regs_dct else 'UsOther')
zipcode_info=zipcode_info[['Zipcode', 'Region']]

users=pd.read_table("/home/david/movielens/ml-1m/ml-1m/users.dat",sep='::',names=["userId","Gender","Age","Occupation","Zipcode"], engine='python')
users["Zipcode"]=users.Zipcode.map(lambda x: np.int(re.sub("-.*","",x)))
users=pd.merge(users,zipcode_info,on='Zipcode',how='left')
users['Region']=users.Region.fillna('UnknownOrNonUS')

users_features=users.copy()
users_features['Gender']=users_features.Gender.map(lambda x: 1.0*(x=='M'))
del users_features['Zipcode']
users_features['Age']=users_features.Age.map(lambda x: str(x))
users_features['Occupation']=users_features.Occupation.map(lambda x: str(x))
users_features=pd.get_dummies(users_features, columns=['Age', 'Occupation', 'Region'])
users_features.set_index('userId',inplace=True)

movies_w_sideinfo=set(list(movies.movieId))
ratings=ratings.loc[ratings.movieId.map(lambda x: x in movies_w_sideinfo)]
avg_rating_by_user=ratings.groupby('userId')['Rating'].mean().to_frame().rename(columns={'Rating':'AvgRating'})
ratings_train=pd.merge(ratings, avg_rating_by_user, left_on='userId',right_index=True)
ratings_train['RatingCentered']=ratings_train.Rating-ratings_train.AvgRating

def generate_features(user,movie,users_features_bc,movies_features_bc):
    user_feats=users_features_bc.value.loc[user].as_matrix()
    movie_feats=movies_features_bc.value.loc[movie].as_matrix()
    return csc_matrix(np.kron(user_feats,movie_feats).reshape(-1,1))

users_features_bc=sc.broadcast(users_features)
movies_features_bc=sc.broadcast(movies_features)

trainset=sc.parallelize([(i.userId,i.movieId,i.RatingCentered) for i in ratings_train.itertuples()])\
.map(lambda x: LabeledPoint(x[2],generate_features(x[0],x[1],users_features_bc,movies_features_bc)))\
.map(lambda x: (float(x.label),x.features.asML())).toDF(['label','features'])
trainset.repartition(50)

recommender=LinearRegression(regParam=1e-4).fit(trainset)
formula_coeffs=recommender.coefficients.toArray()

def generate_features_series(user,movie):
    user_feats=users_features.loc[user].as_matrix()
    movie_feats=movies_features.loc[movie].as_matrix()
    return pd.Series(np.kron(user_feats,movie_feats).astype('float64'))

preds_scores=preds_scores.loc[preds_scores.movieId.map(lambda x: x in movies_w_sideinfo)]
X_predict=preds_scores.movieId.apply(lambda x: generate_features_series(user,x))
preds_scores['score_cb']=X_predict.dot(formula_coeffs)
preds_scores=preds_scores.sort_values('score_cb',ascending=False)
cb20=list(preds_scores.movieId.iloc[:20])
preds_scores.head()

/home/david/anaconda2/lib/python2.7/site-packages/pandas/core/indexing.py:179: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self._setitem_with_indexer(indexer, value)
/home/david/anaconda2/lib/python2.7/site-packages/IPython/core/interactiveshell.py:2717: DtypeWarning: Columns (11) have mixed types. Specify dtype option on import or set low_memory=False.
  interactivity=interactivity, compiler=compiler, result=result)

Out[6]:

	userId	movieId	score_cf	Title	score_cb
3191	100	1262	3.136641	Great Escape, The (1963)	1.030581
835	100	3030	3.163622	Yojimbo (1961)	1.015274
398	100	908	3.137593	North by Northwest (1959)	1.012968
354	100	3435	3.167782	Double Indemnity (1944)	1.003191
473	100	3196	3.033045	Stalag 17 (1953)	0.998952

5. Examining the recommendations¶

Now taking a look at what these lists are actually recommend each - their recommendations are very different with little intersection, and as expected, collaborative filtering tends to favor less popular items for this user. First Most-Popular recommended list:

In [7]:

def print_reclist(reclist):
    list_w_info=[str(m+1)+") - "+movie_titles[reclist[m]]+\
        " - Average Rating: "+str(np.round(avg_ratings.loc[reclist[m]].iloc[0],2))+\
        " - Number of ratings: "+str(num_ratings.loc[reclist[m]].iloc[0]) for m in range(len(reclist))]
    print "\n".join(list_w_info)
    
print_reclist(pop20)

1) - American Beauty (1999) - Average Rating: 4.32 - Number of ratings: 3428
2) - Star Wars: Episode IV - A New Hope (1977) - Average Rating: 4.45 - Number of ratings: 2991
3) - Star Wars: Episode V - The Empire Strikes Back (1980) - Average Rating: 4.29 - Number of ratings: 2990
4) - Star Wars: Episode VI - Return of the Jedi (1983) - Average Rating: 4.02 - Number of ratings: 2883
5) - Saving Private Ryan (1998) - Average Rating: 4.34 - Number of ratings: 2653
6) - Raiders of the Lost Ark (1981) - Average Rating: 4.48 - Number of ratings: 2514
7) - Silence of the Lambs, The (1991) - Average Rating: 4.35 - Number of ratings: 2578
8) - Matrix, The (1999) - Average Rating: 4.32 - Number of ratings: 2590
9) - Sixth Sense, The (1999) - Average Rating: 4.41 - Number of ratings: 2459
10) - Terminator 2: Judgment Day (1991) - Average Rating: 4.06 - Number of ratings: 2649
11) - Fargo (1996) - Average Rating: 4.25 - Number of ratings: 2513
12) - Schindler's List (1993) - Average Rating: 4.51 - Number of ratings: 2304
13) - Braveheart (1995) - Average Rating: 4.23 - Number of ratings: 2443
14) - Back to the Future (1985) - Average Rating: 3.99 - Number of ratings: 2583
15) - Shawshank Redemption, The (1994) - Average Rating: 4.55 - Number of ratings: 2227
16) - Godfather, The (1972) - Average Rating: 4.52 - Number of ratings: 2223
17) - Jurassic Park (1993) - Average Rating: 3.76 - Number of ratings: 2672
18) - Princess Bride, The (1987) - Average Rating: 4.3 - Number of ratings: 2318
19) - Shakespeare in Love (1998) - Average Rating: 4.13 - Number of ratings: 2369
20) - L.A. Confidential (1997) - Average Rating: 4.22 - Number of ratings: 2288

Collaborative filtering recommended list:

In [8]:

print_reclist(cf20)

1) - Song of Freedom (1936) - Average Rating: 5.0 - Number of ratings: 1
2) - Mamma Roma (1962) - Average Rating: 4.5 - Number of ratings: 2
3) - Schlafes Bruder (Brother of Sleep) (1995) - Average Rating: 5.0 - Number of ratings: 1
4) - Hour of the Pig, The (1993) - Average Rating: 4.5 - Number of ratings: 2
5) - Smashing Time (1967) - Average Rating: 5.0 - Number of ratings: 2
6) - Gate of Heavenly Peace, The (1995) - Average Rating: 5.0 - Number of ratings: 3
7) - Apple, The (Sib) (1998) - Average Rating: 4.67 - Number of ratings: 9
8) - Ulysses (Ulisse) (1954) - Average Rating: 5.0 - Number of ratings: 1
9) - Follow the Bitch (1998) - Average Rating: 5.0 - Number of ratings: 1
10) - I Am Cuba (Soy Cuba/Ya Kuba) (1964) - Average Rating: 4.8 - Number of ratings: 5
11) - One Little Indian (1973) - Average Rating: 5.0 - Number of ratings: 1
12) - Lamerica (1994) - Average Rating: 4.75 - Number of ratings: 8
13) - Foreign Student (1994) - Average Rating: 3.0 - Number of ratings: 2
14) - Sanjuro (1962) - Average Rating: 4.61 - Number of ratings: 69
15) - Lured (1947) - Average Rating: 5.0 - Number of ratings: 1
16) - Bells, The (1926) - Average Rating: 4.5 - Number of ratings: 2
17) - Bittersweet Motel (2000) - Average Rating: 5.0 - Number of ratings: 1
18) - Seven Samurai (The Magnificent Seven) (Shichinin no samurai) (1954) - Average Rating: 4.56 - Number of ratings: 628
19) - Jar, The (Khomreh) (1992) - Average Rating: 4.0 - Number of ratings: 1
20) - For All Mankind (1989) - Average Rating: 4.44 - Number of ratings: 27

Content-based recommended list:

In [9]:

print_reclist(cb20)

1) - Great Escape, The (1963) - Average Rating: 4.38 - Number of ratings: 696
2) - Yojimbo (1961) - Average Rating: 4.4 - Number of ratings: 215
3) - North by Northwest (1959) - Average Rating: 4.38 - Number of ratings: 1315
4) - Double Indemnity (1944) - Average Rating: 4.42 - Number of ratings: 551
5) - Stalag 17 (1953) - Average Rating: 4.23 - Number of ratings: 394
6) - It Happened One Night (1934) - Average Rating: 4.28 - Number of ratings: 374
7) - Seven Samurai (The Magnificent Seven) (Shichinin no samurai) (1954) - Average Rating: 4.56 - Number of ratings: 628
8) - Gladiator (2000) - Average Rating: 4.11 - Number of ratings: 1924
9) - Casablanca (1942) - Average Rating: 4.41 - Number of ratings: 1669
10) - Third Man, The (1949) - Average Rating: 4.45 - Number of ratings: 480
11) - Maltese Falcon, The (1941) - Average Rating: 4.4 - Number of ratings: 1043
12) - To Kill a Mockingbird (1962) - Average Rating: 4.43 - Number of ratings: 928
13) - Treasure of the Sierra Madre, The (1948) - Average Rating: 4.29 - Number of ratings: 453
14) - Everest (1998) - Average Rating: 4.01 - Number of ratings: 167
15) - Wrong Trousers, The (1993) - Average Rating: 4.51 - Number of ratings: 882
16) - In the Heat of the Night (1967) - Average Rating: 4.13 - Number of ratings: 348
17) - Terminator 2: Judgment Day (1991) - Average Rating: 4.06 - Number of ratings: 2649
18) - Modern Times (1936) - Average Rating: 4.24 - Number of ratings: 305
19) - City Lights (1931) - Average Rating: 4.39 - Number of ratings: 271
20) - Terminator, The (1984) - Average Rating: 4.15 - Number of ratings: 2098

6. Combining recommended lists¶

Finally, combining these three lists through interleaved ranking, prioritizing them in this order: CF-CB-MP:

In [10]:

def interleaved_ranking(lst_of_lists,n):
    final_list=list()
    while len(final_list)<n:
        for lst in lst_of_lists:
            lst=[m for m in lst if m not in final_list]
            if len(lst)==0:
                continue
            new=lst[0]
            final_list.append(new)
            if len(final_list)==n:
                break
    return final_list

mixed_list=interleaved_ranking([cf20,cb20,pop20],20)
print_reclist(mixed_list)

1) - Song of Freedom (1936) - Average Rating: 5.0 - Number of ratings: 1
2) - Great Escape, The (1963) - Average Rating: 4.38 - Number of ratings: 696
3) - American Beauty (1999) - Average Rating: 4.32 - Number of ratings: 3428
4) - Mamma Roma (1962) - Average Rating: 4.5 - Number of ratings: 2
5) - Yojimbo (1961) - Average Rating: 4.4 - Number of ratings: 215
6) - Star Wars: Episode IV - A New Hope (1977) - Average Rating: 4.45 - Number of ratings: 2991
7) - Schlafes Bruder (Brother of Sleep) (1995) - Average Rating: 5.0 - Number of ratings: 1
8) - North by Northwest (1959) - Average Rating: 4.38 - Number of ratings: 1315
9) - Star Wars: Episode V - The Empire Strikes Back (1980) - Average Rating: 4.29 - Number of ratings: 2990
10) - Hour of the Pig, The (1993) - Average Rating: 4.5 - Number of ratings: 2
11) - Double Indemnity (1944) - Average Rating: 4.42 - Number of ratings: 551
12) - Star Wars: Episode VI - Return of the Jedi (1983) - Average Rating: 4.02 - Number of ratings: 2883
13) - Smashing Time (1967) - Average Rating: 5.0 - Number of ratings: 2
14) - Stalag 17 (1953) - Average Rating: 4.23 - Number of ratings: 394
15) - Saving Private Ryan (1998) - Average Rating: 4.34 - Number of ratings: 2653
16) - Gate of Heavenly Peace, The (1995) - Average Rating: 5.0 - Number of ratings: 3
17) - It Happened One Night (1934) - Average Rating: 4.28 - Number of ratings: 374
18) - Raiders of the Lost Ark (1981) - Average Rating: 4.48 - Number of ratings: 2514
19) - Apple, The (Sib) (1998) - Average Rating: 4.67 - Number of ratings: 9
20) - Seven Samurai (The Magnificent Seven) (Shichinin no samurai) (1954) - Average Rating: 4.56 - Number of ratings: 628

The list seems now a lot more diverse, which is probably a good thing. Such a list can be further diversified following other heuristics as illustrated in this other IPython notebook, and the lists can be further rotated in time (e.g. starting from rank10 instead of rank1) to offer more items to the user.

When there is a large degree of intersection between the items from each list, changing the order in which they are prioritized will change not only the relative orderings in the final list, but also the items that end up appearing. This is not the case here though, as there is little intersection between the lists:

In [11]:

mixed_list=interleaved_ranking([pop20,cf20,cb20],20)
print_reclist(mixed_list)

1) - American Beauty (1999) - Average Rating: 4.32 - Number of ratings: 3428
2) - Song of Freedom (1936) - Average Rating: 5.0 - Number of ratings: 1
3) - Great Escape, The (1963) - Average Rating: 4.38 - Number of ratings: 696
4) - Star Wars: Episode IV - A New Hope (1977) - Average Rating: 4.45 - Number of ratings: 2991
5) - Mamma Roma (1962) - Average Rating: 4.5 - Number of ratings: 2
6) - Yojimbo (1961) - Average Rating: 4.4 - Number of ratings: 215
7) - Star Wars: Episode V - The Empire Strikes Back (1980) - Average Rating: 4.29 - Number of ratings: 2990
8) - Schlafes Bruder (Brother of Sleep) (1995) - Average Rating: 5.0 - Number of ratings: 1
9) - North by Northwest (1959) - Average Rating: 4.38 - Number of ratings: 1315
10) - Star Wars: Episode VI - Return of the Jedi (1983) - Average Rating: 4.02 - Number of ratings: 2883
11) - Hour of the Pig, The (1993) - Average Rating: 4.5 - Number of ratings: 2
12) - Double Indemnity (1944) - Average Rating: 4.42 - Number of ratings: 551
13) - Saving Private Ryan (1998) - Average Rating: 4.34 - Number of ratings: 2653
14) - Smashing Time (1967) - Average Rating: 5.0 - Number of ratings: 2
15) - Stalag 17 (1953) - Average Rating: 4.23 - Number of ratings: 394
16) - Raiders of the Lost Ark (1981) - Average Rating: 4.48 - Number of ratings: 2514
17) - Gate of Heavenly Peace, The (1995) - Average Rating: 5.0 - Number of ratings: 3
18) - It Happened One Night (1934) - Average Rating: 4.28 - Number of ratings: 374
19) - Silence of the Lambs, The (1991) - Average Rating: 4.35 - Number of ratings: 2578
20) - Apple, The (Sib) (1998) - Average Rating: 4.67 - Number of ratings: 9

Interleaved ranking can also be used as a heuristic to movie away from most-popular recommendations, by following the same algorithm as before but removing the items that came from most-popular (and this can be expanded by letting most-popular choose more than one item at once and other heuristics) - here is a simple version to force the list to contain less popular items (in this particular case it's the same as just exluding the most-popular list as there is pretty much no intersection):

In [12]:

def interleaved_ranking_decreased_popularity(most_popular,lst_of_lists,n):
    final_list=list()
    excl_list=set()
    while len(final_list)<n:
        most_popular=[m for m in most_popular if m not in excl_list]
        excl=most_popular[0]
        excl_list.add(excl)
        for lst in lst_of_lists:
            lst=[m for m in lst if m not in excl_list]
            if len(lst)==0:
                continue
            new=lst[0]
            final_list.append(new)
            excl_list.add(new)
            if len(final_list)==n:
                break
    return final_list

mixed_list_dec_pop=interleaved_ranking_decreased_popularity(pop20,[cf20,cb20],10)
print_reclist(mixed_list_dec_pop)

1) - Song of Freedom (1936) - Average Rating: 5.0 - Number of ratings: 1
2) - Great Escape, The (1963) - Average Rating: 4.38 - Number of ratings: 696
3) - Mamma Roma (1962) - Average Rating: 4.5 - Number of ratings: 2
4) - Yojimbo (1961) - Average Rating: 4.4 - Number of ratings: 215
5) - Schlafes Bruder (Brother of Sleep) (1995) - Average Rating: 5.0 - Number of ratings: 1
6) - North by Northwest (1959) - Average Rating: 4.38 - Number of ratings: 1315
7) - Hour of the Pig, The (1993) - Average Rating: 4.5 - Number of ratings: 2
8) - Double Indemnity (1944) - Average Rating: 4.42 - Number of ratings: 551
9) - Smashing Time (1967) - Average Rating: 5.0 - Number of ratings: 2
10) - Stalag 17 (1953) - Average Rating: 4.23 - Number of ratings: 394