Someone made a compilation of musicians and other celebrities answering the question, "Lennon or McCartney?"
I wasn't sure how I'd answer it myself. It's not really fair to compare their post-Beatle careers, McCartney having had 30+ extra years in which to write and produce. When you look at their character, John is certainly the more divisive figure. On the one hand, he was admittedly abusive to his first wife and son, and was violent towards others as well. On the other hand, towards the end of his life he became a force for peace, and his Rolling Stone cover with Yoko Ono remains one of the best artistic and cultural statements I've seen. Undoubtedly McCartney is kinder than Lennon. But the question is not "Who is the better person?"
That leaves the (Beatles) music! Just thinking of random songs, I'm not sure who I like more or even who wrote the songs I like best. So I grabbed a list of all their songs from Wikipedia, covered the credits column, and rated them:
Let's get to analyzing! We start by importing the file and making sure it looks like we expect:
import pandas
import numpy as np
df = pandas.read_csv("./beatles.csv")
print df[:5]
Title Album Ratings Writer 0 "12-Bar Original" Anthology 2 0 Other 1 "Across the Universe" Let It Be 3 Lennon 2 "Act Naturally" Help! 2 NonBeatle 3 "Ain't She Sweet" Anthology 1 2 NonBeatle 4 "All I've Got to Do" With the Beatles 0 Lennon
Let's start by comparing #s. Who has written the most songs?
df.groupby(['Writer'])['Ratings'].count()
Writer NonBeatle 76 Harrison 27 Lennon 57 Lennon and McCartney 16 Lennon with McCartney 27 McCartney 64 McCartney with Lennon 27 Other 15 Name: Ratings, dtype: int64
We can see a few things here. One, Lennon and McCartney wrote far more songs on their own than they did together - nearly twice as many. McCartney is slightly more prolific than Lennon, but not by much. The single most common writing category though is "other".
Next, I'm curious to see who wrote the most songs that I didn't even recognize.
mysterySongs = df[df['Ratings'] == 0]
mysterySongs.groupby(['Writer'])['Ratings'].count()
Writer NonBeatle 64 Harrison 11 Lennon 21 Lennon and McCartney 7 Lennon with McCartney 7 McCartney 24 McCartney with Lennon 3 Other 9 Name: Ratings, dtype: int64
A clearer way to look at this might be to ask who had the highest percentages of songs I didn't recognize.
mysterySongs.groupby(['Writer'])['Ratings'].count() / df.groupby(['Writer'])['Ratings'].count()
Writer NonBeatle 0.842105 Harrison 0.407407 Lennon 0.368421 Lennon and McCartney 0.437500 Lennon with McCartney 0.259259 McCartney 0.375000 McCartney with Lennon 0.111111 Other 0.600000 Name: Ratings, dtype: float64
Not surprised that I'm disproportionately less likely to know Beatles song written by non-Beatles. I am surprised to realize I don't know 40% of George Harrison's songs. Sorry, George! The percentage of unknown songs by "Lennon and McCartney" is surprisingly high as well, although that's maybe skewed by the small number of songs overall credited to them both equally.
Okay, let's move on to the real question - whose songs do I like more?
songs = df[df['Ratings'] != 0]
songs.groupby(['Writer'])['Ratings'].agg([np.mean, np.std])
mean | std | |
---|---|---|
Writer | ||
NonBeatle | 2.833333 | 0.717741 |
Harrison | 3.250000 | 0.856349 |
Lennon | 3.055556 | 0.629941 |
Lennon and McCartney | 3.222222 | 0.666667 |
Lennon with McCartney | 3.150000 | 1.136708 |
McCartney | 3.625000 | 0.774183 |
McCartney with Lennon | 3.000000 | 0.589768 |
Other | 2.666667 | 0.516398 |
If we're looking just at means, my favorite writer is McCartney alone, and my least favorite writer (other than non-Beatles and weird combos) is McCartney with Lennon. I'd point out that that doesn't make much sense, but the standard deviations show that they're all within a reasonable range of each other. Another way to view this is to look at all songs which Lennon has credits on vs all songs which McCartney has credits on.
songs['IsLennon'] = np.where(songs['Writer'].str.contains("Lennon"), 1, 0)
songs['IsMcCartney'] = np.where(songs['Writer'].str.contains("McCartney"), 1, 0)
print songs[:5]
Title Album Ratings Writer IsLennon \ 1 "Across the Universe" Let It Be 3 Lennon 1 2 "Act Naturally" Help! 2 NonBeatle 0 3 "Ain't She Sweet" Anthology 1 2 NonBeatle 0 5 "All My Loving" With the Beatles 3 McCartney 0 6 "All Things Must Pass" Anthology 3 4 Harrison 0 IsMcCartney 1 0 2 0 3 0 5 1 6 0
Now let's compare!
print "Lennon:"
print songs[songs['IsLennon'] == 1]['Ratings'].mean(), songs[songs['IsLennon'] == 1]['Ratings'].std()
print "McCartney:"
print songs[songs['IsMcCartney'] == 1]['Ratings'].mean(), songs[songs['IsMcCartney'] == 1]['Ratings'].std()
Lennon: 3.07865168539 0.757158550424 McCartney: 3.32258064516 0.849056897804
Yup, still like McCartney more. I have a few more questions. I'm curious who wrote my favorite and least favorite songs. Let's take a look:
bestSongs = df[df['Ratings'] == 5]
print bestSongs
Title Album Ratings Writer 63 "Eleanor Rigby" Revolver 5 McCartney 72 "For No One" Revolver 5 McCartney 97 "Here Comes the Sun" Abbey Road 5 Harrison 98 "Here, There and Everywhere" Revolver 5 McCartney 228 "I Will" The Beatles 5 McCartney
I've always liked the melodic, melancholic stuff the best. (Also, I've always liked Revolver! Maybe I'll tack on a by-album analysis at the end.) Now, how about my least favorite songs? Turns out there's a bunch I labelled "meh", so here are some counts:
worstSongs = songs[songs['Ratings'] < 3].groupby(['Writer'])['Ratings'].count()
print worstSongs
Writer NonBeatle 4 Harrison 3 Lennon 6 Lennon and McCartney 1 Lennon with McCartney 2 McCartney 3 McCartney with Lennon 4 Other 2 Name: Ratings, dtype: int64
Guess I do not < 3 Lennon by himself. (Get it? :P)
I distinctly remember creating the rating -1 for the only Beatles song I actually hate. Who wrote that one?
worstSongs = df[df['Ratings'] == -1]
print worstSongs
Title Album Ratings Writer 217 "Run for Your Life" Rubber Soul -1 Lennon with McCartney
From Wikipedia: "The song's lyrics establish a threatening tone towards the singer's unnamed girlfriend (referred to throughout the song as "little girl"), claiming "I'd rather see you dead, little girl, than to be with another man." ... Lennon designated this song as his "least favourite Beatles song" in a 1973 interview and later said it was the song he most regretted writing."
In conclusion, I appear to like McCartney slightly more than Lennon.
One more analysis! What are my favorite albums (in order?) I'm leaving the unknown songs in here because I think there a sign of me not liking the album as much.
Let's also only select the 'studio albums'.
studioAlbums = ['Please Please Me', 'With the Beatles', 'A Hard Day\'s Night', 'Beatles for Sale', 'Help!', 'Rubber Soul', 'Revolver', 'Sgt. Pepper\'s Lonely Hearts Club Band', 'Magical Mystery Tour', 'The Beatles', 'Yellow Submarine', 'Let It Be', 'Abbey Road']
albums = df[df['Album'].isin(studioAlbums) == True]
albums.groupby(['Album'])['Ratings'].agg([np.mean, np.std])
mean | std | |
---|---|---|
Album | ||
A Hard Day's Night | 2.615385 | 1.556624 |
Abbey Road | 3.352941 | 0.785905 |
Beatles for Sale | 1.428571 | 1.741542 |
Help! | 3.214286 | 0.892582 |
Let It Be | 2.083333 | 1.880925 |
Magical Mystery Tour | 2.818182 | 1.078720 |
Please Please Me | 1.285714 | 1.589803 |
Revolver | 3.428571 | 1.342460 |
Rubber Soul | 2.785714 | 1.251373 |
Sgt. Pepper's Lonely Hearts Club Band | 2.615385 | 1.260850 |
The Beatles | 2.066667 | 1.638614 |
With the Beatles | 1.500000 | 1.605280 |
Yellow Submarine | 2.000000 | 1.414214 |
Looks like my favorite is in fact Revolver, followed closely by Abbey Road. Not surpised to see the early stuff (Please Please Me, Beatles for Sale, With the Beatles) farther down.