Remember, you need to import pandas before you can use it:
%matplotlib inline import pandas as pd import matplotlib.pyplot as plt import numpy as np #you need to press enter
In cells below, import the EdX data set from your notebooks directory as a dataframe. (It's at "./HMXPC13_DI_v2_5-14-14.csv" from your IPython notebooks)
pandas, slice and dice to get:
Load the earthquakes csv from the Foundations class using
The csv includes the labels for the columns.
Using the magnitudes of the earthquakes--the 'mag' column--calculate:
the mean of all earthquake magnitudes
the five earthquakes with the greatest magnitudes
give the row number, time, magnitude and place for each
hint: use the .size() method or the value_counts() method
happydataframe is a dataframe with 200 rows and two columns "activity" and "endorphin_level".
Explain briefly what is the difference between
Using the HarvardX dataset, compute how much video (
nplay_video) on average the following watched:
Use boolean indexing.
.groupby method create a data frame of how much video on average people from different countries of different genders watched.
something roughly like:
F 10 M 20
F 300 M 10
Precise formatting not at issue
Turn now to the files in the directory
ml-100k. In the lecture, we manually converted the field names for the u.users files for our conversion into a pandas dataframe.
Using regular expressions, convert the string "user id | age | gender | occupation | zip code" into a list named
labels of strings of the names of the columns. Replace any spaces within the names with underscores (_), so "zip code" will become "zip_code" &c.
README file, find the names for the columns for
u.data. Using regular expressions, parse each set of names into a
list of strings of the names of the columns. Replace any spaces within the names with underscores (_).
Drawing upon the two lists of labels you've just created, use pd.read_csv to load the
u.data files as dataframes.
Using the dataframe you've created from
Finally, take the item numbers that user 42 gave a rating greater than his/her mean. Using the data
u.item, give the titles of the movies corresponding to those item numbers.