If you have a dataset that contains latitude and longitude data, one natural thing to do with it is put it on a map.
In this notebook, we'll look at how to put the Code Club Eats data onto a map...
The original data can be found in the Code Club Eats Google spreadsheet.
In Code Club Week 6, you saw how we could geocode the date using the Google Maps geocoder in OpenRefine:
Edit column -> Add column by fetching URLs
, name the column something like jsondata and then use the following GREL expression to generate the URLs: 'http://maps.googleapis.com/maps/api/geocode/json?address='+escape(value,'url')+'&sensor=False'
Edit column -> Add column based on this column
), one called Latitude and one called Longitude (or something similar...)parseJson(value)['results'][0]['geometry']['location']['lat']
parseJson(value)['results'][0]['geometry']['location']['lng']
Export
menu (top right), select Custom tabular exporter
, select all the columns except the jsondata column, and save the file as a comma separated CSV file. (By the by, it is also possible to upload data from OpenRefine to Google spreadsheets using the Custom Tabular Exporter.) When you export the file, it should be placed into your normal Downloads directory.import pandas as pd
Load the exported data file into a pandas dataframe.
#Use the path to your own file
#If the file is in the current working directory, you just need the filename
#!pwd on Mac, !cd on Windows to find the current working directory, remember...
#Note that depending on your machine settings, sometimes when you look at the file in a file browser,
## the file suffix (.csv) may not be displayed.. But it is still there and does need adding to the filename
cceats=pd.read_csv('/Users/ajh59/Downloads/Code-Club-s-Fave-Restaurants-Sheet1-csv.csv')
cceats
Name | Postcode | longitude | latitude | Person recommended | Type of food | |
---|---|---|---|---|---|---|
0 | Sacro Cuore | NW10 3NB | -0.217232 | 51.531975 | Marcus | Pizza |
1 | Casse-Cro��te | SE1 3XB | -0.081578 | 51.500636 | Timur | Bistro |
2 | Donna Margherita | SW11 5TE | -0.159551 | 51.464603 | Laia | Italian |
3 | Le Mercury | N1 1QY | -0.102773 | 51.539820 | Julia | French |
4 | Mondello | W1T 2QN | -0.135335 | 51.519695 | Danni | Italian |
5 | McDonald's | SW12 9AU | -0.151805 | 51.444311 | Andy | American |
6 | Silk Road | SE5 8TR | -0.089592 | 51.474050 | Steph | Chinese |
7 | Pedlar | SE15 4JR | -0.066710 | 51.465554 | Sam | Anything seasonal |
8 | Polpo Covent Garden | WC2E 7NA | -0.122964 | 51.510676 | Silvia | Venetian |
9 | McDonald's | WC1V 2JS | -0.117823 | 51.517905 | Tara | Fast food |
10 | Il Convivio | SW1W 9QN | -0.150446 | 51.492894 | Francesca | Italian |
11 | Al Maeda | E26DG | -0.071187 | 51.524944 | Ashraf | Turkish |
12 | Climpson's Arch | E8 3SB | -0.058345 | 51.539453 | Joe | Thai |
13 | Pizza Express | SE1 9QQ | -0.088722 | 51.506211 | Adrian | Pizza |
14 | Janetira | W1F 0SR | -0.134633 | 51.512193 | Madeleine | Thai |
15 | Pizza Express | SW17 7HR | -0.166470 | 51.442481 | Hina | Pizza |
16 | Bodean's | SW4 7SS | -0.136328 | 51.462159 | Jon | BBQ |
You may get an error when trying to load the data saying something along the lines of an ASCII codec not being able to handle UTF-8 (?!)... If so, here's the fix... (this is a Good Fragment to Remember...)
#http://stackoverflow.com/questions/3828723/why-we-need-sys-setdefaultencodingutf-8-in-a-py-script
#If you get the ascii/utf-8 error, uncomment and run the following
#import sys
#reload(sys)
#sys.setdefaultencoding("utf-8")
#NOTE - you should only need to do this once
To plot markers on to a map idetnifying each Code Club Eats location, we need to do a couple of things:
To make life easier working with maps, we can use the folium package, which contains a set of functions that do lots of heavy lifting for us and give us a simple means by which we can work with maps.
folium is not part of the standard Anaconda python distribution, so we need to install the package before we can load it.
#This should work - uncomment (remove the #) and run the following command line command:
#!pip install folium
#NOTE - you only need to install the package once.
#What the command does is fetch the package files from an online directory of python packages and then
## install them into the python distribution on your own computer.
#If it doesn't work (i.e. you get error messages...) then try to run the command directly:
## - open a terminal/command prompt
## - go to the Anaconda directory
## - go into the bin directory (which is where the pip command lives...) and run:
#pip install folium
#Having installed the package, we can load it in to the notebook
import folium
#If the package has not been installed, you will see something like:
#ImportError: No module named folium
#If you ever see that sort of error, you need to install the missing package...
#So if you try to run:
#import foo
#and see: ImportError: No module named foo
#what do you need to do?
#Try: pip install foo
To embed maps in notebooks, we need to add a couple of helper functions that allow us to insert generated maps into a notebook in a couple of ways.
From the gist at bit.ly/ccweek7code, grab the code from folium_base.py and run it in a code cell.
#Programmers share code - and reuse each others' code - all the time...
#(that's partly what libraries and packages are about).
#Copy and paste the folium_base.py code here and run the cell
from IPython.display import HTML
import folium
def inline_map(map):
"""
Embeds the HTML source of the map directly into the IPython notebook.
This method will not work if the map depends on any files (json data). Also this uses
the HTML5 srcdoc attribute, which may not be supported in all browsers.
"""
map._build_map()
return HTML('<iframe srcdoc="{srcdoc}" style="width: 100%; height: 510px; border: none"></iframe>'.format(srcdoc=map.HTML.replace('"', '"')))
def embed_map(map, path="map.html"):
"""
Embeds a linked iframe to the map into the IPython notebook.
Note: this method will not capture the source of the map into the notebook.
This method should work for all maps (as long as they use relative urls).
"""
map.create_map(path=path)
return HTML('<iframe src="files/{path}" style="width: 100%; height: 510px; border: none"></iframe>'.format(path=path))
The folium_base.py code contains two functions - one embeds a map directly in the notebook (inline_map()
), the other generates a file (by default called map.html (can you figure out how to change that to mymap.html?) that will be saved to your current working directory and that is then loaded into an HTML iframe in the notebook (embed_map()
).
To create a map and pop markers on it using folium, we need to:
folium.Map(location=[LAT, LONG])
), specifiying an initial central point as a list containing a latitude and a longitude, and optionally a zoom level (e.g. zoom_start=9
).simple_marker( [MARKER_LAT, MARKER_LONG] )
cceats
fmap=folium.Map(location=[51.5, 0])
def plotmarker(row):
fmap.simple_marker( [row['latitude'], row['longitude']] )
# "for" loops are common to a wide variety of proramming languages, allowing you to do something a particular number
# of times or for each item in a list or set of things.
#The iterrows() method enables you to iterate through each row in the dataframe.
#This allows you to do something to each row in turn
#iterrows() actually returns a couple of items at each pass - the row index value, and the row values by column name
#We want to access the second of those items, the row values by column name, so count to the second item: 0,1,..
#Once we have that second item, we need to say which column value we want from the row
for row in cceats.iterrows():
#The 'latitude' and 'longitude' names correspond to column names in the original cceats dataframe
latlon = [ row[1]['latitude'], row[1]['longitude'] ]
fmap.simple_marker( latlon )
embed_map(fmap)
GOTCHA - if maps stop displaying for any reason, save the notebook, shutdown that notebook down, then reopen it... (i.e. switch it off and on again?!;-) Note that you will need to re-run the code cells (but won't need to install it again from pip).
There are always other ways of doing things... To apply a function to each row of a pandas dataframe, we can use the .apply()
method applied to axis=1
. For example, we can create a simple function to add a marker to a map, and then "apply" that function to each row of the cceats
dataframe.
In the following example, note how we also calculate an intitial central point for the map automatically. (What other strategies might you use for finding a mid-point?)
lat=cceats['latitude'].mean()
lon=cceats['longitude'].mean()
fmap=folium.Map(location=[lat, lon], zoom_start=9)
def plotmarker(row):
fmap.simple_marker( [row['latitude'], row['longitude']] )
cceats.apply( plotmarker, axis=1)
inline_map(fmap)
If you click on a marker, you will notice that a pop box appears contain some not very interesting text...
We can customise the text using the popup
parameter. The text we use for the pop-up can contain HTML tags.
fmap=folium.Map(location=[lat, lon], zoom_start=9)
for row in cceats.iterrows():
latlon = [ row[1]['latitude'], row[1]['longitude'] ]
fmap.simple_marker( latlon, popup='This is my <strong>label</strong>' )
inline_map(fmap)
See if you can change the popup text in the previous code cell and then regenerate the map.
Does it work as you expected?
Having the same popup text for each marker is not very interesting. What text would be more informative?
I think it could be more interesting to put the name of the eatery into the popup box. But how can we do that?
One way would be to set the popup string value to the name of the establishment. You've already seen how to get the latitude and longitude for each row, so how would you set popup
equal to the name?
fmap=folium.Map(location=[lat, lon], zoom_start=9)
for row in cceats.iterrows():
latlon = [ row[1]['latitude'], row[1]['longitude'] ]
fmap.simple_marker( latlon, popup=row[1]['Name'] )
inline_map(fmap)
SEE IF YOU CAN CHANGE THE CODE SO THAT THE POPUP DISPLAYS THE NAME OF THE PERSON WHO RECOMMENDED THAT LOCATION.
Once you've done that, try this... How would you modify the plotmarker()
function to plot the name of each eatery in the .apply()
route to adding map markers?
Copy the orginal code cell and hack the code yourself to see if you can generate popups containing the name of eatery using that method.
Adding the name of the eatery or person who recommended it to the popup box is one thing, but sometimes we may want to have more elaborate popup messages.
To construct complex strings that blend static content and "variable" content, we can use the "".format()
string method to produce template generated sentences.
"This is my string.".format()
print("This is my {} string.".format("variable"))
print("This is my {} string{}.".format("variable",", okay?"))
print("This is my {var1} string{other}.".format(other="variable",var1=", okay?"))
So how might you construct some (templated) popup text that says something taking the form: Wagamama (suggested by Sam).
#Stuck? HINT: popup='Name: {name}'.format(name=row[1]['Name'])
At the current time, a map created using embed_map()
and saved to an HTML file (eg as map.html) expects to be viewed via a webserver. If you share the map file with someone (eg by emailing it to them) and they just double click on it to load it into a browser, the markers won't display because certain files used to display the markers can't be found (the browser looks for them on your computer rather than on the web).
The following hack will patch the file so that the files can be found...
def patcher(fn='map.html'):
f=open(fn,'r')
html=f.read()
f.close()
html=html.replace('"//','"http://')
f=open(fn,'w')
f.write(html)
f.close()
#Run the patcher - by default, the file we look for is map.html in the current working directory
patcher()
#You should now be able to double click on the file to open and view it correctly in your browser, share it by email etc
The Food Standards Agency is the agency that collates food hygiene ratings for all food related establishments in the UK. The site allows you to search for establishments by name and postcode and it will display the rating for that location. The FSA also publish an API that allow machines to get hold of the same information in a machine readable way.
In Python, the requests
library provides a range of tools that may it each to call an API and pull data (and web pages) down from the web.
import requests
#http://api.ratings.food.gov.uk/help
#http://docs.python-requests.org/en/latest/user/quickstart/
params={'name':"McDonald's",'address':'SW12 9AU'}
r=requests.get('http://api.ratings.food.gov.uk/Establishments',
headers={"x-api-version":2},
params=params)
r.content
The data is returned as JSON - the same sort of stuff we got back from the Google geocoding API. Just as we coould parse that data in OpenRefine, we can do much the same in Python.
In this case, the json
library has the tools we need to parse the JSON into a Python dict
.
import json
j=json.loads(r.content)
j
j['establishments'][0]['BusinessName']
j['establishments'][0]['geocode']['latitude']
How would you pull out the postcode?
How would you pull out the Hygiene score?
Let's have a go at trying to pull down the food ratings scores for Code Club Eateries. We could annotate the data we got from OpenRefine, but instead - noting that the FSA data includes latitude and longitude co-ordinates - let's just work from the original data.
To grab a Google Spreadsheet file as a CSV file, use the URL pattern (SHEETNUMBER starts at 0):
url='https://docs.google.com/a/okfn.org/spreadsheets/d/1M14S4hqG4F5P8H78VdOMMeoITOPBpVZEGoiCvXEFBQg/export?gid=0&format=csv'
cceats_google=pd.read_csv(url)
cceats_google
#Be lazy... we can turn the original example of calling the FSA website into a function
def getFoodRatingData(name,address):
params={'name':name,'address':address}
r=requests.get('http://api.ratings.food.gov.uk/Establishments',
headers={"x-api-version":2},
params=params)
return r
tmp=getFoodRatingData("Mcdonald's","SW12 9AU")
tmp.content
tmp=getFoodRatingData('Sacro Cuore','NW10 3NB')
tmp.content
Whilst we could manuually grab the data for establishment, that's not the lazy way. If you find yourself repeating yourself, let some code take the strain and automate the hassle away...
We're going to do that in a couple of ways:
#The pandas DataFrame .append() method can be used to add a python dict to a dataframe
stuff={'book':'War and Peace', 'opinion':'too long'}
df_tmp = pd.DataFrame()
df_tmp = df_tmp.append(stuff,ignore_index=True)
df_tmp
df_tmp=df_tmp.append({'opinion':'cracking read','book':'Flash Boys'},ignore_index=True)
df_tmp
We can also append one dataframe onto the end of another:
df_tmp.append(df_tmp)
We can parse the JSON data we got back from the FSA into a Python dict
.
#Let's parse a json response from the FSA API as a function
def parseFoodRatingData(jdata):
df=pd.DataFrame()
#The FSA return a list of establishments, though the list may only contain one establishment
#Generate one row per establishment we get back
for establishment in jdata['establishments']:
#Create an empty dict to hold the data we want from the FSA API
info={}
#Here are some of the data items I want
for item in ['BusinessName','FHRSID','PostCode','RatingValue','RatingDate']:
#Take those items from the data returned from the FSA and put them into my 'useful data' dict
info[item]= establishment[item]
#We can also iterate through the items contained nested elements of the FSA data dict
for item in establishment['geocode']:
#..that is, the latitude and longitude elements...
info[item]= establishment['geocode'][item]
for item in establishment['scores']:
#..and here we grab the individual score components
info[item]= establishment['scores'][item]
#Now use the data we grabbed as the basis for a dataframe row
df=df.append(info,ignore_index=True)
return df
parseFoodRatingData(jdata)
#Let's simplify further - create another function that:
#-- gets the data from the FSA website
#-- parses it
#--returns it as a dataframe
def getAndParseFoodRatingData(name,address):
r=getFoodRatingData(name,address)
jdata=json.loads(r.content)
df=parseFoodRatingData(jdata)
return df
getAndParseFoodRatingData('Sacro Cuore','NW10 3NB')
Now it's time to construct a dataset that contains FSA information for each of the Code Club eateries. Can you think of a way to do that?
One way might be to iterate through each item in the cceats_google
Google dataframe and build a dataframe (that starts out empty) using dataframes generated from the FSA data.
#TRY HACKING SOMETHING TO SEE IF YOU CAN FIGURE OUT A WAY OF DOING IT..
#Here's one solution
#Create a dummy dataframe to put stuff into
cceats_fsa=pd.DataFrame()
#Iterate through each eatery in the data we grabbed from the Google spreadsheet
for place in cceats_google.iterrows():
#Using the name and postcode, grab the FSA rating for that establishment and add it to the growing cceats_fsa dataframe
cceats_fsa=cceats_fsa.append(getAndParseFoodRatingData(place[1]['Name'],place[1]['Postcode']),ignore_index=True)
cceats_fsa
Fortuitously, the postcodes in the original dataset are all unique, and they're also exactly the same in terms of string equivalance as the ones returned from the FSA. Which means we can use them as unique identifiers to merge the two datasets (the original one from the Google dataset, the other constructed from the FSA data).
cceats_bigdata=pd.merge(cceats_fsa,cceats_google,left_on='PostCode',right_on='Postcode')
cceats_bigdata
#Display the dataframe using reordered columns
cceats_bigdata[['FHRSID','BusinessName','Person recommended','Type of food','PostCode','RatingDate','RatingValue','latitude','longitude',
'Structural','Hygiene','ConfidenceInManagement']]
Using this dataset, see if you can generate a map with pop up markers that display the name of the establishment, the name of the person who recommended it, the food type, the rating value and (in brackets) the last inspection date. Remember, the pop text can include HTML, so if you know HTML, you should be able add in things like line breaks (<br/>) or emphasis, eg using <strong></strong> to display the establishment name using strong emphasis, (that is, in a bold font).