#!/usr/bin/env python # coding: utf-8 # # Mobile Data Explorer # # Whilst the original interactive application that allowed you to explore the data collected from Malte Spritz' mobile phone is no longer available, the original data *does* still seem to be available in the form of a public Google spreadsheet, as linked from articles such as the contemporary *Flowing Data* blog post [Tell-all telephone reveals politician’s life](https://flowingdata.com/2011/03/30/tell-all-telephone-reveals-politicians-life/): # # https://docs.google.com/spreadsheets/d/1PMjIkymwzYNGhENCi9BZst63H-UPagYgPO6DwHVdskU/edit?authkey=COCjw-kG&hl=en_GB&hl=en_GB&authkey=COCjw-kG#gid=0 # # Google spreadsheets also expose the data as a CSV (comma separated variable) data file, which we can access by rewriting the URL in the form: # # `https://docs.google.com/spreadsheets/d/{key}/gviz/tq?tqx=out:csv` # # This notebook describes a DIY solution to loading and exploring the data. At the end of the notebook, follow the activity to explore an embedded, interactive map view of what Spritz was doing on a particular date. # # If you would like to explore the data for a different period, you can download this notebook, upload it to a "live" Jupyter notebook environment, and run the code cells using a date range of your own choosing. # # __*NOTE: you can jump straight to the embedded map activity at the end of this notebook without reading through the code that generated the map. However, you may find it instructive to see how the map was created.*__ # In[16]: # Data URL sheet_id = "1PMjIkymwzYNGhENCi9BZst63H-UPagYgPO6DwHVdskU" data_url = f"https://docs.google.com/spreadsheets/d/{sheet_id}/gviz/tq?tqx=out:csv" # We can load the data from the Google spreadsheet into a two dimensional data structure using the powerful *pandas* Python package: # In[18]: import pandas as pd # Preview just the first few rows of the data pd.read_csv(data_url, nrows=5) # Let's tidy things up a little, firstly by parsing the start and end time associated with a particular data location as we load in the complete dataset as a *pandas* dataframe object: # In[19]: start = "Beginn" end = "Ende" df = pd.read_csv(data_url, parse_dates=[start, end]) df.head(5) # We can review how many data points are available as the number of rows in, which is to say, the *length* of, the dataframe: # In[72]: len(df) # As we are particulary interested in data points at a particular time and place, we can drop rows if there is a null value associated with the loaction, or both the start and end time: # In[21]: lon = "Laenge" lat = "Breite" df = df.dropna(how="any", subset= [lat, lon]) df = df.dropna(how="all", subset= [start, end]) # We can create a map using the Python `folium` package. # # A function is defined that adds a marker to the map based on the data contained within a particular row in the data table. This function can be applied to multiple rows in the table to generate a map showing location markers associated with the points defined by the data rows. # In[73]: import folium def add_marker(row, m, color='red'): folium.Circle(location=[row[lat], row[lon]], color = color, popup = f"From: {row[start]} \To: {row[end]}", radius=50, fill=True, fill_opacity=1.0).add_to(m) # We can now plot a map showing the location of the phone at different points in time. # # For example, because we have parsed the dates as `datetime` objects, we can select rows of data within a particular time range. Suppose we want to look at data on a particular day, such as October 1st and up until mid-afternoon on October 2nd, 2009. We can create a datetime object for the data we are interested in: # In[58]: time_from = pd.to_datetime("October 1st, 2009") time_to = pd.to_datetime("October 2nd, 2009, 3pm") time_from, time_to # We can filter the data to include just records where the start falls in that period: # In[59]: filtered_period = df[start].between(time_from, time_to) df[filtered_period] # To center the map, let's find the "average" location: # In[60]: AVERAGE_LOCATION = df[filtered_period][[lat, lon]].median() # .values.tolist() AVERAGE_LOCATION # Now we can create a *folium* map object, add the data to it and display it: # In[61]: #Create map m = folium.Map(AVERAGE_LOCATION, # Specify some map properties width=500, height=800) # Add markers to filtered rows df[filtered_period].apply(add_marker, m=m, axis=1) # Display map m # ## Activity — On This Day... # # The following code fragment to specifies the time range over which we might wish to inspect the activity of Mark Spritz. # # Hint: note that 12pm is parsed as 12 noon. # In[75]: time_from = pd.to_datetime("October 2nd, 2009, 6pm") time_to = pd.to_datetime("October 2nd, 2009, 11.59pm") # Running the following cell generates a map over the selected data points. Use the zoom controls to modify the map view. Click on the red markers to pop-up a marker showing the time at which Spritz was at that location. # In[83]: # Before running this cell, # ensure that all cells above this cell have been run # You can do this from the notebook Cell menu filtered_period = df[start].between(time_from, time_to) AVERAGE_LOCATION = df[filtered_period][[lat, lon]].median() # .values.tolist() m = folium.Map(AVERAGE_LOCATION, # Specify some map properties width=500, height=800, zoom_start=7) # Add markers to filtered rows df[filtered_period].apply(add_marker, m=m, axis=1) # Display map m # ### Discussion # # With the range set for the evening of October 2nd, 2009, between 6pm and 11.59pm, what sort of pattern do the location markers make? What does it suggest if consecutive markers are widely spaced. What does it mean if they are closely grouped? If you zoom the map in to a set of closely grouped markers, is there anything revealing about the location as displayed on the map?