Notebook

Processing GPS data in Python¶

Last updated: Thu Nov 21 22:57:12 GMT 2013.

Source code for this notebook is at: https://gist.github.com/MHenderson/6279740

The notebook itself can be viewed at: http://nbviewer.ipython.org/6279740

Computing distances between points¶

Using pyproj (http://code.google.com/p/pyproj/) we implement three functions for convenience.

distance(lat1, lng1, lat2, lng2) computes the distance between points with latitudes lat1, lat2 and longitudes lng1, lng2 with respect to the WGS84 ellipsoid (http://en.wikipedia.org/wiki/World_Geodetic_System)
distance_between(p1, p2) computes the same distance but when the arguments p1, p2 are latitude, longitude pairs.
nearest_mile(distance_in_metres) converts a distance in metres to a distance to the nearest mile.
total_distance(points) calculates the total distance between points in a list of points.

In [1]:

import pyproj

def distance(lat1, lng1, lat2, lng2, ellps = 'WGS84'):
    g = pyproj.Geod(ellps = ellps)
    return g.inv(lng1, lat1, lng2, lat2)[2]

def distance_between(p1, p2):
    return distance(p1[0], p1[1], p2[0], p2[1])

def nearest_mile(distance_in_metres):
    return int(0.621371*distance_in_metres/1000)

def total_distance(points):
    return sum(map(distance_between, points[:-1], points[1:]))

So now, for example, if we know that Nottingham, England has latitude and longitude (52.9548, -1.1581) and Louisville, Kentucky has latitude and longitude (38.253284, -85.758786) then we can compute the great circle distance between those two points by using the distance_between function.

In [2]:

p1 = (52.9548, -1.1581)   # Nottingham, England
p2 = (38.253284, -85.758786)   # Louisville, KY
print "Distance (to the nearest mile): " + str(nearest_mile(distance_between(p1, p2)))

Distance (to the nearest mile): 3976

Working with CSV files¶

The data we are given is in CSV format (http://en.wikipedia.org/wiki/Comma-separated_values). Each row of our data gives GPS (in the columns headed latitude and longitude) data for a specific van (van_id) at a specific time (timestamp). We also have access to other information like the address, speed, heading and so forth. To open a CSV file for inspection with Python we use the standard library module csv which provides the DictReader object which provides a dictionary interface to the CSV data. To instantiate a DictReader we need to provide the path the CSV file and a list of table headings.

In [3]:

data_dir_path = '/home/matthew/workspace/resources/G/Geographical Information Science/'
van_activity_csv_filename = 'van_activity.csv'
van_activity_csv_filename = 'gps-activity.csv'
van_activity_csv_path = data_dir_path + van_activity_csv_filename
labels = ['id','van_id','timestamp','latitude','longitude','type','address','speed','heading','created']

With this information we can create our DictReader object:

In [4]:

import csv

csv_file = open(van_activity_csv_path, 'rb')
van_activity_reader = csv.DictReader(csv_file, labels, delimiter=',', quotechar='\"')

The keyword arguments delimeter and quotechar can be customised, for example to allow for tab seperated values.

We immediately advance the van_activity_reader to the next value because the first row represent the headings and so we don't want to do any calculation with that data. After that we build a list of points by iterating over the remaining rows of the data.

In [5]:

van_activity_reader.next()
points = []
for van_activity in van_activity_reader:
    points.append((van_activity['latitude'], van_activity['longitude']))

In [6]:

len(points)

Out[6]:

The ultimate task is to inspect the data for anomalies. The vans should be following the same routes on different days and, therefore, should follow certain routes day after day and return more or less the same data every day. We want to look for features in the data that will allow us to recognise automatically whether a van's activity is anomalous. To start with, we look at the total distance travelled by a given van on a given day.

In [7]:

print "Distance (to the nearest mile): " + str(nearest_mile(total_distance(points)))

Distance (to the nearest mile): 937658

References¶

http://blog.tremily.us/posts/pyproj/