This is a featured recipe from the IPython Cookbook, the definitive guide to high-performance scientific computing and data science in Python.
In this featured recipe, we create a simple GPS-like route planner in Python. We retrieve California's road network data from the United States Census Bureau in order to find shortest paths and compute road itineraries between any two locations in California.
For this recipe, you need IPython, NumPy, Pandas, matplotlib.
You also need NetworkX and GDAL/OGR. On Windows, you can find binary installers on Chris Gohlke's webpage.
At the time of this writing, NetworkX's support of Shapefile doesn't seem to be compatible with Python 3.x. For this reason, this recipe has only been successfully tested with Python 2.x.
You also need Smopy and Pillow for displaying OpenStreetMap maps:
!pip install smopy
Finally, you need to download the Road dataset on the book's website an extract it in the current folder. The data comes from the United States Census Bureau website.
import networkx as nx
import numpy as np
import pandas as pd
import json
import smopy
import matplotlib.pyplot as plt
%matplotlib inline
import matplotlib as mpl
mpl.rcParams['figure.dpi'] = mpl.rcParams['savefig.dpi'] = 300
read_shp
function returns a graph, where each node is a geographical position, and each edge contains information about the road linking the two nodes.g = nx.read_shp("data/tl_2013_06_prisecroads.shp")
connected_component_subgraphs
function.sg = list(nx.connected_component_subgraphs(g.to_undirected()))[0]
len(sg)
464
pos0 = (36.6026, -121.9026)
pos1 = (34.0569, -118.2427)
def get_path(n0, n1):
"""If n0 and n1 are connected nodes in the graph, this function
return an array of point coordinates along the road linking
these two nodes."""
return np.array(json.loads(sg[n0][n1]['Json'])['coordinates'])
EARTH_R = 6372.8
def geocalc(lat0, lon0, lat1, lon1):
"""Return the distance (in km) between two points in
geographical coordinates."""
lat0 = np.radians(lat0)
lon0 = np.radians(lon0)
lat1 = np.radians(lat1)
lon1 = np.radians(lon1)
dlon = lon0 - lon1
y = np.sqrt(
(np.cos(lat1) * np.sin(dlon)) ** 2
+ (np.cos(lat0) * np.sin(lat1)
- np.sin(lat0) * np.cos(lat1) * np.cos(dlon)) ** 2)
x = np.sin(lat0) * np.sin(lat1) + \
np.cos(lat0) * np.cos(lat1) * np.cos(dlon)
c = np.arctan2(y, x)
return EARTH_R * c
def get_path_length(path):
return np.sum(geocalc(path[1:,0], path[1:,1],
path[:-1,0], path[:-1,1]))
distance
attribute of the edges.# Compute the length of the road segments.
for n0, n1 in sg.edges_iter():
path = get_path(n0, n1)
distance = get_path_length(path)
sg.edge[n0][n1]['distance'] = distance
nodes = np.array(sg.nodes())
# Get the closest nodes in the graph.
pos0_i = np.argmin(np.sum((nodes[:,::-1] - pos0)**2, axis=1))
pos1_i = np.argmin(np.sum((nodes[:,::-1] - pos1)**2, axis=1))
shortest_path
function to compute the shortest path between our two positions. We specify that the weight of every edge is the length of the road between them.# Compute the shortest path.
path = nx.shortest_path(sg,
source=tuple(nodes[pos0_i]),
target=tuple(nodes[pos1_i]),
weight='distance')
len(path)
19
path
variable contains the list of edges that form the shortest path between our two positions. Now, we can get information about the itinerary with Pandas. The dataset has a few fields of interest, including the name and type (State, Interstate, etc.) of the roads.roads = pd.DataFrame([sg.edge[path[i]][path[i + 1]]
for i in range(len(path) - 1)],
columns=['FULLNAME', 'MTFCC',
'RTTYP', 'distance'])
roads
FULLNAME | MTFCC | RTTYP | distance | |
---|---|---|---|---|
0 | State Rte 1 | S1200 | S | 100.658130 |
1 | State Rte 1 | S1200 | S | 33.419556 |
2 | Cabrillo Hwy | S1200 | M | 4.399051 |
3 | State Rte 1 | S1200 | S | 12.400382 |
4 | Cabrillo Hwy | S1200 | M | 36.693272 |
5 | Cabrillo Hwy | S1200 | M | 0.017746 |
6 | Cabrillo Hwy | S1200 | M | 0.439355 |
7 | Cabrillo Hwy | S1200 | M | 0.130107 |
8 | State Hwy 1 | S1200 | S | 0.007007 |
9 | el Camino Real | S1200 | M | 5.774056 |
10 | el Camino Real | S1200 | M | 0.507131 |
11 | el Camino Real | S1200 | M | 33.550742 |
12 | US Hwy 101 | S1200 | U | 140.786519 |
13 | US Hwy 101 | S1200 | U | 75.852281 |
14 | Ventura Fwy | S1200 | M | 49.045475 |
15 | Hollywood Fwy | S1200 | M | 0.885826 |
16 | Hollywood Fwy | S1200 | M | 14.087603 |
17 | Hollywood Fwy | S1200 | M | 0.010107 |
Here is the total length of this itinerary.
roads['distance'].sum()
508.66434555909808
map = smopy.Map(pos0, pos1, z=7, margin=.1)
def get_full_path(path):
"""Return the positions along a path."""
p_list = []
curp = None
for i in range(len(path)-1):
p = get_path(path[i], path[i+1])
if curp is None:
curp = p
if np.sum((p[0]-curp)**2) > np.sum((p[-1]-curp)**2):
p = p[::-1,:]
p_list.append(p)
curp = p[-1]
return np.vstack(p_list)
linepath = get_full_path(path)
x, y = map.to_pixels(linepath[:,1], linepath[:,0])
plt.figure(figsize=(6,6));
map.show_mpl();
# Plot the itinerary.
plt.plot(x, y, '-k', lw=1.5);
# Mark our two positions.
plt.plot(x[0], y[0], 'ob', ms=10);
plt.plot(x[-1], y[-1], 'or', ms=10);
We computed the shortest path with NetworkX's shortest_path
function. Here, this function used Dijkstra's algorithm. This algorithm has a wide variety of applications, for example in network routing protocols.
There are different ways to compute the geographical distance between two points. Here, we used a relatively precise formula: the orthodromic distance (also called great-circle distance), which assumes that the Earth is a perfect sphere. We could also have used a simpler formula since the distance between two successive points on a road is small.
You can find more information about shortest path problems and Dijkstra's algorithm in the following references:
Here are a few references about geographical distances:
This was a featured recipe from the IPython Cookbook, by Cyrille Rossant, Packt Publishing, 2014.