Shapely is a Python package that can be used to create and manipulate 2D Spatial data. This notebook will just be an overview of some of the most useful functions available through the Shapely library. For more information, check out http://toblerity.org/shapely/manual.html
Github: https://github.com/nosila/Shapely_PyOhio
NbViewer: http://nbviewer.ipython.org/github/nosila/Shapely_PyOhio/blob/master/Shapely_PyOhio_2014.ipynb
To get started, let's look at some potential applications of 2D data. For example, one could reimagine an implementation of Conway's game of life that is free of any grid.
from IPython.core.display import Image
Image(filename='files/conway.png')
I use Shapely regularly when dealing with geospatial datasets, mostly in a geospatial context. Below is a map I created by layering data about different types of alcohol drinking habits. It reaveals the divide in the state between the Appalacian foothills and the coastal plain.
from IPython.core.display import Image
Image(filename='files/GA.png')
The example below is courtesy of Mike Higgins. He used our mapping implementation to draw a picture of Luna the Cat near the equator. In this case, he mapped latitiude and longitudes and then used a radius measure to draw circles around those points, forming a grid.
Image(filename='files/luna.jpg')
Image(filename='files/luna_real.JPG')
First, here are the packages we will need to work with geospatial data.
import os
import numpy as np
from matplotlib.patches import Polygon
import matplotlib.pyplot as plt
import pysal
import pylab as pl
import shapely
import shapely.geometry
from shapely.geometry import shape
from shapely.geometry import Point, LineString, Polygon, MultiPolygon
from shapely.wkt import dumps, loads
The Code below will help us draw our shapes on the screen using matplotlib. Code for drawing shapes was borrowed from this notebook and modified to have more control over coloration.
import random
#grab a random color for coloring a figure
def get_random_color():
r = lambda: random.randint(0,255)
return('#%02X%02X%02X' % (r(),r(),r()))
#initialize the figure and draw the shape
def plot(shapelyGeometries, color_dict={"fill":"#AADDCC", "line":"#666666","hole_fill":"#ffffff", "hole_line":"#999999" }):
'Plot shapelyGeometries'
figure = pl.figure(num=None, figsize=(10, 10), dpi=180)
axes = pl.axes()
axes.set_aspect('equal', 'datalim')
axes.xaxis.set_visible(False)
axes.yaxis.set_visible(False)
draw(shapelyGeometries, color_dict)
#Check the type and break up multipolygons
def draw(gs, color_dict):
'Draw shapelyGeometries'
# Handle single and lists of geometries
try:
gs = iter(gs)
except TypeError:
gs = [gs]
#Route polygons and multipolygons to the right place
for g in gs:
gType = g.geom_type
if gType.startswith('Multi') or gType == 'GeometryCollection':
draw(g.geoms, color_dict)
else:
draw_(g, color_dict)
#Break the shape into its interior and exterior rings
def draw_(g, color_dict):
'Draw a shapelyGeometry; thanks to Sean Gilles'
gType = g.geom_type
if gType == 'Point':
pl.plot(g.x, g.y, 'k,')
elif gType == 'LineString':
x, y = g.xy
pl.plot(x, y, 'b-', color=color_dict["line"])
elif gType == 'Polygon':
#can draw parts as multiple colors
if not color_dict:
color_dict={"fill":get_random_color(),
"line":"#666666",
"hole_fill":"#FFFFFF",
"hole_line":"#999999" }
x, y = g.exterior.xy
pl.fill(x, y, color=color_dict["fill"], aa=True)
pl.plot(x, y, color=color_dict["line"], aa=True, lw=1.0)
for hole in g.interiors:
x, y = hole.xy
pl.fill(x, y, color=color_dict["hole_fill"], aa=True)
pl.plot(x, y, color=color_dict["hole_line"], aa=True, lw=1.0)
First, let's draw a point.
pt = Point(0, 0)
print pt
plot(pt)
POINT (0 0)
Okay, that is not much to look at. ENHANCE!!
circle = pt.buffer(.5)
plot([circle, pt])
print circle
POLYGON ((0.5 0, 0.4975923633360985 -0.04900857016478025, 0.4903926402016152 -0.09754516100806404, 0.4784701678661045 -0.1451423386272311, 0.4619397662556435 -0.1913417161825447, 0.4409606321741776 -0.2356983684129986, 0.4157348061512728 -0.2777851165098009, 0.3865052266813687 -0.3171966420818225, 0.3535533905932741 -0.3535533905932735, 0.3171966420818231 -0.3865052266813682, 0.2777851165098016 -0.4157348061512723, 0.2356983684129993 -0.4409606321741772, 0.1913417161825454 -0.4619397662556431, 0.1451423386272318 -0.4784701678661042, 0.09754516100806482 -0.4903926402016151, 0.04900857016478104 -0.4975923633360984, 8.077722872162934e-16 -0.5, -0.04900857016477944 -0.4975923633360985, -0.09754516100806324 -0.4903926402016154, -0.1451423386272302 -0.4784701678661047, -0.1913417161825439 -0.4619397662556438, -0.2356983684129978 -0.440960632174178, -0.2777851165098002 -0.4157348061512732, -0.317196642081822 -0.3865052266813691, -0.3535533905932731 -0.3535533905932745, -0.3865052266813679 -0.3171966420818234, -0.4157348061512722 -0.2777851165098018, -0.4409606321741771 -0.2356983684129995, -0.4619397662556431 -0.1913417161825456, -0.4784701678661042 -0.1451423386272318, -0.4903926402016151 -0.09754516100806473, -0.4975923633360984 -0.04900857016478086, -0.5 -5.053215498074303e-16, -0.4975923633360985 0.04900857016477985, -0.4903926402016153 0.09754516100806375, -0.4784701678661045 0.1451423386272309, -0.4619397662556435 0.1913417161825446, -0.4409606321741776 0.2356983684129986, -0.4157348061512727 0.277785116509801, -0.3865052266813686 0.3171966420818226, -0.3535533905932738 0.3535533905932737, -0.317196642081823 0.3865052266813683, -0.2777851165098015 0.4157348061512723, -0.2356983684129993 0.4409606321741772, -0.1913417161825456 0.4619397662556431, -0.1451423386272321 0.4784701678661042, -0.09754516100806521 0.490392640201615, -0.04900857016478155 0.4975923633360983, -1.424116139486239e-15 0.5, 0.04900857016477872 0.4975923633360986, 0.0975451610080624 0.4903926402016155, 0.1451423386272293 0.478470167866105, 0.1913417161825429 0.4619397662556442, 0.2356983684129968 0.4409606321741786, 0.2777851165097991 0.415734806151274, 0.3171966420818207 0.3865052266813701, 0.3535533905932718 0.3535533905932757, 0.3865052266813666 0.317196642081825, 0.4157348061512709 0.2777851165098037, 0.440960632174176 0.2356983684130017, 0.461939766255642 0.1913417161825481, 0.4784701678661034 0.1451423386272346, 0.4903926402016145 0.09754516100806784, 0.4975923633360981 0.04900857016478424, 0.5 4.119267568565299e-15, 0.5 0))
small_circle = pt.buffer(.25)
plot([circle, small_circle, pt], color_dict=None)
You can also use one polygon to make a hole in the other.
donut = circle.difference(small_circle)
plot(donut)
Let's draw a line. It's defined by its two end points.
lineString1 = LineString(((-3, -5), (3, 5)))
print lineString1
plot(lineString1)
LINESTRING (-3 -5, 3 5)
How about a super crazy line? Oh yes, this is super crazy.
lineString2 = LineString(((-3, -5), (3, 5), (-3, 5), (3, -5)))
print lineString2
plot(lineString2)
LINESTRING (-3 -5, 3 5, -3 5, 3 -5)
And now we can add polygons to our linestrings to make a sad bunny.
pt1 = Point(-3, 6.5)
pt2 = Point(3, 6.5)
left = pt1.buffer(.5)
right = pt2.buffer(.5)
poly = shapely.geometry.Polygon(
((-3, 5), (3, 5), (0, 0), (-3, 5)))
plot([lineString2, left, right, poly], {"fill":"#003300", "line":"#000000","hole_fill":"#000000", "hole_line":"#000000" })
Image(filename='files/emoticon_bunny.jpeg')
Hat tip to cheezburger.com
Let's look at a Polygon with holes.
polygon1 = shapely.geometry.Polygon(
((1, 1), (5, 1), (5, 5), (1, 5), (1, 1)), # Shell
[
((2, 2), (2, 3), (3, 3), (3, 2), (2, 2)), # Hole
((3, 3), (3, 4), (4, 4), (4, 3), (3, 3)), # Hole
]
)
print polygon1
plot(polygon1)
POLYGON ((1 1, 5 1, 5 5, 1 5, 1 1), (2 2, 2 3, 3 3, 3 2, 2 2), (3 3, 3 4, 4 4, 4 3, 3 3))
Polygons are made of one exterior ring and its interior rings. Interior rings are also known as holes. Let's break it into pieces
exterior = polygon1.exterior
print "Exterior Ring: " + str(exterior)
print "\nInterior Rings: "
for hole in polygon1.interiors:
print "\t" + str(hole)
Exterior Ring: LINEARRING (1 1, 5 1, 5 5, 1 5, 1 1) Interior Rings: LINEARRING (2 2, 2 3, 3 3, 3 2, 2 2) LINEARRING (3 3, 3 4, 4 4, 4 3, 3 3)
You can use union() to create more complex shapes.
circle = Point(5,2.5).buffer(2)
plot([circle, polygon1], color_dict=None)
Now, let's look at some more complicated shapes.
new_polygon = polygon1.union(circle)
plot(new_polygon)
print new_polygon.type
Polygon
circle2 = Point(7,7).buffer(2)
plot([circle2, polygon1], color_dict=None)
new_multipolygon = polygon1.union(circle2)
plot(new_multipolygon)
print new_multipolygon.type
print "CIRCLE AREA: %f" % (circle2.area)
print "POLYGON AREA: %f" % (polygon1.area)
print "MULTIPOLYGON AREA: %f" % (new_multipolygon.area)
MultiPolygon CIRCLE AREA: 12.546194 POLYGON AREA: 14.000000 MULTIPOLYGON AREA: 26.546194
exteriors = new_multipolygon.geoms
for exterior in exteriors:
print "Exterior Ring:"
print "\t" + str(exterior)
if not exterior.interiors:
continue
print "\n\tInterior Rings: "
for hole in exterior.interiors:
print "\t\t" + str(hole)
Exterior Ring: POLYGON ((1 1, 1 5, 5 5, 5 1, 1 1), (3 3, 2 3, 2 2, 3 2, 3 3), (3 3, 4 3, 4 4, 3 4, 3 3)) Interior Rings: LINEARRING (3 3, 2 3, 2 2, 3 2, 3 3) LINEARRING (3 3, 4 3, 4 4, 3 4, 3 3) Exterior Ring: POLYGON ((9 7, 8.990369453344394 6.803965719340879, 8.961570560806461 6.609819355967744, 8.913880671464417 6.419430645491076, 8.847759065022574 6.234633135269821, 8.76384252869671 6.057206526348006, 8.662939224605092 5.888859533960797, 8.546020906725476 5.73121343167271, 8.414213562373096 5.585786437626906, 8.268786568327293 5.453979093274527, 8.111140466039206 5.33706077539491, 7.942793473651998 5.236157471303291, 7.765366864730182 5.152240934977428, 7.580569354508927 5.086119328535583, 7.390180644032259 5.038429439193539, 7.196034280659124 5.009630546655607, 7.000000000000004 5, 6.803965719340882 5.009630546655606, 6.609819355967747 5.038429439193538, 6.419430645491079 5.086119328535581, 6.234633135269824 5.152240934977425, 6.057206526348009 5.236157471303288, 5.888859533960799 5.337060775394907, 5.731213431672712 5.453979093274524, 5.585786437626908 5.585786437626902, 5.453979093274528 5.731213431672707, 5.337060775394911 5.888859533960793, 5.236157471303292 6.057206526348002, 5.152240934977428 6.234633135269818, 5.086119328535583 6.419430645491072, 5.038429439193539 6.609819355967741, 5.009630546655607 6.803965719340876, 5 6.999999999999998, 5.009630546655606 7.196034280659119, 5.038429439193539 7.390180644032255, 5.086119328535582 7.580569354508923, 5.152240934977426 7.765366864730178, 5.236157471303289 7.942793473651994, 5.337060775394909 8.111140466039204, 5.453979093274526 8.26878656832729, 5.585786437626904 8.414213562373096, 5.731213431672709 8.546020906725474, 5.888859533960794 8.66293922460509, 6.057206526348002 8.76384252869671, 6.234633135269817 8.847759065022572, 6.419430645491071 8.913880671464417, 6.609819355967739 8.961570560806461, 6.803965719340874 8.990369453344393, 6.999999999999995 9, 7.196034280659115 8.990369453344394, 7.39018064403225 8.961570560806463, 7.580569354508917 8.913880671464419, 7.765366864730172 8.847759065022577, 7.942793473651987 8.763842528696713, 8.111140466039195 8.662939224605097, 8.268786568327283 8.546020906725481, 8.414213562373087 8.414213562373103, 8.546020906725467 8.2687865683273, 8.662939224605083 8.111140466039215, 8.763842528696705 7.942793473652006, 8.847759065022569 7.765366864730193, 8.913880671464414 7.580569354508938, 8.961570560806457 7.390180644032271, 8.990369453344393 7.196034280659137, 9 7.000000000000017, 9 7))
One other way to get shape data is to get it from a file. Below is an example of a function that creates a dictionary of shapes keyed by an id column
import fiona
def get_from_fiona(filename, id_attr):
from shapely.geometry import shape
data = {}
c = fiona.open(filename, 'r')
for shape_data in c:
name = shape_data['properties'][id_attr]
parsed_shape = shape(shape_data['geometry'])
data[name] = parsed_shape
return data
Just a quick note, you can get your own shape data from the US Census here: https://www.census.gov/cgi-bin/geo/shapefiles2013/main
We're going to look at state shapes from the US Census. Just to give you an idea of what is in the file, here is a glance at the 'properties' of a few records.
from shapely.wkt import dumps, loads
state_dict = get_from_fiona("files/tl_2013_us_state/tl_2013_us_state.shp", "NAME")
#print(shapely.wkt.dumps(state_dict["Ohio"]))
ohio = state_dict["Ohio"]
print ohio.convex_hull
plot(ohio)
POLYGON ((-82.54957 38.403423, -82.55010799999999 38.403434, -82.55034499999999 38.403445, -82.55098699999999 38.403489, -82.56656799999999 38.404645, -82.570571 38.40496, -82.571613 38.405138, -83.668718 38.626542, -83.76319599999999 38.652241, -83.76418799999999 38.652525, -84.205879 38.802685, -84.21200999999999 38.805255, -84.82015699999999 39.105489, -84.82020399999999 39.112007, -84.82030499999999 39.156126, -84.82015899999999 39.227373, -84.82010199999999 39.25141, -84.806209 41.674584, -84.805972 41.696118, -83.11246 41.95941, -80.519851 42.327132, -80.519463 42.204048, -80.51921899999999 41.919275, -80.518705 41.249811, -80.518922 40.7182, -80.51899 40.639874, -80.518991 40.638801, -80.59989499999999 40.317666, -80.60289499999999 40.307069, -80.83082899999999 39.70782699999999, -80.83187099999999 39.705655, -80.87664099999999 39.625905, -80.88036 39.620706, -82.32739099999999 38.446484, -82.33033499999999 38.44450000000001, -82.33145499999999 38.444052, -82.332735 38.44354, -82.336992 38.44202, -82.34063999999999 38.440948, -82.53308299999999 38.404919, -82.533523 38.404837, -82.53769 38.404157, -82.538234 38.404081, -82.539659 38.403932, -82.541006 38.4038, -82.541662 38.403762, -82.54392299999999 38.403658, -82.548783 38.403451, -82.54957 38.403423))
plot([state_dict["Pennsylvania"],
state_dict["Ohio"],
state_dict["West Virginia"],
state_dict["Maryland"],
state_dict["Kentucky"],
state_dict["Indiana"],
state_dict["Michigan"]], color_dict=None)
print
You might notice that Michigan and Maryland look a little/lot wrong. This is because the USGS sometimes includes water features that are counted as part of a state's territory as part of their borders. Maryland has the Chesapeake as part of its shape and Michigan includes a great deal of the Great Lakes. Below is a more familiar picture of what Michigan looks like.
Image(filename='files/MI.png')
It's also handy to simplify shapes. This can be a net good for speed, especially where details aren't as pertinent. However, there are drawbacks in the form of loss of precision and potential introduction of unsound polygons, which can happen even with the parameter preserve_topology = True
#This function will help total up a count of all exterior ring points in a polygon or multipolygon
def points(s):
if s.type == 'Polygon':
return len(s.exterior.coords)
if s.type == 'Point':
return s
else:
x = 0
for g in s.geoms:
if g.type != 'Point' and g.type != 'LineString':
x += len(g.exterior.coords)
return x
import time
hawaii = (state_dict["Hawaii"])
now = time.time()
hawaii_simple = hawaii.simplify(.1)
simple_time = time.time() - now
now = time.time()
hawaii_simple_not_preserved = hawaii.simplify(.1, preserve_topology=False)
simple_not_preserved_time = time.time() - now
print "Points with preserving topology " + str((points(hawaii_simple)))
print "Time cost: " + str(simple_time)
print
print "Points without preserving topology " + str((points(hawaii_simple_not_preserved)))
print "Time cost: " + str(simple_not_preserved_time)
plot(hawaii_simple)
plot(hawaii_simple_preserved)
Points with preserving topology 89 Time cost: 0.00633096694946 Points without preserving topology 52 Time cost: 0.000607013702393
Shapely's non-topology-preserving simplification methodology is based on the Ramer-Douglas-Peucker algorithm. For more information see here: http://en.wikipedia.org/wiki/Ramer%E2%80%93Douglas%E2%80%93Peucker_algorithm
"Douglas-Peucker animated" by Mysid - Own work; self-made in Inkscape and Gimp. Based on File:Douglas Peucker.png by de:User:Leupold.. Licensed under CC BY-SA 3.0 via Wikimedia Commons.
texas = state_dict["Texas"]
print "TOTAL POINTS: %d" % (points(texas))
plot(state_dict["Texas"])
TOTAL POINTS: 58439
texas_simple = texas.simplify(.001)
plot(texas_simple)
print "TOTAL POINTS: %d" % (points(texas_simple))
TOTAL POINTS: 4509
texas_simplest = texas.simplify(.1)
plot(texas_simplest)
print "TOTAL POINTS: %d" % (points(texas_simplest))
TOTAL POINTS: 47
not_texas = texas.simplify(1.1)
plot(not_texas)
print "TOTAL POINTS: %d" % (points(not_texas))
TOTAL POINTS: 11
Let's look at some further operations that can help us simplify and understand the shapes we are generating.
plot(texas.convex_hull)
plot(texas.envelope)
print(texas.bounds)
(-106.645646, 25.837163999999998, -93.508039, 36.500704)
print(texas.centroid)
center_circle1 = texas.centroid.buffer(.2)
center_circle2 = texas.centroid.buffer(.5)
center_circle3 = texas.centroid.buffer(1)
plot([texas, center_circle3, center_circle2, center_circle1, texas.centroid], color_dict=None)
POINT (-99.31713786238799 31.44721635815321)
You can also union together series of shapes. This is can be faster than a plain union, but be careful of the pitfalls, mainly the fact that it is possible to end up in an infinite loop with more complex shapes.
from shapely.ops import cascaded_union, unary_union
buffered_points_list = []
x_ys = range(5);
for x in x_ys:
buffered_points_list.append(Point(x,x).buffer(1))
union_shape = cascaded_union(buffered_points_list)
plot(buffered_points_list, color_dict=None)
plot(union_shape)
Intersections are great for showing the degree of overlap as well as creating new shapes.
intersection_shape = buffered_points_list[0].intersection(buffered_points_list[1])
plot([buffered_points_list[0], buffered_points_list[1], intersection_shape], color_dict=None)
Difference is the inverse of intersection and shows the parts of a shape that do not overlap. Unlike the intersection function, this operation is not symmetrical
difference_shape1 = buffered_points_list[0].difference(buffered_points_list[1])
difference_shape2 = buffered_points_list[1].difference(buffered_points_list[0])
plot([difference_shape1,difference_shape2], color_dict=None)
Here are a few examples of shape tests just to show how subtle differences in test definitions can affect results.
print "Do the full circle shapes intersect?"
print(buffered_points_list[0].intersects(buffered_points_list[1]))
print "Do the difference shapes intersect?"
print(difference_shape1.intersects(difference_shape2))
print "Are the difference shapes overlapping?"
print(difference_shape1.overlaps(difference_shape2))
Do the full circle shapes intersect? True Do the difference shapes intersect? True Are the difference shapes overlapping? False
difference_shape3 = buffered_points_list[1].symmetric_difference(buffered_points_list[0])
plot([difference_shape3])
First, there are self-intersections. In this inital example, there is a line sticking out of a polygon.
sad_polygon1 = Polygon(((-3, -5),(-3,0),(-5,0),(-3,0),(-3, 5),(3, 5), (3, -5), (-3, -5)))
plot(sad_polygon1)
print explain_validity(sad_polygon1)
Self-intersection[-5 0]
One of the ways to fix this is to use our friend the buffer to turn that line from 1-D to 2-D.
print "SAD POLYGON AREA: %f" % (sad_polygon1.area)
fixed_polygon1 = sad_polygon1.buffer(.03)
plot(fixed_polygon1)
print explain_validity(fixed_polygon1)
print "HAPPY POLYGON AREA: %f" % (fixed_polygon1.area)
SAD POLYGON AREA: 60.000000 Valid Geometry HAPPY POLYGON AREA: 61.082434
Let's look at another type of self-intersection, the bowtie. It can be more tricky than dealing with the extra line.
from shapely.validation import explain_validity
sad_polygon2 = Polygon(((-3, -5), (3, 5), (-3, 5), (3, -5), (-3, -5)))
plot(sad_polygon2)
print explain_validity(sad_polygon2)
print "The bowtie's area: " + str(sad_polygon2.area)
Self-intersection[0 0] The bowtie's area: 0.0
Using the buffer method for cleanup can lead to a deletion of part of the unsound shape. Notice that the area of this shape is calculated as 0.
fixed_polygon2 = sad_polygon2.buffer(0)
plot(fixed_polygon2)
print explain_validity(fixed_polygon2)
print fixed_polygon2
print "The area of our fixed shape:" + str(fixed_polygon2.area)
Valid Geometry POLYGON ((0 0, -3 5, 3 5, 0 0)) The area of our fixed shape:15.0
Other solutions also fail due to the fact that the shape is too unsound for unions.
from shapely.validation import explain_validity
print explain_validity(sad_polygon2)
pt = Point(0,0)
band_aid = pt.buffer(.2)
plot([sad_polygon2, band_aid], color_dict=None)
try:
trythis = sad_polygon2.union(band_aid)
except(shapely.geos.TopologicalError):
print "ERROR"
Self-intersection[0 0] ERROR
Here is a quick look at the stack trace without the exception handling for your edification.
---------------------------------------------------------------------------
TopologicalError Traceback (most recent call last)
<ipython-input-157-cfbdb28f0e7f> in <module>()
4 band_aid = pt.buffer(.2)
5 plot([sad_polygon2, band_aid])
----> 6 trythis = sad_polygon2.union(band_aid)
7
8 try:
/Users/alisonalvarez/anaconda/lib/python2.7/site-packages/shapely/geometry/base.pyc in union(self, other)
489 def union(self, other):
490 """Returns the union of the geometries (Shapely geometry)"""
--> 491 return geom_factory(self.impl['union'](self, other))
492
493 # Unary predicates
/Users/alisonalvarez/anaconda/lib/python2.7/site-packages/shapely/topology.pyc in __call__(self, this, other, *args)
45 if not this.is_valid:
46 raise TopologicalError(
---> 47 "The operation '%s' produced a null geometry. Likely cause is invalidity of the geometry %s" % (self.fn.__name__, repr(this)))
48 elif not other.is_valid:
49 raise TopologicalError(
TopologicalError: The operation 'GEOSUnion_r' produced a null geometry. Likely cause is invalidity of the geometry <shapely.geometry.polygon.Polygon object at 0x110846350>
File "<ipython-input-266-a87d6c900e35>", line 2 --------------------------------------------------------------------------- ^ SyntaxError: invalid syntax
The best way to deal with this kind of unsound shape is to not make it in the first place. In the case where this isn't really an option (such as with simplification) it's possible to clean up and simplify component shapes to prevent this. Complexity should be tamed.
Let's take a look at some unsound shapes that are a result of holes in the wrong place.
First, let's look at what happens when a hole is drawn outside of a shape. Notice what happens to the total area calculation.
sad_polygon3 = shapely.geometry.Polygon(
((1, 1), (5, 1), (5, 5), (1, 5), (1, 1)), # Shell
[
((2, 2), (2, 3), (3, 3), (3, 2), (2, 2)), # Hole
((6, 6), (6, 7), (7, 7), (7, 6), (6, 6)), # Hole
]
)
plot(sad_polygon3)
print explain_validity(sad_polygon3)
print "SAD POLYGON AREA: %f" % (sad_polygon3.area)
Hole lies outside shell[6 6] SAD POLYGON AREA: 14.000000
We can fix this with out friend the buffer.
fixed_polygon3 = sad_polygon3.buffer(0)
plot(fixed_polygon3)
print explain_validity(fixed_polygon3)
print "FIXED POLYGON AREA: %f" % (fixed_polygon3.area)
print fixed_polygon3
Valid Geometry FIXED POLYGON AREA: 15.000000 POLYGON ((1 1, 1 5, 5 5, 5 1, 1 1), (2 2, 3 2, 3 3, 2 3, 2 2))
Another example of an unsound shape is one with nested holes. We can also fix this one using a buffer.
sad_polygon4 = shapely.geometry.Polygon(
((1, 1), (5, 1), (5, 5), (1, 5), (1, 1)), # Shell
[
((2, 2), (2, 4), (4, 4), (4, 2), (2, 2)), # Hole
((3, 3), (3, 3.5), (3.5, 3.5), (3.5, 3), (3, 3)), # Hole
]
)
plot(sad_polygon4)
print explain_validity(sad_polygon4)
print "SAD POLYGON AREA: %f" % (sad_polygon4.area)
Holes are nested[3 3] SAD POLYGON AREA: 11.750000
fixed_polygon4 = sad_polygon4.buffer(0)
plot(fixed_polygon4)
print explain_validity(fixed_polygon4)
print "FIXED POLYGON AREA: %f" % (fixed_polygon4.area)
print fixed_polygon4
Valid Geometry FIXED POLYGON AREA: 12.000000 POLYGON ((1 1, 1 5, 5 5, 5 1, 1 1), (2 2, 4 2, 4 4, 2 4, 2 2))
Another type of unsound shape is where interior rings or exterior rings touch by more than one point. Just as before, buffers can be used to fix this.
sad_polygon5 = shapely.geometry.Polygon(
((1, 1), (5, 1), (5, 5), (1, 5), (1, 1)), # Shell
[
((2, 2), (2, 3.5), (3, 3.5), (3, 2), (2, 2)), # Hole
((3, 3), (3, 4), (4, 4), (4, 3), (3, 3)), # Hole
]
)
print explain_validity(sad_polygon5)
print sad_polygon5.is_valid
plot(sad_polygon5)
print "SAD POLYGON AREA: %f" % (sad_polygon5.area)
Self-intersection[3 3] False
fixed_polygon5 = sad_polygon5.buffer(0)
plot(fixed_polygon5)
print explain_validity(fixed_polygon5)
print "FIXED POLYGON AREA: %f" % (fixed_polygon5.area)
print fixed_polygon5
Valid Geometry FIXED POLYGON AREA: 13.500000 POLYGON ((1 1, 1 5, 5 5, 5 1, 1 1), (2 2, 3 2, 3 3, 4 3, 4 4, 3 4, 3 3.5, 2 3.5, 2 2))
Finally, there are shapes which include interior or exterior rings that contain less than three points.
sad_polygon6 = shapely.geometry.Polygon(
((1, 1), (5, 1), (5, 5), (1, 5), (1, 1)), # Shell
[
((2, 2), (2, 2), (3, 3), (3, 3), (2, 2)), # Hole
]
)
plot(sad_polygon6)
print explain_validity(sad_polygon6)
Too few points in geometry component[2 2]
We can fix these by traversing the shape components and removing the offending piece.
def remove_bad_shapes(shape):
if shape.type == 'Multipolygon':
shape_list = shape.geoms
elif shape.type == 'Polygon':
shape_list = [shape]
else:
return None
polygon_list = []
for g in shape_list:
print g
print explain_validity(g)
hole_list = []
for hole in g.interiors:
hole_shape = Polygon(hole)
if "Too few points" not in explain_validity(hole_shape):
hole_list.append(hole)
else:
print "Removed hole: "
print "\t" + str(hole)
if "Too few points" not in explain_validity(Polygon(g.exterior.coords)):
print "\t" + explain_validity(Polygon(g.exterior.coords))
polygon_list.append(Polygon(g.exterior.coords, hole_list))
else:
print "Removed: "
print "\t" + str(g.exterior.coords)
return MultiPolygon(polygon_list)
fixed_polygon6 = remove_bad_shapes(sad_polygon6)
plot(fixed_polygon6)
print "\nThe validity of the fixed shape:"
print explain_validity(fixed_polygon6)
POLYGON ((1 1, 5 1, 5 5, 1 5, 1 1), (2 2, 2 2, 3 3, 3 3, 2 2)) Too few points in geometry component[2 2] Removed hole: LINEARRING (2 2, 2 2, 3 3, 3 3, 2 2) Valid Geometry The validity of the fixed shape: Valid Geometry
At Rhiza we do a lot of work generating novel shapes from many components of varying quality. Sometimes we need to do some cleanup.
Image(filename='files/zip_union.png')
Here is another example from the union of zip codes.
Image(filename='files/separate_zips.png')
The joined zip codes ended up with a moderate amount of noise.
Image(filename='files/not_cleaned.png')
Here some of the code I use to clean up my shape. Note that we are once again traversing interior and exterior components.
Image(filename='files/cleaned_up.png')
Below is code for taking the curvature of the earth into account when calculating the size of polygons
radius_earth = 3963.207 # miles
def degreesToRadians(d):
if (type(d) == type('')):
print "ERROR: d is of type string....value = ", d
try: d = float(d)
except: return 0.0
return (d / 360.0) * 2.0 * math.pi
def greatCircleDistance(A, B):
(lat1, lon1) = map(degreesToRadians, A)
(lat2, lon2) = map(degreesToRadians, B)
global radius_earth
x1 = math.cos(lat1) * math.cos(lon1)
y1 = math.cos(lat1) * math.sin(lon1)
z1 = math.sin(lat1)
x2 = math.cos(lat2) * math.cos(lon2)
y2 = math.cos(lat2) * math.sin(lon2)
z2 = math.sin(lat2)
C = math.sqrt( ((x1-x2) * (x1-x2)) +
((y1-y2) * (y1-y2)) +
((z1-z2) * (z1-z2)) )
alpha = math.asin(C/2.0)
return (2.0 * alpha * radius_earth)
#use the bounding box of a shape to set the degree of curve correction
def get_sq_mi(shape):
(minx, miny, maxx, maxy) = shape.bounds
top_left = [maxy, maxx]
top_right = [miny, maxx]
bottom_left = [maxy, minx]
bottom_right = [miny, minx]
width = greatCircleDistance(top_left, top_right)
height = greatCircleDistance(bottom_right, top_right)
return (width * height)
This function will take a list of shapes, plus thresholds for interior and exterior shapes and return a list of shapes with components that are smaller than then threshold removed
#this code will traverse the components of a list of polygons or multipolygons and remove any pieces not meeting the area threshold
def simplify_shapes(to_denoise_list, polygon_threshold=0.15, hole_threshold=0.1):
denoised_list = []
for shape in to_denoise_list:
#handle polgyon vs. multipolygon case
if shape.type == 'Polygon':
before_shapes = 1
shape_list = [shape]
if shape.type == 'MultiPolygon':
before_shapes = len(shape.geoms)
shape_list = shape.geoms
polygon_list = []
for g in shape_list:
polygon_sq_mi = get_sq_mi(g)
if polygon_sq_mi > polygon_threshold:
if len(g.interiors) > 0:
hole_list = []
for hole in g.interiors:
hole_shape = shapely_Polygon(hole)
hole_sq_mi = get_sq_mi(hole_shape)
if hole_sq_mi > hole_threshold:
hole_list.append(hole)
else:
g = Polygon(g.exterior.coords, hole_list)
polygon_list.append(g)
else:
#this polygon is below the threshold and should be left out
continue
denoised_list.append(MultiPolygon(polygon_list))
print "Total polygons before: " + str(before_shapes)
print "Total polygons after: " + str(len(polygon_list))
return denoised_list
hawaii_list = simplify_shapes([hawaii], polygon_threshold=250, hole_threshold=250)
plot(hawaii_list)
Total polygons before: 17 Total polygons after: 9