GIS with Python and IPython

Getting some data


Let's import the packages we will use and set the paths for outputs.

In [1]:
# Let's import pandas and some other basic packages we will use 
from __future__ import division
%pylab --no-import-all
%matplotlib inline
import pandas as pd
import numpy as np
import os, sys
Using matplotlib backend: MacOSX
Populating the interactive namespace from numpy and matplotlib
In [2]:
# GIS packages
import geopandas as gpd
from import overlay
from shapely.geometry import Polygon, Point
import georasters as gr
# Alias for Geopandas
gp = gpd
In [3]:
# Plotting
import matplotlib as mpl
import seaborn as sns
# Setup seaborn
In [4]:
# Paths
pathout = './data/'

if not os.path.exists(pathout):
pathgraphs = './graphs/'
if not os.path.exists(pathgraphs):

Initial Example -- Natural Earth Country Shapefile

Let's download a shapefile with all the polygons for countries so we can visualize and analyze some of the data we have downloaded in other notebooks. Natural Earth provides lots of free data so let's use that one.

For shapefiles and other polygon type data geopandas is the most useful package. geopandas is to GIS what pandas is to other data. Since gepandas extends the functionality of pandas to a GIS dataset, all the nice functions and properties of pandas are also available in geopandas. Of course, geopandas includes functions and properties unique to GIS data.

Next we will use it to download the shapefile (which is contained in a zip archive). geopandas extends pandas for use with GIS data. We can use many functions and properties of the GeoDataFrame to analyze our data.

In [38]:
countries = gpd.read_file('')

Let's look inside this GeoDataFrame

In [6]:
0 Admin-0 country 5 2 Indonesia IDN 0 2 Sovereign country Indonesia IDN ... 인도네시아 Indonesië Indonezja Indonésia Индонезия Indonesien Endonezya Indonesia 印度尼西亚 (POLYGON ((117.7036079039552 4.163414542001791...
1 Admin-0 country 5 3 Malaysia MYS 0 2 Sovereign country Malaysia MYS ... 말레이시아 Maleisië Malezja Malásia Малайзия Malaysia Malezya Malaysia 马来西亚 (POLYGON ((117.7036079039552 4.163414542001791...
2 Admin-0 country 6 2 Chile CHL 0 2 Sovereign country Chile CHL ... 칠레 Chili Chile Chile Чили Chile Şili Chile 智利 (POLYGON ((-69.51008875199994 -17.506588197999...
3 Admin-0 country 0 3 Bolivia BOL 0 2 Sovereign country Bolivia BOL ... 볼리비아 Bolivia Boliwia Bolívia Боливия Bolivia Bolivya Bolivia 玻利維亞 POLYGON ((-69.51008875199994 -17.5065881979999...
4 Admin-0 country 0 2 Peru PER 0 2 Sovereign country Peru PER ... 페루 Peru Peru Peru Перу Peru Peru Peru 秘鲁 (POLYGON ((-69.51008875199994 -17.506588197999...
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
250 Admin-0 country 0 4 China CH1 1 2 Country Macao S.A.R MAC ... 마카오 Macau Makau Macau Макао Macao Makao Ma Cao 澳門 (POLYGON ((113.5586043630001 22.16303131700005...
251 Admin-0 country 6 5 Australia AU1 1 2 Dependency Ashmore and Cartier Islands ATC ... 애시모어 카르티에 제도 Ashmore- en Cartiereilanden Wyspy Ashmore i Cartiera Ilhas Ashmore e Cartier Острова Ашмор и Картье Ashmore- och Cartieröarna Ashmore ve Cartier Adaları Quần đảo Ashmore và Cartier 阿什莫尔和卡捷岛 POLYGON ((123.5970158210001 -12.42831796699988...
252 Admin-0 country 6 8 Bajo Nuevo Bank (Petrel Is.) BJN 0 2 Indeterminate Bajo Nuevo Bank (Petrel Is.) BJN ... 바호 누에보 뱅크 Bajo Nuevo Bajo Nuevo Ilha Baixo Novo Бахо-Нуэво Bajo Nuevo Bajo Nuevo Bank Bajo Nuevo Bank 巴霍努埃沃礁 POLYGON ((-79.9892878899999 15.79494863500008,...
253 Admin-0 country 6 5 Serranilla Bank SER 0 2 Indeterminate Serranilla Bank SER ... 세라냐 뱅크 Serranilla Isla Serranilla Ilha Serranilla Серранилья-Банк Serranilla Bank Serranilla Bank Serranilla Bank 塞拉纳浅滩 POLYGON ((-78.63707434799994 15.86208730700008...
254 Admin-0 country 6 6 Scarborough Reef SCR 0 2 Indeterminate Scarborough Reef SCR ... 스카버러 암초 Scarborough-rif Huangyan Dao Recife de Scarborough Скарборо-Шол Scarboroughrevet Scarborough Shoal Bãi cạn Scarborough 黄岩岛 POLYGON ((117.753887178 15.15436926300009, 117...

255 rows × 95 columns

Each row contains the information for one country.

Each column is one property or variable.

Unlike pandas DataFrames, geopandas always must have a geometry column.

Let's plot this data

In [7]:
%matplotlib inline
fig, ax = plt.subplots(figsize=(30,20))
ax.set_title("WGS84 (lat/lon)", fontdict={'fontsize':34})
Text(0.5, 1, 'WGS84 (lat/lon)')

We can also get some additional information on this data. For example its projection

In [8]:
{'init': 'epsg:4326'}

We can reproject the data from its current WGS84 projection to other ones. Let's do this and plot the results so we can see how different projections distort results.

In [9]:
fig, ax = plt.subplots(figsize=(30,20))
countries_merc = countries.to_crs(epsg=3395)
ax.set_title("Mercator", fontdict={'fontsize':34})
Text(0.5, 1, 'Mercator')
In [10]:
cea = {'datum': 'WGS84',
 'lat_ts': 0,
 'lon_0': 0,
 'no_defs': True,
 'over': True,
 'proj': 'cea',
 'units': 'm',
 'x_0': 0,
 'y_0': 0}

fig, ax = plt.subplots(figsize=(30,20))
countries_cea = countries.to_crs(crs=cea)
ax.set_title("Cylindrical Equal Area", fontdict={'fontsize':34})
Text(0.5, 1, 'Cylindrical Equal Area')

Notice that each projection shows the world in a very different manner, distoring areas, distances etc. So you need to take care when doing computations to use the correct projection. An important issue to remember is that you need a projected (not geographical) projection to compute areas and distances. Let's compare these three a bit. Start with the boundaries of each.

In [11]:
print('[xmin, ymin, xmax, ymax] in three projections')
[xmin, ymin, xmax, ymax] in three projections
[-180.          -90.          180.           83.63410065]
[-20037508.34278923 -34636982.07825699                inf
[-20037508.34278923  -6363885.33192604  20037508.34278924

Let's describe the areas of these countries in the three projections

In [12]:
print('Area distribution in WGS84')
print(countries.area.describe(), '\n')
Area distribution in WGS84
count     255.000000
mean       84.030717
std       446.307748
min         0.000001
25%         0.069381
50%         5.920517
75%        37.990619
max      6049.574693
dtype: float64 

In [13]:
print('Area distribution in Mercator')
print(countries_merc.area.describe(), '\n')
Area distribution in Mercator
count    2.540000e+02
mean     1.274530e+12
std      6.697560e+12
min      2.196509e+04
25%      9.736132e+08
50%      9.232570e+10
75%      5.377488e+11
max      8.288355e+13
dtype: float64 

In [14]:
print('Area distribution in CEA')
print(countries_cea.area.describe(), '\n')
Area distribution in CEA
count    2.550000e+02
mean     5.757889e+11
std      1.836629e+12
min      1.220383e+04
25%      6.820803e+08
50%      5.507806e+10
75%      3.655901e+11
max      1.698019e+13
dtype: float64 

Let's compare the area of each country in the two projected projections

In [15]:
countries_merc = countries_merc.set_index('ADM0_A3')
countries_cea = countries_cea.set_index('ADM0_A3')
countries_merc['ratio_area'] = countries_merc.area / countries_cea.area
countries_cea['ratio_area'] = countries_merc.area / countries_cea.area
fig, ax = plt.subplots()
sns.scatterplot(x=countries_cea.area/1e6, y=countries_merc.area/1e6, ax=ax)
sns.lineplot(x=countries_cea.area/1e6, y=countries_cea.area/1e6, color='r', ax=ax)
Text(0.5, 1.0, 'Areas')