Inspired by Mike Bostock's Let's Make a Map, we want to make a map too using matta. We will display the communes of Santiago, Chile. To do that we will perform the following steps:
import matta
matta.init_javascript(path='https://rawgit.com/carnby/matta/master/matta/libs/')
Note We delete data to start from 0
!rm -fr data
!mkdir data
!wget http://siit2.bcn.cl/obtienearchivo?id=repositorio/10221/10396/1/division_comunal.zip -O data/division_comunal.zip
--2015-01-08 13:34:16-- http://siit2.bcn.cl/obtienearchivo?id=repositorio/10221/10396/1/division_comunal.zip Resolviendo siit2.bcn.cl (siit2.bcn.cl)... 200.0.66.71 Conectando con siit2.bcn.cl (siit2.bcn.cl)[200.0.66.71]:80... conectado. Petición HTTP enviada, esperando respuesta... 200 OK Longitud: 29000232 (28M) [application/zip] Grabando a: “data/division_comunal.zip” 100%[======================================>] 29.000.232 1,26MB/s en 22s 2015-01-08 13:34:39 (1,26 MB/s) - “data/division_comunal.zip” guardado [29000232/29000232]
!unzip data/division_comunal.zip -d data/
Archive: data/division_comunal.zip inflating: data/Disclaimer.txt inflating: data/division_comunal.dbf inflating: data/division_comunal.prj inflating: data/division_comunal.sbn inflating: data/division_comunal.sbx inflating: data/division_comunal.shp inflating: data/division_comunal.shp.xml inflating: data/division_comunal.shx
You can use ogrinfo
to see the structure of the source shapefile.
!ogrinfo data/division_comunal.shp 'division_comunal' | head -n 30
INFO: Open of `data/division_comunal.shp' using driver `ESRI Shapefile' successful. Layer name: division_comunal Geometry: Polygon Feature Count: 346 Extent: (-3701712.293900, 3794823.357600) - (704690.560200, 8065196.816300) Layer SRS WKT: PROJCS["WGS_1984_UTM_Zone_19S", GEOGCS["GCS_WGS_1984", DATUM["WGS_1984", SPHEROID["WGS_84",6378137.0,298.257223563]], PRIMEM["Greenwich",0.0], UNIT["Degree",0.0174532925199433]], PROJECTION["Transverse_Mercator"], PARAMETER["False_Easting",500000.0], PARAMETER["False_Northing",10000000.0], PARAMETER["Central_Meridian",-69.0], PARAMETER["Scale_Factor",0.9996], PARAMETER["Latitude_Of_Origin",0.0], UNIT["Meter",1.0]] NOM_REG: String (50.0) NOM_PROV: String (20.0) NOM_COM: String (30.0) SHAPE_LENG: Real (19.11) DIS_ELEC: Integer (4.0) CIR_SENA: Integer (4.0) COD_COMUNA: Integer (4.0) SHAPE_Le_1: Real (19.11) SHAPE_Area: Real (19.11)
Now we use ogr2ogr to convert the shapefile into GeoJSON.
Notes:
santiago-comunas.json
in case it exists (that is, when we re-run the notebook :) ).-clipdst
option to specify a bounding box obtained in this site.-t_srs EPSG:4326-o
option to convert the data coordinates to (longitude,latitude) pairs.!rm data/santiago-comunas.json
!ogr2ogr -where "NOM_PROV IN ('Santiago', 'Maipo', 'Cordillera')" -f GeoJSON \
-clipdst -70.828155 -33.635036 -70.452573 -33.302953 -t_srs EPSG:4326-o \
data/santiago-comunas.json data/division_comunal.shp
rm: no se puede borrar «data/santiago-comunas.json»: No existe el archivo o el directorio
!topojson -p --id-property NOM_COM -s 0 -o data/topojson-santiago-comunas.json data/santiago-comunas.json
bounds: -70.828155 -33.635036 -70.452573 -33.302953 (spherical) pre-quantization: 0.0418m (3.76e-7°) 0.0369m (3.32e-7°) topology: 253 arcs, 5154 points post-quantization: 4.18m (0.0000376°) 3.69m (0.0000332°) prune: retained 253 / 253 arcs (100%)
import json
import unicodedata
def strip_accents(s):
return ''.join(c for c in unicodedata.normalize('NFD', s)
if unicodedata.category(c) != 'Mn')
with open('data/topojson-santiago-comunas.json', 'r') as f:
stgo = json.load(f)
for g in stgo['objects']['santiago-comunas']['geometries']:
g['id'] = strip_accents(g['id'].upper())
g['properties']['id'] = g['id']
stgo['objects']['santiago-comunas']['geometries'][7]
{u'arcs': [[61, 62, 63, -15, 64, 65, 66, 67, 68, 69, 70, 71]], u'id': u'SAN JOAQUIN', u'properties': {u'CIR_SENA': 8, u'COD_COMUNA': 1312, u'DIS_ELEC': 25, u'NOM_COM': u'San Joaqu\xedn', u'NOM_PROV': u'Santiago', u'NOM_REG': u'Regi\xf3n Metropolitana de Santiago', u'SHAPE_Area': 9876876.69845, u'SHAPE_LENG': 13987.3267808, u'SHAPE_Le_1': 13986.8273946, 'id': u'SAN JOAQUIN'}, u'type': u'Polygon'}
from matta import topojson
topojson(geometry=stgo)
import requests
wikipage = requests.get('https://es.wikipedia.org/wiki/Anexo:Comunas_de_Santiago_de_Chile')
wikipage
<Response [200]>
%load_ext autoreload
%autoreload 2
import pandas as pd
df = pd.read_html(wikipage.text, attrs={'class': 'sortable'}, header=0)[0]
df.head()
Comuna | Ubicación? | Población? | Viviendas? | Densidad poblacional? | Crecimiento demográfico? | IDH? | Pobreza? | |
---|---|---|---|---|---|---|---|---|
0 | Cerrillos | Surponiente | 71.906 | 19.811 | 4.32908 | -10 | 0,743 (54) | 83 |
1 | Cerro Navia | Norponiente | 148.312 | 35.277 | 13.48291 | -48 | 0,683 (165) | 175 |
2 | Conchalí | Norte | 133.256 | 32.609 | 12.07029 | -129 | 0,707 (118) | 80 |
3 | El Bosque | Sur | 175.594 | 42.808 | 12.27072 | 16 | 0,711 (106) | 158 |
4 | Estación Central | Surponiente | 130.394 | 32.357 | 9.03631 | -75 | 0,735 (60) | 73 |
Data is not clean. Fortunately, we just want the IDH column, which should be easy to convert to a meaningful float.
df['Comuna'] = [strip_accents(c).replace('?', '').upper() for c in df['Comuna']]
df['IDH'] = [float(c.split()[0].replace(',', '.')) for c in df['IDH?']]
del df['IDH?']
df.head()
Comuna | Ubicación? | Población? | Viviendas? | Densidad poblacional? | Crecimiento demográfico? | Pobreza? | IDH | |
---|---|---|---|---|---|---|---|---|
0 | CERRILLOS | Surponiente | 71.906 | 19.811 | 4.32908 | -10 | 83 | 0.743 |
1 | CERRO NAVIA | Norponiente | 148.312 | 35.277 | 13.48291 | -48 | 175 | 0.683 |
2 | CONCHALI | Norte | 133.256 | 32.609 | 12.07029 | -129 | 80 | 0.707 |
3 | EL BOSQUE | Sur | 175.594 | 42.808 | 12.27072 | 16 | 158 | 0.711 |
4 | ESTACION CENTRAL | Surponiente | 130.394 | 32.357 | 9.03631 | -75 | 73 | 0.735 |
df.IDH.describe()
count 37.000000 mean 0.762865 std 0.076155 min 0.657000 25% 0.709000 50% 0.737000 75% 0.804000 max 0.949000 Name: IDH, dtype: float64
We use seaborn
to create a color palette.
%matplotlib inline
import seaborn as sns
palette = sns.color_palette("GnBu_d", 5)
sns.palplot(palette)
from matta.scales import threshold_scale
scale = threshold_scale(df.IDH, palette, extend_by=0.05)
scale
{'domain': [0.65700000000000003, 0.7543333333333333, 0.85166666666666668, 0.94899999999999995], 'extent': [0.64240000000000008, 0.9635999999999999], 'range': [u'#385965', u'#3d8099', u'#43a6cc', u'#68bac6', u'#8fcec0']}
topojson(geometry=stgo, area_dataframe=df, area_feature_name='Comuna', area_value='IDH', area_color_scale_domain=scale['domain'],
area_color_scale_range=scale['range'], area_color_scale_extent=scale['extent'], leaflet=False)
topojson(geometry=stgo, mark_dataframe=df, mark_feature_name='Comuna', mark_value='Pobreza?',
mark_color='indigo', mark_scale=0.5)
topojson(geometry=stgo, area_dataframe=df, area_feature_name='Comuna', area_value='IDH', area_color_scale_domain=scale['domain'],
area_color_scale_range=scale['range'], area_color_scale_extent=scale['extent'],
mark_dataframe=df, mark_feature_name='Comuna', mark_value='Pobreza?', mark_color='indigo', mark_scale=0.5,
mark_max_ratio=15, mark_min_ratio=0, mark_opacity=0.5, leaflet=True)
The mixture of both choropleth and symbol map does not make sense in our case. But surely you have a more interesting use case!
You can see an example of scaffolded visualizations using matta.topojson
here.