Before using the clustering algorithms included in Clusterpy we need to load/import a shapefile.
The data_examples directory in Clusterpy contains some shapefiles that we can use to start exploring the different components in the library.
Let's start by importing clusterpy:
import clusterpy
# To use this same version use git to checkout this commit: 434b415f
# $ git checkout 434b415f
# Current version is v0.9.9
ClusterPy: Library of spatially constrained clustering algorithms
The basic data structure for Clusterpy is the 'Layer' object.
This object will hold the information of our shapefile and will hold a reference to future modification of the shape that result from running the clustering algorithms.
To create a layer object we can import an ESRI shape file or generate a regular lattice of polygons.
# Use the importArcData function
ca_layer = clusterpy.importArcData('clusterpy/data_examples/CA_Polygons')
Loading clusterpy/data_examples/CA_Polygons.dbf Loading clusterpy/data_examples/CA_Polygons.shp Done
# You can use the CPhelp function to quickly get the function's __doc__ like this:
clusterpy.CPhelp('importArcData')
Creates a new Layer from a shapefile (<file>.shp) :param filename: filename without extension :type filename: string :rtype: Layer (CP project) **Description** `ESRI <http://www.esri.com/>`_ shapefile is a binary file used to save and transport maps. During the last times it has become the most used format for the spatial scientists around the world. On clusterPy's "data_examples" folder you can find some shapefiles. To load a shapefile in clusterPy just follow the example bellow. **Example** :: import clusterpy china = clusterpy.importArcData("clusterpy/data_examples/china")
Another way to create a Layer file is by generating a regular lattice with the createGrid function.
clusterpy.CPhelp('createGrid')
Creates a new Layer with a regular lattice :param nRows: number of rows :type nRows: integer :param nCols: number of columns :type nCols: integer :type lowerLeft: tuple or none, lower-left corner coordinates; default is (0,0) :type upperRight: tuple or none, upper-right corner coordinates; default is (100,100) :rtype: Layer new lattice **Description** Regular lattices are widely used in both theoretical and empirical applications in Regional Science. The example below shows how easy the creation of this kind of maps is using clusterPy. **Examples** Create a grid of ten by ten points.:: import clusterpy points = clusterpy.createGrid(10,10) Create a grid of ten by ten points on the bounding box (0,0,100,100).:: import clusterpy points = clusterpy.createGrid(10, 10, lowerLeft=(0, 0), upperRight=(100, 100))
# From the examples found in the help function we can create a simple 10x10 grid.
# let's create a smaller grid for simplicity
grid_layer = clusterpy.createGrid(5, 5)
Creating grid Done
Now that we have our basic Layer data structure loaded with a shape, we can start working on it and exploring its contents.
After loading a Shapefile we can easily get the contiguity matrix and also the information found in the .dbf (in case we loaded an ESRI file).
Contiguity matrices: We can work with both Wqueen and Wrook to specify the neighbouring areas in our shapefile. In Wqueen, two areas are considered neighbours if they touch each other even by a point. In Wrook, two areas are neighbours if they share a border.
"""
In Wqueen, the area 'A' has all neighbors named 'Q'
+=====+=====+=====+
| Q | Q | Q |
+=====+=====+=====+
| Q | A | Q |
+=====+=====+=====+
| Q | Q | Q |
+=====+=====+=====+
In Wrook, the area 'A' has all neighbors named 'R'
+=====+=====+=====+
| | R | |
+=====+=====+=====+
| R | A | R |
+=====+=====+=====+
| | R | |
+=====+=====+=====+
"""
None
# Contiguity matrices of our layers
print "ca_layer Wqueen: ", ca_layer.Wqueen
print "\nca_layer Wrook: ", ca_layer.Wrook
print "\ngrid_layer Wqueen: ", grid_layer.Wqueen
print "\ngrid_layer Wrook: ", grid_layer.Wrook
ca_layer Wqueen: {0: [6, 38, 40, 42], 1: [2, 4, 8, 25, 54], 2: [1, 4, 8, 33, 38], 3: [5, 10, 31, 50, 51, 57], 4: [1, 2, 38, 49, 54], 5: [3, 10, 16, 50, 56], 6: [0, 33, 38, 47], 7: [11, 46], 8: [1, 2, 30, 33], 9: [13, 15, 19, 23, 25, 26, 34, 53], 10: [3, 5, 16, 22, 51], 11: [7, 22, 46, 52], 12: [32, 36], 13: [9, 14, 25, 35, 53], 14: [13, 15, 18, 35, 39, 41, 53, 55], 15: [14, 9, 26, 39, 53], 16: [10, 5, 22, 27, 48, 56], 17: [24, 31, 44, 45], 18: [14, 29, 35, 55], 19: [9, 21, 23, 25, 54], 20: [48], 21: [19, 23, 49, 54], 22: [11, 10, 16, 48, 51, 52], 23: [19, 9, 21, 34, 42, 49], 24: [17, 44, 46], 25: [13, 9, 19, 1, 54], 26: [9, 15, 34, 39, 43], 27: [16, 47, 48, 56], 28: [30, 45, 57], 29: [18, 32, 35, 36], 30: [8, 28, 33, 50, 57], 31: [17, 3, 44, 45, 51, 57], 32: [12, 29, 35, 36], 33: [30, 8, 2, 6, 38, 47, 50, 56], 34: [23, 9, 26, 42, 43], 35: [13, 32, 29, 18, 14], 36: [32, 12, 29], 37: [40], 38: [6, 33, 2, 4, 0, 49], 39: [26, 15, 14, 41], 40: [37, 0, 42, 43], 41: [39, 14, 55], 42: [0, 23, 34, 40, 43, 49], 43: [42, 34, 26, 40], 44: [24, 17, 31, 46, 51, 52], 45: [31, 17, 28, 57], 46: [24, 44, 11, 7, 52], 47: [33, 6, 27, 48, 56], 48: [20, 22, 16, 27, 47], 49: [21, 23, 42, 38, 4, 54], 50: [3, 30, 33, 5, 56, 57], 51: [44, 31, 3, 10, 22, 52], 52: [46, 44, 51, 22, 11], 53: [13, 14, 15, 9], 54: [1, 25, 19, 21, 49, 4], 55: [14, 18, 41], 56: [5, 50, 33, 47, 27, 16], 57: [45, 28, 30, 50, 3, 31]} ca_layer Wrook: {0: [6, 38, 40, 42], 1: [2, 4, 8, 25, 54], 2: [1, 4, 8, 33, 38], 3: [5, 10, 31, 50, 51, 57], 4: [1, 2, 38, 49, 54], 5: [3, 10, 16, 50, 56], 6: [0, 33, 38, 47], 7: [11, 46], 8: [1, 2, 30, 33], 9: [13, 15, 19, 23, 25, 26, 34, 53], 10: [3, 5, 16, 22, 51], 11: [7, 22, 46, 52], 12: [32, 36], 13: [9, 14, 25, 35, 53], 14: [13, 15, 18, 35, 39, 41, 53, 55], 15: [14, 9, 26, 39, 53], 16: [10, 5, 22, 27, 48, 56], 17: [24, 31, 44, 45], 18: [14, 29, 35, 55], 19: [9, 21, 23, 25, 54], 20: [48], 21: [19, 23, 54], 22: [11, 10, 16, 48, 51, 52], 23: [19, 9, 21, 34, 42, 49], 24: [17, 46], 25: [13, 9, 19, 1, 54], 26: [9, 15, 34, 39, 43], 27: [16, 47, 48, 56], 28: [30, 45, 57], 29: [18, 32, 35, 36], 30: [8, 28, 33, 50, 57], 31: [17, 3, 44, 45, 51, 57], 32: [12, 29, 35, 36], 33: [30, 8, 2, 6, 38, 47, 50, 56], 34: [23, 9, 26, 42, 43], 35: [13, 32, 29, 18, 14], 36: [32, 12, 29], 37: [40], 38: [6, 33, 2, 4, 0, 49], 39: [26, 15, 14, 41], 40: [37, 0, 42, 43], 41: [39, 14, 55], 42: [0, 23, 34, 40, 43, 49], 43: [42, 34, 26, 40], 44: [17, 31, 46, 51, 52], 45: [31, 17, 28, 57], 46: [24, 44, 11, 7, 52], 47: [33, 6, 27, 56], 48: [20, 22, 16, 27], 49: [23, 42, 38, 4, 54], 50: [3, 30, 33, 5, 56, 57], 51: [44, 31, 3, 10, 22, 52], 52: [46, 44, 51, 22, 11], 53: [13, 14, 15, 9], 54: [1, 25, 19, 21, 49, 4], 55: [14, 18, 41], 56: [5, 50, 33, 47, 27, 16], 57: [45, 28, 30, 50, 3, 31]} grid_layer Wqueen: {0: [1, 5, 6], 1: [0, 2, 5, 6, 7], 2: [1, 3, 6, 7, 8], 3: [2, 4, 7, 8, 9], 4: [3, 8, 9], 5: [0, 1, 6, 10, 11], 6: [0, 1, 5, 2, 7, 10, 11, 12], 7: [1, 2, 6, 3, 8, 11, 12, 13], 8: [2, 3, 7, 4, 9, 12, 13, 14], 9: [3, 4, 8, 13, 14], 10: [5, 6, 11, 15, 16], 11: [5, 6, 10, 7, 12, 15, 16, 17], 12: [6, 7, 11, 8, 13, 16, 17, 18], 13: [7, 8, 12, 9, 14, 17, 18, 19], 14: [8, 9, 13, 18, 19], 15: [10, 11, 16, 20, 21], 16: [10, 11, 15, 12, 17, 20, 21, 22], 17: [11, 12, 16, 13, 18, 21, 22, 23], 18: [12, 13, 17, 14, 19, 22, 23, 24], 19: [13, 14, 18, 23, 24], 20: [15, 16, 21], 21: [15, 16, 20, 17, 22], 22: [16, 17, 21, 18, 23], 23: [17, 18, 22, 19, 24], 24: [18, 19, 23]} grid_layer Wrook: {0: [1, 5], 1: [0, 2, 6], 2: [1, 3, 7], 3: [2, 4, 8], 4: [3, 9], 5: [0, 6, 10], 6: [1, 5, 7, 11], 7: [2, 6, 8, 12], 8: [3, 7, 9, 13], 9: [4, 8, 14], 10: [5, 11, 15], 11: [6, 10, 12, 16], 12: [7, 11, 13, 17], 13: [8, 12, 14, 18], 14: [9, 13, 19], 15: [10, 16, 20], 16: [11, 15, 17, 21], 17: [12, 16, 18, 22], 18: [13, 17, 19, 23], 19: [14, 18, 24], 20: [15, 21], 21: [16, 20, 22], 22: [17, 21, 23], 23: [18, 22, 24], 24: [19, 23]}
Both the Wqueen and Wrook are properties of the layer object. They are a Python dictionary where the key is the area ID and the value is a Python list containing the IDs of the neighbouring areas.
Layer's fieldNames The fieldNames property is a list with the identifiers of the data loaded with the shape file.
ca_layer.fieldNames
['ID', 'NAME', 'STATE_FIP', 'STATE', 'MYID', 'AREA', 'PERIMETER', 'PCR1969', 'PCR1970', 'PCR1971', 'PCR1972', 'PCR1973', 'PCR1974', 'PCR1975', 'PCR1976', 'PCR1977', 'PCR1978', 'PCR1979', 'PCR1980', 'PCR1981', 'PCR1982', 'PCR1983', 'PCR1984', 'PCR1985', 'PCR1986', 'PCR1987', 'PCR1988', 'PCR1989', 'PCR1990', 'PCR1991', 'PCR1992', 'PCR1993', 'PCR1994', 'PCR1995', 'PCR1996', 'PCR1997', 'PCR1998', 'PCR1999', 'PCR2000', 'PCR2001', 'PCR2002', 'POP1969', 'POP1970', 'POP1971', 'POP1972', 'POP1973', 'POP1974', 'POP1975', 'POP1976', 'POP1977', 'POP1978', 'POP1979', 'POP1980', 'POP1981', 'POP1982', 'POP1983', 'POP1984', 'POP1985', 'POP1986', 'POP1987', 'POP1988', 'POP1989', 'POP1990', 'POP1991', 'POP1992', 'POP1993', 'POP1994', 'POP1995', 'POP1996', 'POP1997', 'POP1998', 'POP1999', 'POP2000', 'POP2001', 'POP2002']
Having the data identifiers we can extract the data using the getVars method that returns a Python dictionary where the key is the area ID and the value is an array of values.
# Extract one column
print ca_layer.getVars('NAME')
# Or extract multiple columns
print ca_layer.getVars(['NAME','POP2002'])
Getting variables Variables successfully extracted {0: ['Alameda'], 1: ['Alpine'], 2: ['Amador'], 3: ['Butte'], 4: ['Calaveras'], 5: ['Colusa'], 6: ['Contra Costa'], 7: ['Del Norte'], 8: ['El Dorado'], 9: ['Fresno'], 10: ['Glenn'], 11: ['Humboldt'], 12: ['Imperial'], 13: ['Inyo'], 14: ['Kern'], 15: ['Kings'], 16: ['Lake'], 17: ['Lassen'], 18: ['Los Angeles'], 19: ['Madera'], 20: ['Marin'], 21: ['Mariposa'], 22: ['Mendocino'], 23: ['Merced'], 24: ['Modoc'], 25: ['Mono'], 26: ['Monterey'], 27: ['Napa'], 28: ['Nevada'], 29: ['Orange'], 30: ['Placer'], 31: ['Plumas'], 32: ['Riverside'], 33: ['Sacramento'], 34: ['San Benito'], 35: ['San Bernardino'], 36: ['San Diego'], 37: ['San Francisco'], 38: ['San Joaquin'], 39: ['San Luis Obispo'], 40: ['San Mateo'], 41: ['Santa Barbara'], 42: ['Santa Clara'], 43: ['Santa Cruz'], 44: ['Shasta'], 45: ['Sierra'], 46: ['Siskiyou'], 47: ['Solano'], 48: ['Sonoma'], 49: ['Stanislaus'], 50: ['Sutter'], 51: ['Tehama'], 52: ['Trinity'], 53: ['Tulare'], 54: ['Tuolumne'], 55: ['Ventura'], 56: ['Yolo'], 57: ['Yuba']} Getting variables Variables successfully extracted {0: ['Alameda', 1465923.0], 1: ['Alpine', 1219.0], 2: ['Amador', 36742.0], 3: ['Butte', 208779.0], 4: ['Calaveras', 43113.0], 5: ['Colusa', 19363.0], 6: ['Contra Costa', 989340.0], 7: ['Del Norte', 27481.0], 8: ['El Dorado', 165513.0], 9: ['Fresno', 831946.0], 10: ['Glenn', 26797.0], 11: ['Humboldt', 127348.0], 12: ['Imperial', 145843.0], 13: ['Inyo', 18273.0], 14: ['Kern', 692474.0], 15: ['Kings', 134802.0], 16: ['Lake', 62223.0], 17: ['Lassen', 33576.0], 18: ['Los Angeles', 9768236.0], 19: ['Madera', 128824.0], 20: ['Marin', 246824.0], 21: ['Mariposa', 17318.0], 22: ['Mendocino', 87516.0], 23: ['Merced', 224976.0], 24: ['Modoc', 9306.0], 25: ['Mono', 13043.0], 26: ['Monterey', 411140.0], 27: ['Napa', 129894.0], 28: ['Nevada', 95093.0], 29: ['Orange', 2926160.0], 30: ['Placer', 278515.0], 31: ['Plumas', 21003.0], 32: ['Riverside', 1695369.0], 33: ['Sacramento', 1301627.0], 34: ['San Benito', 55757.0], 35: ['San Bernardino', 1806450.0], 36: ['San Diego', 2904687.0], 37: ['San Francisco', 761983.0], 38: ['San Joaquin', 613153.0], 39: ['San Luis Obispo', 252064.0], 40: ['San Mateo', 700341.0], 41: ['Santa Barbara', 401757.0], 42: ['Santa Clara', 1677426.0], 43: ['Santa Cruz', 253295.0], 44: ['Shasta', 171784.0], 45: ['Sierra', 3489.0], 46: ['Siskiyou', 44231.0], 47: ['Solano', 409510.0], 48: ['Sonoma', 465862.0], 49: ['Stanislaus', 481014.0], 50: ['Sutter', 82273.0], 51: ['Tehama', 57454.0], 52: ['Trinity', 13254.0], 53: ['Tulare', 381039.0], 54: ['Tuolumne', 56008.0], 55: ['Ventura', 781159.0], 56: ['Yolo', 180011.0], 57: ['Yuba', 62386.0]}
From there you can use this dictionaries as you normally would using Python.
Geometry Last thing we will explore this time is the geometry of the shape we just loaded. To do this we can use the following methods: getBbox, getCentroids and getGeometricAreas.
print "Bounding box of the layer:\n", ca_layer.getBbox()
print "\nCoordinates of each polygon's centroid:\n", ca_layer.getCentroids()
print "\nGeometric area of each polygon:\n", ca_layer.getGeometricAreas()
Bounding box of the layer: (-13849215.0216, 3810651.8715000004, -12705407.2537, 5133802.977899998) Coordinates of each polygon's centroid: Processing geometric areas Done {0: [-13568659.919586366, 4503413.063612468], 1: [-13338372.215979917, 4637531.6768424865], 2: [-13430882.045796787, 4616096.512727589], 3: [-13536520.309179187, 4790502.499745239], 4: [-13419984.710444089, 4582013.482539735], 5: [-13607377.60687068, 4720192.822275058], 6: [-13573009.426717965, 4541750.500248381], 7: [-13792155.732439086, 5094215.939646447], 8: [-13416714.916405657, 4663289.480117444], 9: [-13319197.243995998, 4380231.914094166], 10: [-13624642.23431504, 4780557.530761684], 11: [-13789745.01937965, 4940740.548065312], 12: [-12842419.767405918, 3877368.457248818], 13: [-13070398.220065115, 4346359.124208251], 14: [-13216999.188649021, 4186062.4704793678], 15: [-13337783.804150525, 4285929.164384436], 16: [-13664853.450876404, 4709153.371158138], 17: [-13424562.10058535, 4936793.228410435], 18: [-13160714.601288402, 4048240.6935433187], 19: [-13331807.234915387, 4443863.697048869], 20: [-13661436.093311014, 4563414.568985493], 21: [-13347796.208296021, 4494482.175947623], 22: [-13735889.12302683, 4758165.55509784], 23: [-13438225.181055542, 4440263.2055810755], 24: [-13439052.651975656, 5071553.099204304], 25: [-13234556.374408357, 4544807.466138684], 26: [-13496355.326297682, 4305487.455694987], 27: [-13617772.683528531, 4624707.534311376], 28: [-13443815.780664323, 4737914.395268852], 29: [-13109123.387110595, 3965293.3584607164], 30: [-13438152.9326567, 4703903.310220157], 31: [-13451723.687715685, 4839259.111399066], 32: [-12912367.250883367, 3970782.0348184193], 33: [-13507935.889340824, 4616562.908849987], 34: [-13478055.376873886, 4358891.890145417], 35: [-12933022.016070718, 4118558.098666614], 36: [-12994910.64984747, 3876686.1000765674], 37: [-13630999.501299726, 4518811.469500262], 38: [-13499919.919868827, 4544013.408017299], 39: [-13403416.238293068, 4192003.5300912694], 40: [-13617402.38900018, 4472389.9554926595], 41: [-13360251.215654947, 4095464.879514745], 42: [-13547194.159427388, 4445769.069474665], 43: [-13581197.11435818, 4421214.244132896], 44: [-13585433.711149536, 4949833.469471982], 45: [-13415787.879356084, 4777934.4592928], 46: [-13641136.58655161, 5071983.217357874], 47: [-13573367.305089545, 4591299.189713263], 48: [-13679783.383839797, 4627817.291038316], 49: [-13469357.95243801, 4491395.628752196], 50: [-13547026.266442705, 4699770.334356141], 51: [-13606986.489451626, 4856779.688482221], 52: [-13704749.605066169, 4933513.55758898], 53: [-13224817.616867151, 4305885.060021143], 54: [-13353290.198194833, 4557087.129823579], 55: [-13256343.395780759, 4066212.6464521573], 56: [-13570056.24491133, 4650245.633251788], 57: [-13508731.568547059, 4733319.941590071]} Geometric area of each polygon: Processing geometric areas Done {0: 3108321800.5820312, 1: 3141322375.7226562, 2: 2547049328.8671875, 3: 7310071012.136719, 4: 4338156715.0234375, 5: 4969871133.613281, 6: 3149200450.6289062, 7: 4705894364.058594, 8: 7599260770.140625, 9: 24223594696.015625, 10: 5773798622.015625, 11: 16226764037.457031, 12: 16483687478.710938, 13: 40911603668.70703, 14: 31697765299.308594, 15: 5503534454.2421875, 16: 5703143196.484375, 17: 21192838314.5, 18: 15538836545.734375, 19: 8773033085.925781, 20: 2273285525.9296875, 21: 6017271483.527344, 22: 15218627771.582031, 23: 8027682224.042969, 24: 19404443747.214844, 25: 13009352723.707031, 26: 13149450775.453125, 27: 3324242362.1484375, 28: 4203259051.8867188, 29: 2986245536.53125, 30: 6439967903.96875, 31: 11505610290.414062, 32: 27301274272.67578, 33: 4192057172.7734375, 34: 5577376003.628906, 35: 77145020084.69531, 36: 15661603577.3125, 37: 202862566.37890625, 38: 5923598317.199219, 39: 12917199330.90625, 40: 1889379664.0625, 41: 10501274949.964844, 42: 5315647723.28125, 43: 1811574275.0820312, 44: 17319422674.6875, 45: 4182072183.6835938, 46: 29305219389.515625, 47: 3722485525.2265625, 48: 6711028502.027344, 49: 6227170623.6796875, 50: 2604709852.703125, 51: 13085397427.78125, 52: 14390904869.09375, 53: 19211393642.875, 54: 9468555258.746094, 55: 7061990877.597656, 56: 4335799114.730469, 57: 2773976792.0351562}
This is what the geometric information looks like for a grid.
print "Bounding box of the layer:\n", grid_layer.getBbox()
print "\nCoordinates of each polygon's centroid:\n", grid_layer.getCentroids()
print "\nGeometric area of each polygon:\n", grid_layer.getGeometricAreas()
Bounding box of the layer: (0, 0.0, 100.0, 100) Coordinates of each polygon's centroid: Processing geometric areas Done {0: [10.0, 90.0], 1: [30.0, 90.0], 2: [50.0, 90.0], 3: [70.0, 90.0], 4: [90.0, 90.0], 5: [10.0, 70.0], 6: [30.0, 70.0], 7: [50.0, 70.0], 8: [70.0, 70.0], 9: [90.0, 70.0], 10: [10.0, 50.0], 11: [30.0, 50.0], 12: [50.0, 50.0], 13: [70.0, 50.0], 14: [90.0, 50.0], 15: [10.0, 30.0], 16: [30.0, 30.0], 17: [50.0, 30.0], 18: [70.0, 30.0], 19: [90.0, 30.0], 20: [10.0, 10.0], 21: [30.0, 10.0], 22: [50.0, 10.0], 23: [70.0, 10.0], 24: [90.0, 10.0]} Geometric area of each polygon: Processing geometric areas Done {0: 400.0, 1: 400.0, 2: 400.0, 3: 400.0, 4: 400.0, 5: 400.0, 6: 400.0, 7: 400.0, 8: 400.0, 9: 400.0, 10: 400.0, 11: 400.0, 12: 400.0, 13: 400.0, 14: 400.0, 15: 400.0, 16: 400.0, 17: 400.0, 18: 400.0, 19: 400.0, 20: 400.0, 21: 400.0, 22: 400.0, 23: 400.0, 24: 400.0}
We will leave it there. Here we are just covering the basics but if you are interested check out the full documentation for this version here.