Notebook

Neighborhood Structures in the ArcGIS Spatial Statistics Library¶

Spatial Weights Matrix
On-the-fly Neighborhood Iterators [GA Table]
Contructing PySAL Spatial Weights

Spatial Weight Matrix File¶

Stores the spatial weights so they do not have to be re-calculated for each analysis.
In row-compressed format.
Little endian byte encoded.
Requires a unique long/short field to identify each features. Can NOT be the OID/FID.

Construction¶

In [31]:

import Weights as WEIGHTS
import os as OS
inputFC = r'../data/CA_Polygons.shp'
fullFC = OS.path.abspath(inputFC)
fullPath, fcName = OS.path.split(fullFC)
masterField = "MYID"

Distance-Based Options¶

INPUTS:
    inputFC (str): path to the input feature class
    swmFile (str): path to the SWM file.
    masterField (str): field in table that serves as the mapping.
    fixed (boolean): fixed (1) or inverse (0) distance? 
    concept: {str, EUCLIDEAN}: EUCLIDEAN or MANHATTAN 
    exponent {float, 1.0}: distance decay
    threshold {float, None}: distance threshold
    kNeighs (int): number of neighbors to return
    rowStandard {bool, True}: row standardize weights?

Example: Fixed Distance

In [32]:

swmFile = OS.path.join(fullPath, "fixed250k.swm")
fixedSWM = WEIGHTS.distance2SWM(fullFC, swmFile, masterField, 
                                threshold = 250000)

Example: Inverse Distance Squared

In [33]:

swmFile = OS.path.join(fullPath, "inv2_250k.swm")
fixedSWM = WEIGHTS.distance2SWM(fullFC, swmFile, masterField, fixed = False,
                                exponent = 2.0, threshold = 250000)

k-Nearest Neighbors Options¶

INPUTS:
    inputFC (str): path to the input feature class
    swmFile (str): path to the SWM file.
    masterField (str): field in table that serves as the mapping.
    concept: {str, EUCLIDEAN}: EUCLIDEAN or MANHATTAN 
    kNeighs {int, 1}: number of neighbors to return
    rowStandard {bool, True}: row standardize weights?

Example: 8-nearest neighbors

In [34]:

swmFile = OS.path.join(fullPath, "knn8.swm")
fixedSWM = WEIGHTS.kNearest2SWM(fullFC, swmFile, masterField, kNeighs = 8)

Example: Fixed Distance - k-nearest neighbor hybrid [i.e. at least k neighbors but may have more...]

In [35]:

swmFile = OS.path.join(fullPath, "fixed250k_knn8.swm")
fixedSWM = WEIGHTS.distance2SWM(fullFC, swmFile, masterField, kNeighs = 8,
                                threshold = 250000)

Delaunay Triangulation Options¶

INPUTS:
    inputFC (str): path to the input feature class
    swmFile (str): path to the SWM file.
    masterField (str): field in table that serves as the mapping.
    rowStandard {bool, True}: row standardize weights?

Example: delaunay

In [36]:

swmFile = OS.path.join(fullPath, "delaunay.swm")
fixedSWM = WEIGHTS.delaunay2SWM(fullFC, swmFile, masterField)

Polygon Contiguity Options ¶

INPUTS:
    inputFC (str): path to the input feature class
    swmFile (str): path to the SWM file.
    masterField (str): field in table that serves as the mapping.
    concept: {str, EUCLIDEAN}: EUCLIDEAN or MANHATTAN
    kNeighs {int, 0}: number of neighbors to return (1)
    rowStandard {bool, True}: row standardize weights?
    contiguityType {str, Rook}: {Rook = Edges Only, Queen = Edges/Vertices}

    NOTES:
    (1) kNeighs is an option often used when you know there are polygon
        features that are not contiguous (e.g. islands).  A kNeighs value 
        of 2 will assure that ALL features have at least 2 neighbors.  
        If a polygon is determined to only touch a single other polygon, 
        then a nearest neighbor search based on true centroids are used 
        to find the additional neighbor.

Example: Rook [Binary]

In [37]:

swmFile = OS.path.join(fullPath, "rook_bin.swm")
WEIGHTS.polygon2SWM(inputFC, swmFile, masterField, rowStandard = False)

*Example: Queen Contiguity [Row Standardized]

In [38]:

swmFile = OS.path.join(fullPath, "queen.swm")
WEIGHTS.polygon2SWM(inputFC, swmFile, masterField, contiguityType = "QUEEN")

Example: Queen Contiguity - KNN Hybrid [Prevents Islands w/ no Neighbors][ (1)](#poly_options)

In [39]:

swmFile = OS.path.join(fullPath, "hybrid.swm")
WEIGHTS.polygon2SWM(inputFC, swmFile, masterField, kNeighs = 4)

On-the-fly Neighborhood Iterators [GA Table]¶

Reads centroids of input features into spatial tree structure.
Distance Based Queries.
Scalable: In-memory/disk-space swap for large data.
Requires a unique long/short field to identify each features. Can be the OID/FID.
Uses requireSearch = True when using ssdo.obtainData

Pre-Example: Load the Data into GA Version of SSDataObject

In [1]:

import SSDataObject as SSDO
inputFC = r'../data/CA_Polygons.shp'
ssdo = SSDO.SSDataObject(inputFC)
uniqueIDField = ssdo.oidName
ssdo.obtainData(uniqueIDField, requireSearch = True)

Example: NeighborSearch - When you only need your Neighbor IDs

gaSearch.init_nearest(distance_band, minimum_num_neighs, {"euclidean", "manhattan")

In [2]:

import arcgisscripting as ARC
import WeightsUtilities as WU
import gapy as GAPY
gaSearch = GAPY.ga_nsearch(ssdo.gaTable)
concept, gaConcept = WU.validateDistanceMethod('EUCLIDEAN', ssdo.spatialRef)
gaSearch.init_nearest(0.0, 4, gaConcept)
neighSearch = ARC._ss.NeighborSearch(ssdo.gaTable, gaSearch)
for i in range(len(neighSearch)):
    neighOrderIDs = neighSearch[i]
    if i < 5:
        print(neighOrderIDs)

[23 27  1  2]
[ 2  0 27 23]
[ 1 27  0 38]
[4 1 5 6]
[3 5 2 1]

In [10]:

import arcgisscripting as ARC
import WeightsUtilities as WU
import gapy as GAPY
import SSUtilities as UTILS
inputGrid = r'D:\Data\UC\UC17\Island\Dykstra\Dykstra.gdb\emerge'
ssdo = SSDO.SSDataObject(inputGrid)
ssdo.obtainData(ssdo.oidName, requireSearch = True)
gaSearch = GAPY.ga_nsearch(ssdo.gaTable)
concept, gaConcept = WU.validateDistanceMethod('EUCLIDEAN', ssdo.spatialRef)
gaSearch.init_nearest(300., 0, gaConcept)
neighSearch = ARC._ss.NeighborSearch(ssdo.gaTable, gaSearch)
print(ssdo.distanceInfo.name)
for i in range(len(neighSearch)):
    neighOrderIDs = neighSearch[i]
    x0,y0 = ssdo.xyCoords[i]
    if i < 5:
        nhs = ", ".join([str(i) for i in neighOrderIDs])
        dist =  []
        for nh in neighOrderIDs:
            x1,y1 = ssdo.xyCoords[nh]
            dij = WU.euclideanDistance(x0,x1,y0,y1)
            dist.append(UTILS.formatValue(dij, "%0.2f"))
        print("ID {0} has {1} neighs, they are {2}".format(i, len(neighOrderIDs), nhs))
        print("The Distances are... {0}".format(", ".join(dist)))
        

METER
ID 0 has 3 neighs, they are 3, 1, 4
The Distances are... 200.00, 200.00, 282.84
ID 1 has 5 neighs, they are 2, 0, 4, 3, 5
The Distances are... 200.00, 200.00, 200.00, 282.84, 282.84
ID 2 has 4 neighs, they are 5, 1, 11, 4
The Distances are... 200.00, 200.00, 282.84, 282.84
ID 3 has 5 neighs, they are 0, 4, 6, 1, 7
The Distances are... 200.00, 200.00, 200.00, 282.84, 282.84
ID 4 has 8 neighs, they are 5, 1, 3, 7, 6, 2, 0, 8
The Distances are... 200.00, 200.00, 200.00, 200.00, 282.84, 282.84, 282.84, 282.84

Example: NeighborWeights - When you need non-uniform spatial weights (E.g. Inverse Distance Squared)

NeighborWeights(gaTable, gaSearch, weight_type [0: inverse_distance, 1: fixed_distance], exponent = 1.0, row_standard = True, include_self = False)

In [42]:

gaSearch = GAPY.ga_nsearch(ssdo.gaTable)
gaSearch.init_nearest(250000, 0, gaConcept)
neighSearch = ARC._ss.NeighborWeights(ssdo.gaTable, gaSearch, weight_type = 0, exponent = 2.0)
for i in range(len(neighSearch)):
    neighOrderIDs, neighWeights = neighSearch[i]
    if i < 3:
        print(neighOrderIDs)
        print(neighWeights)

[23 27  1  2 26 22 25 29 38 24 21  4  5 34 28  3 42 41 35 31 33]
[ 0.24396241  0.15676701  0.13038913  0.09121172  0.07828545  0.05179581
  0.03150761  0.02784675  0.02374503  0.02282472  0.01887078  0.01625245
  0.01579714  0.01565997  0.01562422  0.01241947  0.01037579  0.00957457
  0.00936652  0.00911591  0.00860753]
[ 2  0 27 23  4 26 38  3  5 29 22 25 24 28 42 34 21 46 41]
[ 0.36443032  0.15271508  0.09036452  0.0526977   0.04303312  0.04294889
  0.03623002  0.03043097  0.03002032  0.02873118  0.02299465  0.02071245
  0.01442619  0.01389529  0.01204944  0.01178428  0.01173792  0.01047316
  0.01032449]
[ 1 27  0 38 26  5 29 23  4  3 25 22 28 42 24 46 41 34  9 45 21 10]
[ 0.30264594  0.13468572  0.08871796  0.06402787  0.05237183  0.0432296
  0.04296864  0.04237608  0.04219547  0.02342009  0.02259267  0.01885412
  0.01616961  0.01484024  0.01375183  0.01230613  0.01214131  0.01207259
  0.01106552  0.01034444  0.01003139  0.00919095]

Contructing PySAL Spatial Weights¶

Convert masterID to orderID when using ssdo.obtainData (SWM File, Polygon Contiguity)
Data is already in orderID when using ssdo.obtainDataGA (Distance Based)

Methods in next cell can be imported from pysal2ArcGIS.py

In [ ]:

import pysal as PYSAL
import WeightsUtilities as WU
import SSUtilities as UTILS

def swm2Weights(ssdo, swmfile):
    """Converts ArcGIS Sparse Spatial Weights Matrix (*.swm) file to 
    PySAL Sparse Spatial Weights Class.
    
    INPUTS:
    ssdo (class): instance of SSDataObject [1,2]
    swmFile (str): full path to swm file
    
    NOTES:
    (1) Data must already be obtained using ssdo.obtainData()
    (2) The masterField for the swm file and the ssdo object must be
        the same and may NOT be the OID/FID/ObjectID
    """
    neighbors = {}
    weights = {}
    
    #### Create SWM Reader Object ####
    swm = WU.SWMReader(swmfile)
    
    #### SWM May NOT be a Subset of the Data ####
    if ssdo.numObs > swm.numObs:
        ARCPY.AddIDMessage("ERROR", 842, ssdo.numObs, swm.numObs)
        raise SystemExit()
        
    #### Parse All SWM Records ####
    for r in UTILS.ssRange(swm.numObs):
        info = swm.swm.readEntry()
        masterID, nn, nhs, w, sumUnstandard = info
        
        #### Must Have at Least One Neighbor ####
        if nn:
            #### Must be in Selection Set (If Exists) ####
            if masterID in ssdo.master2Order:
                outNHS = []
                outW = []
                
                #### Transform Master ID to Order ID ####
                orderID = ssdo.master2Order[masterID]
                
                #### Neighbors and Weights Adjusted for Selection ####
                for nhInd, nhVal in enumerate(nhs):
                    try:
                        nhOrder = ssdo.master2Order[nhVal]
                        outNHS.append(nhOrder)
                        weightVal = w[nhInd]
                        if swm.rowStandard:
                            weightVal = weightVal * sumUnstandard[0]
                        outW.append(weightVal)
                    except KeyError:
                        pass
                
                #### Add Selected Neighbors/Weights ####
                if len(outNHS):
                    neighbors[orderID] = outNHS
                    weights[orderID] = outW
    swm.close()
    
    #### Construct PySAL Spatial Weights and Standardize as per SWM ####
    w = PYSAL.W(neighbors, weights)
    if swm.rowStandard:
        w.transform = 'R'
        
    return w

def poly2Weights(ssdo, contiguityType = "ROOK", rowStandard = True):
    """Uses GP Polygon Neighbor Tool to construct contiguity relationships
    and stores them in PySAL Sparse Spatial Weights class.
    
    INPUTS:
    ssdo (class): instance of SSDataObject [1]
    contiguityType {str, ROOK}: ROOK or QUEEN contiguity
    rowStandard {bool, True}: whether to row standardize the spatial weights
    
    NOTES:
    (1) Data must already be obtained using ssdo.obtainData() or ssdo.obtainDataGA ()
    """
    
    neighbors = {}
    weights = {}
    polyNeighDict = WU.polygonNeighborDict(ssdo.inputFC, ssdo.masterField,
                                           contiguityType = contiguityType)
    
    for masterID, neighIDs in UTILS.iteritems(polyNeighDict):
        orderID = ssdo.master2Order[masterID]
        neighbors[orderID] = [ssdo.master2Order[i] for i in neighIDs]

    w = PYSAL.W(neighbors)
    if rowStandard:
        w.transform = 'R'
    return w

def distance2Weights(ssdo, neighborType = 1, distanceBand = 0.0, numNeighs = 0, distanceType = "euclidean",
                     exponent = 1.0, rowStandard = True, includeSelf = False):
    """Uses ArcGIS Neighborhood Searching Structure to create a PySAL Sparse Spatial Weights Matrix.
    
    INPUTS:
    ssdo (class): instance of SSDataObject [1]
    neighborType {int, 1}: 0 = inverse distance, 1 = fixed distance, 
                           2 = k-nearest-neighbors, 3 = delaunay
    distanceBand {float, 0.0}: return all neighbors within this distance for inverse/fixed distance
    numNeighs {int, 0}: number of neighbors for k-nearest-neighbor, can also be used to set a minimum 
                        number of neighbors for inverse/fixed distance
    distanceType {str, euclidean}: manhattan or euclidean distance [2]  
    exponent {float, 1.0}: distance decay factor for inverse distance
    rowStandard {bool, True}: whether to row standardize the spatial weights
    includeSelf {bool, False}: whether to return self as a neighbor
    
    NOTES:
    (1) Data must already be obtained using ssdo.obtainDataGA()
    (2) Chordal Distance is used for GCS Data
    """
    
    neighbors = {}
    weights = {}
    gaSearch = GAPY.ga_nsearch(ssdo.gaTable)
    if neighborType == 3:
        gaSearch.init_delaunay()
        neighSearch = ARC._ss.NeighborWeights(ssdo.gaTable, gaSearch, weight_type = 1)
    else:
        if neighborType == 2:
            distanceBand = 0.0
            weightType = 1
        else:
            weightType = neighborType
        
        concept, gaConcept = WU.validateDistanceMethod(distanceType.upper(), ssdo.spatialRef)
        gaSearch.init_nearest(distanceBand, numNeighs, gaConcept)
        neighSearch = ARC._ss.NeighborWeights(ssdo.gaTable, gaSearch, weight_type = weightType, 
                                              exponent = exponent, include_self = includeSelf)
        
    for i in range(len(neighSearch)):
        neighOrderIDs, neighWeights = neighSearch[i]
        neighbors[i] = neighOrderIDs
        weights[i] = neighWeights
        
    w = PYSAL.W(neighbors, weights)
    if rowStandard:
        w.transform = 'R'
    return w 

Converting Spatial Weight Matrix Formats (e.g. .swm, .gwt, *.gal)¶

Follow directions at the PySAL-ArcGIS-Toolbox Git Repository [https://github.com/Esri/PySAL-ArcGIS-Toolbox]
Please make note of the section on Adding a Git Project to your ArcGIS Installation Python Path.

In [ ]:

import WeightConvertor as W_CONVERT
swmFile = OS.path.join(fullPath, "queen.swm")
galFile = OS.path.join(fullPath, "queen.gal")
convert = W_CONVERT.WeightConvertor(swmFile, galFile, inputFC, "MYID", "SWM", "GAL")
convert.createOutput()

Calling MaxP Regions Using SWM Based on Rook Contiguity, No Row Standardization

In [ ]:

import numpy as NUM
NUM.random.seed(100)
ssdo = SSDO.SSDataObject(inputFC)
uniqueIDField = "MYID"
fieldNames = ['PCR2010', 'POP2010', 'PERCNOHS']
ssdo.obtainDataGA(uniqueIDField, fieldNames)
df = ssdo.getDataFrame()
X = df.as_matrix()
swmFile = OS.path.join(fullPath, "rook_bin.swm")
w = swm2Weights(ssdo, swmFile)
maxp = PYSAL.region.Maxp(w, X[:,0:2], 3000000., floor_variable = X[:,2])
maxpGroups = NUM.empty((ssdo.numObs,), int)
for regionID, orderIDs in enumerate(maxp.regions):
    maxpGroups[orderIDs] = regionID
    print((regionID, orderIDs))

Calling MaxP Regions Using Rook Contiguity, No Row Standardization

In [ ]:

NUM.random.seed(100)
w = poly2Weights(ssdo, rowStandard = False)
maxp = PYSAL.region.Maxp(w, X[:,0:2], 3000000., floor_variable = X[:,2])
maxpGroups = NUM.empty((ssdo.numObs,), int)
for regionID, orderIDs in enumerate(maxp.regions):
    maxpGroups[orderIDs] = regionID
    print((regionID, orderIDs))

Identical results because the random seed was set to 100 and they have the same spatial neighborhood

Calling MaxP Regions Using Fixed Distance 250000, Hyrbid to Assure at least 2 Neighbors

In [ ]:

NUM.random.seed(100)
w = distance2Weights(ssdo, distanceBand = 250000.0, numNeighs = 2)
maxp = PYSAL.region.Maxp(w, X[:,0:2], 3000000., floor_variable = X[:,2])
maxpGroups = NUM.empty((ssdo.numObs,), int)
for regionID, orderIDs in enumerate(maxp.regions):
    maxpGroups[orderIDs] = regionID
    print((regionID, orderIDs))

Same random seed, different result as neighborhood is different than the two previous