The building blocks of cenpy

Cenpy (sen - pie) is a package that exposes APIs from the US Census Bureau and makes it easy to pull down and work with Census data in Pandas.

Below, we'll discuss the APIConnection interface. This is the building blocks of the data products supported in cenpy.products. If you're looking for datasets that are not supported in cenpy.products or are interested in building your own application on top of cenpy, this is probably better for you. However, most users will probably want to use entries in cenpy.products.

In [1]:
import cenpy as c
import pandas

On import, cenpy.explorer requests all currently available APIs from the Census Bureau's API listing. In future, it will can also read a JSON collection describing the databases from disk, if asked.

Explorer has two functions, available and explain. available will provide a list of the identifiers of all the APIs that cenpy knows about. If run with verbose=True, cenpy will also include the title of the database as a dictionary. It's a good idea to not process this directly, and instead use it to explore currently available APIs.

Also, beware that the US Census Bureau can change the names of the resources. This means that the index of the following table is not necessarily stable over time; sometimes, the same resource can change its identifier, like when the 2010 decennial census changed from 2010sf1 to DECENNIALSF12010. So, consult the table built by cenpy.explorer.available() if the keys appear to have changed.

Here, I'll just show the first five entries:

In [2]:
c.explorer.available().head()
Out[2]:
title temporal spatial publisher programCode modified keyword distribution description contactPoint ... c_isTimeseries c_isCube c_isAvailable c_isAggregate c_groupsLink c_geographyLink c_examplesLink c_dataset bureauCode accessLevel
2000sf1 2000 Decennial: Summary File 1 2000 US US Census Bureau NaN NaN () {'@type': 'dcat:Distribution', 'accessURL': 'h... Data files available from Census 2000 and the ... {'fn': 'Census Bureau Call Center', 'hasEmail'... ... NaN NaN True True https://api.census.gov/data/2000/sf1/groups.json https://api.census.gov/data/2000/sf1/geography... https://api.census.gov/data/2000/sf1/examples.... (sf1,) 006:07 public
2000sf3 2000 Decennial: Summary File 3 2000 US US Census Bureau NaN 2017-05-23 () {'@type': 'dcat:Distribution', 'accessURL': 'h... This Census 2000 file presents data on the pop... {'fn': 'Census Bureau Call Center', 'hasEmail'... ... NaN NaN True True https://api.census.gov/data/2000/sf3/groups.json https://api.census.gov/data/2000/sf3/geography... https://api.census.gov/data/2000/sf3/examples.... (sf3,) 006:07 public
2012acs1 2012 American Community Survey: 1-Year Estimates 2012 US US Census Bureau NaN NaN () {'@type': 'dcat:Distribution', 'accessURL': 'h... The American Community Survey (ACS) is a natio... {'fn': 'Census Bureau Call Center', 'hasEmail'... ... NaN NaN True True https://api.census.gov/data/2012/acs1/groups.json https://api.census.gov/data/2012/acs1/geograph... https://api.census.gov/data/2012/acs1/examples... (acs1,) 006:07 public
2012acs3 2012 American Community Survey: 3-Year Estimates 2012 US US Census Bureau NaN NaN () {'@type': 'dcat:Distribution', 'accessURL': 'h... The American Community Survey (ACS) is a natio... {'fn': 'Census Bureau Call Center', 'hasEmail'... ... NaN NaN True True https://api.census.gov/data/2012/acs3/groups.json https://api.census.gov/data/2012/acs3/geograph... https://api.census.gov/data/2012/acs3/examples... (acs3,) 006:07 public
2012acs3profile 2012 American Community Survey: 3-Year Profile... 2012 US US Census Bureau NaN NaN () {'@type': 'dcat:Distribution', 'accessURL': 'h... The American Community Survey (ACS) is a natio... {'fn': 'Census Bureau Call Center', 'hasEmail'... ... NaN NaN True True https://api.census.gov/data/2012/acs3/profile/... https://api.census.gov/data/2012/acs3/profile/... https://api.census.gov/data/2012/acs3/profile/... (acs3, profile) 006:07 public

5 rows × 24 columns

The explain command provides the title and full description of the datasource. If run in verbose mode, the function returns the full json listing of the API.

In [3]:
c.explorer.explain('DECENNIALSF12010')
Out[3]:
{'Decennial SF1': 'Summary File 1 (SF 1) contains detailed tables focusing on age, sex, households, families, and housing units. These tables provide in-depth figures by race and Hispanic origin> some tables are repeated for each of nine race/Latino groups. Counts also are provided for over forty American Indian and Alaska Native tribes and for groups within race categories. The race categories include eighteen Asian groups and twelve Native Hawaiian and Other Pacific Islander groups. Counts of persons of Hispanic origin by country of origin (twenty-eight groups) are also shown. Summary File 1 presents data for the United States, the 50 states, and the District of Columbia in a hierarchical sequence down to the block level for many tabulations, but only to the census tract level for others. Summaries are included for other geographic areas such as ZIP Code Tabulation Areas (ZCTAs) and Congressional districts. Geographic coverage for Puerto Rico is comparable to the 50 states. Data are presented in a hierarchical sequence down the block level for many tabulations, but only to the census tract level for others. Geographic areas include barrios, barrios-pueblo, subbarrios, places, census tracts, block groups, and blocks. Summaries also are included for other geographic areas such as ZIP Code Tabulation Areas (ZCTAs).'}

To actually connect to a database resource, you create a Connection. A Connection works like a very simplified connection from the sqlalchemy world. The Connection class has a method, query that constructs a query string and requests it from the Census server. This result is then parsed into JSON and returned to the user.

In [6]:
conn = c.remote.APIConnection('DECENNIALSF12010')

That may have taken longer than you'd've expected. This is because, when the Connection constructor is called, it populates the connection object with a bit of metadata that makes it possible to construct queries without referring to the census handbooks.

For instance, a connection's variables represent all available search parameters for a given dataset.

In [7]:
conn.variables.head()
Out[7]:
attributes concept group label limit predicateOnly predicateType required values
for NaN Census API Geography Specification N/A Census API FIPS 'for' clause 0 True fips-for NaN NaN
in NaN Census API Geography Specification N/A Census API FIPS 'in' clause 0 True fips-in NaN NaN
ucgid NaN Census API Geography Specification N/A Uniform Census Geography Identifier clause 0 True ucgid NaN NaN
P029009 NaN HOUSEHOLD TYPE BY RELATIONSHIP P29 Total!!In households!!In family households!!Ad... 0 NaN int NaN NaN
P029007 NaN HOUSEHOLD TYPE BY RELATIONSHIP P29 Total!!In households!!In family households!!Sp... 0 NaN int NaN NaN

This dataframe is populated just like the census's table describing the variables on the corresponding api website. Fortunately, this means that you can modify and filter this dataframe just like you can regular pandas dataframes, so working out what the exact codes to use in your query is easy.

I've added a function, varslike, that globs variables that fit a regular expression pattern. It can use the builtin python re module, in addition to the fnmatch module. It also can use any filtering function you want.

In [9]:
conn.varslike('H011[AB]')
Out[9]:
attributes concept group label limit predicateOnly predicateType required values
H011A002 NaN TOTAL POPULATION IN OCCUPIED HOUSING UNITS BY ... H11A Population in occupied housing units!!Owned wi... 0 NaN int NaN NaN
H011A001 NaN TOTAL POPULATION IN OCCUPIED HOUSING UNITS BY ... H11A Population in occupied housing units 0 NaN int NaN NaN
H011A004 NaN TOTAL POPULATION IN OCCUPIED HOUSING UNITS BY ... H11A Population in occupied housing units!!Renter o... 0 NaN int NaN NaN
H011A003 NaN TOTAL POPULATION IN OCCUPIED HOUSING UNITS BY ... H11A Population in occupied housing units!!Owned fr... 0 NaN int NaN NaN
H011B004 NaN TOTAL POPULATION IN OCCUPIED HOUSING UNITS BY ... H11B Population in occupied housing units!!Renter o... 0 NaN int NaN NaN
H011B002 NaN TOTAL POPULATION IN OCCUPIED HOUSING UNITS BY ... H11B Population in occupied housing units!!Owned wi... 0 NaN int NaN NaN
H011B003 NaN TOTAL POPULATION IN OCCUPIED HOUSING UNITS BY ... H11B Population in occupied housing units!!Owned fr... 0 NaN int NaN NaN
H011B001 NaN TOTAL POPULATION IN OCCUPIED HOUSING UNITS BY ... H11B Population in occupied housing units 0 NaN int NaN NaN

You can also use this functionality to filter variables using an arbitrary field:

In [12]:
conn.varslike('Dominican', by='label')
Out[12]:
attributes concept group label limit predicateOnly predicateType required values
PCT011007 NaN HISPANIC OR LATINO BY SPECIFIC ORIGIN PCT11 Total!!Hispanic or Latino (200-299)!!Dominican... 0 NaN int NaN NaN

Likewise, the different levels of geographic scale are determined from the metadata in the overall API listing and recorded.

However, many Census products have multiple possible geographical indexing systems, like the deprecated fips code system and the new Geographical Names Information System, gnis. Thus, the geographies property is a dictionary of dataframes, where each key is the name of the identifier system and the value is the dataframe describing the identifier system.

For the 2010 census, the following systems are available:

In [13]:
conn.geographies.keys()
Out[13]:
dict_keys(['fips'])

For an explanation of the geographic hierarchies, the geographies tables show the geographies at which the data is summarized:

In [14]:
conn.geographies['fips'].head()
Out[14]:
geoLevelDisplay name optionalWithWCFor referenceDate requires wildcard
0 010 us NaN 2010-01-01 NaN NaN
1 020 region NaN 2010-01-01 NaN NaN
2 030 division NaN 2010-01-01 NaN NaN
3 040 state NaN 2010-01-01 NaN NaN
4 050 county state 2010-01-01 [state] [state]

Note that some geographies in the fips system have a requires filter to prevent drawing too much data. This will get passed to the query method later.

So, let's just grab the housing information from the 2010 Census Short Form. Using the variables table above, we picked out a subset of the fields we wanted. Since the variables table is indexed by the identifiers, we can grab the indexes of the filtered dataframe as query parameters.

In addition, adding the NAME field smart-fills the table with the name of the geographic entity being pulled from the Census.

In [21]:
cols = conn.varslike('H00[012]*', engine='fnmatch').index.tolist()
In [22]:
cols.append('NAME')
In [23]:
cols
Out[23]:
['H001001',
 'H002001',
 'H002006',
 'H002003',
 'H002002',
 'H002005',
 'H002004',
 'NAME']

Now the query. The query is constructed just like the API query, and works as follows.

  1. cols - list of columns desired from the database, maps to census API's get=
  2. geo_unit - string denoting the unit of study to pull, maps to census API's in=
  3. geo_filter - dictionary containing groupings of geo_units, if required, maps to for=

To be specific, a fully query tells the server what columns to pull of what underlying geography from what aggregation units. It's structured using these heterogeneous datatypes so it's easy to change the smallest units quickly, while providing sufficient granularity to change the filters and columns as you go.

This query below grabs the names, population, and housing estimates from the ACS, as well as their standard errors from census designated places in Arizona.

In [24]:
data = conn.query(cols, geo_unit = 'place:*', geo_filter = {'state':'04'})

Once constructed, the query executes as fast as your internet connection will move. This query has:

In [25]:
data.shape
Out[25]:
(451, 10)

28 columns and 451 rows. So, rather fast.

For validity and ease of use, we store the last executed query to the object. If you're dodgy about your census API key never being shown in plaintext, never print this property!

In [26]:
conn.last_query
Out[26]:
'https://api.census.gov/data/2010/dec/sf1?get=H001001,H002001,H002006,H002003,H002002,H002005,H002004,NAME&for=place:*&in=state:04&key=174dc2099125916233a42788cc0ffd0336d2ca85'

So, you have a dataframe with the information requested, plus the fields specified in the geo_filter and geo_unit. Sometimes, the pandas.infer_objects() function is not able to infer the types or structures of the data in the ways that you might expect. Thus, you may need to format the final data to ensure that the data types are correct.

So, the following is a dataframe of the data requested. I've filtered it to only look at data where the population is larger than 40 thousand people.

Pretty neat!

In [27]:
data[data['H001001'].astype(int) > 40000]
Out[27]:
H001001 H002001 H002006 H002003 H002002 H002005 H002004 NAME state place
63 94404 94404 0 94394 94394 10 0 Chandler city, Arizona 04 12000
146 74907 74907 0 74880 74880 27 0 Gilbert town, Arizona 04 27400
148 90505 90505 0 90493 90493 12 0 Glendale city, Arizona 04 27820
224 201173 201173 0 200979 200979 194 0 Mesa city, Arizona 04 46000
266 64818 64818 0 60939 64133 685 3194 Peoria city, Arizona 04 54050
268 590149 590149 0 587936 587936 2213 0 Phoenix city, Arizona 04 55000
328 124001 124001 0 120049 120049 3952 0 Scottsdale city, Arizona 04 65000
366 52586 52586 0 51082 51082 1504 0 Surprise city, Arizona 04 71510
375 73462 73462 0 73462 73462 0 0 Tempe city, Arizona 04 73000
394 229762 229762 0 228506 228577 1185 71 Tucson city, Arizona 04 77000

And, just in case you're liable to forget your FIPS codes, the explorer module can look up some fips codes listings for you.

In [28]:
c.explorer.fips_table('place', in_state='AZ')
Out[28]:
AZ 04 00730 Aguila CDP Census Designated Place S Maricopa County
0 AZ 4 870 Ajo CDP Census Designated Place S Pima County
1 AZ 4 940 Ak Chin CDP Census Designated Place S Pima County
2 AZ 4 1090 Ak-Chin Village CDP Census Designated Place S Pinal County
3 AZ 4 1170 Alamo Lake CDP Census Designated Place S La Paz County
4 AZ 4 1560 Ali Chuk CDP Census Designated Place S Pima County
5 AZ 4 1570 Ali Chukson CDP Census Designated Place S Pima County
6 AZ 4 1620 Ali Molina CDP Census Designated Place S Pima County
7 AZ 4 1920 Alpine CDP Census Designated Place S Apache County
8 AZ 4 1990 Amado CDP Census Designated Place S Santa Cruz County
9 AZ 4 2270 Anegam CDP Census Designated Place S Pima County
10 AZ 4 2410 Antares CDP Census Designated Place S Mohave County
11 AZ 4 2430 Anthem CDP Census Designated Place S Maricopa County
12 AZ 4 2830 Apache Junction city Incorporated Place A Maricopa County, Pinal County
13 AZ 4 3320 Arivaca CDP Census Designated Place S Pima County
14 AZ 4 3380 Arivaca Junction CDP Census Designated Place S Pima County
15 AZ 4 3530 Arizona City CDP Census Designated Place S Pinal County
16 AZ 4 3915 Arizona Village CDP Census Designated Place S Mohave County
17 AZ 4 4020 Arlington CDP Census Designated Place S Maricopa County
18 AZ 4 4440 Ash Fork CDP Census Designated Place S Yavapai County
19 AZ 4 4710 Avenue B and C CDP Census Designated Place S Yuma County
20 AZ 4 4720 Avondale city Incorporated Place A Maricopa County
21 AZ 4 4880 Avra Valley CDP Census Designated Place S Pima County
22 AZ 4 4930 Aztec CDP Census Designated Place S Yuma County
23 AZ 4 5140 Bagdad CDP Census Designated Place S Yavapai County
24 AZ 4 5450 Bear Flat CDP Census Designated Place S Gila County
25 AZ 4 5490 Beaver Dam CDP Census Designated Place S Mohave County
26 AZ 4 5495 Beaver Valley CDP Census Designated Place S Gila County
27 AZ 4 5770 Benson city Incorporated Place A Cochise County
28 AZ 4 5970 Beyerville CDP Census Designated Place S Santa Cruz County
29 AZ 4 6260 Bisbee city Incorporated Place A Cochise County
... ... ... ... ... ... ... ...
420 AZ 4 82120 Wheatfields CDP Census Designated Place S Gila County
421 AZ 4 82155 Whetstone CDP Census Designated Place S Cochise County
422 AZ 4 82270 Whispering Pines CDP Census Designated Place S Gila County
423 AZ 4 82390 Whitecone CDP Census Designated Place S Navajo County
424 AZ 4 82425 White Hills CDP Census Designated Place S Mohave County
425 AZ 4 82450 White Mountain Lake CDP Census Designated Place S Navajo County
426 AZ 4 82530 Whiteriver CDP Census Designated Place S Navajo County
427 AZ 4 82660 Why CDP Census Designated Place S Pima County
428 AZ 4 82740 Wickenburg town Incorporated Place A Maricopa County, Yavapai County
429 AZ 4 82810 Wide Ruins CDP Census Designated Place S Apache County
430 AZ 4 82880 Wikieup CDP Census Designated Place S Mohave County
431 AZ 4 82950 Wilhoit CDP Census Designated Place S Yavapai County
432 AZ 4 83090 Willcox city Incorporated Place A Cochise County
433 AZ 4 83160 Williams city Incorporated Place A Coconino County
434 AZ 4 83388 Williamson CDP Census Designated Place S Yavapai County
435 AZ 4 83475 Willow Canyon CDP Census Designated Place S Pima County
436 AZ 4 83570 Willow Valley CDP Census Designated Place S Mohave County
437 AZ 4 83720 Window Rock CDP Census Designated Place S Apache County
438 AZ 4 83790 Winkelman town Incorporated Place A Gila County, Pinal County
439 AZ 4 83930 Winslow city Incorporated Place A Navajo County
440 AZ 4 83960 Winslow West CDP Census Designated Place S Coconino County, Navajo County
441 AZ 4 84000 Wintersburg CDP Census Designated Place S Maricopa County
442 AZ 4 84140 Wittmann CDP Census Designated Place S Maricopa County
443 AZ 4 84350 Woodruff CDP Census Designated Place S Navajo County
444 AZ 4 84980 Yarnell CDP Census Designated Place S Yavapai County
445 AZ 4 85260 York CDP Census Designated Place S Greenlee County
446 AZ 4 85330 Young CDP Census Designated Place S Gila County
447 AZ 4 85400 Youngtown town Incorporated Place A Maricopa County
448 AZ 4 85470 Yucca CDP Census Designated Place S Mohave County
449 AZ 4 85540 Yuma city Incorporated Place A Yuma County

450 rows × 7 columns

GEO & Tiger Integration

The Census TIGER geometry API is substantively different from every other API, in that it's an ArcGIS REST API. But, I've tried to expose a consistent interface. It works like this:

In [29]:
import cenpy.tiger as tiger
In [30]:
tiger.available()
Out[30]:
[{'name': 'AIANNHA', 'type': 'MapServer'},
 {'name': 'CBSA', 'type': 'MapServer'},
 {'name': 'Hydro', 'type': 'MapServer'},
 {'name': 'Labels', 'type': 'MapServer'},
 {'name': 'Legislative', 'type': 'MapServer'},
 {'name': 'Places_CouSub_ConCity_SubMCD', 'type': 'MapServer'},
 {'name': 'PUMA_TAD_TAZ_UGA_ZCTA', 'type': 'MapServer'},
 {'name': 'Region_Division', 'type': 'MapServer'},
 {'name': 'School', 'type': 'MapServer'},
 {'name': 'Special_Land_Use_Areas', 'type': 'MapServer'},
 {'name': 'State_County', 'type': 'MapServer'},
 {'name': 'tigerWMS_ACS2013', 'type': 'MapServer'},
 {'name': 'tigerWMS_ACS2014', 'type': 'MapServer'},
 {'name': 'tigerWMS_ACS2015', 'type': 'MapServer'},
 {'name': 'tigerWMS_ACS2016', 'type': 'MapServer'},
 {'name': 'tigerWMS_ACS2017', 'type': 'MapServer'},
 {'name': 'tigerWMS_ACS2018', 'type': 'MapServer'},
 {'name': 'tigerWMS_ACS2019', 'type': 'MapServer'},
 {'name': 'tigerWMS_Census2010', 'type': 'MapServer'},
 {'name': 'tigerWMS_Current', 'type': 'MapServer'},
 {'name': 'tigerWMS_ECON2012', 'type': 'MapServer'},
 {'name': 'tigerWMS_PhysicalFeatures', 'type': 'MapServer'},
 {'name': 'Tracts_Blocks', 'type': 'MapServer'},
 {'name': 'Transportation', 'type': 'MapServer'},
 {'name': 'TribalTracts', 'type': 'MapServer'},
 {'name': 'Urban', 'type': 'MapServer'},
 {'name': 'USLandmass', 'type': 'MapServer'}]

In some cases, it makes quite a bit of sense to "attach" a map server to your connection. In the case of the US Census 2010 we've been using, there is an obvious data product match in tigerWMS_Census2010. So, let's attach it to the connection.

In [31]:
conn.set_mapservice('tigerWMS_Census2010')
Out[31]:
Connection to Decennial SF1(ID: https://api.census.gov/data/id/DECENNIALSF12010)
With MapServer: Census 2010 WMS
In [32]:
conn.mapservice
Out[32]:
<cenpy.tiger.TigerConnection at 0x7fc715d04ef0>

neat! this is the same as calling:

tiger.TigerConnection('tigerWMS_Census2010')

but this attaches that object it to the connection you've been using. The connection also updates with this information:

In [33]:
conn
Out[33]:
Connection to Decennial SF1(ID: https://api.census.gov/data/id/DECENNIALSF12010)
With MapServer: Census 2010 WMS

An ESRI MapServer is a big thing, and cenpy doesn't support all of its features. Since cenpy is designed to support retreival of data from the US Census, we only support GET statements for defined geographic units, and ignore the vaious other functionalities in the service.

To work with a service, note that any map server is composed of layers:

In [34]:
conn.mapservice.layers
Out[34]:
[(ESRILayer) Public Use Microdata Areas,
 (ESRILayer) Public Use Microdata Areas Labels,
 (ESRILayer) Traffic Analysis Districts,
 (ESRILayer) Traffic Analysis Districts Labels,
 (ESRILayer) Traffic Analysis Zones,
 (ESRILayer) Traffic Analysis Zones Labels,
 (ESRILayer) Urban Growth Areas,
 (ESRILayer) Urban Growth Areas Labels,
 (ESRILayer) ZIP Code Tabulation Areas,
 (ESRILayer) ZIP Code Tabulation Areas Labels,
 (ESRILayer) Tribal Census Tracts,
 (ESRILayer) Tribal Census Tracts Labels,
 (ESRILayer) Tribal Block Groups,
 (ESRILayer) Tribal Block Groups Labels,
 (ESRILayer) Census Tracts,
 (ESRILayer) Census Tracts Labels,
 (ESRILayer) Census Block Groups,
 (ESRILayer) Census Block Groups Labels,
 (ESRILayer) Census Blocks,
 (ESRILayer) Census Blocks Labels,
 (ESRILayer) Unified School Districts,
 (ESRILayer) Unified School Districts Labels,
 (ESRILayer) Secondary School Districts,
 (ESRILayer) Secondary School Districts Labels,
 (ESRILayer) Elementary School Districts,
 (ESRILayer) Elementary School Districts Labels,
 (ESRILayer) Estates,
 (ESRILayer) Estates Labels,
 (ESRILayer) County Subdivisions,
 (ESRILayer) County Subdivisions Labels,
 (ESRILayer) Subbarrios,
 (ESRILayer) Subbarrios Labels,
 (ESRILayer) Consolidated Cities,
 (ESRILayer) Consolidated Cities Labels,
 (ESRILayer) Incorporated Places,
 (ESRILayer) Incorporated Places Labels,
 (ESRILayer) Census Designated Places,
 (ESRILayer) Census Designated Places Labels,
 (ESRILayer) Alaska Native Regional Corporations,
 (ESRILayer) Alaska Native Regional Corporations Labels,
 (ESRILayer) Tribal Subdivisions,
 (ESRILayer) Tribal Subdivisions Labels,
 (ESRILayer) Federal American Indian Reservations,
 (ESRILayer) Federal American Indian Reservations Labels,
 (ESRILayer) Off-Reservation Trust Lands,
 (ESRILayer) Off-Reservation Trust Lands Labels,
 (ESRILayer) State American Indian Reservations,
 (ESRILayer) State American Indian Reservations Labels,
 (ESRILayer) Hawaiian Home Lands,
 (ESRILayer) Hawaiian Home Lands Labels,
 (ESRILayer) Alaska Native Village Statistical Areas,
 (ESRILayer) Alaska Native Village Statistical Areas Labels,
 (ESRILayer) Oklahoma Tribal Statistical Areas,
 (ESRILayer) Oklahoma Tribal Statistical Areas Labels,
 (ESRILayer) State Designated Tribal Statistical Areas,
 (ESRILayer) State Designated Tribal Statistical Areas Labels,
 (ESRILayer) Tribal Designated Statistical Areas,
 (ESRILayer) Tribal Designated Statistical Areas Labels,
 (ESRILayer) American Indian Joint-Use Areas,
 (ESRILayer) American Indian Joint-Use Areas Labels,
 (ESRILayer) 113th Congressional Districts,
 (ESRILayer) 113th Congressional Districts Labels,
 (ESRILayer) 111th Congressional Districts,
 (ESRILayer) 111th Congressional Districts Labels,
 (ESRILayer) 2013 State Legislative Districts - Upper,
 (ESRILayer) 2013 State Legislative Districts - Upper Labels,
 (ESRILayer) 2013 State Legislative Districts - Lower,
 (ESRILayer) 2013 State Legislative Districts - Lower Labels,
 (ESRILayer) 2010 State Legislative Districts - Upper,
 (ESRILayer) 2010 State Legislative Districts - Upper Labels,
 (ESRILayer) 2010 State Legislative Districts - Lower,
 (ESRILayer) 2010 State Legislative Districts - Lower Labels,
 (ESRILayer) Voting Districts,
 (ESRILayer) Voting Districts Labels,
 (ESRILayer) Census Divisions,
 (ESRILayer) Census Divisions Labels,
 (ESRILayer) Census Regions,
 (ESRILayer) Census Regions Labels,
 (ESRILayer) Urbanized Areas,
 (ESRILayer) Urbanized Areas Labels,
 (ESRILayer) Urban Clusters,
 (ESRILayer) Urban Clusters Labels,
 (ESRILayer) Combined New England City and Town Areas,
 (ESRILayer) Combined New England City and Town Areas Labels,
 (ESRILayer) New England City and Town Area Divisions,
 (ESRILayer) New England City and Town Area  Divisions Labels,
 (ESRILayer) Metropolitan New England City and Town Areas,
 (ESRILayer) Metropolitan New England City and Town Areas Labels,
 (ESRILayer) Micropolitan New England City and Town Areas,
 (ESRILayer) Micropolitan New England City and Town Areas Labels,
 (ESRILayer) Combined Statistical Areas,
 (ESRILayer) Combined Statistical Areas Labels,
 (ESRILayer) Metropolitan Divisions,
 (ESRILayer) Metropolitan Divisions Labels,
 (ESRILayer) Metropolitan Statistical Areas,
 (ESRILayer) Metropolitan Statistical Areas Labels,
 (ESRILayer) Micropolitan Statistical Areas,
 (ESRILayer) Micropolitan Statistical Areas Labels,
 (ESRILayer) States,
 (ESRILayer) States Labels,
 (ESRILayer) Counties,
 (ESRILayer) Counties Labels]

These layers are what actually implement query operations. For now, let's focus on the same "class" of units we were using before, Census Designated Places:

In [23]:
conn.mapservice.layers[36]
Out[23]:
(ESRILayer) Census Designated Places

A query function is implemented both at the mapservice level and the layer level. At the mapservice level, a layer ID is required in order to complete the query.

Mapservice queries are driven by SQL. So, to grab all of the geodata that fits the CDPs we pulled before, you could start to construct it like this.

First, just like the main connection, each layer has a set of variables:

In [35]:
conn.mapservice.layers[36].variables
Out[35]:
alias domain length name type
0 MTFCC None 5.0 MTFCC esriFieldTypeString
1 OID None NaN OID esriFieldTypeDouble
2 GEOID None 7.0 GEOID esriFieldTypeString
3 STATE None 2.0 STATE esriFieldTypeString
4 PLACE None 5.0 PLACE esriFieldTypeString
5 BASENAME None 100.0 BASENAME esriFieldTypeString
6 NAME None 100.0 NAME esriFieldTypeString
7 LSADC None 2.0 LSADC esriFieldTypeString
8 FUNCSTAT None 1.0 FUNCSTAT esriFieldTypeString
9 PLACECC None 2.0 PLACECC esriFieldTypeString
10 AREALAND None NaN AREALAND esriFieldTypeDouble
11 AREAWATER None NaN AREAWATER esriFieldTypeDouble
12 UR None 1.0 UR esriFieldTypeString
13 CBSAPCI None 1.0 CBSAPCI esriFieldTypeString
14 NECTAPCI None 1.0 NECTAPCI esriFieldTypeString
15 STGEOMETRY None NaN STGEOMETRY esriFieldTypeGeometry
16 CENTLAT None 11.0 CENTLAT esriFieldTypeString
17 CENTLON None 12.0 CENTLON esriFieldTypeString
18 INTPTLAT None 11.0 INTPTLAT esriFieldTypeString
19 INTPTLON None 12.0 INTPTLON esriFieldTypeString
20 PLACENS None 8.0 PLACENS esriFieldTypeString
21 HU100 None NaN HU100 esriFieldTypeDouble
22 POP100 None NaN POP100 esriFieldTypeDouble
23 OBJECTID None NaN OBJECTID esriFieldTypeOID

Our prior query grabbed the places in AZ. So, we could use a SQL query that focuses on that.

I try to pack the geometries into containers that people are used to using. Without knowing if GEOS is installed on a user's computer, I use PySAL as the target geometry type.

If you do have GEOS, that means you can use Shapely or GeoPandas. So, to choose your backend, you can use the following two arguments to this query function. the pkg argument will let you choose the three types of python objects to output to.

Pysal is default. If you select Shapely, the result will just be a pandas dataframe with Shapely geometries instead of pysal geometries. If you choose geopandas (or throw a gpize) option, cenpy will try to convert the pandas dataframe into a GeoPandas dataframe.

In [36]:
geodata = conn.mapservice.query(layer=36, where='STATE = 04')
In [37]:
geodata.head()
Out[37]:
AREALAND AREAWATER BASENAME CBSAPCI CENTLAT CENTLON FUNCSTAT GEOID HU100 INTPTLAT ... NECTAPCI OBJECTID OID PLACE PLACECC PLACENS POP100 STATE UR geometry
0 314183 0 Donovan Estates N +32.7093536 -114.6782229 S 0419790 394 +32.7093536 ... N 19810 280403717389013 19790 U2 02582773 1508 04 U POLYGON ((-12766151.5981 3857230.2984, -127661...
1 3034369 0 Kohls Ranch N +34.3210530 -111.0838546 S 0438600 127 +34.3210530 ... N 20308 280403717476719 38600 U1 02582809 46 04 R POLYGON ((-12368084.4537 4072701.559100002, -1...
2 20884268 0 Wheatfields N +33.4805233 -110.8366065 S 0482120 465 +33.5553700 ... N 20326 280403717476654 82120 U2 02582899 785 04 R POLYGON ((-12343973.9878 3969254.195299998, -1...
3 8679902 0 Goodyear Village N +33.1973130 -111.8723437 S 0428465 121 +33.1973130 ... N 20216 280403861091191 28465 U2 02612139 457 04 M POLYGON ((-12456879.4474 3922507.554499999, -1...
4 23385412 55226 Carrizo N +33.9866282 -110.3314358 S 0410320 40 +33.9793542 ... N 20283 280403717231648 10320 U1 02582748 127 04 R POLYGON ((-12289756.4977 4027500.7016, -122896...

5 rows × 24 columns

To join the geodata to the other data, use pandas functions:

In [38]:
import pandas as pd
In [39]:
newdata = pd.merge(data, geodata, left_on='place', right_on='PLACE')
In [40]:
newdata.head()
Out[40]:
H001001 H002001 H002006 H002003 H002002 H002005 H002004 NAME_x state place ... NECTAPCI OBJECTID OID PLACE PLACECC PLACENS POP100 STATE UR geometry
0 304 304 0 0 0 304 0 Aguila CDP, Arizona 04 00730 ... N 28835 280403717476713 00730 U1 02582720 798 04 R POLYGON ((-12599756.2327 4021163.631899998, -1...
1 2175 2175 0 0 2006 169 2006 Ajo CDP, Arizona 04 00870 ... N 29791 280401254189026 00870 U1 02407704 3304 04 M POLYGON ((-12573640.5688 3815514.326800004, -1...
2 11 11 0 0 0 11 0 Ak Chin CDP, Arizona 04 00940 ... N 23411 280403717476626 00940 U1 02582721 30 04 R POLYGON ((-12469657.0325 3801071.388700001, -1...
3 256 256 0 0 141 115 141 Ak-Chin Village CDP, Arizona 04 01090 ... N 24700 280401260231698 01090 U1 02407705 862 04 M POLYGON ((-12480838.2961 3895295.468500003, -1...
4 31 31 0 0 0 31 0 Alamo Lake CDP, Arizona 04 01170 ... N 21621 280403717388977 01170 U2 02582722 25 04 R POLYGON ((-12647299.4514 4059688.195799999, -1...

5 rows × 34 columns

So, that's how you get your geodata in addition to your regular data!

OK, that's one API, does it work for others?

We'll try the Economic Census

In [41]:
conn2 = c.remote.APIConnection('CBP2012')

Alright, let's look at the available columns:

In [42]:
conn2.variables
Out[42]:
attributes concept group label limit predicateOnly predicateType required values
for NaN Census API Geography Specification N/A Census API FIPS 'for' clause 0 True fips-for NaN NaN
in NaN Census API Geography Specification N/A Census API FIPS 'in' clause 0 True fips-in NaN NaN
ucgid NaN Census API Geography Specification N/A Uniform Census Geography Identifier clause 0 True ucgid NaN NaN
EMP_N EMP_N_F Geography Area Series: County Business Pattern... CB1200CBP Noise range for number of paid employees for p... 0 NaN int NaN NaN
FOOTID_GEO NaN NaN N/A Geo Footnote 0 NaN string NaN NaN
PAYQTR1_N PAYQTR1_N_F Geography Area Series: County Business Pattern... CB1200CBP Noise range for first-quarter payroll (%) 0 NaN int NaN NaN
CSA NaN NaN N/A FIPS Combined Statistical Area code 0 NaN string NaN NaN
YEAR YEAR_TTL Geography Area Series: County Business Pattern... CB1200CBP Year 0 NaN string NaN {'item': {'1982': '1982', '1983': '1983', '198...
LFO LFO_TTL Geography Area Series: County Business Pattern... CB1200CBP Legal form of organization code 0 NaN string default displayed {'item': {'001': 'All establishments', '002': ...
MD NaN NaN N/A FIPS Metropolitan Division code 0 NaN string NaN NaN
FOOTID_NAICS NaN NaN N/A Naics Footnote 0 NaN string NaN NaN
MSA NaN NaN N/A FIPS Metropolitan Statistical Area or Micropol... 0 NaN NaN NaN NaN
COUNTY NaN NaN N/A FIPS county code 0 NaN NaN NaN NaN
ST NaN NaN N/A FIPS state code 0 NaN NaN NaN NaN
PAYANN_N PAYANN_N_F Geography Area Series: County Business Pattern... CB1200CBP Noise range for annual payroll 0 NaN int NaN NaN
PAYQTR1 PAYQTR1_F Geography Area Series: County Business Pattern... CB1200CBP First-quarter payroll ($1,000) 0 NaN int NaN NaN
EMP EMP_F Geography Area Series: County Business Pattern... CB1200CBP Paid employees for pay period including March ... 0 NaN NaN NaN NaN
NAICS2012 NAICS2012_TTL,NAICS2012_F,INDLEVEL,SECTOR,SUBS... Geography Area Series: County Business Pattern... CB1200CBP 2012 NAICS code 2135 NaN string default displayed {'item': {'00': 'Total for all sectors', '0000...
GEO_ID GEO_TTL,GEO_ID_F Geography Area Series: County Business Pattern... CB1200CBP Geographic identifier code 0 NaN NaN NaN NaN
GEOTYPE NaN NaN N/A Type of geography flag 0 NaN string NaN NaN
ESTAB ESTAB_F Geography Area Series: County Business Pattern... CB1200CBP Number of establishments 0 NaN NaN NaN NaN
PAYANN PAYANN_F Geography Area Series: County Business Pattern... CB1200CBP Annual payroll ($1,000) 0 NaN int NaN NaN
EMPSZES EMPSZES_TTL Geography Area Series: County Business Pattern... CB1200CBP Employment size of establishment 14 NaN string default displayed {'item': {'001': 'All establishments', '204': ...

To show the required predicates, can filter the variables dataframe by the required field. Note that required means that the query will fail if these are not passed as keyword arguments. They don't have to specify a single value, though, so they can be left as a wild card, like we did with place:* in the prior query:

In [43]:
conn2.variables[~ conn2.variables.required.isnull()]
Out[43]:
attributes concept group label limit predicateOnly predicateType required values
LFO LFO_TTL Geography Area Series: County Business Pattern... CB1200CBP Legal form of organization code 0 NaN string default displayed {'item': {'001': 'All establishments', '002': ...
NAICS2012 NAICS2012_TTL,NAICS2012_F,INDLEVEL,SECTOR,SUBS... Geography Area Series: County Business Pattern... CB1200CBP 2012 NAICS code 2135 NaN string default displayed {'item': {'00': 'Total for all sectors', '0000...
EMPSZES EMPSZES_TTL Geography Area Series: County Business Pattern... CB1200CBP Employment size of establishment 14 NaN string default displayed {'item': {'001': 'All establishments', '204': ...

Like before, geographies are shown within a given hierarchy. Here, the only geography is the fips geography.

In [44]:
conn2.geographies.keys()
Out[44]:
dict_keys(['fips'])
In [45]:
conn2.geographies['fips']
Out[45]:
geoLevelDisplay limit name optionalWithWCFor referenceDate requires wildcard
0 NaN 1 us NaN 2012-01-01 NaN NaN
1 08,09 939 metropolitan statistical area/micropolitan sta... NaN 2012-01-01 NaN NaN
2 NaN 3249 county state 2012-01-01 [state] [state]
3 NaN 51 state NaN 2012-01-01 NaN NaN

Now, we'll do some fun with error handling and passing of additional arguments to the query. Any "extra" required predicates beyond get, for and in are added at the end of the query as keyword arguments. These are caught and introduced into the query following the API specifications.

First, though, let's see what happens when we submit a malformed query!

Here, we can query for every column in the dataset applied to places in California (fips = 06). The dataset we're working with, the Economic Census, requires an OPTAX field, which identifies the "type of operation or tax status code" along which to slice the data. Just like the other arguments, we will map them to keywords in the API string, and a wildcard represents a slice of all possible values.

In [48]:
cols = conn2.varslike('ESTAB*', engine='fnmatch').index.tolist()
In [49]:
data2 = conn2.query(cols=cols, geo_unit='county:*', geo_filter={'state':'06'})
In [50]:
data2.head()
Out[50]:
ESTAB state county
0 36700 06 001
1 43 06 003
2 801 06 005
3 4615 06 007
4 891 06 009

And so you get the table of employment by County & NAICS code for employment and establishments in California counties. Since we're using counties as our unit of analysis, we could grab the geodata for counties.

In [51]:
conn2.set_mapservice('State_County')
Out[51]:
Connection to 2012 County Business Patterns(ID: https://api.census.gov/data/id/CBP2012)
With MapServer: States and Counties

But, there are quite a few layers in this MapService:

In [52]:
len(conn2.mapservice.layers)
Out[52]:
71

Oof. If you ever want to check out the web interface to see what it looks like, you can retrieve the URLs of most objects using:

In [53]:
conn2.mapservice._baseurl
Out[53]:
'http://tigerweb.geo.census.gov/arcgis/rest/services/TIGERweb/State_County/MapServer'

Anyway, we know counties don't really change all that much. So, let's just pick a counties layer and pull it down for California:

In [54]:
geodata2= conn2.mapservice.query(layer=1,where='STATE = 06')
In [55]:
newdata2 = pd.merge(data2, geodata2, left_on='county', right_on='COUNTY')
In [56]:
newdata2.head()
Out[56]:
ESTAB state county AREALAND AREAWATER BASENAME CENTLAT CENTLON COUNTY COUNTYCC ... GEOID INTPTLAT INTPTLON LSADC MTFCC NAME OBJECTID OID STATE geometry
0 36700 06 001 1909614756 216907015 Alameda +37.6505687 -121.9177578 001 H1 ... 06001 +37.6471385 -121.9124880 06 G4020 Alameda County 2098 27590141293924 06 POLYGON ((-13612245.2954 4538149.388899997, -1...
1 43 06 003 1912292608 12557304 Alpine +38.5971043 -119.8206026 003 H1 ... 06003 +38.6217831 -119.7983522 06 G4020 Alpine County 1317 27590289634197 06 POLYGON ((-13366502.0648 4678945.273900002, -1...
2 801 06 005 1539933596 29470567 Amador +38.4466174 -120.6516693 005 H1 ... 06005 +38.4435501 -120.6538563 06 G4020 Amador County 2724 27590143912562 06 POLYGON ((-13472696.4062 4647651.505999997, -1...
3 4615 06 007 4238488156 105261063 Butte +39.6665788 -121.6007017 007 H1 ... 06007 +39.6659588 -121.6019188 06 G4020 Butte County 2237 27590417130535 06 POLYGON ((-13565003.3072 4798393.384000003, -1...
4 891 06 009 2641784992 43841871 Calaveras +38.2044678 -120.5546688 009 H1 ... 06009 +38.1838996 -120.5614415 06 G4020 Calaveras County 347 27590202403841 06 POLYGON ((-13428574.0355 4627724.500200003, -1...

5 rows × 22 columns

And that's all there is to it! Geodata and tabular data from the Census APIs in one place.

File an issue if you have concerns!