Read in JSON and DataFrame Basics
# read population in
import json
import requests
from pandas import DataFrame
# pop_json_url holds a
pop_json_url = "https://gist.github.com/rdhyee/8511607/raw/f16257434352916574473e63612fcea55a0c1b1c/population_of_countries.json"
pop_list= requests.get(pop_json_url).json()
df = DataFrame(pop_list)
df[:5]
0 | 1 | 2 | |
---|---|---|---|
0 | 1 | China | 1385566537 |
1 | 2 | India | 1252139596 |
2 | 3 | United States | 320050716 |
3 | 4 | Indonesia | 249865631 |
4 | 5 | Brazil | 200361925 |
5 rows × 3 columns
df.dtypes
0 float64 1 object 2 int64 dtype: object
Q: Based on the above statement, which of these would you expect to see in pop_list?
['1', 'United States', '320050716']
[1, 'United States', 320050716]
['United States', 320050716]
[1, 'United States', '320050716']
Q: What is the relationship between s
and the population of China?
s = sum(df[df[1].str.startswith('C')][2])
s
is greater than the population of Chinas
is the same as the population of Chinas
is less than the population of Chinas
is not a number.Q: This statement does the following?
df.columns = ['Number','Country','Population']
columns
Q: How would you rewrite this statement to get the same result
s = sum(df[df[1].str.startswith('C')][2])
after running:
df.columns = ['Number','Country','Population']
Series Examples
from pandas import DataFrame, Series
import numpy as np
s1 = Series(np.arange(1,4))
s1
0 1 1 2 2 3 dtype: int64
Q: What is
s1 + 1
Q: What is
s1.apply(lambda k: 2*k).sum()
Q: What is
s1.cumsum()[1]
Q: What is
s1.cumsum() + s1.cumsum()
Q: Describe what is happening in these statements:
s1 + 1
and
s1.cumsum() + s1.cumsum()
Q: What is
np.any(s1 > 2)
** Census API Examples **
from census import Census
from us import states
import settings
c = Census(settings.CENSUS_KEY)
c.sf1.get(('NAME', 'P0010001'), {'for': 'state:%s' % states.CA.fips})
[{u'NAME': u'California', u'P0010001': u'37253956', u'state': u'06'}]
Q: What is the purpose of settings.CENSUS_KEY
?
Q: What is the difference between r1
and r2
?
r1 = c.sf1.get(('NAME', 'P0010001'), {'for': 'county:*', 'in': 'state:%s' % states.CA.fips})
r2 = c.sf1.get(('NAME', 'P0010001'), {'for': 'county:*', 'in': 'state:*' })
Q: Which is the correct geographic hierarchy?
Nation > States = Nation is subdivided into States
from pandas import DataFrame
r = c.sf1.get(('NAME', 'P0010001'), {'for': 'state:*'})
df = DataFrame(r)
df.head()
NAME | P0010001 | state | |
---|---|---|---|
0 | Alabama | 4779736 | 01 |
1 | Alaska | 710231 | 02 |
2 | Arizona | 6392017 | 04 |
3 | Arkansas | 2915918 | 05 |
4 | California | 37253956 | 06 |
5 rows × 3 columns
Q: Why does df
have 52 items? Please explain
len(df)
52
Q: Why are the results below different? Please explain
print df.P0010001.sum()
print
print df.P0010001.astype(int).sum()
477973671023163920172915918372539565029196357409789793460172318801310968765313603011567582128306326483802304635528531184339367453337213283615773552654762998836405303925296729759889279894151826341270055113164708791894205917919378102953548367259111536504375135138310741270237910525674625364814180634610525145561276388562574180010246724540185299456869865636263725789 312471327
Q: Describe the output of the following:
df.P0010001 = df.P0010001.astype(int)
df[['NAME','P0010001']].sort('P0010001', ascending=False).head()
Q: After running:
df.set_index('NAME', inplace=True)
how would you access the Series for the state of Alaska?
np.in1d([ s.fips for s in states.STATES], df.state)
array([ True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True], dtype=bool)
df[np.in1d(df.state, [ s.fips for s in states.STATES])]
NAME | P0010001 | state | |
---|---|---|---|
0 | Alabama | 4779736 | 01 |
1 | Alaska | 710231 | 02 |
2 | Arizona | 6392017 | 04 |
3 | Arkansas | 2915918 | 05 |
4 | California | 37253956 | 06 |
5 | Colorado | 5029196 | 08 |
6 | Connecticut | 3574097 | 09 |
7 | Delaware | 897934 | 10 |
8 | District of Columbia | 601723 | 11 |
9 | Florida | 18801310 | 12 |
10 | Georgia | 9687653 | 13 |
11 | Hawaii | 1360301 | 15 |
12 | Idaho | 1567582 | 16 |
13 | Illinois | 12830632 | 17 |
14 | Indiana | 6483802 | 18 |
15 | Iowa | 3046355 | 19 |
16 | Kansas | 2853118 | 20 |
17 | Kentucky | 4339367 | 21 |
18 | Louisiana | 4533372 | 22 |
19 | Maine | 1328361 | 23 |
20 | Maryland | 5773552 | 24 |
21 | Massachusetts | 6547629 | 25 |
22 | Michigan | 9883640 | 26 |
23 | Minnesota | 5303925 | 27 |
24 | Mississippi | 2967297 | 28 |
25 | Missouri | 5988927 | 29 |
26 | Montana | 989415 | 30 |
27 | Nebraska | 1826341 | 31 |
28 | Nevada | 2700551 | 32 |
29 | New Hampshire | 1316470 | 33 |
30 | New Jersey | 8791894 | 34 |
31 | New Mexico | 2059179 | 35 |
32 | New York | 19378102 | 36 |
33 | North Carolina | 9535483 | 37 |
34 | North Dakota | 672591 | 38 |
35 | Ohio | 11536504 | 39 |
36 | Oklahoma | 3751351 | 40 |
37 | Oregon | 3831074 | 41 |
38 | Pennsylvania | 12702379 | 42 |
39 | Rhode Island | 1052567 | 44 |
40 | South Carolina | 4625364 | 45 |
41 | South Dakota | 814180 | 46 |
42 | Tennessee | 6346105 | 47 |
43 | Texas | 25145561 | 48 |
44 | Utah | 2763885 | 49 |
45 | Vermont | 625741 | 50 |
46 | Virginia | 8001024 | 51 |
47 | Washington | 6724540 | 53 |
48 | West Virginia | 1852994 | 54 |
49 | Wisconsin | 5686986 | 55 |
50 | Wyoming | 563626 | 56 |
51 rows × 3 columns