About this document¶

This is an IJulia notebook developed to explore some of the data relevant to Chicago's housing retrofitting project. More information about this project and data sources can be found here: https://github.com/dssg/data-challenges/tree/master/RetrofitChicagoTargeting

In [1]:

using DataFrames
using DataStructures

In [2]:

using Gadfly
using Color

In [214]:

colors = distinguishable_colors(8)
gajomidark = Theme(default_color = color("black"),
                    major_label_color = color("black"),major_label_font_size=18px,
                    minor_label_color = color("black"),minor_label_font_size=14px,
                    panel_stroke = color("black"),
                    default_point_size = 0.5mm,
                    line_width = 1mm,
                    grid_line_width = 0px,
                    highlight_width = 0px)
set_default_plot_size(20cm, 15cm)

Download all data¶

The energy efficiency data can be downloaded manually at https://data.cityofchicago.org/api/views/8yq3-m6wp/rows.csv?accessType=DOWNLOAD and the socioeconomic data can be dowloaded at https://data.cityofchicago.org/api/views/kn9c-c2s2/rows.csv?accessType=DOWNLOAD.

The following julia code should accomplish this auomatically.

In [302]:

download("https://data.cityofchicago.org/api/views/8yq3-m6wp/rows.csv?accessType=DOWNLOAD","energy.csv")
download("https://data.cityofchicago.org/api/views/kn9c-c2s2/rows.csv?accessType=DOWNLOAD","socio.csv")

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 24.7M  100 24.7M    0     0   252k      0  0:01:40  0:01:40 --:--:--  197k
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  3862  100  3862    0     0   3672      0  0:00:01  0:00:01 --:--:--  3674

Out[302]:

"socio.csv"

Inspecting socioeconomic data¶

In [303]:

fulleconsummary = readtable("socio.csv")
map(display,names(fullecondata));

:Community_Area_Number

:COMMUNITY_AREA_NAME

:PERCENT_OF_HOUSING_CROWDED

:PERCENT_HOUSEHOLDS_BELOW_POVERTY

:PERCENT_AGED_16_UNEMPLOYED

:PERCENT_AGED_25_WITHOUT_HIGH_SCHOOL_DIPLOMA

:PERCENT_AGED_UNDER_18_OR_OVER_64

:PER_CAPITA_INCOME

:HARDSHIP_INDEX

We see that the table contains information for different communities realted to housing, employment, income and other socioeconomic summary statistics. Since for the moment we are just interested in getting a feel for the basic patterns in our dataset, we will focus on two column: those pertaining to per capita income and poverty percentages. We extract this information into a new dataframe, sorting the communities by the precentage of households below the poverty level.

In [304]:

fullnames = [:COMMUNITY_AREA_NAME,:PERCENT_HOUSEHOLDS_BELOW_POVERTY, :PER_CAPITA_INCOME]
shortnames = [:community,:percent_poverty,:per_capita_income]
econsummary = sort!(names!(fullecondata[:,fullnames],shortnames),cols=:percent_poverty,rev=true)

Out[304]:

	community	percent_poverty	per_capita_income
1	Riverdale	56.5	8201
2	Fuller Park	51.2	10432
3	Englewood	46.6	11888
4	North Lawndale	43.1	12034
5	East Garfield Park	42.4	12961
6	Washington Park	42.1	13785
7	West Garfield Park	41.7	10934
8	Armour Square	40.1	16148
9	Oakland	39.7	19252
10	West Englewood	34.4	11317
11	Humboldt park	33.9	13781
12	Burnside	33.0	12515
13	South Shore	31.1	19398
14	South Lawndale	30.7	10402
15	Woodlawn	30.7	18672
16	South Chicago	29.8	16579
17	Douglas	29.6	23791
18	Greater Grand Crossing	29.6	17285
19	Grand Boulevard	29.3	23472
20	South Deering	29.2	14685
21	New City	29.0	12765
22	Austin	28.6	15957
23	Chicago Lawn	27.9	13231
24	Chatham	27.8	18881
25	Auburn Gresham	27.6	15528
26	West Pullman	25.9	16563
27	Lower West Side	25.8	16444
28	Uptown	24.0	35787
29	Rogers Park	23.6	23939
30	Brighton Park	23.6	13089
⋮	⋮	⋮	⋮

Since per capita income is a measure of average wealth and the percentage of households below the poverty level a measure of absense of wealth, we might expect that these two statistics correlate negatively with one another, assuming that communities are no excessively hetereogenous. We can see this kind of correlation readily in a simple scatter plot.

In [305]:

plot(econsummary,x="percent_poverty",y="per_capita_income",label=:community,Geom.label,Geom.point,gajomidark)

Out[305]:

In addition to these socioeconomic summary statistics at the community level we have extensive data on energy usage and residency at sub-census block level. Energy, in this context refers to energy assocaited with electricity (with total energy for a period in kilowatt-hours) and energy associated with natural gas (recorded "therms", which can be converted into energy units given a conversion factor). The full list of columns in this dataset is quite large, and can be seen below.

Inspecting energy data¶

In [307]:

fullenergydetail = readtable("energy.csv")
map(display,names(fullenergydetail));

:COMMUNITY_AREA_NAME

:CENSUS_BLOCK

:BUILDING_TYPE

:BUILDING_SUBTYPE

:KWH_JANUARY_2010

:KWH_FEBRUARY_2010

:KWH_MARCH_2010

:KWH_APRIL_2010

:KWH_MAY_2010

:KWH_JUNE_2010

:KWH_JULY_2010

:KWH_AUGUST_2010

:KWH_SEPTEMBER_2010

:KWH_OCTOBER_2010

:KWH_NOVEMBER_2010

:KWH_DECEMBER_2010

:TOTAL_KWH

:ELECTRICITY_ACCOUNTS

:ZERO_KWH_ACCOUNTS

:THERM_JANUARY_2010

:THERM_FEBRUARY_2010

:THERM_MARCH_2010

:TERM_APRIL_2010

:THERM_MAY_2010

:THERM_JUNE_2010

:THERM_JULY_2010

:THERM_AUGUST_2010

:THERM_SEPTEMBER_2010

:THERM_OCTOBER_2010

:THERM_NOVEMBER_2010

:THERM_DECEMBER_2010

:TOTAL_THERMS

:GAS_ACCOUNTS

:KWH_TOTAL_SQFT

:THERMS_TOTAL_SQFT

:KWH_MEAN_2010

:KWH_STANDARD_DEVIATION_2010

:KWH_MINIMUM_2010

:KWH_1ST_QUARTILE_2010

:KWH_2ND_QUARTILE_2010

:KWH_3RD_QUARTILE_2010

:KWH_MAXIMUM_2010

:KWH_SQFT_MEAN_2010

:KWH_SQFT_STANDARD_DEVIATION_2010

:KWH_SQFT_MINIMUM_2010

:KWH_SQFT_1ST_QUARTILE_2010

:KWH_SQFT_2ND_QUARTILE_2010

:KWH_SQFT_3RD_QUARTILE_2010

:KWH_SQFT_MAXIMUM_2010

:THERM_MEAN_2010

:THERM_STANDARD_DEVIATION_2010

:THERM_MINIMUM_2010

:THERM_1ST_QUARTILE_2010

:THERM_2ND_QUARTILE_2010

:THERM_3RD_QUARTILE_2010

:THERM_MAXIMUM_2010

:THERMS_SQFT_MEAN_2010

:THERMS_SQFT_STANDARD_DEVIATION_2010

:THERMS_SQFT_MINIMUM_2010

:THERMS_SQFT_1ST_QUARTILE_2010

:THERMS_SQFT_2ND_QUARTILE_2010

:THERMS_SQFT_3RD_QUARTILE_2010

:THERMS_SQFT_MAXIMUM_2010

:TOTAL_POPULATION

:TOTAL_UNITS

:AVERAGE_STORIES

:AVERAGE_BUILDING_AGE

:AVERAGE_HOUSESIZE

:OCCUPIED_UNITS

:OCCUPIED_UNITS_PERCENTAGE

:RENTER_OCCUPIED_HOUSING_UNITS

:RENTER_OCCUPIED_HOUSING_PERCENTAGE

:OCCUPIED_HOUSING_UNITS

We see that the dataset includes information about energy usage for the year on a month by month basis as well as information averaged over the whole year. Temporal information may be especially relevant when considering the differences in electricity and gas usage, but for a first pass we will consider year averaged data. Since we are interested not jsut in total energy usage but in energy efficieny, we also take note of the total population and energy usage relevant square footage, which can be used to compute specific energy efficiencies. Finally, we note both the building type and it's age.

In [308]:

fullnames = [:COMMUNITY_AREA_NAME,:CENSUS_BLOCK,:TOTAL_POPULATION,:BUILDING_TYPE,:THERMS_TOTAL_SQFT,:TOTAL_KWH,:TOTAL_THERMS]
shortnames = [:community,:block,:population,:building_type,:sqft,:kwh,:therms]
energydetail = complete_cases!(names!(fullenergydetail[:,fullnames],shortnames))

Out[308]:

	community	block	population	building_type	sqft	kwh	therms
1	Albany Park	170311401001001	132	Residential	48825	201032	36822
2	Albany Park	170311401001001	132	Residential	3306	20608	755
3	Albany Park	170311401001001	132	Residential	9472	61020	6065
4	Albany Park	170311401001002	228	Residential	14407	16752	14653
5	Albany Park	170311401001002	228	Residential	58835	244341	51590
6	Albany Park	170311401001003	231	Commercial	8240	45330	11217
7	Albany Park	170311401001003	231	Residential	13305	25300	9485
8	Albany Park	170311401001003	231	Residential	16654	65982	14139
9	Albany Park	170311401001003	231	Residential	10840	79663	14780
10	Albany Park	170311401001004	456	Commercial	127916	369231	114547
11	Albany Park	170311401001004	456	Residential	25086	63378	25518
12	Albany Park	170311401001005	73	Residential	25339	100191	16191
13	Albany Park	170311401001005	73	Residential	6804	7070	1948
14	Albany Park	170311401001006	0	Commercial	28700	108743	10107
15	Albany Park	170311401001007	208	Commercial	5128	28935	5240
16	Albany Park	170311401001007	208	Residential	48205	142707	40087
17	Albany Park	170311401001007	208	Residential	3726	51150	5687
18	Albany Park	170311401001008	200	Residential	49278	209392	41226
19	Albany Park	170311401001009	116	Residential	17388	49878	14529
20	Albany Park	170311401001009	116	Residential	19261	124140	20472
21	Albany Park	170311401002000	232	Commercial	5496	13196	6537
22	Albany Park	170311401002000	232	Residential	39952	215223	32761
23	Albany Park	170311401002000	232	Residential	14486	119681	11934
24	Albany Park	170311401002001	258	Commercial	5628	29183	4074
25	Albany Park	170311401002001	258	Residential	51390	207428	73929
26	Albany Park	170311401002001	258	Residential	2708	24653	3734
27	Albany Park	170311401002002	192	Commercial	10872	33619	8474
28	Albany Park	170311401002002	192	Residential	41996	139524	31640
29	Albany Park	170311401002002	192	Residential	6866	56482	7221
30	Albany Park	170311401002003	213	Commercial	2100	53596	15075
⋮	⋮	⋮	⋮	⋮	⋮	⋮	⋮

The above table shows that there may be multiple energy usage data rows for the same census block. Since the population statistics are collected for the entire census block this presents a significant complication in the analysis of per capita energy usage statistics. As such, for out initial look into the data, we select only census block with unique rows in the above table. We also see that a census block may contain commercial, industrial and residential and building types. We are interested in residential housing only, and so select the rows that fit this criterion. Finally, we see that census blocks may be deviod of people! As such, we remove rows with zero population.

In [309]:

dupcounts = counter(energydetail[:block])
rows = (&)(array(energydetail[:building_type].=="Residential"),
           array(energydetail[:population].>0),
           array(map(i->dupcounts[i],energydetail[:block]).==1))
energydetail = delete!(energydetail[rows,:],[:block,:building_type])

Out[309]:

	community	population	sqft	kwh	therms
1	Albany Park	200	49278	209392	41226
2	Albany Park	216	38424	160811	37675
3	Albany Park	101	16119	128006	19418
4	Albany Park	198	18800	82883	19183
5	Albany Park	67	18866	112954	15332
6	Albany Park	286	56929	229624	31643
7	Albany Park	158	10163	54492	8002
8	Albany Park	3	3236	20594	2006
9	Albany Park	25	12851	76119	12858
10	Albany Park	53	24691	123490	24588
11	Albany Park	236	47073	191120	46149
12	Albany Park	19	9150	51920	8543
13	Albany Park	206	48777	184677	44127
14	Albany Park	23	17024	86464	15304
15	Albany Park	47	23893	150530	20576
16	Albany Park	125	6833	7153	25864
17	Albany Park	14	6176	36171	9010
18	Albany Park	41	21749	120511	23344
19	Albany Park	61	15637	126508	19276
20	Albany Park	40	23042	149660	21980
21	Albany Park	64	24074	147661	24933
22	Albany Park	46	23380	134328	21894
23	Albany Park	80	1036	17164	3122
24	Albany Park	62	19871	187969	23087
25	Albany Park	46	20054	143696	22168
26	Albany Park	10	4723	31525	4898
27	Albany Park	15	6167	42429	6465
28	Albany Park	44	22047	120338	23807
29	Albany Park	71	21583	159578	24718
30	Albany Park	47	19350	123013	17779
⋮	⋮	⋮	⋮	⋮	⋮

Having filtered the data down to the subset we are interested in, we add to the table simple statistics relevant to energy efficiency

In [310]:

per(denominator::Symbol,numerator::Symbol,data::DataFrame) = data[numerator]./data[denominator]

Out[310]:

per (generic function with 1 method)

In [311]:

#add stats
energydata[:kwh_per_capita] = per(:population,:kwh,energydata)
energydata[:therms_per_capita] = per(:population,:therms,energydata)
energydata[:kwh_per_sqft] = per(:sqft,:kwh,energydata)
energydata[:therms_per_sqft] = per(:sqft,:therms,energydata)
energydata

Out[311]:

	community	population	sqft	kwh	therms	kwh_per_capita	therms_per_capita	kwh_per_sqft	therms_per_sqft
1	Albany Park	67	22140	147942	26344	2208.089552238806	393.1940298507463	6.682113821138211	1.1898825654923215
2	Albany Park	45	20125	117665	22167	2614.777777777778	492.6	5.8467080745341615	1.1014658385093168
3	Albany Park	64	18536	113633	21252	1775.515625	332.0625	6.130394907207596	1.146525679758308
4	Albany Park	29	17255	91166	18912	3143.655172413793	652.1379310344828	5.2834540712836855	1.096030136192408
5	Albany Park	52	11766	99825	11996	1919.7115384615386	230.69230769230768	8.48419173890872	1.019547849736529
6	Albany Park	52	8783	54886	10901	1055.5	209.6346153846154	6.249117613571673	1.2411476716383925
7	Albany Park	85	25196	99683	23426	1172.7411764705882	275.6	3.9563025877123352	0.9297507540879505
8	Albany Park	136	34516	194872	26558	1432.8823529411766	195.27941176470588	5.645845405029552	0.7694402595897555
9	Albany Park	120	35570	192173	43028	1601.4416666666666	358.56666666666666	5.402670789991566	1.2096710711273546
10	Albany Park	357	57079	394016	80580	1103.686274509804	225.71428571428572	6.90299409590217	1.4117276055992571
11	Albany Park	24	2428	20970	1603	873.75	66.79166666666667	8.63673805601318	0.6602141680395387
12	Albany Park	17	6749	38454	5138	2262.0	302.2352941176471	5.697732997481109	0.76129797006964
13	Albany Park	46	24539	110585	18679	2404.021739130435	406.0652173913044	4.506499857369901	0.7611964627735441
14	Albany Park	45	19377	68842	16018	1529.8222222222223	355.9555555555556	3.5527687464519793	0.8266501522423492
15	Albany Park	23	21184	104351	19060	4537.0	828.695652173913	4.925934667673716	0.8997356495468278
16	Albany Park	51	30746	130256	27238	2554.0392156862745	534.0784313725491	4.236518571521499	0.8859038574123463
17	Albany Park	71	34447	126550	28684	1782.394366197183	404.0	3.6737596887972828	0.8326995093912387
18	Albany Park	17	9479	40598	10379	2388.1176470588234	610.5294117647059	4.282941238527271	1.0949467243380104
19	Albany Park	22	10683	56937	10446	2588.0454545454545	474.8181818181818	5.329682673406347	0.9778152204436956
20	Albany Park	47	9126	19024	8089	404.7659574468085	172.10638297872342	2.084593469208854	0.8863686171378479
21	Albany Park	30	21589	119606	20596	3986.866666666667	686.5333333333333	5.540136180462272	0.9540043540692019
22	Albany Park	34	20113	25416	19622	747.5294117647059	577.1176470588235	1.2636603191965396	0.9755879282056381
23	Albany Park	39	19038	174602	19719	4476.974358974359	505.61538461538464	9.171236474419581	1.0357705641348882
24	Albany Park	37	20178	141019	18864	3811.324324324324	509.8378378378378	6.988750123897314	0.9348795718108831
25	Albany Park	31	23213	97469	19783	3144.1612903225805	638.1612903225806	4.198897169689398	0.8522379701029595
26	Albany Park	162	13489	54196	13437	334.5432098765432	82.94444444444444	4.0177922751871895	0.9961450070427756
27	Albany Park	80	8113	32345	7285	404.3125	91.0625	3.9868112905213855	0.8979415752495994
28	Albany Park	53	5593	23875	5864	450.47169811320754	110.64150943396227	4.268728768102986	1.0484534239227605
29	Albany Park	11	11862	101576	12332	9234.181818181818	1121.090909090909	8.56314280896982	1.039622323385601
30	Albany Park	32	20863	99212	22103	3100.375	690.71875	4.755404304270718	1.0594353640416048
⋮	⋮	⋮	⋮	⋮	⋮	⋮	⋮	⋮	⋮

Since the socioeconomic data we have available to us gives statistics at the level of communities, it makes sense to aggregate this census block level data into data at the community level. the simplest way to achieve this is to simple average over the values in the rows of common communities.

In [312]:

stats = [:kwh_per_capita,:therms_per_capita,:kwh_per_sqft,:therms_per_sqft]
energysummary = aggregate(energydata[[:community, stats...]],:community, mean)

Out[312]:

	community	kwh_per_capita_mean	therms_per_capita_mean	kwh_per_sqft_mean	therms_per_sqft_mean
1	Albany Park	2468.0411169673425	412.0055855560169	6.106879778389706	1.077010168649753
2	Archer Heights	1999.36664605425	348.30086113709245	7.879389696865459	1.3642276528193051
3	Armour Square	1337.9031640356106	186.10719646873602	6.75172078525251	0.9176220611320854
4	Ashburn	2798.6655243808614	392.16244966081865	7.722845761167268	1.0695066758819596
5	Auburn Gresham	2955.1741795199755	460.9206848204113	7.230493884586336	1.238707564940622
6	Austin	2265.560691870814	399.1870113049785	6.071752200804417	1.046794235832325
7	Avalon Park	3143.1419295917417	554.5910732142593	7.1926166587045435	1.2466691679956463
8	Avondale	1500.46123468815	268.7833607378459	6.004555695320488	1.0535979223923637
9	Belmont Cragin	1834.5544956552503	305.21773038359953	6.7052515063731315	1.0973223202640137
10	Beverly	3608.4793997049537	554.60626032916	6.346808073028228	0.9603103958629925
11	Bridgeport	1574.7693070576802	261.7049152880214	4.964600678867906	0.8631429947408497
12	Brighton Park	1458.6191240676173	246.8853196972466	6.4216642429263056	1.0003623986754846
13	Burnside	2513.7282846480666	452.44247361533155	7.536603821567134	1.2014653339268535
14	Calumet Heights	3472.0095358597555	554.324665313044	6.883460956288404	1.0956179693923338
15	Chatham	3138.6836620964605	558.3983500879795	6.632272540044313	1.159983987023555
16	Chicago Lawn	1871.3779921079536	289.08136796648034	7.88727349490481	1.2176286052460588
17	Clearing	3137.418666321823	385.0173852954983	7.799318610070047	0.9690108791701314
18	Douglas	2612.514773084323	675.617274019734	3.8018340422600296	0.7702503940141541
19	Dunning	2806.8455920536626	402.4769039894256	6.832504000539365	0.9736882375518406
20	East Garfield Park	2461.231343906299	477.4500548942718	6.157234436434369	1.28299956755393
21	East Side	2268.0444216159412	358.87899204759617	6.552013061814929	1.0321531929846388
22	Edgewater	2067.5433224620224	337.5904164305514	5.384499427082707	0.8896255300621512
23	Edison Park	3473.3050391461725	496.5842613988439	6.45390071602274	0.9321651837849423
24	Englewood	1644.610438663598	350.2684909902882	6.02828764363686	1.1845251972542743
25	Forest Glen	3655.537828137374	504.07121768167417	6.198106263041437	0.855189567209813
26	Fuller Park	1833.3618850996181	335.12071420071186	6.115995838434476	1.1226769592334431
27	Gage Park	1945.0561084470085	309.89295056876966	7.353070979428966	1.2548033213811771
28	Garfield Ridge	2900.242942171233	375.08610998489553	7.683310667893575	1.004097867112446
29	Grand Boulevard	2145.7691233349487	488.9259053595046	4.089155930011542	0.7617395282368347
30	Greater Grand Crossing	2176.906873888835	430.36087239243125	5.7337731901144515	1.1612619801624018
⋮	⋮	⋮	⋮	⋮	⋮

Combining socioeconomic and energy data¶

The two dataset above share common pools of communities. We can join the dataframes on these values.

In [313]:

summary = join(econdata, energysummary, on = :community)

Out[313]:

	community	percent_poverty	per_capita_income	kwh_per_capita_mean	therms_per_capita_mean	kwh_per_sqft_mean	therms_per_sqft_mean
1	Albany Park	19.2	21323	2468.0411169673425	412.0055855560169	6.106879778389706	1.077010168649753
2	Archer Heights	14.1	16134	1999.36664605425	348.30086113709245	7.879389696865459	1.3642276528193051
3	Armour Square	40.1	16148	1337.9031640356106	186.10719646873602	6.75172078525251	0.9176220611320854
4	Ashburn	10.4	23482	2798.6655243808614	392.16244966081865	7.722845761167268	1.0695066758819596
5	Auburn Gresham	27.6	15528	2955.1741795199755	460.9206848204113	7.230493884586336	1.238707564940622
6	Austin	28.6	15957	2265.560691870814	399.1870113049785	6.071752200804417	1.046794235832325
7	Avalon Park	17.2	24454	3143.1419295917417	554.5910732142593	7.1926166587045435	1.2466691679956463
8	Avondale	15.3	20039	1500.46123468815	268.7833607378459	6.004555695320488	1.0535979223923637
9	Belmont Cragin	18.7	15461	1834.5544956552503	305.21773038359953	6.7052515063731315	1.0973223202640137
10	Beverly	5.1	39523	3608.4793997049537	554.60626032916	6.346808073028228	0.9603103958629925
11	Bridgeport	18.9	22694	1574.7693070576802	261.7049152880214	4.964600678867906	0.8631429947408497
12	Brighton Park	23.6	13089	1458.6191240676173	246.8853196972466	6.4216642429263056	1.0003623986754846
13	Burnside	33.0	12515	2513.7282846480666	452.44247361533155	7.536603821567134	1.2014653339268535
14	Calumet Heights	11.5	28887	3472.0095358597555	554.324665313044	6.883460956288404	1.0956179693923338
15	Chatham	27.8	18881	3138.6836620964605	558.3983500879795	6.632272540044313	1.159983987023555
16	Chicago Lawn	27.9	13231	1871.3779921079536	289.08136796648034	7.88727349490481	1.2176286052460588
17	Clearing	8.9	25113	3137.418666321823	385.0173852954983	7.799318610070047	0.9690108791701314
18	Douglas	29.6	23791	2612.514773084323	675.617274019734	3.8018340422600296	0.7702503940141541
19	Dunning	10.6	26282	2806.8455920536626	402.4769039894256	6.832504000539365	0.9736882375518406
20	East Garfield Park	42.4	12961	2461.231343906299	477.4500548942718	6.157234436434369	1.28299956755393
21	East Side	19.2	17104	2268.0444216159412	358.87899204759617	6.552013061814929	1.0321531929846388
22	Edgewater	18.2	33385	2067.5433224620224	337.5904164305514	5.384499427082707	0.8896255300621512
23	Edison Park	3.3	40959	3473.3050391461725	496.5842613988439	6.45390071602274	0.9321651837849423
24	Englewood	46.6	11888	1644.610438663598	350.2684909902882	6.02828764363686	1.1845251972542743
25	Forest Glen	7.5	44164	3655.537828137374	504.07121768167417	6.198106263041437	0.855189567209813
26	Fuller Park	51.2	10432	1833.3618850996181	335.12071420071186	6.115995838434476	1.1226769592334431
27	Gage Park	23.4	12171	1945.0561084470085	309.89295056876966	7.353070979428966	1.2548033213811771
28	Garfield Ridge	8.8	26353	2900.242942171233	375.08610998489553	7.683310667893575	1.004097867112446
29	Grand Boulevard	29.3	23472	2145.7691233349487	488.9259053595046	4.089155930011542	0.7617395282368347
30	Greater Grand Crossing	29.6	17285	2176.906873888835	430.36087239243125	5.7337731901144515	1.1612619801624018
⋮	⋮	⋮	⋮	⋮	⋮	⋮	⋮

Having joined the energy usage data with the socioeconomic data we can see how the different rows between these infleunce one another. A simple example is to be found in looking gas usage per square foot across differnt communities as a function of the poverty percentage in those communities. The Loop is the obvious outlier (the density of skyscrapers probably has something to do with this). There is an apparent trend of reduced efficieny with increasing poverty (the converse can also be seen for per capita income).

In [314]:

set_default_plot_size(25cm, 10cm)
hstack(
plot(fulldata,x = :percent_poverty, y = :therms_per_sqft_mean,label=:community,Geom.label,Geom.point,Scale.y_log10,gajomidark),
plot(fulldata,x=:per_capita_income,y = :therms_per_sqft_mean,label=:community,Geom.label,Geom.point,Scale.y_log10,gajomidark)
)

Out[314]:

One can make analogous plots for electricity usage. The analogous trend may be present, but are certainly not clear (further quantitative analysis could shed light on this). Interestingly, the Near North Side appears to use roughly four times as much electricity per square foot than the Loop.

In [315]:

set_default_plot_size(25cm, 10cm)
hstack(
plot(fulldata,x=:per_capita_income,y = :kwh_per_sqft_mean,label=:community,Geom.label,Geom.point,Scale.y_log10,gajomidark),
plot(fulldata,x = :percent_poverty, y = :kwh_per_sqft_mean,label=:community,Geom.label,Geom.point,Scale.y_log10,gajomidark)
)

Out[315]:

Now in addition to considering energy efficiency on the basis of building area, it might also make sense to consider efficieny on normalized against population, as this would say something about the amount of energy used by individuals. This is of interest in its own right from an energy equality standpoint, but also is directly relevant in estimating energy costs for individuals. Unlike the case of area based efficiency there are no readily visually apparent correlation in the per captia efficiency plots (with the possible exception of decreased electricity efficiency with increaseing income). A more quantitative analysis might be helpful in unconering hidden patterns.

In [316]:

set_default_plot_size(25cm, 10cm)
hstack(
plot(fulldata,x = :percent_poverty, y = :kwh_per_capita_mean,label=:community,Geom.label,Geom.point,Scale.y_log10,gajomidark),
plot(fulldata,x= :per_capita_income,y = :kwh_per_capita_mean,label=:community,Geom.label,Geom.point,Scale.y_log10,gajomidark)
)

Out[316]:

In [317]:

set_default_plot_size(25cm, 10cm)
hstack(
plot(fulldata,x = :percent_poverty, y = :therms_per_capita_mean,label=:community,Geom.label,Geom.point,Scale.y_log10,gajomidark),
plot(fulldata,x=:per_capita_income,y = :therms_per_capita_mean,label=:community,Geom.label,Geom.point,Scale.y_log10,gajomidark)
)

Out[317]:

The above plots highlight the distribution of efficiencies for both electric and gas consumption separately. In the final set of plots we look at both quantities simultaneously. Each community is color coded according to the fraction of households below the poverty level, so that the tagert communities stand out more. We see that there are at least three low income communities (Oakland,Riverdae and Washington Park) that lie at the edge of the bulk of points in the efficiency by area plots. This makes them interesting targets for further investigation. Notably, some of the communities that have low efficiencies on a per area basis have high efficiencies on a per capita basis. It is not unlikely that the higher population densities in the low income nieghboorhoods contributes to increased per capita efficiency, despite the decreased per area efficiency.

In [318]:

set_default_plot_size(20cm, 15cm)
plot(fulldata,x=:therms_per_sqft_mean,y = :kwh_per_sqft_mean,color=:percent_poverty,label=:community,
    Geom.label,Geom.point,Scale.ContinuousColorScale(p -> RGB(1-p,1-p,1-p)),Scale.y_log10,Scale.x_log10,gajomidark)

Out[318]:

In [319]:

plot(fulldata,x=:therms_per_capita_mean,y = :kwh_per_capita_mean,color=:percent_poverty,label=:community,
    Geom.label,Geom.point,Scale.ContinuousColorScale(p -> RGB(1-p,1-p,1-p)),Scale.y_log10,Scale.x_log10,gajomidark)

Out[319]:

In the final plot for the current notebook we revisit the data at the census block level. While it may be practical to target communities in a retrofitting effort, there may be considerable variation in efficiency within a community, and taking note of the extent of this variation shoudl be important in thinking about hwo any initiave should proceed. In the plots below the distribution of the four different types of efficiencies are shown together superimposed alongside the distribution of efficiencies for all the communities in Chicago. We find that some of these communities in some efficiency measure have distributions shifted away from one another and fromt he whole of Chicago. These cases suggest that ther may be opportunties to identify factors unique to a community responsible for shifts in efficiency. On the other hand, the distributions overlap significantly in other places, suggesting that a retrofitting strategy may have to carefully target particular census blocks within communities. In this case further investigation of the energy usage data including seasonal variation, census block location, building information and so on could shed light on the best way to proceed.

In [320]:

#an ugly hack to get Gadfly to plot bulk statstic alongside lect communities
unlabeledall = deepcopy(energydata)
unlabeledall[:community] = utf8("All Chicago")
indexes = findin(energydata[:community],["Oakland","Washington Park","Riverdale"])
labledsubset = deepcopy(energydata[indexes,:])
superhackyappendeddata = append!(unlabeledall,labledsubset);

In [321]:

#indexes = findin(energydata[:community],["Lincoln Park","Hyde Park","Washington Park"])
plots = [[plot(superhackyappendeddata,x = units*"_per_"*normalizer,color = :community, 
        Scale.discrete_color_manual("black",colors[6:end]...),Geom.density,Scale.x_log10,gajomidark)
        for units in ["kwh","therms"]] for normalizer in ["capita","sqft"]]

vstack(hstack(plots[1]...),hstack(plots[2]...))

Out[321]:

About this document¶

Download all data¶

Inspecting socioeconomic data¶

Inspecting energy data¶

Combining socioeconomic and energy data¶

To be continued...¶