This is an IJulia notebook developed to explore some of the data relevant to Chicago's housing retrofitting project. More information about this project and data sources can be found here: https://github.com/dssg/data-challenges/tree/master/RetrofitChicagoTargeting
using DataFrames
using DataStructures
using Gadfly
using Color
colors = distinguishable_colors(8)
gajomidark = Theme(default_color = color("black"),
major_label_color = color("black"),major_label_font_size=18px,
minor_label_color = color("black"),minor_label_font_size=14px,
panel_stroke = color("black"),
default_point_size = 0.5mm,
line_width = 1mm,
grid_line_width = 0px,
highlight_width = 0px)
set_default_plot_size(20cm, 15cm)
The energy efficiency data can be downloaded manually at https://data.cityofchicago.org/api/views/8yq3-m6wp/rows.csv?accessType=DOWNLOAD and the socioeconomic data can be dowloaded at https://data.cityofchicago.org/api/views/kn9c-c2s2/rows.csv?accessType=DOWNLOAD.
The following julia code should accomplish this auomatically.
download("https://data.cityofchicago.org/api/views/8yq3-m6wp/rows.csv?accessType=DOWNLOAD","energy.csv")
download("https://data.cityofchicago.org/api/views/kn9c-c2s2/rows.csv?accessType=DOWNLOAD","socio.csv")
% Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 24.7M 100 24.7M 0 0 252k 0 0:01:40 0:01:40 --:--:-- 197k % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 3862 100 3862 0 0 3672 0 0:00:01 0:00:01 --:--:-- 3674
"socio.csv"
fulleconsummary = readtable("socio.csv")
map(display,names(fullecondata));
:Community_Area_Number
:COMMUNITY_AREA_NAME
:PERCENT_OF_HOUSING_CROWDED
:PERCENT_HOUSEHOLDS_BELOW_POVERTY
:PERCENT_AGED_16_UNEMPLOYED
:PERCENT_AGED_25_WITHOUT_HIGH_SCHOOL_DIPLOMA
:PERCENT_AGED_UNDER_18_OR_OVER_64
:PER_CAPITA_INCOME
:HARDSHIP_INDEX
We see that the table contains information for different communities realted to housing, employment, income and other socioeconomic summary statistics. Since for the moment we are just interested in getting a feel for the basic patterns in our dataset, we will focus on two column: those pertaining to per capita income and poverty percentages. We extract this information into a new dataframe, sorting the communities by the precentage of households below the poverty level.
fullnames = [:COMMUNITY_AREA_NAME,:PERCENT_HOUSEHOLDS_BELOW_POVERTY, :PER_CAPITA_INCOME]
shortnames = [:community,:percent_poverty,:per_capita_income]
econsummary = sort!(names!(fullecondata[:,fullnames],shortnames),cols=:percent_poverty,rev=true)
community | percent_poverty | per_capita_income | |
---|---|---|---|
1 | Riverdale | 56.5 | 8201 |
2 | Fuller Park | 51.2 | 10432 |
3 | Englewood | 46.6 | 11888 |
4 | North Lawndale | 43.1 | 12034 |
5 | East Garfield Park | 42.4 | 12961 |
6 | Washington Park | 42.1 | 13785 |
7 | West Garfield Park | 41.7 | 10934 |
8 | Armour Square | 40.1 | 16148 |
9 | Oakland | 39.7 | 19252 |
10 | West Englewood | 34.4 | 11317 |
11 | Humboldt park | 33.9 | 13781 |
12 | Burnside | 33.0 | 12515 |
13 | South Shore | 31.1 | 19398 |
14 | South Lawndale | 30.7 | 10402 |
15 | Woodlawn | 30.7 | 18672 |
16 | South Chicago | 29.8 | 16579 |
17 | Douglas | 29.6 | 23791 |
18 | Greater Grand Crossing | 29.6 | 17285 |
19 | Grand Boulevard | 29.3 | 23472 |
20 | South Deering | 29.2 | 14685 |
21 | New City | 29.0 | 12765 |
22 | Austin | 28.6 | 15957 |
23 | Chicago Lawn | 27.9 | 13231 |
24 | Chatham | 27.8 | 18881 |
25 | Auburn Gresham | 27.6 | 15528 |
26 | West Pullman | 25.9 | 16563 |
27 | Lower West Side | 25.8 | 16444 |
28 | Uptown | 24.0 | 35787 |
29 | Rogers Park | 23.6 | 23939 |
30 | Brighton Park | 23.6 | 13089 |
⋮ | ⋮ | ⋮ | ⋮ |
Since per capita income is a measure of average wealth and the percentage of households below the poverty level a measure of absense of wealth, we might expect that these two statistics correlate negatively with one another, assuming that communities are no excessively hetereogenous. We can see this kind of correlation readily in a simple scatter plot.
plot(econsummary,x="percent_poverty",y="per_capita_income",label=:community,Geom.label,Geom.point,gajomidark)
In addition to these socioeconomic summary statistics at the community level we have extensive data on energy usage and residency at sub-census block level. Energy, in this context refers to energy assocaited with electricity (with total energy for a period in kilowatt-hours) and energy associated with natural gas (recorded "therms", which can be converted into energy units given a conversion factor). The full list of columns in this dataset is quite large, and can be seen below.
fullenergydetail = readtable("energy.csv")
map(display,names(fullenergydetail));
:COMMUNITY_AREA_NAME
:CENSUS_BLOCK
:BUILDING_TYPE
:BUILDING_SUBTYPE
:KWH_JANUARY_2010
:KWH_FEBRUARY_2010
:KWH_MARCH_2010
:KWH_APRIL_2010
:KWH_MAY_2010
:KWH_JUNE_2010
:KWH_JULY_2010
:KWH_AUGUST_2010
:KWH_SEPTEMBER_2010
:KWH_OCTOBER_2010
:KWH_NOVEMBER_2010
:KWH_DECEMBER_2010
:TOTAL_KWH
:ELECTRICITY_ACCOUNTS
:ZERO_KWH_ACCOUNTS
:THERM_JANUARY_2010
:THERM_FEBRUARY_2010
:THERM_MARCH_2010
:TERM_APRIL_2010
:THERM_MAY_2010
:THERM_JUNE_2010
:THERM_JULY_2010
:THERM_AUGUST_2010
:THERM_SEPTEMBER_2010
:THERM_OCTOBER_2010
:THERM_NOVEMBER_2010
:THERM_DECEMBER_2010
:TOTAL_THERMS
:GAS_ACCOUNTS
:KWH_TOTAL_SQFT
:THERMS_TOTAL_SQFT
:KWH_MEAN_2010
:KWH_STANDARD_DEVIATION_2010
:KWH_MINIMUM_2010
:KWH_1ST_QUARTILE_2010
:KWH_2ND_QUARTILE_2010
:KWH_3RD_QUARTILE_2010
:KWH_MAXIMUM_2010
:KWH_SQFT_MEAN_2010
:KWH_SQFT_STANDARD_DEVIATION_2010
:KWH_SQFT_MINIMUM_2010
:KWH_SQFT_1ST_QUARTILE_2010
:KWH_SQFT_2ND_QUARTILE_2010
:KWH_SQFT_3RD_QUARTILE_2010
:KWH_SQFT_MAXIMUM_2010
:THERM_MEAN_2010
:THERM_STANDARD_DEVIATION_2010
:THERM_MINIMUM_2010
:THERM_1ST_QUARTILE_2010
:THERM_2ND_QUARTILE_2010
:THERM_3RD_QUARTILE_2010
:THERM_MAXIMUM_2010
:THERMS_SQFT_MEAN_2010
:THERMS_SQFT_STANDARD_DEVIATION_2010
:THERMS_SQFT_MINIMUM_2010
:THERMS_SQFT_1ST_QUARTILE_2010
:THERMS_SQFT_2ND_QUARTILE_2010
:THERMS_SQFT_3RD_QUARTILE_2010
:THERMS_SQFT_MAXIMUM_2010
:TOTAL_POPULATION
:TOTAL_UNITS
:AVERAGE_STORIES
:AVERAGE_BUILDING_AGE
:AVERAGE_HOUSESIZE
:OCCUPIED_UNITS
:OCCUPIED_UNITS_PERCENTAGE
:RENTER_OCCUPIED_HOUSING_UNITS
:RENTER_OCCUPIED_HOUSING_PERCENTAGE
:OCCUPIED_HOUSING_UNITS
We see that the dataset includes information about energy usage for the year on a month by month basis as well as information averaged over the whole year. Temporal information may be especially relevant when considering the differences in electricity and gas usage, but for a first pass we will consider year averaged data. Since we are interested not jsut in total energy usage but in energy efficieny, we also take note of the total population and energy usage relevant square footage, which can be used to compute specific energy efficiencies. Finally, we note both the building type and it's age.
fullnames = [:COMMUNITY_AREA_NAME,:CENSUS_BLOCK,:TOTAL_POPULATION,:BUILDING_TYPE,:THERMS_TOTAL_SQFT,:TOTAL_KWH,:TOTAL_THERMS]
shortnames = [:community,:block,:population,:building_type,:sqft,:kwh,:therms]
energydetail = complete_cases!(names!(fullenergydetail[:,fullnames],shortnames))
community | block | population | building_type | sqft | kwh | therms | |
---|---|---|---|---|---|---|---|
1 | Albany Park | 170311401001001 | 132 | Residential | 48825 | 201032 | 36822 |
2 | Albany Park | 170311401001001 | 132 | Residential | 3306 | 20608 | 755 |
3 | Albany Park | 170311401001001 | 132 | Residential | 9472 | 61020 | 6065 |
4 | Albany Park | 170311401001002 | 228 | Residential | 14407 | 16752 | 14653 |
5 | Albany Park | 170311401001002 | 228 | Residential | 58835 | 244341 | 51590 |
6 | Albany Park | 170311401001003 | 231 | Commercial | 8240 | 45330 | 11217 |
7 | Albany Park | 170311401001003 | 231 | Residential | 13305 | 25300 | 9485 |
8 | Albany Park | 170311401001003 | 231 | Residential | 16654 | 65982 | 14139 |
9 | Albany Park | 170311401001003 | 231 | Residential | 10840 | 79663 | 14780 |
10 | Albany Park | 170311401001004 | 456 | Commercial | 127916 | 369231 | 114547 |
11 | Albany Park | 170311401001004 | 456 | Residential | 25086 | 63378 | 25518 |
12 | Albany Park | 170311401001005 | 73 | Residential | 25339 | 100191 | 16191 |
13 | Albany Park | 170311401001005 | 73 | Residential | 6804 | 7070 | 1948 |
14 | Albany Park | 170311401001006 | 0 | Commercial | 28700 | 108743 | 10107 |
15 | Albany Park | 170311401001007 | 208 | Commercial | 5128 | 28935 | 5240 |
16 | Albany Park | 170311401001007 | 208 | Residential | 48205 | 142707 | 40087 |
17 | Albany Park | 170311401001007 | 208 | Residential | 3726 | 51150 | 5687 |
18 | Albany Park | 170311401001008 | 200 | Residential | 49278 | 209392 | 41226 |
19 | Albany Park | 170311401001009 | 116 | Residential | 17388 | 49878 | 14529 |
20 | Albany Park | 170311401001009 | 116 | Residential | 19261 | 124140 | 20472 |
21 | Albany Park | 170311401002000 | 232 | Commercial | 5496 | 13196 | 6537 |
22 | Albany Park | 170311401002000 | 232 | Residential | 39952 | 215223 | 32761 |
23 | Albany Park | 170311401002000 | 232 | Residential | 14486 | 119681 | 11934 |
24 | Albany Park | 170311401002001 | 258 | Commercial | 5628 | 29183 | 4074 |
25 | Albany Park | 170311401002001 | 258 | Residential | 51390 | 207428 | 73929 |
26 | Albany Park | 170311401002001 | 258 | Residential | 2708 | 24653 | 3734 |
27 | Albany Park | 170311401002002 | 192 | Commercial | 10872 | 33619 | 8474 |
28 | Albany Park | 170311401002002 | 192 | Residential | 41996 | 139524 | 31640 |
29 | Albany Park | 170311401002002 | 192 | Residential | 6866 | 56482 | 7221 |
30 | Albany Park | 170311401002003 | 213 | Commercial | 2100 | 53596 | 15075 |
⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ |
The above table shows that there may be multiple energy usage data rows for the same census block. Since the population statistics are collected for the entire census block this presents a significant complication in the analysis of per capita energy usage statistics. As such, for out initial look into the data, we select only census block with unique rows in the above table. We also see that a census block may contain commercial, industrial and residential and building types. We are interested in residential housing only, and so select the rows that fit this criterion. Finally, we see that census blocks may be deviod of people! As such, we remove rows with zero population.
dupcounts = counter(energydetail[:block])
rows = (&)(array(energydetail[:building_type].=="Residential"),
array(energydetail[:population].>0),
array(map(i->dupcounts[i],energydetail[:block]).==1))
energydetail = delete!(energydetail[rows,:],[:block,:building_type])
community | population | sqft | kwh | therms | |
---|---|---|---|---|---|
1 | Albany Park | 200 | 49278 | 209392 | 41226 |
2 | Albany Park | 216 | 38424 | 160811 | 37675 |
3 | Albany Park | 101 | 16119 | 128006 | 19418 |
4 | Albany Park | 198 | 18800 | 82883 | 19183 |
5 | Albany Park | 67 | 18866 | 112954 | 15332 |
6 | Albany Park | 286 | 56929 | 229624 | 31643 |
7 | Albany Park | 158 | 10163 | 54492 | 8002 |
8 | Albany Park | 3 | 3236 | 20594 | 2006 |
9 | Albany Park | 25 | 12851 | 76119 | 12858 |
10 | Albany Park | 53 | 24691 | 123490 | 24588 |
11 | Albany Park | 236 | 47073 | 191120 | 46149 |
12 | Albany Park | 19 | 9150 | 51920 | 8543 |
13 | Albany Park | 206 | 48777 | 184677 | 44127 |
14 | Albany Park | 23 | 17024 | 86464 | 15304 |
15 | Albany Park | 47 | 23893 | 150530 | 20576 |
16 | Albany Park | 125 | 6833 | 7153 | 25864 |
17 | Albany Park | 14 | 6176 | 36171 | 9010 |
18 | Albany Park | 41 | 21749 | 120511 | 23344 |
19 | Albany Park | 61 | 15637 | 126508 | 19276 |
20 | Albany Park | 40 | 23042 | 149660 | 21980 |
21 | Albany Park | 64 | 24074 | 147661 | 24933 |
22 | Albany Park | 46 | 23380 | 134328 | 21894 |
23 | Albany Park | 80 | 1036 | 17164 | 3122 |
24 | Albany Park | 62 | 19871 | 187969 | 23087 |
25 | Albany Park | 46 | 20054 | 143696 | 22168 |
26 | Albany Park | 10 | 4723 | 31525 | 4898 |
27 | Albany Park | 15 | 6167 | 42429 | 6465 |
28 | Albany Park | 44 | 22047 | 120338 | 23807 |
29 | Albany Park | 71 | 21583 | 159578 | 24718 |
30 | Albany Park | 47 | 19350 | 123013 | 17779 |
⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ |
Having filtered the data down to the subset we are interested in, we add to the table simple statistics relevant to energy efficiency
per(denominator::Symbol,numerator::Symbol,data::DataFrame) = data[numerator]./data[denominator]
per (generic function with 1 method)
#add stats
energydata[:kwh_per_capita] = per(:population,:kwh,energydata)
energydata[:therms_per_capita] = per(:population,:therms,energydata)
energydata[:kwh_per_sqft] = per(:sqft,:kwh,energydata)
energydata[:therms_per_sqft] = per(:sqft,:therms,energydata)
energydata
community | population | sqft | kwh | therms | kwh_per_capita | therms_per_capita | kwh_per_sqft | therms_per_sqft | |
---|---|---|---|---|---|---|---|---|---|
1 | Albany Park | 67 | 22140 | 147942 | 26344 | 2208.089552238806 | 393.1940298507463 | 6.682113821138211 | 1.1898825654923215 |
2 | Albany Park | 45 | 20125 | 117665 | 22167 | 2614.777777777778 | 492.6 | 5.8467080745341615 | 1.1014658385093168 |
3 | Albany Park | 64 | 18536 | 113633 | 21252 | 1775.515625 | 332.0625 | 6.130394907207596 | 1.146525679758308 |
4 | Albany Park | 29 | 17255 | 91166 | 18912 | 3143.655172413793 | 652.1379310344828 | 5.2834540712836855 | 1.096030136192408 |
5 | Albany Park | 52 | 11766 | 99825 | 11996 | 1919.7115384615386 | 230.69230769230768 | 8.48419173890872 | 1.019547849736529 |
6 | Albany Park | 52 | 8783 | 54886 | 10901 | 1055.5 | 209.6346153846154 | 6.249117613571673 | 1.2411476716383925 |
7 | Albany Park | 85 | 25196 | 99683 | 23426 | 1172.7411764705882 | 275.6 | 3.9563025877123352 | 0.9297507540879505 |
8 | Albany Park | 136 | 34516 | 194872 | 26558 | 1432.8823529411766 | 195.27941176470588 | 5.645845405029552 | 0.7694402595897555 |
9 | Albany Park | 120 | 35570 | 192173 | 43028 | 1601.4416666666666 | 358.56666666666666 | 5.402670789991566 | 1.2096710711273546 |
10 | Albany Park | 357 | 57079 | 394016 | 80580 | 1103.686274509804 | 225.71428571428572 | 6.90299409590217 | 1.4117276055992571 |
11 | Albany Park | 24 | 2428 | 20970 | 1603 | 873.75 | 66.79166666666667 | 8.63673805601318 | 0.6602141680395387 |
12 | Albany Park | 17 | 6749 | 38454 | 5138 | 2262.0 | 302.2352941176471 | 5.697732997481109 | 0.76129797006964 |
13 | Albany Park | 46 | 24539 | 110585 | 18679 | 2404.021739130435 | 406.0652173913044 | 4.506499857369901 | 0.7611964627735441 |
14 | Albany Park | 45 | 19377 | 68842 | 16018 | 1529.8222222222223 | 355.9555555555556 | 3.5527687464519793 | 0.8266501522423492 |
15 | Albany Park | 23 | 21184 | 104351 | 19060 | 4537.0 | 828.695652173913 | 4.925934667673716 | 0.8997356495468278 |
16 | Albany Park | 51 | 30746 | 130256 | 27238 | 2554.0392156862745 | 534.0784313725491 | 4.236518571521499 | 0.8859038574123463 |
17 | Albany Park | 71 | 34447 | 126550 | 28684 | 1782.394366197183 | 404.0 | 3.6737596887972828 | 0.8326995093912387 |
18 | Albany Park | 17 | 9479 | 40598 | 10379 | 2388.1176470588234 | 610.5294117647059 | 4.282941238527271 | 1.0949467243380104 |
19 | Albany Park | 22 | 10683 | 56937 | 10446 | 2588.0454545454545 | 474.8181818181818 | 5.329682673406347 | 0.9778152204436956 |
20 | Albany Park | 47 | 9126 | 19024 | 8089 | 404.7659574468085 | 172.10638297872342 | 2.084593469208854 | 0.8863686171378479 |
21 | Albany Park | 30 | 21589 | 119606 | 20596 | 3986.866666666667 | 686.5333333333333 | 5.540136180462272 | 0.9540043540692019 |
22 | Albany Park | 34 | 20113 | 25416 | 19622 | 747.5294117647059 | 577.1176470588235 | 1.2636603191965396 | 0.9755879282056381 |
23 | Albany Park | 39 | 19038 | 174602 | 19719 | 4476.974358974359 | 505.61538461538464 | 9.171236474419581 | 1.0357705641348882 |
24 | Albany Park | 37 | 20178 | 141019 | 18864 | 3811.324324324324 | 509.8378378378378 | 6.988750123897314 | 0.9348795718108831 |
25 | Albany Park | 31 | 23213 | 97469 | 19783 | 3144.1612903225805 | 638.1612903225806 | 4.198897169689398 | 0.8522379701029595 |
26 | Albany Park | 162 | 13489 | 54196 | 13437 | 334.5432098765432 | 82.94444444444444 | 4.0177922751871895 | 0.9961450070427756 |
27 | Albany Park | 80 | 8113 | 32345 | 7285 | 404.3125 | 91.0625 | 3.9868112905213855 | 0.8979415752495994 |
28 | Albany Park | 53 | 5593 | 23875 | 5864 | 450.47169811320754 | 110.64150943396227 | 4.268728768102986 | 1.0484534239227605 |
29 | Albany Park | 11 | 11862 | 101576 | 12332 | 9234.181818181818 | 1121.090909090909 | 8.56314280896982 | 1.039622323385601 |
30 | Albany Park | 32 | 20863 | 99212 | 22103 | 3100.375 | 690.71875 | 4.755404304270718 | 1.0594353640416048 |
⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ |
Since the socioeconomic data we have available to us gives statistics at the level of communities, it makes sense to aggregate this census block level data into data at the community level. the simplest way to achieve this is to simple average over the values in the rows of common communities.
stats = [:kwh_per_capita,:therms_per_capita,:kwh_per_sqft,:therms_per_sqft]
energysummary = aggregate(energydata[[:community, stats...]],:community, mean)
community | kwh_per_capita_mean | therms_per_capita_mean | kwh_per_sqft_mean | therms_per_sqft_mean | |
---|---|---|---|---|---|
1 | Albany Park | 2468.0411169673425 | 412.0055855560169 | 6.106879778389706 | 1.077010168649753 |
2 | Archer Heights | 1999.36664605425 | 348.30086113709245 | 7.879389696865459 | 1.3642276528193051 |
3 | Armour Square | 1337.9031640356106 | 186.10719646873602 | 6.75172078525251 | 0.9176220611320854 |
4 | Ashburn | 2798.6655243808614 | 392.16244966081865 | 7.722845761167268 | 1.0695066758819596 |
5 | Auburn Gresham | 2955.1741795199755 | 460.9206848204113 | 7.230493884586336 | 1.238707564940622 |
6 | Austin | 2265.560691870814 | 399.1870113049785 | 6.071752200804417 | 1.046794235832325 |
7 | Avalon Park | 3143.1419295917417 | 554.5910732142593 | 7.1926166587045435 | 1.2466691679956463 |
8 | Avondale | 1500.46123468815 | 268.7833607378459 | 6.004555695320488 | 1.0535979223923637 |
9 | Belmont Cragin | 1834.5544956552503 | 305.21773038359953 | 6.7052515063731315 | 1.0973223202640137 |
10 | Beverly | 3608.4793997049537 | 554.60626032916 | 6.346808073028228 | 0.9603103958629925 |
11 | Bridgeport | 1574.7693070576802 | 261.7049152880214 | 4.964600678867906 | 0.8631429947408497 |
12 | Brighton Park | 1458.6191240676173 | 246.8853196972466 | 6.4216642429263056 | 1.0003623986754846 |
13 | Burnside | 2513.7282846480666 | 452.44247361533155 | 7.536603821567134 | 1.2014653339268535 |
14 | Calumet Heights | 3472.0095358597555 | 554.324665313044 | 6.883460956288404 | 1.0956179693923338 |
15 | Chatham | 3138.6836620964605 | 558.3983500879795 | 6.632272540044313 | 1.159983987023555 |
16 | Chicago Lawn | 1871.3779921079536 | 289.08136796648034 | 7.88727349490481 | 1.2176286052460588 |
17 | Clearing | 3137.418666321823 | 385.0173852954983 | 7.799318610070047 | 0.9690108791701314 |
18 | Douglas | 2612.514773084323 | 675.617274019734 | 3.8018340422600296 | 0.7702503940141541 |
19 | Dunning | 2806.8455920536626 | 402.4769039894256 | 6.832504000539365 | 0.9736882375518406 |
20 | East Garfield Park | 2461.231343906299 | 477.4500548942718 | 6.157234436434369 | 1.28299956755393 |
21 | East Side | 2268.0444216159412 | 358.87899204759617 | 6.552013061814929 | 1.0321531929846388 |
22 | Edgewater | 2067.5433224620224 | 337.5904164305514 | 5.384499427082707 | 0.8896255300621512 |
23 | Edison Park | 3473.3050391461725 | 496.5842613988439 | 6.45390071602274 | 0.9321651837849423 |
24 | Englewood | 1644.610438663598 | 350.2684909902882 | 6.02828764363686 | 1.1845251972542743 |
25 | Forest Glen | 3655.537828137374 | 504.07121768167417 | 6.198106263041437 | 0.855189567209813 |
26 | Fuller Park | 1833.3618850996181 | 335.12071420071186 | 6.115995838434476 | 1.1226769592334431 |
27 | Gage Park | 1945.0561084470085 | 309.89295056876966 | 7.353070979428966 | 1.2548033213811771 |
28 | Garfield Ridge | 2900.242942171233 | 375.08610998489553 | 7.683310667893575 | 1.004097867112446 |
29 | Grand Boulevard | 2145.7691233349487 | 488.9259053595046 | 4.089155930011542 | 0.7617395282368347 |
30 | Greater Grand Crossing | 2176.906873888835 | 430.36087239243125 | 5.7337731901144515 | 1.1612619801624018 |
⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ |
The two dataset above share common pools of communities. We can join the dataframes on these values.
summary = join(econdata, energysummary, on = :community)
community | percent_poverty | per_capita_income | kwh_per_capita_mean | therms_per_capita_mean | kwh_per_sqft_mean | therms_per_sqft_mean | |
---|---|---|---|---|---|---|---|
1 | Albany Park | 19.2 | 21323 | 2468.0411169673425 | 412.0055855560169 | 6.106879778389706 | 1.077010168649753 |
2 | Archer Heights | 14.1 | 16134 | 1999.36664605425 | 348.30086113709245 | 7.879389696865459 | 1.3642276528193051 |
3 | Armour Square | 40.1 | 16148 | 1337.9031640356106 | 186.10719646873602 | 6.75172078525251 | 0.9176220611320854 |
4 | Ashburn | 10.4 | 23482 | 2798.6655243808614 | 392.16244966081865 | 7.722845761167268 | 1.0695066758819596 |
5 | Auburn Gresham | 27.6 | 15528 | 2955.1741795199755 | 460.9206848204113 | 7.230493884586336 | 1.238707564940622 |
6 | Austin | 28.6 | 15957 | 2265.560691870814 | 399.1870113049785 | 6.071752200804417 | 1.046794235832325 |
7 | Avalon Park | 17.2 | 24454 | 3143.1419295917417 | 554.5910732142593 | 7.1926166587045435 | 1.2466691679956463 |
8 | Avondale | 15.3 | 20039 | 1500.46123468815 | 268.7833607378459 | 6.004555695320488 | 1.0535979223923637 |
9 | Belmont Cragin | 18.7 | 15461 | 1834.5544956552503 | 305.21773038359953 | 6.7052515063731315 | 1.0973223202640137 |
10 | Beverly | 5.1 | 39523 | 3608.4793997049537 | 554.60626032916 | 6.346808073028228 | 0.9603103958629925 |
11 | Bridgeport | 18.9 | 22694 | 1574.7693070576802 | 261.7049152880214 | 4.964600678867906 | 0.8631429947408497 |
12 | Brighton Park | 23.6 | 13089 | 1458.6191240676173 | 246.8853196972466 | 6.4216642429263056 | 1.0003623986754846 |
13 | Burnside | 33.0 | 12515 | 2513.7282846480666 | 452.44247361533155 | 7.536603821567134 | 1.2014653339268535 |
14 | Calumet Heights | 11.5 | 28887 | 3472.0095358597555 | 554.324665313044 | 6.883460956288404 | 1.0956179693923338 |
15 | Chatham | 27.8 | 18881 | 3138.6836620964605 | 558.3983500879795 | 6.632272540044313 | 1.159983987023555 |
16 | Chicago Lawn | 27.9 | 13231 | 1871.3779921079536 | 289.08136796648034 | 7.88727349490481 | 1.2176286052460588 |
17 | Clearing | 8.9 | 25113 | 3137.418666321823 | 385.0173852954983 | 7.799318610070047 | 0.9690108791701314 |
18 | Douglas | 29.6 | 23791 | 2612.514773084323 | 675.617274019734 | 3.8018340422600296 | 0.7702503940141541 |
19 | Dunning | 10.6 | 26282 | 2806.8455920536626 | 402.4769039894256 | 6.832504000539365 | 0.9736882375518406 |
20 | East Garfield Park | 42.4 | 12961 | 2461.231343906299 | 477.4500548942718 | 6.157234436434369 | 1.28299956755393 |
21 | East Side | 19.2 | 17104 | 2268.0444216159412 | 358.87899204759617 | 6.552013061814929 | 1.0321531929846388 |
22 | Edgewater | 18.2 | 33385 | 2067.5433224620224 | 337.5904164305514 | 5.384499427082707 | 0.8896255300621512 |
23 | Edison Park | 3.3 | 40959 | 3473.3050391461725 | 496.5842613988439 | 6.45390071602274 | 0.9321651837849423 |
24 | Englewood | 46.6 | 11888 | 1644.610438663598 | 350.2684909902882 | 6.02828764363686 | 1.1845251972542743 |
25 | Forest Glen | 7.5 | 44164 | 3655.537828137374 | 504.07121768167417 | 6.198106263041437 | 0.855189567209813 |
26 | Fuller Park | 51.2 | 10432 | 1833.3618850996181 | 335.12071420071186 | 6.115995838434476 | 1.1226769592334431 |
27 | Gage Park | 23.4 | 12171 | 1945.0561084470085 | 309.89295056876966 | 7.353070979428966 | 1.2548033213811771 |
28 | Garfield Ridge | 8.8 | 26353 | 2900.242942171233 | 375.08610998489553 | 7.683310667893575 | 1.004097867112446 |
29 | Grand Boulevard | 29.3 | 23472 | 2145.7691233349487 | 488.9259053595046 | 4.089155930011542 | 0.7617395282368347 |
30 | Greater Grand Crossing | 29.6 | 17285 | 2176.906873888835 | 430.36087239243125 | 5.7337731901144515 | 1.1612619801624018 |
⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ |
Having joined the energy usage data with the socioeconomic data we can see how the different rows between these infleunce one another. A simple example is to be found in looking gas usage per square foot across differnt communities as a function of the poverty percentage in those communities. The Loop is the obvious outlier (the density of skyscrapers probably has something to do with this). There is an apparent trend of reduced efficieny with increasing poverty (the converse can also be seen for per capita income).
set_default_plot_size(25cm, 10cm)
hstack(
plot(fulldata,x = :percent_poverty, y = :therms_per_sqft_mean,label=:community,Geom.label,Geom.point,Scale.y_log10,gajomidark),
plot(fulldata,x=:per_capita_income,y = :therms_per_sqft_mean,label=:community,Geom.label,Geom.point,Scale.y_log10,gajomidark)
)
One can make analogous plots for electricity usage. The analogous trend may be present, but are certainly not clear (further quantitative analysis could shed light on this). Interestingly, the Near North Side appears to use roughly four times as much electricity per square foot than the Loop.
set_default_plot_size(25cm, 10cm)
hstack(
plot(fulldata,x=:per_capita_income,y = :kwh_per_sqft_mean,label=:community,Geom.label,Geom.point,Scale.y_log10,gajomidark),
plot(fulldata,x = :percent_poverty, y = :kwh_per_sqft_mean,label=:community,Geom.label,Geom.point,Scale.y_log10,gajomidark)
)
Now in addition to considering energy efficiency on the basis of building area, it might also make sense to consider efficieny on normalized against population, as this would say something about the amount of energy used by individuals. This is of interest in its own right from an energy equality standpoint, but also is directly relevant in estimating energy costs for individuals. Unlike the case of area based efficiency there are no readily visually apparent correlation in the per captia efficiency plots (with the possible exception of decreased electricity efficiency with increaseing income). A more quantitative analysis might be helpful in unconering hidden patterns.
set_default_plot_size(25cm, 10cm)
hstack(
plot(fulldata,x = :percent_poverty, y = :kwh_per_capita_mean,label=:community,Geom.label,Geom.point,Scale.y_log10,gajomidark),
plot(fulldata,x= :per_capita_income,y = :kwh_per_capita_mean,label=:community,Geom.label,Geom.point,Scale.y_log10,gajomidark)
)
set_default_plot_size(25cm, 10cm)
hstack(
plot(fulldata,x = :percent_poverty, y = :therms_per_capita_mean,label=:community,Geom.label,Geom.point,Scale.y_log10,gajomidark),
plot(fulldata,x=:per_capita_income,y = :therms_per_capita_mean,label=:community,Geom.label,Geom.point,Scale.y_log10,gajomidark)
)
The above plots highlight the distribution of efficiencies for both electric and gas consumption separately. In the final set of plots we look at both quantities simultaneously. Each community is color coded according to the fraction of households below the poverty level, so that the tagert communities stand out more. We see that there are at least three low income communities (Oakland,Riverdae and Washington Park) that lie at the edge of the bulk of points in the efficiency by area plots. This makes them interesting targets for further investigation. Notably, some of the communities that have low efficiencies on a per area basis have high efficiencies on a per capita basis. It is not unlikely that the higher population densities in the low income nieghboorhoods contributes to increased per capita efficiency, despite the decreased per area efficiency.
set_default_plot_size(20cm, 15cm)
plot(fulldata,x=:therms_per_sqft_mean,y = :kwh_per_sqft_mean,color=:percent_poverty,label=:community,
Geom.label,Geom.point,Scale.ContinuousColorScale(p -> RGB(1-p,1-p,1-p)),Scale.y_log10,Scale.x_log10,gajomidark)
plot(fulldata,x=:therms_per_capita_mean,y = :kwh_per_capita_mean,color=:percent_poverty,label=:community,
Geom.label,Geom.point,Scale.ContinuousColorScale(p -> RGB(1-p,1-p,1-p)),Scale.y_log10,Scale.x_log10,gajomidark)
In the final plot for the current notebook we revisit the data at the census block level. While it may be practical to target communities in a retrofitting effort, there may be considerable variation in efficiency within a community, and taking note of the extent of this variation shoudl be important in thinking about hwo any initiave should proceed. In the plots below the distribution of the four different types of efficiencies are shown together superimposed alongside the distribution of efficiencies for all the communities in Chicago. We find that some of these communities in some efficiency measure have distributions shifted away from one another and fromt he whole of Chicago. These cases suggest that ther may be opportunties to identify factors unique to a community responsible for shifts in efficiency. On the other hand, the distributions overlap significantly in other places, suggesting that a retrofitting strategy may have to carefully target particular census blocks within communities. In this case further investigation of the energy usage data including seasonal variation, census block location, building information and so on could shed light on the best way to proceed.
#an ugly hack to get Gadfly to plot bulk statstic alongside lect communities
unlabeledall = deepcopy(energydata)
unlabeledall[:community] = utf8("All Chicago")
indexes = findin(energydata[:community],["Oakland","Washington Park","Riverdale"])
labledsubset = deepcopy(energydata[indexes,:])
superhackyappendeddata = append!(unlabeledall,labledsubset);
#indexes = findin(energydata[:community],["Lincoln Park","Hyde Park","Washington Park"])
plots = [[plot(superhackyappendeddata,x = units*"_per_"*normalizer,color = :community,
Scale.discrete_color_manual("black",colors[6:end]...),Geom.density,Scale.x_log10,gajomidark)
for units in ["kwh","therms"]] for normalizer in ["capita","sqft"]]
vstack(hstack(plots[1]...),hstack(plots[2]...))