CrisisNET provides a simple, powerful resource for accessing crisis relevant data. Below is a short introduction to pulling down data from CrisisNET and completing some simple data wrangling tasks — setting you up to start pulling and using data on your own.
The example below uses Python 3.3 and was written in iPython notebook. The notebook is avaliable on Nbviewer.
import requests as re
from mpl_toolkits.basemap import Basemap
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
pd.set_option('display.max_columns', 30)
api_key = '532d8dc4ed3329652f114b73'
headers = {'Authorization': 'Bearer ' + api_key}
url = 'http://api.crisis.net/item?limit=200'
r = re.get(url, headers=headers)
request_df = pd.DataFrame(r.json())
# Create a dataframe from the request's data
df = request_df['data'].apply(pd.Series)
# Set the row index of the dataframe to be the time the report was updated
df["updatedAt"] = pd.to_datetime(df["updatedAt"])
df.index = df['updatedAt']
# Expand the geo column into a full dataframe
geo_df = df['geo'].apply(pd.Series)
# Expand the address components column into it's own dataframe
geo_admin_df = geo_df['addressComponents'].apply(pd.Series)
# Join the two geo dataframes to the primary dataframe
df = pd.concat([df[:], geo_admin_df[:], geo_df[:]], axis=1)
# Extract the latitute and longitude coordinates into their own columns
df['latitude'] = df['coords'].str[1]
df['longitude'] = df['coords'].str[0]
# Expand the tags column into its own dataframe
tags_df = df['tags'].apply(pd.Series)
# Drop everything column after the second column
tags_df = tags_df.ix[:, 0:1]
tags_df.columns = ['tag1', 'tag2']
# Extract the tags
def tag_extractor(x):
# that, if x is a string,
if type(x) is float:
# just returns it untouched
return x
# but, if not, convert x to a dict() and return the value from the name key
elif x:
x = dict(x)
return x['name']
# and leave everything else
else:
return
tags_df = tags_df.applymap(tag_extractor)
# Attach the tags to the main dataframe
df = pd.concat([df[:], tags_df[:]], axis=1)
# Expand the language value:key pair
lang_df = df['language'].apply(pd.Series)
# Attach the language code as a column
df['lang'] = lang_df['code']
# print the length and view the first row
print(len(df))
df.head(1)
200
content | createdAt | entities | geo | id | language | license | lifespan | publishedAt | remoteID | source | summary | tags | updatedAt | adminArea1 | adminArea3 | adminArea4 | adminArea5 | formattedAddress | postalCode | streetAddress | addressComponents | coords | latitude | longitude | tag1 | tag2 | lang | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
updatedAt | ||||||||||||||||||||||||||||
2014-05-28 22:03:57.208119 | As at 31 Mar 2014, 11 provinces (nine in north... | 2014-05-09T19:07:06.931101 | [Afghanistan, Badakhshan, Faryab] | {'addressComponents': {'adminArea1': 'Afghanis... | AcVAF46mQC6X_qpvC2SQpA | {'code': 'en'} | unknown | temporary | 2014-04-25T00:00:00.000Z | 14539 | reliefweb | Afghanistan: Flash Floods and Landslides - Apr... | [{'name': 'flood', 'confidence': 1}, {'name': ... | 2014-05-28 22:03:57.208119 | Afghanistan | NaN | NaN | NaN | Afghanistan | NaN | NaN | {'adminArea1': 'Afghanistan', 'formattedAddres... | [72, 36.75] | 36.75 | 72 | flood | flash-flood | en |
1 rows × 28 columns
df['source'].value_counts().fillna(0).plot(kind='bar')
<matplotlib.axes.AxesSubplot at 0x1093a6110>
df['lang'].value_counts().fillna(0).plot(kind='bar')
<matplotlib.axes.AxesSubplot at 0x109174190>
df['adminArea1'].value_counts().fillna(0).plot(kind='bar')
<matplotlib.axes.AxesSubplot at 0x109317810>
tags = pd.DataFrame([df['tag1'], df['tag2']])
tag_counts = tags.stack().value_counts()
tag_counts
armed-conflict 30 conflict 28 military 3 death 3 air-combat 3 human-rights 2 Votes 2 disaster 2 criminal 2 fire 2 government 2 She 1 rebels 1 arrest 1 dtype: int64
tags.stack().value_counts().plot(kind='bar')
<matplotlib.axes.AxesSubplot at 0x10957a5d0>
pd.crosstab(df['adminArea1'], df['source'], rownames=['SOURCE:'], colnames=['LOCATION:'])
LOCATION: | vdc_syria | youtube | ||
---|---|---|---|---|
SOURCE: | ||||
Egypt | 0 | 1 | 0 | 0 |
General-conflict | 0 | 5 | 0 | 0 |
Libya | 0 | 8 | 0 | 0 |
Syria | 169 | 0 | 2 | 9 |
Thailand | 6 | 0 | 0 | 0 |
5 rows × 4 columns
pd.crosstab(df['adminArea1'], df['lang'], rownames=['LOCATION:'], colnames=['LANGUAGE:'])
LANGUAGE: | ar | en | es | nl | pt | th | vi |
---|---|---|---|---|---|---|---|
LOCATION: | |||||||
Egypt | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
General-conflict | 0 | 4 | 0 | 0 | 0 | 0 | 1 |
Libya | 4 | 4 | 0 | 0 | 0 | 0 | 0 |
Syria | 86 | 41 | 1 | 1 | 3 | 0 | 0 |
Thailand | 0 | 3 | 0 | 0 | 0 | 3 | 0 |
5 rows × 7 columns
df.head()
author | content | contentEnglish | createdAt | entities | fromURL | geo | id | image | language | license | lifespan | publishedAt | remoteID | source | summary | tags | updatedAt | video | adminArea1 | adminArea3 | adminArea5 | formattedAddress | neighborhood | streetAddress | addressComponents | coords | locationIdentifiers | latitude | longitude | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
updatedAt | |||||||||||||||||||||||||||||||
2014-06-01 09:51:14.942730 | NaN | Mohammed Yehya al-Houndi killed by Shooting. | NaN | 2014-05-28T20:19:51.945796 | NaN | http://www.vdc-sy.info/index.php/en/details/ma... | {'addressComponents': {'adminArea1': 'Syria', ... | S4aCYMNeRM27iFuKi_mJPw | NaN | NaN | cc | temporary | 2014-09-16T00:00:00 | 111181 | vdc_syria | Adult - Male Civilian killed by Shooting | [{'name': 'conflict', 'confidence': 1}, {'name... | 2014-06-01 09:51:14.942730 | NaN | Syria | Daraa | NaN | NaN | Sanamain | NaN | {'adminArea1': 'Syria', 'adminArea3': 'Daraa',... | NaN | NaN | NaN | NaN | ... |
2014-06-01 09:53:11.429487 | NaN | Mohammed Yehya al-Houndi killed by Shooting. | NaN | 2014-05-28T04:20:13.815495 | NaN | http://www.vdc-sy.info/index.php/en/details/ma... | {'addressComponents': {'adminArea1': 'Syria', ... | JY3h2j-_RWaGSoBdD6Ey0w | NaN | NaN | cc | temporary | 2014-09-16T00:00:00 | 111181 | vdc_syria | Adult - Male Civilian killed by Shooting | [{'name': 'conflict', 'confidence': 1}, {'name... | 2014-06-01 09:53:11.429487 | NaN | Syria | Daraa | NaN | NaN | Sanamain | NaN | {'adminArea1': 'Syria', 'adminArea3': 'Daraa',... | NaN | NaN | NaN | NaN | ... |
2014-06-04 02:36:55.047509 | NaN | قناة السوري الحر الفضائية ( صلاة الفجر ) مكة ا... | Free Syrian satellite channel (dawn prayers) M... | 2014-06-04T02:36:45.356415 | [Syria] | http://facebook.com/422696344520369_5240711310... | {'coords': [38.473472595214844, 35.03312683105... | InCnRUdbQ36Cf1cWKUqzNQ | https://fbexternal-a.akamaihd.net/safe_image.p... | {'nativeName': 'العربية', 'code': 'ar', 'name'... | temporary | 2014-06-04T02:32:35+00:00 | 422696344520369_524071131049556 | قناة السوري الحر الفضائية ( صلاة الفجر ) مكة ا... | NaN | 2014-06-04 02:36:55.047509 | NaN | Syria | NaN | NaN | Syria | NaN | NaN | {'formattedAddress': 'Syria', 'adminArea1': 'S... | [38.473472595214844, 35.03312683105469] | {'authorLocationName': 'تركيا'} | 35.033127 | 38.473473 | ... | ||
2014-06-04 02:30:14.558210 | NaN | مجموعة البحث الموسيقي - حسين مروّة\n\nمن المست... | Find music group - Marwa Hussein\n\nOf the swa... | 2014-06-04T02:30:04.295555 | [Syria] | http://facebook.com/261252337411762 | {'coords': [38.473472595214844, 35.03312683105... | 5C9rzBWsROCxCvbLnygeBQ | https://fbcdn-vthumb-a.akamaihd.net/hvthumb-ak... | {'nativeName': 'العربية', 'code': 'ar', 'name'... | temporary | 2014-06-04T02:30:25+00:00 | 261252337411762 | مجموعة البحث الموسيقي - حسين مروّة\n\nمن المست... | NaN | 2014-06-04 02:30:14.558210 | https://fbcdn-video-a.akamaihd.net/hvideo-ak-x... | Syria | NaN | NaN | Syria | NaN | NaN | {'formattedAddress': 'Syria', 'adminArea1': 'S... | [38.473472595214844, 35.03312683105469] | NaN | 35.033127 | 38.473473 | ... | ||
2014-06-04 02:34:40.901612 | NaN | عاهدناكم يا شعب سورية على النصر\n#الجيش_العربي... | Aahidnakm O people of Syria to victory\nArmy _... | 2014-06-04T02:34:31.039971 | [Syria] | http://facebook.com/727851910594949 | {'coords': [38.473472595214844, 35.03312683105... | r2ZLDpQSS-SLMxZfHmHHAQ | https://fbcdn-photos-a-a.akamaihd.net/hphotos-... | {'nativeName': 'العربية', 'code': 'ar', 'name'... | temporary | 2014-06-04T02:30:00+00:00 | 727851910594949 | عاهدناكم يا شعب سورية على النصر\n#الجيش_العربي... | NaN | 2014-06-04 02:34:40.901612 | NaN | Syria | NaN | NaN | Syria | NaN | NaN | {'formattedAddress': 'Syria', 'adminArea1': 'S... | [38.473472595214844, 35.03312683105469] | {'authorLocationName': ' Damascus, Syria'} | 35.033127 | 38.473473 | ... |
5 rows × 33 columns
df.tail()
author | content | contentEnglish | createdAt | entities | fromURL | geo | id | image | language | license | lifespan | publishedAt | remoteID | source | summary | tags | updatedAt | video | adminArea1 | adminArea3 | adminArea5 | formattedAddress | neighborhood | streetAddress | addressComponents | coords | locationIdentifiers | latitude | longitude | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
updatedAt | |||||||||||||||||||||||||||||||
2014-06-04 01:33:51.140934 | NaN | يرفع الان آذان الفجر بتوقيت دمشق المحررة قريبا... | Raises far ears Dawn Time Damascus liberated s... | 2014-06-04T01:33:41.406417 | [Syria] | http://facebook.com/1506835209537961 | {'coords': [38.473472595214844, 35.03312683105... | K4YA2Da9Q5eJUyPACC5Bgg | https://fbcdn-photos-h-a.akamaihd.net/hphotos-... | {'nativeName': 'العربية', 'code': 'ar', 'name'... | temporary | 2014-06-04T00:38:14+00:00 | 1506835209537961 | يرفع الان آذان الفجر بتوقيت دمشق المحررة قريبا... | [{'name': 'armed-conflict', 'confidence': 1}, ... | 2014-06-04 01:33:51.140934 | NaN | Syria | NaN | NaN | Syria | NaN | NaN | {'formattedAddress': 'Syria', 'adminArea1': 'S... | [38.473472595214844, 35.03312683105469] | {'authorLocationName': 'Damascus, Syria Damasc... | 35.033127 | 38.473473 | ... | ||
2014-06-04 01:34:56.196292 | NaN | Governor of Homs: More than 644 000 in Homs Vo... | NaN | 2014-06-04T01:34:41.689972 | [Syria] | http://facebook.com/298382103530912_6851488448... | {'coords': [38.473472595214844, 35.03312683105... | HFpDxoLCSou7ODjj9BGAYw | NaN | {'nativeName': 'English', 'code': 'en', 'name'... | temporary | 2014-06-04T00:38:08+00:00 | 298382103530912_685148844854234 | Governor of Homs: More than 644 000 in Homs Vo... | NaN | 2014-06-04 01:34:56.196292 | NaN | Syria | NaN | NaN | Syria | NaN | NaN | {'formattedAddress': 'Syria', 'adminArea1': 'S... | [38.473472595214844, 35.03312683105469] | NaN | 35.033127 | 38.473473 | ... | ||
2014-06-04 01:34:41.729906 | NaN | "The will of the Syrian people through voting ... | NaN | 2014-06-04T01:34:41.729895 | NaN | http://facebook.com/685148648187587 | {'addressComponents': {'adminArea1': 'Syria'}} | hCGXMjzTRsO1DKn-pdV9PQ | https://fbcdn-photos-h-a.akamaihd.net/hphotos-... | NaN | temporary | 2014-06-04T00:37:23+00:00 | 685148648187587 | "The will of the Syrian people through voting ... | NaN | 2014-06-04 01:34:41.729906 | NaN | Syria | NaN | NaN | NaN | NaN | NaN | {'adminArea1': 'Syria'} | NaN | NaN | NaN | NaN | ... | ||
2014-06-04 00:40:31.436039 | NaN | abo jad tours \n\nتنويه : فيما يخص إشاعة الغاء... | abo jad tours\n\nDisclaimer: Regarding rumor c... | 2014-06-04T00:40:21.687859 | [Syria] | http://facebook.com/369949626434443_6228387244... | {'coords': [38.473472595214844, 35.03312683105... | ywNp1DXjQSuoSt3U-I5JJA | NaN | {'nativeName': 'العربية', 'code': 'ar', 'name'... | temporary | 2014-06-04T00:36:52+00:00 | 369949626434443_622838724478864 | abo jad tours \n\nتنويه : فيما يخص إشاعة الغاء... | [{'name': 'fire', 'confidence': 1}, {'name': '... | 2014-06-04 00:40:31.436039 | NaN | Syria | NaN | NaN | Syria | NaN | NaN | {'formattedAddress': 'Syria', 'adminArea1': 'S... | [38.473472595214844, 35.03312683105469] | {'authorLocationName': 'مصر Cairo, Egypt'} | 35.033127 | 38.473473 | ... | ||
2014-06-04 02:38:09.314566 | NaN | حين سئل الشاعر أحمد مطر ما هي نصائحك إلى القرا... | NaN | 2014-06-04T02:38:09.314554 | NaN | http://facebook.com/670941639653421 | {'locationIdentifiers': {'authorLocationName':... | pGgUaeOxSbiel3kmT-noPA | https://fbcdn-photos-g-a.akamaihd.net/hphotos-... | NaN | temporary | 2014-06-04T00:35:46+00:00 | 670941639653421 | حين سئل الشاعر أحمد مطر ما هي نصائحك إلى القرا... | NaN | 2014-06-04 02:38:09.314566 | NaN | Syria | NaN | NaN | NaN | NaN | NaN | {'adminArea1': 'Syria'} | NaN | {'authorLocationName': ' Duma, Syria'} | NaN | NaN | ... |
5 rows × 33 columns
df['createdAt'] = pd.to_datetime(df['createdAt'])
d2 = df.set_index('createdAt').resample('D', 'count')
d2.drop_duplicates(take_last=False, inplace=True)
d2 = d2.unstack().rename(columns={'content': 'daily_total'})
d2 = d2['daily_total']
d2.plot(kind='bar')
<matplotlib.axes.AxesSubplot at 0x1096524d0>
df.apply(lambda x: x.isnull().value_counts()).T.plot(kind='bar', stacked=True)
<matplotlib.axes.AxesSubplot at 0x109648d90>