We will cover some functions from YouTube Data API v3 from Google Developer Console.
Important Links:
We will use the following function:
Video Tutorial:
There is a Python Google Library. But we will be using HTTP requests to access the API.
api_key = ""
from __future__ import division
from datetime import datetime
import requests
from lxml import html, etree
import json
from textblob import TextBlob
import pandas as pd
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings('ignore')
pd.options.display.max_columns = 100
pd.options.display.max_rows = 35
pd.options.display.width = 120
youtube.search.list
¶https://developers.google.com/youtube/v3/docs/search
GET https://www.googleapis.com/youtube/v3/search
Parameter name | Value | Description |
---|---|---|
Required parameters | ||
part |
string |
The part parameter specifies a comma-separated list of one or more search resource properties that the API response will include. Set the parameter value to snippet . The snippet part has a quota cost of 1 unit.
|
Filters (specify 0 or 1 of the following parameters) | ||
forContentOwner |
boolean |
This parameter can only be used in a properly authorized request. Note: This parameter is intended exclusively for YouTube content partners. The forContentOwner parameter restricts the search to only retrieve resources owned by the content owner specified by the onBehalfOfContentOwner parameter. The user must be authenticated using a CMS account linked to the specified content owner and onBehalfOfContentOwner must be provided. |
forMine |
boolean |
This parameter can only be used in a properly authorized request. The forMine parameter restricts the search to only retrieve videos owned by the authenticated user. If you set this parameter to true , then the type parameter's value must also be set to video . |
relatedToVideoId |
string |
The relatedToVideoId parameter retrieves a list of videos that are related to the video that the parameter value identifies. The parameter value must be set to a YouTube video ID and, if you are using this parameter, the type parameter must be set to video . |
Optional parameters | ||
channelId |
string |
The channelId parameter indicates that the API response should only contain resources created by the channel |
channelType |
string |
The channelType parameter lets you restrict a search to a particular type of channel.Acceptable values are:
|
eventType |
string |
The eventType parameter restricts a search to broadcast events. If you specify a value for this parameter, you must also set the type parameter's value to video .Acceptable values are:
|
location |
string |
The location parameter, in conjunction with the locationRadius parameter, defines a circular geographic area and also restricts a search to videos that specify, in their metadata, a geographic location that falls within that area. The parameter value is a string that specifies latitude/longitude coordinates e.g. (37.42307,-122.08427 ).
location parameter but does not also specify a value for the locationRadius parameter. |
locationRadius |
string |
The locationRadius parameter, in conjunction with the location parameter, defines a circular geographic area.The parameter value must be a floating point number followed by a measurement unit. Valid measurement units are m , km , ft , and mi . For example, valid parameter values include 1500m , 5km , 10000ft , and 0.75mi . The API does not support locationRadius parameter values larger than 1000 kilometers.Note: See the definition of the location parameter for more information. |
maxResults |
unsigned integer |
The maxResults parameter specifies the maximum number of items that should be returned in the result set. Acceptable values are 0 to 50 , inclusive. The default value is 5 . |
onBehalfOfContentOwner |
string |
This parameter can only be used in a properly authorized request. Note: This parameter is intended exclusively for YouTube content partners. The onBehalfOfContentOwner parameter indicates that the request's authorization credentials identify a YouTube CMS user who is acting on behalf of the content owner specified in the parameter value. This parameter is intended for YouTube content partners that own and manage many different YouTube channels. It allows content owners to authenticate once and get access to all their video and channel data, without having to provide authentication credentials for each individual channel. The CMS account that the user authenticates with must be linked to the specified YouTube content owner. |
order |
string |
The order parameter specifies the method that will be used to order resources in the API response. The default value is relevance .Acceptable values are:
|
pageToken |
string |
The pageToken parameter identifies a specific page in the result set that should be returned. In an API response, the nextPageToken and prevPageToken properties identify other pages that could be retrieved. |
publishedAfter |
datetime |
The publishedAfter parameter indicates that the API response should only contain resources created after the specified time. The value is an RFC 3339 formatted date-time value (1970-01-01T00:00:00Z). |
publishedBefore |
datetime |
The publishedBefore parameter indicates that the API response should only contain resources created before the specified time. The value is an RFC 3339 formatted date-time value (1970-01-01T00:00:00Z). |
q |
string |
The q parameter specifies the query term to search for.Your request can also use the Boolean NOT ( - ) and OR (| ) operators to exclude videos or to find videos that are associated with one of several search terms. For example, to search for videos matching either "boating" or "sailing", set the q parameter value to boating|sailing . Similarly, to search for videos matching either "boating" or "sailing" but not "fishing", set the q parameter value to boating|sailing -fishing . Note that the pipe character must be URL-escaped when it is sent in your API request. The URL-escaped value for the pipe character is %7C . |
regionCode |
string |
The regionCode parameter instructs the API to return search results for the specified country. The parameter value is an ISO 3166-1 alpha-2 country code. |
safeSearch |
string |
The safeSearch parameter indicates whether the search results should include restricted content as well as standard content.Acceptable values are:
|
topicId |
string |
The topicId parameter indicates that the API response should only contain resources associated with the specified topic. The value identifies a Freebase topic ID. |
type |
string |
The type parameter restricts a search query to only retrieve a particular type of resource. The value is a comma-separated list of resource types. The default value is video,channel,playlist .Acceptable values are:
|
videoCaption |
string |
The videoCaption parameter indicates whether the API should filter video search results based on whether they have captions. If you specify a value for this parameter, you must also set the type parameter's value to video .Acceptable values are:
|
videoCategoryId |
string |
The videoCategoryId parameter filters video search results based on their category. If you specify a value for this parameter, you must also set the type parameter's value to video . |
videoDefinition |
string |
The videoDefinition parameter lets you restrict a search to only include either high definition (HD) or standard definition (SD) videos. HD videos are available for playback in at least 720p, though higher resolutions, like 1080p, might also be available. If you specify a value for this parameter, you must also set the type parameter's value to video .Acceptable values are:
|
videoDimension |
string |
The videoDimension parameter lets you restrict a search to only retrieve 2D or 3D videos. If you specify a value for this parameter, you must also set the type parameter's value to video .Acceptable values are:
|
videoDuration |
string |
The videoDuration parameter filters video search results based on their duration. If you specify a value for this parameter, you must also set the type parameter's value to video .Acceptable values are:
|
videoEmbeddable |
string |
The videoEmbeddable parameter lets you to restrict a search to only videos that can be embedded into a webpage. If you specify a value for this parameter, you must also set the type parameter's value to video .Acceptable values are:
|
videoLicense |
string |
The videoLicense parameter filters search results to only include videos with a particular license. YouTube lets video uploaders choose to attach either the Creative Commons license or the standard YouTube license to each of their videos. If you specify a value for this parameter, you must also set the type parameter's value to video .Acceptable values are:
|
videoSyndicated |
string |
The videoSyndicated parameter lets you to restrict a search to only videos that can be played outside youtube.com. If you specify a value for this parameter, you must also set the type parameter's value to video .Acceptable values are:
|
videoType |
string |
The videoType parameter lets you restrict a search to a particular type of videos. If you specify a value for this parameter, you must also set the type parameter's value to video .Acceptable values are:
|
part
:
id
: Returns only resource ID data
snippet
: Returns some basic meta data about the resource
channelId
:
Filter results to a single channelId.
maxResults
:
Between 0 and 50 results per page. The default is 5.
order
:
date: Resources are sorted in reverse chronological order based on the date they were uploaded.
rating: Resources are sorted from highest to lowest rating.
relevance: Resources are sorted based on their relevance to the search query. This is the default value for this parameter.
title: Resources are sorted alphabetically by title.
videoCount: Channels are sorted in descending order of their number of uploaded videos.
viewCount: Resources are sorted from highest to lowest number of views.
pageToken
:
A string token to select results page
publishedAfter
:
Use RFC 3339 format for Date Time 2000-12-31T23:59:59
publishedBefore
:
Use RFC 3339 format for Date Time 2000-12-31T23:59:59
q
:
Query term(s)
You can use multiple search terms
For OR operator use |
For NOT operator use -
key
:
You API Key code
parameters = {"part": "snippet",
"maxResults": 5,
"order": "date",
"pageToken": "",
"publishedAfter": "2008-08-04T00:00:00Z",
"publishedBefore": "2008-11-04T00:00:00Z",
"q": "",
"key": api_key,
"type": "video",
}
url = "https://www.googleapis.com/youtube/v3/search"
parameters["q"] = "Mark Udall"
page = requests.request(method="get", url=url, params=parameters)
j_results = json.loads(page.text)
print page.text
{ "kind": "youtube#searchListResponse", "etag": "\"PSjn-HSKiX6orvNhGZvglLI2lvk/_2hFMhP6zvFl7CAy5D9Ir40dMWE\"", "nextPageToken": "CAUQAA", "pageInfo": { "totalResults": 2325, "resultsPerPage": 5 }, "items": [ { "kind": "youtube#searchResult", "etag": "\"PSjn-HSKiX6orvNhGZvglLI2lvk/tmAMwya2pvXlrX05odd04vzKBSQ\"", "id": { "kind": "youtube#video", "videoId": "5Q98TvXjIZg" }, "snippet": { "publishedAt": "2008-11-03T15:31:30.000Z", "channelId": "UC52X5wxOL_s5yw0dQk7NtgA", "title": "Cousins Vying to Ride Democratic Wave to Senate", "description": "Cousins Tom and Mark Udall are vying to become U.S. Senators in New Mexico and Colorado. The two are hoping to ride an emerging Democratic wave in the ...", "thumbnails": { "default": { "url": "https://i.ytimg.com/vi/5Q98TvXjIZg/default.jpg" }, "medium": { "url": "https://i.ytimg.com/vi/5Q98TvXjIZg/mqdefault.jpg" }, "high": { "url": "https://i.ytimg.com/vi/5Q98TvXjIZg/hqdefault.jpg" } }, "channelTitle": "AssociatedPress", "liveBroadcastContent": "none" } }, { "kind": "youtube#searchResult", "etag": "\"PSjn-HSKiX6orvNhGZvglLI2lvk/Pqtrk7f6rZM5jwPLKtXoJ98nNtg\"", "id": { "kind": "youtube#video", "videoId": "nnghUTeSKW0" }, "snippet": { "publishedAt": "2008-11-03T00:06:40.000Z", "channelId": "UC9ZGcEDoHfuY8lB5_SknuLA", "title": "mark udall", "description": "gov project.", "thumbnails": { "default": { "url": "https://i.ytimg.com/vi/nnghUTeSKW0/default.jpg" }, "medium": { "url": "https://i.ytimg.com/vi/nnghUTeSKW0/mqdefault.jpg" }, "high": { "url": "https://i.ytimg.com/vi/nnghUTeSKW0/hqdefault.jpg" } }, "channelTitle": "g072091", "liveBroadcastContent": "none" } }, { "kind": "youtube#searchResult", "etag": "\"PSjn-HSKiX6orvNhGZvglLI2lvk/ay0GP1CevugYOb4FvBtJzXG_A0c\"", "id": { "kind": "youtube#video", "videoId": "Pq-KnAMpDHs" }, "snippet": { "publishedAt": "2008-11-01T00:55:26.000Z", "channelId": "UC5QhjJAjxtRvJ9ujFxNiJbA", "title": "Eden Lane One on One with Congressman Mark Udall", "description": "Senate candidate, Congressman Mark Udall spoke with me at a campaign event. Congressional candidate Jared Polis, and State Senate Candidate Joe ...", "thumbnails": { "default": { "url": "https://i.ytimg.com/vi/Pq-KnAMpDHs/default.jpg" }, "medium": { "url": "https://i.ytimg.com/vi/Pq-KnAMpDHs/mqdefault.jpg" }, "high": { "url": "https://i.ytimg.com/vi/Pq-KnAMpDHs/hqdefault.jpg" } }, "channelTitle": "missedenlane", "liveBroadcastContent": "none" } }, { "kind": "youtube#searchResult", "etag": "\"PSjn-HSKiX6orvNhGZvglLI2lvk/y4ELzTNng5HKENL9FoHgkDV304k\"", "id": { "kind": "youtube#video", "videoId": "aITDlrkKOoY" }, "snippet": { "publishedAt": "2008-11-01T00:42:17.000Z", "channelId": "UCT3P1V7_N5HzV1vEZUSdNNQ", "title": "CO: AFGE, APWU, NALC, and NPMHU leaflet with Mark Udall", "description": "APWU, NALC, and NPMHU are out at the worksite when it matters most!", "thumbnails": { "default": { "url": "https://i.ytimg.com/vi/aITDlrkKOoY/default.jpg" }, "medium": { "url": "https://i.ytimg.com/vi/aITDlrkKOoY/mqdefault.jpg" }, "high": { "url": "https://i.ytimg.com/vi/aITDlrkKOoY/hqdefault.jpg" } }, "channelTitle": "shubi10", "liveBroadcastContent": "none" } }, { "kind": "youtube#searchResult", "etag": "\"PSjn-HSKiX6orvNhGZvglLI2lvk/JgS_14GwklWRyGGiyMW8NbsyiQA\"", "id": { "kind": "youtube#video", "videoId": "JAHI1pSiEPM" }, "snippet": { "publishedAt": "2008-10-30T04:43:12.000Z", "channelId": "UCxdp8upAlGFfB4jjTH3wAHw", "title": "[SEN-CO] Udall: Reason", "description": "http://politicalrealm.blogspot.com A new campaign ad from Democrat Mark Udall.", "thumbnails": { "default": { "url": "https://i.ytimg.com/vi/JAHI1pSiEPM/default.jpg" }, "medium": { "url": "https://i.ytimg.com/vi/JAHI1pSiEPM/mqdefault.jpg" }, "high": { "url": "https://i.ytimg.com/vi/JAHI1pSiEPM/hqdefault.jpg" } }, "channelTitle": "PoliticalRealm", "liveBroadcastContent": "none" } } ] }
https://developers.google.com/youtube/v3/docs/videos/list
GET https://www.googleapis.com/youtube/v3/videos
Parameter name | Value | Description |
---|---|---|
Required parameters | ||
part |
string |
The part parameter specifies a comma-separated list of one or more video resource properties that the API response will include.If the parameter identifies a property that contains child properties, the child properties will be included in the response. For example, in a video resource, the snippet property contains the channelId , title , description , tags , and categoryId properties. As such, if you set part=snippet , the API response will contain all of those properties.The list below contains the part names that you can include in the parameter value and the quota cost for each part:
|
Filters (specify exactly one of the following parameters) | ||
chart |
string |
The chart parameter identifies the chart that you want to retrieve.Acceptable values are:
|
id |
string |
The id parameter specifies a comma-separated list of the YouTube video ID(s) for the resource(s) that are being retrieved. In a video resource, the id property specifies the video's ID. |
myRating |
string |
This parameter can only be used in a properly authorized request. Set this parameter's value to like or dislike to instruct the API to only return videos liked or disliked by the authenticated user.Acceptable values are:
|
Optional parameters | ||
maxResults |
unsigned integer |
The maxResults parameter specifies the maximum number of items that should be returned in the result set.Note: This parameter is supported for use in conjunction with the myRating parameter, but it is not supported for use in conjunction with the id parameter. Acceptable values are 1 to 50 , inclusive. The default value is 5 . |
onBehalfOfContentOwner |
string |
This parameter can only be used in a properly authorized request. Note: This parameter is intended exclusively for YouTube content partners. The onBehalfOfContentOwner parameter indicates that the request's authorization credentials identify a YouTube CMS user who is acting on behalf of the content owner specified in the parameter value. This parameter is intended for YouTube content partners that own and manage many different YouTube channels. It allows content owners to authenticate once and get access to all their video and channel data, without having to provide authentication credentials for each individual channel. The CMS account that the user authenticates with must be linked to the specified YouTube content owner. |
pageToken |
string |
The pageToken parameter identifies a specific page in the result set that should be returned. In an API response, the nextPageToken and prevPageToken properties identify other pages that could be retrieved.Note: This parameter is supported for use in conjunction with the myRating parameter, but it is not supported for use in conjunction with the id parameter. |
regionCode |
string |
The regionCode parameter instructs the API to select a video chart available in the specified region. This parameter can only be used in conjunction with the chart parameter. The parameter value is an ISO 3166-1 alpha-2 country code. |
videoCategoryId |
string |
The videoCategoryId parameter identifies the video category for which the chart should be retrieved. This parameter can only be used in conjunction with the chart parameter. By default, charts are not restricted to a particular category. The default value is 0 . |
parameters = {"part": "statistics",
"id": "5Q98TvXjIZg",
"key": api_key,
}
url = "https://www.googleapis.com/youtube/v3/videos"
page = requests.request(method="get", url=url, params=parameters)
j_results = json.loads(page.text)
print page.text
{ "kind": "youtube#videoListResponse", "etag": "\"PSjn-HSKiX6orvNhGZvglLI2lvk/RI2HLqoe4gS1QbNV867B5089lmY\"", "pageInfo": { "totalResults": 1, "resultsPerPage": 1 }, "items": [ { "kind": "youtube#video", "etag": "\"PSjn-HSKiX6orvNhGZvglLI2lvk/HdPfiQBFpxUe-eEq-EYCkg3p4b8\"", "id": "5Q98TvXjIZg", "statistics": { "viewCount": "58", "likeCount": "0", "dislikeCount": "0", "favoriteCount": "0", "commentCount": "0" } } ] }
I'll check the coorelation between the results of 2008 Senate elections results and YouTube Stats.
Colorado Senate - Gardner vs. Udall Cory Gardner (R) Mark Udall (D)
def _search_list(q="", publishedAfter=None, publishedBefore=None, pageToken=""):
parameters = {"part": "id",
"maxResults": 50,
"order": "viewCount",
"pageToken": pageToken,
"q": q,
"type": "video",
"key": api_key,
}
url = "https://www.googleapis.com/youtube/v3/search"
if publishedAfter: parameters["publishedAfter"] = publishedAfter
if publishedBefore: parameters["publishedBefore"] = publishedBefore
page = requests.request(method="get", url=url, params=parameters)
return json.loads(page.text)
def search_list(q="", publishedAfter=None, publishedBefore=None, max_requests=10):
more_results = True
pageToken=""
results = []
for counter in range(max_requests):
j_results = _search_list(q=q, publishedAfter=publishedAfter, publishedBefore=publishedBefore, pageToken=pageToken)
items = j_results.get("items", None)
if items:
results += [item["id"]["videoId"] for item in j_results["items"]]
if j_results.has_key("nextPageToken"):
pageToken = j_results["nextPageToken"]
else:
return results
else:
return results
return results
def _video_list(video_id_list):
parameters = {"part": "statistics",
"id": ",".join(video_id_list),
"key": api_key,
"maxResults": 50
}
url = "https://www.googleapis.com/youtube/v3/videos"
page = requests.request(method="get", url=url, params=parameters)
j_results = json.loads(page.text)
df = pd.DataFrame([item["statistics"] for item in j_results["items"]], dtype=np.int64)
df["video_id"] = [item["id"] for item in j_results["items"]]
parameters["part"] = "snippet"
page = requests.request(method="get", url=url, params=parameters)
j_results = json.loads(page.text)
df["publishedAt"] = [item["snippet"]["publishedAt"] for item in j_results["items"]]
df["publishedAt"] = df["publishedAt"].apply(lambda x: datetime.strptime(x, "%Y-%m-%dT%H:%M:%S.000Z"))
df["date"] = df["publishedAt"].apply(lambda x: x.date())
df["week"] = df["date"].apply(lambda x: x.isocalendar()[1])
df["channelId"] = [item["snippet"]["channelId"] for item in j_results["items"]]
df["title"] = [item["snippet"]["title"] for item in j_results["items"]]
df["description"] = [item["snippet"]["description"] for item in j_results["items"]]
df["channelTitle"] = [item["snippet"]["channelTitle"] for item in j_results["items"]]
df["categoryId"] = [item["snippet"]["categoryId"] for item in j_results["items"]]
return df
def video_list(video_id_list):
values = []
for index, item in enumerate(video_id_list[::50]):
t_index = index * 50
values.append(_video_list(video_id_list[t_index:t_index+50]))
return pd.concat(values)
def get_data(candidates, publishedAfter, publishedBefore):
results_list = []
for q in candidates:
results = search_list(q=q,
publishedAfter=publishedAfter,
publishedBefore=publishedBefore,
max_requests=50)
stat_data_set = video_list(results)
stat_data_set["candidate_name"] = q
results_list.append(stat_data_set)
data_set = pd.concat(results_list)
return data_set
def get_2008_data(candidates):
return get_data(candidates, publishedAfter="2008-08-04T00:00:00Z", publishedBefore="2008-11-04T00:00:00Z")
def get_2010_data(candidates):
return get_data(candidates, publishedAfter="2010-08-04T00:00:00Z", publishedBefore="2010-11-04T00:00:00Z")
def get_2012_data(candidates):
return get_data(candidates, publishedAfter="2012-08-04T00:00:00Z", publishedBefore="2012-11-04T00:00:00Z")
def get_2014_data(candidates):
return get_data(candidates, publishedAfter="2014-08-04T00:00:00Z", publishedBefore="2014-11-04T00:00:00Z")
candidates = ["Cory Gardner", "Mark Udall"] # Cory Gardner (R), Mark Udall (D)*
colorado_2014_ds = get_2014_data(candidates)
pd.pivot_table(colorado_2014_ds, values=["commentCount", "favoriteCount", "dislikeCount", "likeCount", "viewCount"],
aggfunc='sum', rows="candidate_name")
commentCount | dislikeCount | favoriteCount | likeCount | viewCount | |
---|---|---|---|---|---|
candidate_name | |||||
Cory Gardner | 304 | 167 | 0 | 437 | 234669 |
Mark Udall | 195 | 470 | 0 | 450 | 144744 |
for candidate, color in zip(candidates, ["r", "b"]):
cand = colorado_2014_ds[colorado_2014_ds["candidate_name"]==candidate]
by_date = cand["week"].value_counts()
by_date = by_date.sort_index()
dates = by_date.index
plt.plot(dates, by_date.values, "-o", label=candidate, c=color, linewidth=2)
plt.legend(loc="best")
plt.ylabel("Videos Published")
plt.xlabel("Week")
plt.show()
for candidate, color in zip(candidates, ["r", "b"]):
cand = colorado_2014_ds[colorado_2014_ds["candidate_name"]==candidate]
by_date = pd.pivot_table(cand, rows=["week"], values=["viewCount"], aggfunc="sum")
by_date = by_date.sort_index()
dates = by_date.index
plt.plot(dates, by_date.values, "-o", label=candidate, c=color, linewidth=2)
plt.legend(loc="best")
plt.ylabel("Videos viewCount")
plt.xlabel("Week")
plt.show()
for candidate, color in zip(candidates, ["r", "b"]):
cand = colorado_2014_ds[colorado_2014_ds["candidate_name"]==candidate]
by_date = pd.pivot_table(cand, rows=["week"], values=["likeCount"], aggfunc="sum")
by_date = by_date.sort_index()
dates = by_date.index
plt.plot(dates, by_date.values, "-o", label=candidate, c=color, linewidth=2)
plt.legend(loc="best")
plt.ylabel("Videos likeCount")
plt.xlabel("Week")
plt.show()
for candidate, color in zip(candidates, ["r", "b"]):
cand = colorado_2014_ds[colorado_2014_ds["candidate_name"]==candidate]
by_date = pd.pivot_table(cand, rows=["week"], values=["dislikeCount"], aggfunc="sum")
by_date = by_date.sort_index()
dates = by_date.index
plt.plot(dates, by_date.values, "-o", label=candidate, c=color, linewidth=2)
plt.legend(loc="best")
plt.ylabel("Videos dislikeCount")
plt.xlabel("Week")
plt.show()
candidates = ["George Allen", "Tim Kaine"] # George Allen (R), Tim Kaine (D)Winner
va_2012_ds = get_2012_data(candidates)
pd.pivot_table(va_2012_ds, values=["commentCount", "favoriteCount", "dislikeCount", "likeCount", "viewCount"],
aggfunc='sum', rows="candidate_name")
commentCount | dislikeCount | favoriteCount | likeCount | viewCount | |
---|---|---|---|---|---|
candidate_name | |||||
George Allen | 297 | 352 | 0 | 475 | 203297 |
Tim Kaine | 174 | 97 | 0 | 553 | 248367 |
for candidate, color in zip(candidates, ["r", "b"]):
cand = va_2012_ds[va_2012_ds["candidate_name"]==candidate]
by_date = cand["week"].value_counts()
by_date = by_date.sort_index()
dates = by_date.index
plt.plot(dates, by_date.values, "-o", label=candidate, c=color, linewidth=2)
plt.legend(loc="best")
plt.ylabel("Videos Published")
plt.xlabel("Week")
plt.show()
for candidate, color in zip(candidates, ["r", "b"]):
cand = va_2012_ds[va_2012_ds["candidate_name"]==candidate]
by_date = pd.pivot_table(cand, rows=["week"], values=["viewCount"], aggfunc="sum")
by_date = by_date.sort_index()
dates = by_date.index
plt.plot(dates, by_date.values, "-o", label=candidate, c=color, linewidth=2)
plt.legend(loc="best")
plt.ylabel("Videos viewCount")
plt.xlabel("Week")
plt.show()
for candidate, color in zip(candidates, ["r", "b"]):
cand = va_2012_ds[va_2012_ds["candidate_name"]==candidate]
by_date = pd.pivot_table(cand, rows=["week"], values=["likeCount"], aggfunc="sum")
by_date = by_date.sort_index()
dates = by_date.index
plt.plot(dates, by_date.values, "-o", label=candidate, c=color, linewidth=2)
plt.legend(loc="best")
plt.ylabel("Videos likeCount")
plt.xlabel("Week")
plt.show()
for candidate, color in zip(candidates, ["r", "b"]):
cand = va_2012_ds[va_2012_ds["candidate_name"]==candidate]
by_date = pd.pivot_table(cand, rows=["week"], values=["dislikeCount"], aggfunc="sum")
by_date = by_date.sort_index()
dates = by_date.index
plt.plot(dates, by_date.values, "-o", label=candidate, c=color, linewidth=2)
plt.legend(loc="best")
plt.ylabel("Videos dislikeCount")
plt.xlabel("Week")
plt.show()
candidates = ["Dean Heller", "Shelley Berkley"] # Dean Heller (R)*Winnner, Shelley Berkley (D)
nv_2012_ds = get_2012_data(candidates)
print pd.pivot_table(nv_2012_ds, values=["commentCount", "favoriteCount", "dislikeCount", "likeCount", "viewCount"],
aggfunc='sum', rows="candidate_name")
for candidate, color in zip(candidates, ["r", "b"]):
cand = nv_2012_ds[nv_2012_ds["candidate_name"]==candidate]
by_date = cand["week"].value_counts()
by_date = by_date.sort_index()
dates = by_date.index
plt.plot(dates, by_date.values, "-o", label=candidate, c=color, linewidth=2)
plt.legend(loc="best")
plt.ylabel("Videos Published")
plt.xlabel("Week")
plt.show()
for candidate, color in zip(candidates, ["r", "b"]):
cand = nv_2012_ds[nv_2012_ds["candidate_name"]==candidate]
by_date = pd.pivot_table(cand, rows=["week"], values=["viewCount"], aggfunc="sum")
by_date = by_date.sort_index()
dates = by_date.index
plt.plot(dates, by_date.values, "-o", label=candidate, c=color, linewidth=2)
plt.legend(loc="best")
plt.ylabel("Videos viewCount")
plt.xlabel("Week")
plt.show()
for candidate, color in zip(candidates, ["r", "b"]):
cand = nv_2012_ds[nv_2012_ds["candidate_name"]==candidate]
by_date = pd.pivot_table(cand, rows=["week"], values=["likeCount"], aggfunc="sum")
by_date = by_date.sort_index()
dates = by_date.index
plt.plot(dates, by_date.values, "-o", label=candidate, c=color, linewidth=2)
plt.legend(loc="best")
plt.ylabel("Videos likeCount")
plt.xlabel("Week")
plt.show()
for candidate, color in zip(candidates, ["r", "b"]):
cand = nv_2012_ds[nv_2012_ds["candidate_name"]==candidate]
by_date = pd.pivot_table(cand, rows=["week"], values=["dislikeCount"], aggfunc="sum")
by_date = by_date.sort_index()
dates = by_date.index
plt.plot(dates, by_date.values, "-o", label=candidate, c=color, linewidth=2)
plt.legend(loc="best")
plt.ylabel("Videos dislikeCount")
plt.xlabel("Week")
plt.show()
commentCount dislikeCount favoriteCount likeCount viewCount candidate_name Dean Heller 248 644 0 926 870677 Shelley Berkley 222 206 0 472 679636
url = "http://www.senate.gov/general/contact_information/senators_cfm.xml"
response = requests.get(url)
tree = etree.fromstring(str(response.text))
print tree
<Element contact_information at 0x7ff0804d4170>
member_full = [member.xpath("member_full")[0].text for member in tree.xpath("//member")]
senators = pd.DataFrame(member_full, columns=["member_full"])
senators["member_full"] = member_full
senators["last_name"] = [member.xpath("last_name")[0].text for member in tree.xpath("//member")]
senators["first_name"] = [member.xpath("first_name")[0].text for member in tree.xpath("//member")]
senators["party"] = [member.xpath("party")[0].text for member in tree.xpath("//member")]
senators["state"] = [member.xpath("state")[0].text for member in tree.xpath("//member")]
senators["address"] = [member.xpath("address")[0].text for member in tree.xpath("//member")]
senators["phone"] = [member.xpath("phone")[0].text for member in tree.xpath("//member")]
senators["website"] = [member.xpath("website")[0].text for member in tree.xpath("//member")]
senators["bioguide_id"] = [member.xpath("bioguide_id")[0].text for member in tree.xpath("//member")]
senators["class"] = [member.xpath("class")[0].text for member in tree.xpath("//member")]
senators
<class 'pandas.core.frame.DataFrame'> Int64Index: 100 entries, 0 to 99 Data columns (total 10 columns): member_full 100 non-null values last_name 100 non-null values first_name 100 non-null values party 100 non-null values state 100 non-null values address 100 non-null values phone 100 non-null values website 100 non-null values bioguide_id 100 non-null values class 100 non-null values dtypes: object(10)
by_party = senators["party"].value_counts()
by_party.sort(ascending=False)
print by_party
color_dict = {"D": "b",
"R": "r",
"I": "g"}
labels = ["%s: %s" % (by_party.index[index], value) for index, value in enumerate(by_party)]
colors = list(pd.Series(by_party.index).map(color_dict))
plt.figure()
plt.axis("equal")
plt.pie(by_party.values, labels=labels, colors=colors, shadow=True, explode=np.zeros(len(by_party)) + 0.04)
plt.show()
fig = plt.figure()
axes = fig.add_subplot(111)
axes.barh(range(len(by_party.index)), by_party.values, color=colors)
plt.box(on="off")
axes.axvline(x=50, color="black", alpha=0.7, linewidth=2)
axes.yaxis.set_ticks([item + 0.4 for item in range(len(by_party.index))])
axes.yaxis.set_ticklabels(by_party.index, minor=False)
plt.xlabel("$113^{th}$ Senate Seats Controlled by Party")
plt.show()
D 53 R 45 I 2 dtype: int64
Class II senators are up for re-election.
class_2_senators = senators[senators["class"]=="Class II"]
by_party =class_2_senators["party"].value_counts()
by_party.sort(ascending=False)
print by_party
labels = ["%s: %s" % (by_party.index[index], value) for index, value in enumerate(by_party)]
colors = list(pd.Series(by_party.index).map(color_dict))
plt.figure()
plt.axis("equal")
plt.pie(by_party.values, labels=labels, colors=colors, shadow=True, explode=np.zeros(len(by_party)) + 0.04)
plt.show()
color_dict = {"D": "b",
"R": "r",
"I": "g"}
fig = plt.figure()
axes = fig.add_subplot(111)
axes.barh(range(len(by_party.index)), by_party.values, color=colors)
plt.box(on="off")
axes.yaxis.set_ticks([item + 0.4 for item in range(len(by_party.index))])
axes.yaxis.set_ticklabels(by_party.index, minor=False)
plt.xlabel("$113^{th}$ Senate Seats of $Class II$ Controlled by Party")
plt.show()
D 20 R 13 dtype: int64
class_3_senators = senators[senators["class"]=="Class III"]
by_party =class_3_senators["party"].value_counts()
by_party.sort(ascending=False)
print by_party
labels = ["%s: %s" % (by_party.index[index], value) for index, value in enumerate(by_party)]
colors = list(pd.Series(by_party.index).map(color_dict))
plt.figure()
plt.axis("equal")
plt.pie(by_party.values, labels=labels, colors=colors, shadow=True, explode=np.zeros(len(by_party)) + 0.04)
plt.show()
color_dict = {"D": "b",
"R": "r",
"I": "g"}
fig = plt.figure()
axes = fig.add_subplot(111)
axes.barh(range(len(by_party.index)), by_party.values, color=colors)
plt.box(on="off")
axes.yaxis.set_ticks([item + 0.4 for item in range(len(by_party.index))])
axes.yaxis.set_ticklabels(by_party.index, minor=False)
plt.xlabel("$113^{th}$ Senate Seats of $Class III$ Controlled by Party")
plt.show()
R 24 D 10 dtype: int64
class_1_senators = senators[senators["class"]=="Class I"]
by_party =class_1_senators["party"].value_counts()
by_party.sort(ascending=False)
print by_party
labels = ["%s: %s" % (by_party.index[index], value) for index, value in enumerate(by_party)]
colors = list(pd.Series(by_party.index).map(color_dict))
plt.figure()
plt.axis("equal")
plt.pie(by_party.values, labels=labels, colors=colors, shadow=True, explode=np.zeros(len(by_party)) + 0.04)
plt.show()
color_dict = {"D": "b",
"R": "r",
"I": "g"}
fig = plt.figure()
axes = fig.add_subplot(111)
axes.barh(range(len(by_party.index)), by_party.values, color=colors)
plt.box(on="off")
axes.yaxis.set_ticks([item + 0.4 for item in range(len(by_party.index))])
axes.yaxis.set_ticklabels(by_party.index, minor=False)
plt.xlabel("$113^{th}$ Senate Seats of $Class I$ Controlled by Party")
plt.show()
D 23 R 8 I 2 dtype: int64
Start with listing all seat in $Class II$
class_2_senators = senators[senators["class"]=="Class II"].sort("state")
class_2_senators
<class 'pandas.core.frame.DataFrame'> Int64Index: 33 entries, 4 to 29 Data columns (total 10 columns): member_full 33 non-null values last_name 33 non-null values first_name 33 non-null values party 33 non-null values state 33 non-null values address 33 non-null values phone 33 non-null values website 33 non-null values bioguide_id 33 non-null values class 33 non-null values dtypes: object(10)
url = "http://www.fec.gov/data/CandidateSummary.do?format=xml"
response = requests.get(url)
page = html.fromstring(str(response.text))
print response.text[:1000]
<data.fec.gov xmlns:fecdc="http://www.w3.org/2001/XMLSchema-instance" fecdc:schemaLocation="/data /finance/disclosure/schema/CandidateSummary.xsd"><title>Candidate Summary</title><description>This file contains information for each candidate who has registered with the FEC or appears on an official state ballot for an election to the U.S. House of Representatives, U.S. Senate or U.S. President. The table is available for the current election cycle and for election cycles through 2008.</description><timestamp>2014-10-09T05:06:27-05:00</timestamp><copyright>Copyright 2014, Federal Election Commission.</copyright><can_sum><lin_ima>http://www.fec.gov/fecviewer/CandidateCommitteeDetail.do?candidateCommitteeId=H4UT04052&tabIndex=1</lin_ima><can_id>H4UT04052</can_id><can_nam>AALDERS, TIM</can_nam><can_off>H</can_off><can_off_sta>UT</can_off_sta><can_off_dis>04</can_off_dis><can_par_aff>IAP</can_par_aff><can_inc_cha_ope_sea>OPEN</can_inc_cha_ope_sea><can_str1>5306 WEST 10320 NORTH</can_str
for item in page[:10]:
print item.tag
title description timestamp copyright can_sum can_sum can_sum can_sum can_sum can_sum
Notice <can_sum>
encapsulates the candidates data.
for item in page.xpath("//can_sum")[0]:
print "<%s>%s</%s>" % (item.tag, str(item.text), item.tag)
<lin_ima>http://www.fec.gov/fecviewer/CandidateCommitteeDetail.do?candidateCommitteeId=H4UT04052&tabIndex=1</lin_ima> <can_id>H4UT04052</can_id> <can_nam>AALDERS, TIM</can_nam> <can_off>H</can_off> <can_off_sta>UT</can_off_sta> <can_off_dis>04</can_off_dis> <can_par_aff>IAP</can_par_aff> <can_inc_cha_ope_sea>OPEN</can_inc_cha_ope_sea> <can_str1>5306 WEST 10320 NORTH</can_str1> <can_str2>None</can_str2> <can_cit>HIGHLAND</can_cit> <can_sta>UT</can_sta> <can_zip>84003</can_zip> <ind_ite_con>None</ind_ite_con> <ind_uni_con>None</ind_uni_con> <ind_con>None</ind_con> <par_com_con>None</par_com_con> <oth_com_con>None</oth_com_con> <can_con>None</can_con> <tot_con>None</tot_con> <tra_fro_oth_aut_com>None</tra_fro_oth_aut_com> <can_loa>None</can_loa> <oth_loa>None</oth_loa> <tot_loa>None</tot_loa> <off_to_ope_exp>None</off_to_ope_exp> <off_to_fun>None</off_to_fun> <off_to_leg_acc>None</off_to_leg_acc> <oth_rec>None</oth_rec> <tot_rec>None</tot_rec> <ope_exp>None</ope_exp> <exe_leg_acc_dis>None</exe_leg_acc_dis> <fun_dis>None</fun_dis> <tra_to_oth_aut_com>None</tra_to_oth_aut_com> <can_loa_rep>None</can_loa_rep> <oth_loa_rep>None</oth_loa_rep> <tot_loa_rep>None</tot_loa_rep> <ind_ref>None</ind_ref> <par_com_ref>None</par_com_ref> <oth_com_ref>None</oth_com_ref> <tot_con_ref>None</tot_con_ref> <oth_dis>None</oth_dis> <tot_dis>None</tot_dis> <cas_on_han_beg_of_per>None</cas_on_han_beg_of_per> <cas_on_han_clo_of_per>None</cas_on_han_clo_of_per> <net_con>None</net_con> <net_ope_exp>None</net_ope_exp> <deb_owe_by_com>None</deb_owe_by_com> <deb_owe_to_com>None</deb_owe_to_com> <cov_sta_dat>None</cov_sta_dat> <cov_end_dat>None</cov_end_dat>
cand_list = [cand for cand in page.xpath("//can_sum") if cand.xpath("can_off")[0].text=="S"]
lin_ima = [cand.xpath("lin_ima")[0].text for cand in cand_list]
len(lin_ima)
412
senate_cadidate = pd.DataFrame(lin_ima, columns=["lin_ima"])
senate_cadidate["can_id"] = [cand.xpath("can_id")[0].text for cand in cand_list]
senate_cadidate["can_nam"] = [cand.xpath("can_nam")[0].text for cand in cand_list]
senate_cadidate["can_off"] = [cand.xpath("can_off")[0].text for cand in cand_list]
senate_cadidate["can_off_sta"] = [cand.xpath("can_off_sta")[0].text for cand in cand_list]
senate_cadidate["can_par_aff"] = [cand.xpath("can_par_aff")[0].text for cand in cand_list]
senate_cadidate["can_inc_cha_ope_sea"] = [cand.xpath("can_inc_cha_ope_sea")[0].text for cand in cand_list]
senate_cadidate["ind_ite_con"] = [cand.xpath("ind_ite_con")[0].text for cand in cand_list]
senate_cadidate["ind_uni_con"] = [cand.xpath("ind_uni_con")[0].text for cand in cand_list]
senate_cadidate["ind_con"] = [cand.xpath("ind_con")[0].text for cand in cand_list]
senate_cadidate["par_com_con"] = [cand.xpath("par_com_con")[0].text for cand in cand_list]
senate_cadidate["oth_com_con"] = [cand.xpath("oth_com_con")[0].text for cand in cand_list]
senate_cadidate["can_con"] = [cand.xpath("can_con")[0].text for cand in cand_list]
senate_cadidate["tot_con"] = [cand.xpath("tot_con")[0].text for cand in cand_list]
senate_cadidate["tra_fro_oth_aut_com"] = [cand.xpath("tra_fro_oth_aut_com")[0].text for cand in cand_list]
senate_cadidate["can_loa"] = [cand.xpath("can_loa")[0].text for cand in cand_list]
senate_cadidate["oth_loa"] = [cand.xpath("oth_loa")[0].text for cand in cand_list]
senate_cadidate["tot_loa"] = [cand.xpath("tot_loa")[0].text for cand in cand_list]
senate_cadidate["off_to_ope_exp"] = [cand.xpath("off_to_ope_exp")[0].text for cand in cand_list]
senate_cadidate["off_to_fun"] = [cand.xpath("off_to_fun")[0].text for cand in cand_list]
senate_cadidate["off_to_leg_acc"] = [cand.xpath("off_to_leg_acc")[0].text for cand in cand_list]
senate_cadidate["oth_rec"] = [cand.xpath("oth_rec")[0].text for cand in cand_list]
senate_cadidate["tot_rec"] = [cand.xpath("tot_rec")[0].text for cand in cand_list]
senate_cadidate["ope_exp"] = [cand.xpath("ope_exp")[0].text for cand in cand_list]
senate_cadidate["fun_dis"] = [cand.xpath("fun_dis")[0].text for cand in cand_list]
senate_cadidate["exe_leg_acc_dis"] = [cand.xpath("exe_leg_acc_dis")[0].text for cand in cand_list]
senate_cadidate["tra_to_oth_aut_com"] = [cand.xpath("tra_to_oth_aut_com")[0].text for cand in cand_list]
senate_cadidate["can_loa_rep"] = [cand.xpath("can_loa_rep")[0].text for cand in cand_list]
senate_cadidate["oth_loa_rep"] = [cand.xpath("oth_loa_rep")[0].text for cand in cand_list]
senate_cadidate["tot_loa_rep"] = [cand.xpath("tot_loa_rep")[0].text for cand in cand_list]
senate_cadidate["ind_ref"] = [cand.xpath("ind_ref")[0].text for cand in cand_list]
senate_cadidate["par_com_ref"] = [cand.xpath("par_com_ref")[0].text for cand in cand_list]
senate_cadidate["oth_com_ref"] = [cand.xpath("oth_com_ref")[0].text for cand in cand_list]
senate_cadidate["tot_con_ref"] = [cand.xpath("tot_con_ref")[0].text for cand in cand_list]
senate_cadidate["oth_dis"] = [cand.xpath("oth_dis")[0].text for cand in cand_list]
senate_cadidate["tot_dis"] = [cand.xpath("tot_dis")[0].text for cand in cand_list]
senate_cadidate["cas_on_han_beg_of_per"] = [cand.xpath("cas_on_han_beg_of_per")[0].text for cand in cand_list]
senate_cadidate["cas_on_han_clo_of_per"] = [cand.xpath("cas_on_han_clo_of_per")[0].text for cand in cand_list]
senate_cadidate["net_con"] = [cand.xpath("net_con")[0].text for cand in cand_list]
senate_cadidate["net_ope_exp"] = [cand.xpath("net_ope_exp")[0].text for cand in cand_list]
senate_cadidate["deb_owe_by_com"] = [cand.xpath("deb_owe_by_com")[0].text for cand in cand_list]
senate_cadidate["deb_owe_to_com"] = [cand.xpath("deb_owe_to_com")[0].text for cand in cand_list]
senate_cadidate["cov_sta_dat"] = [cand.xpath("cov_sta_dat")[0].text for cand in cand_list]
senate_cadidate["cov_end_dat"] = [cand.xpath("cov_end_dat")[0].text for cand in cand_list]
senate_cadidate
<class 'pandas.core.frame.DataFrame'> Int64Index: 412 entries, 0 to 411 Data columns (total 44 columns): lin_ima 412 non-null values can_id 412 non-null values can_nam 412 non-null values can_off 412 non-null values can_off_sta 412 non-null values can_par_aff 412 non-null values can_inc_cha_ope_sea 411 non-null values ind_ite_con 197 non-null values ind_uni_con 186 non-null values ind_con 204 non-null values par_com_con 37 non-null values oth_com_con 116 non-null values can_con 101 non-null values tot_con 220 non-null values tra_fro_oth_aut_com 61 non-null values can_loa 103 non-null values oth_loa 11 non-null values tot_loa 102 non-null values off_to_ope_exp 104 non-null values off_to_fun 0 non-null values off_to_leg_acc 0 non-null values oth_rec 76 non-null values tot_rec 223 non-null values ope_exp 221 non-null values fun_dis 0 non-null values exe_leg_acc_dis 0 non-null values tra_to_oth_aut_com 22 non-null values can_loa_rep 34 non-null values oth_loa_rep 4 non-null values tot_loa_rep 38 non-null values ind_ref 117 non-null values par_com_ref 4 non-null values oth_com_ref 47 non-null values tot_con_ref 121 non-null values oth_dis 86 non-null values tot_dis 221 non-null values cas_on_han_beg_of_per 69 non-null values cas_on_han_clo_of_per 192 non-null values net_con 215 non-null values net_ope_exp 217 non-null values deb_owe_by_com 108 non-null values deb_owe_to_com 4 non-null values cov_sta_dat 231 non-null values cov_end_dat 231 non-null values dtypes: object(44)
def get_state_data(candidates):
data_set = get_2014_data(candidates)
t_ds = pd.pivot_table(data_set, values=["commentCount", "favoriteCount", "dislikeCount", "likeCount", "viewCount"],
aggfunc='sum', rows="candidate_name")
t_ds["like_dislike_r"] = t_ds["likeCount"] / (t_ds["dislikeCount"] + t_ds["likeCount"])
t_ds["views_share"] = t_ds["viewCount"] / t_ds["viewCount"].sum()
t_ds["msgs_share"] = t_ds["commentCount"] / t_ds["commentCount"].sum()
t_ds["likes_share"] = t_ds["likeCount"] / t_ds["likeCount"].sum()
t_ds["dislikes_share"] = t_ds["dislikeCount"] / t_ds["dislikeCount"].sum()
print t_ds
return t_ds
def fix_name(val_name):
val_names = val_name.split(", ")
return "%s %s" % (val_names[1].split(" ")[0].capitalize(), val_names[0].capitalize())
values_list = []
for index, state in zip(class_2_senators.index, class_2_senators["state"]):
print "%s: %s" % (state,
class_2_senators["member_full"][index])
candidates = senate_cadidate[senate_cadidate["can_off_sta"]==state]
candidates = candidates[~senate_cadidate["tot_rec"].isnull()]
candidates["tot_rec_num"] = candidates["tot_rec"].apply(lambda x: x[1:].replace(",","")).astype(np.float64)
top_candidates = candidates.sort("tot_rec_num", ascending=False)[:2][["can_nam",
"can_par_aff",
"can_inc_cha_ope_sea",
"tot_rec_num",
"can_off_sta"]]
top_candidates["full_name"] = [fix_name(name) for name in top_candidates.values[:,0]]
top_candidates = top_candidates.sort("full_name")
print top_candidates["full_name"]
try:
ds = get_state_data([fix_name(name) for name in top_candidates.values[:,0]])
ds["state"] = state
ds["party"] = top_candidates["can_par_aff"].values
ds["donations"] = top_candidates["tot_rec_num"].values
values_list.append(ds)
except:
print "NA"
sentate_2014 = pd.concat(values_list)
sentate_2014
AK: Begich (D-AK) 359 Dan Sullivan 27 Mark Begich Name: full_name, dtype: object commentCount dislikeCount favoriteCount likeCount viewCount like_dislike_r views_share \ candidate_name Dan Sullivan 228 90 0 496 189278 0.846416 0.644151 Mark Begich 65 96 0 157 104563 0.620553 0.355849 msgs_share likes_share dislikes_share candidate_name Dan Sullivan 0.778157 0.759571 0.483871 Mark Begich 0.221843 0.240429 0.516129 AL: Sessions (R-AL) 335 Jeff Sessions Name: full_name, dtype: object commentCount dislikeCount favoriteCount likeCount viewCount like_dislike_r views_share \ candidate_name Jeff Sessions 137 10 0 29 4800 0.74359 1 msgs_share likes_share dislikes_share candidate_name Jeff Sessions 1 1 1 AR: Pryor (D-AR) 290 Mark Pryor 89 Thomas Cotton Name: full_name, dtype: object commentCount dislikeCount favoriteCount likeCount viewCount like_dislike_r views_share \ candidate_name Mark Pryor 139 37 0 175 228610 0.825472 0.263103 Thomas Cotton 152 73 0 270 640288 0.787172 0.736897 msgs_share likes_share dislikes_share candidate_name Mark Pryor 0.477663 0.393258 0.336364 Thomas Cotton 0.522337 0.606742 0.663636 CO: Udall (D-CO) 137 Cory Gardner 375 Mark Udall Name: full_name, dtype: object commentCount dislikeCount favoriteCount likeCount viewCount like_dislike_r views_share \ candidate_name Cory Gardner 327 157 0 467 282673 0.748397 0.659577 Mark Udall 263 455 0 411 145894 0.474596 0.340423 msgs_share likes_share dislikes_share candidate_name Cory Gardner 0.554237 0.531891 0.256536 Mark Udall 0.445763 0.468109 0.743464 DE: Coons (D-DE) 85 Christopher Coons 379 Kevin Wade Name: full_name, dtype: object commentCount dislikeCount favoriteCount likeCount viewCount like_dislike_r views_share \ candidate_name Christopher Coons 200 20 0 110 20040 0.846154 0.54844 Kevin Wade 110 30 0 190 16500 0.863636 0.45156 msgs_share likes_share dislikes_share candidate_name Christopher Coons 0.645161 0.366667 0.4 Kevin Wade 0.354839 0.633333 0.6 GA: Chambliss (R-GA) 197 John Kingston 261 Mary Nunn Name: full_name, dtype: object commentCount dislikeCount favoriteCount likeCount viewCount like_dislike_r views_share \ candidate_name John Kingston 113 4 0 60 2597 0.937500 0.727247 Mary Nunn 70 8 0 18 974 0.692308 0.272753 msgs_share likes_share dislikes_share candidate_name John Kingston 0.617486 0.769231 0.333333 Mary Nunn 0.382514 0.230769 0.666667 IA: Harkin (D-IA) 43 Bruce Braley 180 Mark Jacobs Name: full_name, dtype: object commentCount dislikeCount favoriteCount likeCount viewCount like_dislike_r views_share \ candidate_name Bruce Braley 260 70 0 278 87802 0.798851 0.056535 Mark Jacobs 7341 1053 0 66086 1465259 0.984316 0.943465 msgs_share likes_share dislikes_share candidate_name Bruce Braley 0.034206 0.004189 0.062333 Mark Jacobs 0.965794 0.995811 0.937667 ID: Risch (R-ID) 253 Briane Mitchell 306 James Risch Name: full_name, dtype: object NA IL: Durbin (D-IL) 264 James Oberweis 115 Richard Durbin Name: full_name, dtype: object NA KS: Roberts (R-KS) 403 Milton Wolf 307 Pat Roberts Name: full_name, dtype: object commentCount dislikeCount favoriteCount likeCount viewCount like_dislike_r views_share \ candidate_name Milton Wolf 20 20 0 20 5210 0.500000 0.039943 Pat Roberts 488 98 0 1000 125227 0.910747 0.960057 msgs_share likes_share dislikes_share candidate_name Milton Wolf 0.03937 0.019608 0.169492 Pat Roberts 0.96063 0.980392 0.830508 KY: McConnell (R-KY) 152 Alison Grimes 239 Mitch Mcconnell Name: full_name, dtype: object commentCount dislikeCount favoriteCount likeCount viewCount like_dislike_r views_share \ candidate_name Alison Grimes 1362 376 0 2797 706090 0.881500 0.432752 Mitch Mcconnell 2247 291 0 4839 925538 0.943275 0.567248 msgs_share likes_share dislikes_share candidate_name Alison Grimes 0.37739 0.366291 0.563718 Mitch Mcconnell 0.62261 0.633709 0.436282 LA: Landrieu (D-LA) 206 Mary Landrieu 69 William Cassidy Name: full_name, dtype: object commentCount dislikeCount favoriteCount likeCount viewCount like_dislike_r views_share \ candidate_name Mary Landrieu 744 79 0 2113 407306 0.963960 0.984916 William Cassidy 84 1 0 94 6238 0.989474 0.015084 msgs_share likes_share dislikes_share candidate_name Mary Landrieu 0.898551 0.957408 0.9875 William Cassidy 0.101449 0.042592 0.0125 MA: Markey (D-MA) 231 Edward Markey 146 Gabriel Gomez Name: full_name, dtype: object commentCount dislikeCount favoriteCount likeCount viewCount like_dislike_r views_share \ candidate_name Edward Markey 0 10 0 70 7994 0.875000 0.004272 Gabriel Gomez 920 515 0 1235 1863296 0.705714 0.995728 msgs_share likes_share dislikes_share candidate_name Edward Markey 0 0.05364 0.019048 Gabriel Gomez 1 0.94636 0.980952 ME: Collins (R-ME) 29 Shenna Bellows 80 Susan Collins Name: full_name, dtype: object commentCount dislikeCount favoriteCount likeCount viewCount like_dislike_r views_share \ candidate_name Shenna Bellows 15 0 0 6 57891 1.00000 0.163068 Susan Collins 90 18 0 351 297121 0.95122 0.836932 msgs_share likes_share dislikes_share candidate_name Shenna Bellows 0.142857 0.016807 0 Susan Collins 0.857143 0.983193 1 MI: Levin (D-MI) 280 Gary Peters 205 Terri Land Name: full_name, dtype: object commentCount dislikeCount favoriteCount likeCount viewCount like_dislike_r views_share \ candidate_name Gary Peters 89 45 0 424 37748 0.904051 0.012098 Terri Land 130 558 0 1784 3082566 0.761742 0.987902 msgs_share likes_share dislikes_share candidate_name Gary Peters 0.406393 0.192029 0.074627 Terri Land 0.593607 0.807971 0.925373 MN: Franken (D-MN) 133 Al Franken 242 Michael Mcfadden Name: full_name, dtype: object commentCount dislikeCount favoriteCount likeCount viewCount like_dislike_r views_share \ candidate_name Al Franken 186 79 0 235 262735 0.748408 0.894219 Michael Mcfadden 57 40 0 106 31080 0.726027 0.105781 msgs_share likes_share dislikes_share candidate_name Al Franken 0.765432 0.68915 0.663866 Michael Mcfadden 0.234568 0.31085 0.336134 MS: Cochran (R-MS) 241 Christopher Mcdaniel 79 Thad Cochran Name: full_name, dtype: object commentCount dislikeCount favoriteCount likeCount viewCount like_dislike_r views_share \ candidate_name Christopher Mcdaniel 0 0 0 6 1062 1.000000 0.061698 Thad Cochran 109 10 0 55 16151 0.846154 0.938302 msgs_share likes_share dislikes_share candidate_name Christopher Mcdaniel 0 0.098361 0 Thad Cochran 1 0.901639 1 MT: Walsh (D-MT) 384 John Walsh 104 Steven Daines Name: full_name, dtype: object NA NC: Hagan (D-NC) 155 Kay Hagan 370 Thom Tillis Name: full_name, dtype: object commentCount dislikeCount favoriteCount likeCount viewCount like_dislike_r views_share \ candidate_name Kay Hagan 278 128 0 271 51704 0.679198 0.056229 Thom Tillis 231 206 0 561 867825 0.731421 0.943771 msgs_share likes_share dislikes_share candidate_name Kay Hagan 0.546169 0.325721 0.383234 Thom Tillis 0.453831 0.674279 0.616766 NE: Johanns (R-NE) 325 Benjamin Sasse 111 Sid Dinsdale Name: full_name, dtype: object NA NH: Shaheen (D-NH) 336 Jeanne Shaheen 50 Scott Brown Name: full_name, dtype: object commentCount dislikeCount favoriteCount likeCount viewCount like_dislike_r views_share \ candidate_name Jeanne Shaheen 30 34 0 84 37123 0.711864 0.085959 Scott Brown 721 96 0 2465 394746 0.962515 0.914041 msgs_share likes_share dislikes_share candidate_name Jeanne Shaheen 0.039947 0.032954 0.261538 Scott Brown 0.960053 0.967046 0.738462 NJ: Booker (D-NJ) 36 Cory Booker 272 Frank Pallone Name: full_name, dtype: object commentCount dislikeCount favoriteCount likeCount viewCount like_dislike_r views_share \ candidate_name Cory Booker 299 43 0 688 93738 0.941176 0.997956 Frank Pallone 1 0 0 0 192 inf 0.002044 msgs_share likes_share dislikes_share candidate_name Cory Booker 0.996667 1 1 Frank Pallone 0.003333 0 0 NM: Udall (D-NM) 391 Allen Weh 376 Tom Udall Name: full_name, dtype: object commentCount dislikeCount favoriteCount likeCount viewCount like_dislike_r views_share \ candidate_name Allen Weh 800 70 0 250 1950780 0.781250 0.987187 Tom Udall 700 40 0 670 25320 0.943662 0.012813 msgs_share likes_share dislikes_share candidate_name Allen Weh 0.533333 0.271739 0.636364 Tom Udall 0.466667 0.728261 0.363636 OK: Inhofe (R-OK) 178 James Inhofe 207 James Lankford Name: full_name, dtype: object commentCount dislikeCount favoriteCount likeCount viewCount like_dislike_r views_share \ candidate_name James Inhofe 1927 412 0 2116 172284 0.837025 0.985471 James Lankford 30 10 0 10 2540 0.500000 0.014529 msgs_share likes_share dislikes_share candidate_name James Inhofe 0.98467 0.995296 0.976303 James Lankford 0.01533 0.004704 0.023697 OR: Merkley (D-OR) 249 Jeffrey Merkley 392 Monica Wehby Name: full_name, dtype: object commentCount dislikeCount favoriteCount likeCount viewCount like_dislike_r views_share \ candidate_name Jeffrey Merkley 156 36 0 392 43632 0.915888 0.310716 Monica Wehby 135 305 0 660 96792 0.683938 0.689284 msgs_share likes_share dislikes_share candidate_name Jeffrey Merkley 0.536082 0.372624 0.105572 Monica Wehby 0.463918 0.627376 0.894428 RI: Reed (D-RI) 301 Jack Reed 410 Mark Zaccaria Name: full_name, dtype: object commentCount dislikeCount favoriteCount likeCount viewCount like_dislike_r views_share \ candidate_name Jack Reed 70 9 0 180 41526 0.952381 0.945277 Mark Zaccaria 10 0 0 0 2404 inf 0.054723 msgs_share likes_share dislikes_share candidate_name Jack Reed 0.875 1 1 Mark Zaccaria 0.125 0 0 SC: Graham (R-SC) 148 Lindsey Graham 333 Timothy Scott Name: full_name, dtype: object commentCount dislikeCount favoriteCount likeCount viewCount like_dislike_r views_share \ candidate_name Lindsey Graham 1274 184 0 1555 125933 0.894192 0.877735 Timothy Scott 198 0 0 332 17542 1.000000 0.122265 msgs_share likes_share dislikes_share candidate_name Lindsey Graham 0.865489 0.824059 1 Timothy Scott 0.134511 0.175941 0 SD: Johnson (D-SD) 40 Annette Bosworth 315 Marion Rounds Name: full_name, dtype: object NA TN: Alexander (R-TN) 131 George Flinn 8 Lamar Alexander Name: full_name, dtype: object commentCount dislikeCount favoriteCount likeCount viewCount like_dislike_r views_share \ candidate_name George Flinn 279 27 0 51 5247 0.653846 0.437505 Lamar Alexander 37 4 0 49 6746 0.924528 0.562495 msgs_share likes_share dislikes_share candidate_name George Flinn 0.882911 0.51 0.870968 Lamar Alexander 0.117089 0.49 0.129032 TX: Cornyn (R-TX) 7 David Alameel 88 John Cornyn Name: full_name, dtype: object commentCount dislikeCount favoriteCount likeCount viewCount like_dislike_r views_share \ candidate_name David Alameel 3 2 0 2 105 0.500000 0.004871 John Cornyn 166 20 0 260 21450 0.928571 0.995129 msgs_share likes_share dislikes_share candidate_name David Alameel 0.017751 0.007634 0.090909 John Cornyn 0.982249 0.992366 0.909091 VA: Warner (D-VA) 140 Edward Gillespie 386 Mark Warner Name: full_name, dtype: object commentCount dislikeCount favoriteCount likeCount viewCount like_dislike_r views_share \ candidate_name Edward Gillespie 2 0 0 8 434 1.000000 0.021218 Mark Warner 53 9 0 44 20020 0.830189 0.978782 msgs_share likes_share dislikes_share candidate_name Edward Gillespie 0.036364 0.153846 0 Mark Warner 0.963636 0.846154 1 WV: Rockefeller (D-WV) 366 Natalie Tennant 61 Shelley Capito Name: full_name, dtype: object commentCount dislikeCount favoriteCount likeCount viewCount like_dislike_r views_share \ candidate_name Natalie Tennant 350 271 0 830 149271 0.753860 0.945921 Shelley Capito 48 24 0 152 8534 0.863636 0.054079 msgs_share likes_share dislikes_share candidate_name Natalie Tennant 0.879397 0.845214 0.918644 Shelley Capito 0.120603 0.154786 0.081356 WY: Enzi (R-WY) 72 Elizabeth Cheney 121 Michael Enzi Name: full_name, dtype: object commentCount dislikeCount favoriteCount likeCount viewCount like_dislike_r views_share \ candidate_name Elizabeth Cheney 48 0 0 30 4482 1 0.749875 Michael Enzi 25 0 0 70 1495 1 0.250125 msgs_share likes_share dislikes_share candidate_name Elizabeth Cheney 0.657534 0.3 inf Michael Enzi 0.342466 0.7 inf
<class 'pandas.core.frame.DataFrame'> Index: 55 entries, Dan Sullivan to Michael Enzi Data columns (total 13 columns): commentCount 55 non-null values dislikeCount 55 non-null values favoriteCount 55 non-null values likeCount 55 non-null values viewCount 55 non-null values like_dislike_r 55 non-null values views_share 55 non-null values msgs_share 55 non-null values likes_share 55 non-null values dislikes_share 55 non-null values state 55 non-null values party 55 non-null values donations 55 non-null values dtypes: float64(6), int64(5), object(2)
class_2_senators["state"]
4 AK 84 AL 73 AR 91 CO 22 DE 17 GA 38 IA 76 ID 28 IL 77 KS 62 KY 54 LA 59 MA 21 ME 57 MI 33 MN 20 MS 94 MT 37 NC 47 NE 85 NH 8 NJ 92 NM 45 OK 64 OR 74 RI 35 SC 49 SD 0 TN 24 TX 95 VA 78 WV 29 WY Name: state, dtype: object
x_column = "views_share"
y_column = "viewCount"
s_column = "donations"
color_dict = {"DEM": "b", "REP": "r", "IND":"g", "NPA": "g", "DFL": "g"}
plt.figure(figsize=(18,12))
for party in sentate_2014["party"].unique():
cands = sentate_2014[sentate_2014["party"]==party]
x = cands[x_column]
y = cands[y_column]
size = sentate_2014[sentate_2014["party"]==party][s_column] / 3000000
plt.scatter(x,y, s=(np.array(size)) * 1000, c=color_dict[party], alpha=0.5)
print plt.ylim()[1]
plt.vlines(0.5, ymin=1, ymax=plt.ylim()[1]*0.9)
prejected_winners = sentate_2014[sentate_2014[x_column]>0.5]["party"].value_counts()
result_text = []
for item in sentate_2014.iterrows():#[sentate_2014[x_column]>0.5].iterrows():
plt.annotate(item[1]["state"], xy=(item[1][x_column], item[1][y_column]))
for item in sentate_2014[sentate_2014[x_column]>0.5].iterrows():
result_text += ["%s: %s (%s) - %0.1f%%" % (item[1]["state"], item[0], item[1]["party"], item[1]["views_share"] * 100.)]
result_text = "\n".join(result_text)
prejected_winners = "\n".join(["%s:%s" % (party, value) for party, value in zip(prejected_winners.index, prejected_winners.values)])
plt.annotate(prejected_winners, xy=(.65,plt.ylim()[1]*0.8))
plt.annotate(result_text, xy=(.8, 1.5))
plt.xlabel(x_column)
plt.ylabel(y_column + " (Log Scale)")
plt.grid()
plt.yscale("log")
#plt.axis("tight")
plt.title("Senate 2014 Elections Forecast (Size is relative and represents the amount of donations)")
plt.show()
3500000.0
sentate_2014[sentate_2014[x_column]>0.5]
commentCount | dislikeCount | favoriteCount | likeCount | viewCount | like_dislike_r | views_share | msgs_share | likes_share | dislikes_share | state | party | donations | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
candidate_name | |||||||||||||
Dan Sullivan | 360 | 94 | 0 | 623 | 165715 | 0.868898 | 0.613568 | 0.814480 | 0.774876 | 0.494737 | AK | DEM | 6340422.00 |
Jeff Sessions | 142 | 10 | 0 | 36 | 5114 | 0.782609 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | AL | REP | 1115688.00 |
Thomas Cotton | 100 | 91 | 0 | 195 | 797919 | 0.681818 | 0.814496 | 0.434783 | 0.537190 | 0.722222 | AR | REP | 7097224.06 |
Cory Gardner | 316 | 167 | 0 | 450 | 234223 | 0.729335 | 0.628876 | 0.633267 | 0.530660 | 0.268058 | CO | DEM | 10420571.00 |
Christopher Coons | 190 | 20 | 0 | 110 | 19950 | 0.846154 | 0.559278 | 0.627063 | 0.379310 | 0.400000 | DE | DEM | 4173447.00 |
John Kingston | 37 | 2 | 0 | 17 | 717 | 0.894737 | 0.540724 | 0.451220 | 0.447368 | 1.000000 | GA | DEM | 9211931.00 |
Mark Jacobs | 7334 | 1052 | 0 | 66064 | 1464016 | 0.984326 | 0.943058 | 0.959571 | 0.995765 | 0.932624 | IA | REP | 4810813.00 |
Pat Roberts | 256 | 83 | 0 | 374 | 97678 | 0.818381 | 0.943867 | 0.733524 | 0.869767 | 0.768519 | KS | REP | 1068018.00 |
Mitch Mcconnell | 2197 | 272 | 0 | 4760 | 893077 | 0.945946 | 0.569208 | 0.632230 | 0.640387 | 0.428346 | KY | DEM | 11353760.00 |
Mary Landrieu | 745 | 84 | 0 | 2015 | 395161 | 0.959981 | 0.986888 | 0.937107 | 0.971084 | 1.000000 | LA | DEM | 10190144.00 |
Gabriel Gomez | 899 | 514 | 0 | 1239 | 1855996 | 0.706788 | 0.995771 | 1.000000 | 0.946524 | 0.980916 | MA | REP | 4755654.00 |
Susan Collins | 81 | 9 | 0 | 351 | 291078 | 0.975000 | 0.834797 | 0.658537 | 0.983193 | 1.000000 | ME | DEM | 1333016.00 |
Terri Land | 110 | 560 | 0 | 1660 | 3077280 | 0.747748 | 0.988351 | 0.578947 | 0.783019 | 0.918033 | MI | DEM | 6994603.00 |
Al Franken | 141 | 71 | 0 | 208 | 260800 | 0.745520 | 0.897435 | 0.762162 | 0.753623 | 0.639640 | MN | DFL | 15126268.00 |
Thad Cochran | 74 | 10 | 0 | 35 | 15199 | 0.777778 | 0.955972 | 1.000000 | 0.897436 | 1.000000 | MS | REP | 2727209.00 |
Thom Tillis | 253 | 194 | 0 | 586 | 797943 | 0.751282 | 0.950310 | 0.496078 | 0.716381 | 0.619808 | NC | REP | 4764110.00 |
Scott Brown | 708 | 92 | 0 | 2456 | 388637 | 0.963893 | 0.916956 | 0.964578 | 0.967310 | 0.736000 | NH | REP | 3686708.00 |
Cory Booker | 355 | 60 | 0 | 894 | 130143 | 0.937107 | 0.998527 | 0.997191 | 1.000000 | 1.000000 | NJ | DEM | 16167874.00 |
Allen Weh | 821 | 71 | 0 | 256 | 1949518 | 0.782875 | 0.987909 | 0.539776 | 0.279476 | 0.702970 | NM | DEM | 5050539.00 |
James Inhofe | 1905 | 406 | 0 | 2094 | 170226 | 0.837600 | 0.981288 | 0.968480 | 0.984023 | 0.966667 | OK | REP | 2811701.00 |
Monica Wehby | 115 | 295 | 0 | 660 | 95280 | 0.691099 | 0.702572 | 0.481172 | 0.662651 | 0.891239 | OR | REP | 2049732.00 |
Jack Reed | 90 | 4 | 0 | 128 | 19412 | 0.969697 | 0.910763 | 0.865385 | 0.969697 | 1.000000 | RI | DEM | 2833802.97 |
Lindsey Graham | 1089 | 135 | 0 | 1412 | 119855 | 0.912734 | 0.891586 | 0.840927 | 0.811494 | 1.000000 | SC | REP | 6788544.00 |
Lamar Alexander | 34 | 4 | 0 | 48 | 6291 | 0.923077 | 0.590538 | 0.161905 | 0.578313 | 0.173913 | TN | REP | 1812250.00 |
John Cornyn | 166 | 20 | 0 | 260 | 21110 | 0.928571 | 0.995943 | 0.982249 | 0.996169 | 0.909091 | TX | DEM | 9673572.00 |
Natalie Tennant | 210 | 110 | 0 | 690 | 159430 | 0.862500 | 0.969238 | 0.860656 | 0.873418 | 0.873016 | WV | REP | 5482547.00 |
Elizabeth Cheney | 30 | 0 | 0 | 20 | 3900 | 1.000000 | 0.722892 | 0.545455 | 0.222222 | inf | WY | REP | 3016825.00 |
commentCount 360 dislikeCount 94 favoriteCount 0 likeCount 623 viewCount 165715 like_dislike_r 0.8688982 views_share 0.6135684 msgs_share 0.8144796 likes_share 0.7748756 dislikes_share 0.4947368 state AK party DEM donations 6340422 Name: Dan Sullivan, dtype: object commentCount 142 dislikeCount 10 favoriteCount 0 likeCount 36 viewCount 5114 like_dislike_r 0.7826087 views_share 1 msgs_share 1 likes_share 1 dislikes_share 1 state AL party REP donations 1115688 Name: Jeff Sessions, dtype: object commentCount 100 dislikeCount 91 favoriteCount 0 likeCount 195 viewCount 797919 like_dislike_r 0.6818182 views_share 0.8144956 msgs_share 0.4347826 likes_share 0.5371901 dislikes_share 0.7222222 state AR party REP donations 7097224 Name: Thomas Cotton, dtype: object commentCount 316 dislikeCount 167 favoriteCount 0 likeCount 450 viewCount 234223 like_dislike_r 0.7293355 views_share 0.6288761 msgs_share 0.6332665 likes_share 0.5306604 dislikes_share 0.2680578 state CO party DEM donations 1.042057e+07 Name: Cory Gardner, dtype: object commentCount 190 dislikeCount 20 favoriteCount 0 likeCount 110 viewCount 19950 like_dislike_r 0.8461538 views_share 0.5592778 msgs_share 0.6270627 likes_share 0.3793103 dislikes_share 0.4 state DE party DEM donations 4173447 Name: Christopher Coons, dtype: object commentCount 37 dislikeCount 2 favoriteCount 0 likeCount 17 viewCount 717 like_dislike_r 0.8947368 views_share 0.540724 msgs_share 0.4512195 likes_share 0.4473684 dislikes_share 1 state GA party DEM donations 9211931 Name: John Kingston, dtype: object commentCount 7334 dislikeCount 1052 favoriteCount 0 likeCount 66064 viewCount 1464016 like_dislike_r 0.9843256 views_share 0.9430583 msgs_share 0.9595708 likes_share 0.9957646 dislikes_share 0.9326241 state IA party REP donations 4810813 Name: Mark Jacobs, dtype: object commentCount 256 dislikeCount 83 favoriteCount 0 likeCount 374 viewCount 97678 like_dislike_r 0.8183807 views_share 0.9438673 msgs_share 0.7335244 likes_share 0.8697674 dislikes_share 0.7685185 state KS party REP donations 1068018 Name: Pat Roberts, dtype: object commentCount 2197 dislikeCount 272 favoriteCount 0 likeCount 4760 viewCount 893077 like_dislike_r 0.9459459 views_share 0.5692079 msgs_share 0.6322302 likes_share 0.6403875 dislikes_share 0.4283465 state KY party DEM donations 1.135376e+07 Name: Mitch Mcconnell, dtype: object commentCount 745 dislikeCount 84 favoriteCount 0 likeCount 2015 viewCount 395161 like_dislike_r 0.9599809 views_share 0.9868885 msgs_share 0.9371069 likes_share 0.9710843 dislikes_share 1 state LA party DEM donations 1.019014e+07 Name: Mary Landrieu, dtype: object commentCount 899 dislikeCount 514 favoriteCount 0 likeCount 1239 viewCount 1855996 like_dislike_r 0.7067884 views_share 0.9957712 msgs_share 1 likes_share 0.9465241 dislikes_share 0.980916 state MA party REP donations 4755654 Name: Gabriel Gomez, dtype: object commentCount 81 dislikeCount 9 favoriteCount 0 likeCount 351 viewCount 291078 like_dislike_r 0.975 views_share 0.8347974 msgs_share 0.6585366 likes_share 0.9831933 dislikes_share 1 state ME party DEM donations 1333016 Name: Susan Collins, dtype: object commentCount 110 dislikeCount 560 favoriteCount 0 likeCount 1660 viewCount 3077280 like_dislike_r 0.7477477 views_share 0.9883509 msgs_share 0.5789474 likes_share 0.7830189 dislikes_share 0.9180328 state MI party DEM donations 6994603 Name: Terri Land, dtype: object commentCount 141 dislikeCount 71 favoriteCount 0 likeCount 208 viewCount 260800 like_dislike_r 0.7455197 views_share 0.897435 msgs_share 0.7621622 likes_share 0.7536232 dislikes_share 0.6396396 state MN party DFL donations 1.512627e+07 Name: Al Franken, dtype: object commentCount 74 dislikeCount 10 favoriteCount 0 likeCount 35 viewCount 15199 like_dislike_r 0.7777778 views_share 0.9559721 msgs_share 1 likes_share 0.8974359 dislikes_share 1 state MS party REP donations 2727209 Name: Thad Cochran, dtype: object commentCount 253 dislikeCount 194 favoriteCount 0 likeCount 586 viewCount 797943 like_dislike_r 0.7512821 views_share 0.95031 msgs_share 0.4960784 likes_share 0.7163814 dislikes_share 0.6198083 state NC party REP donations 4764110 Name: Thom Tillis, dtype: object commentCount 708 dislikeCount 92 favoriteCount 0 likeCount 2456 viewCount 388637 like_dislike_r 0.9638932 views_share 0.9169557 msgs_share 0.9645777 likes_share 0.96731 dislikes_share 0.736 state NH party REP donations 3686708 Name: Scott Brown, dtype: object commentCount 355 dislikeCount 60 favoriteCount 0 likeCount 894 viewCount 130143 like_dislike_r 0.9371069 views_share 0.9985269 msgs_share 0.997191 likes_share 1 dislikes_share 1 state NJ party DEM donations 1.616787e+07 Name: Cory Booker, dtype: object commentCount 821 dislikeCount 71 favoriteCount 0 likeCount 256 viewCount 1949518 like_dislike_r 0.7828746 views_share 0.9879091 msgs_share 0.5397765 likes_share 0.279476 dislikes_share 0.7029703 state NM party DEM donations 5050539 Name: Allen Weh, dtype: object commentCount 1905 dislikeCount 406 favoriteCount 0 likeCount 2094 viewCount 170226 like_dislike_r 0.8376 views_share 0.981288 msgs_share 0.9684799 likes_share 0.9840226 dislikes_share 0.9666667 state OK party REP donations 2811701 Name: James Inhofe, dtype: object commentCount 115 dislikeCount 295 favoriteCount 0 likeCount 660 viewCount 95280 like_dislike_r 0.6910995 views_share 0.702572 msgs_share 0.4811715 likes_share 0.6626506 dislikes_share 0.8912387 state OR party REP donations 2049732 Name: Monica Wehby, dtype: object commentCount 90 dislikeCount 4 favoriteCount 0 likeCount 128 viewCount 19412 like_dislike_r 0.969697 views_share 0.9107629 msgs_share 0.8653846 likes_share 0.969697 dislikes_share 1 state RI party DEM donations 2833803 Name: Jack Reed, dtype: object commentCount 1089 dislikeCount 135 favoriteCount 0 likeCount 1412 viewCount 119855 like_dislike_r 0.9127343 views_share 0.8915859 msgs_share 0.8409266 likes_share 0.8114943 dislikes_share 1 state SC party REP donations 6788544 Name: Lindsey Graham, dtype: object commentCount 34 dislikeCount 4 favoriteCount 0 likeCount 48 viewCount 6291 like_dislike_r 0.9230769 views_share 0.5905379 msgs_share 0.1619048 likes_share 0.5783133 dislikes_share 0.173913 state TN party REP donations 1812250 Name: Lamar Alexander, dtype: object commentCount 166 dislikeCount 20 favoriteCount 0 likeCount 260 viewCount 21110 like_dislike_r 0.9285714 views_share 0.9959426 msgs_share 0.9822485 likes_share 0.9961686 dislikes_share 0.9090909 state TX party DEM donations 9673572 Name: John Cornyn, dtype: object commentCount 210 dislikeCount 110 favoriteCount 0 likeCount 690 viewCount 159430 like_dislike_r 0.8625 views_share 0.9692383 msgs_share 0.8606557 likes_share 0.8734177 dislikes_share 0.8730159 state WV party REP donations 5482547 Name: Natalie Tennant, dtype: object commentCount 30 dislikeCount 0 favoriteCount 0 likeCount 20 viewCount 3900 like_dislike_r 1 views_share 0.7228916 msgs_share 0.5454545 likes_share 0.2222222 dislikes_share inf state WY party REP donations 3016825 Name: Elizabeth Cheney, dtype: object
len(sentate_2014["state"].unique())
27
def get_state_data(candidates):
data_set = get_2012_data(candidates)
t_ds = pd.pivot_table(data_set, values=["commentCount", "favoriteCount", "dislikeCount", "likeCount", "viewCount"],
aggfunc='sum', rows="candidate_name")
t_ds["like_dislike_r"] = t_ds["likeCount"] / (t_ds["dislikeCount"] + t_ds["likeCount"])
t_ds["views_share"] = t_ds["viewCount"] / t_ds["viewCount"].sum()
t_ds["msgs_share"] = t_ds["commentCount"] / t_ds["commentCount"].sum()
t_ds["likes_share"] = t_ds["likeCount"] / t_ds["likeCount"].sum()
t_ds["dislikes_share"] = t_ds["dislikeCount"] / t_ds["dislikeCount"].sum()
# Sentemate Analysis of the title
t_ds["sentiment"] = pd.Series()
for cand in candidates:
t_ds["sentiment"][cand] = np.mean(
[TextBlob(title).polarity for title in data_set[data_set["candidate_name"]==cand]["title"]]
)
print t_ds
return t_ds
senate_2012 = pd.read_csv("data/2012_senate_results.csv")
senate_2012["Full Name"] = senate_2012["First Name"] + " " + senate_2012["Last Name"]
senate_2012
<class 'pandas.core.frame.DataFrame'> Int64Index: 126 entries, 0 to 125 Data columns (total 9 columns): State Postal 126 non-null values County Name 126 non-null values Party 126 non-null values First Name 126 non-null values Last Name 126 non-null values Incumbent 126 non-null values Vote Count 126 non-null values Winner 33 non-null values Full Name 126 non-null values dtypes: int64(2), object(7)
senate_2012["commentCount"] = pd.Series()
senate_2012["dislikeCount"] = pd.Series()
senate_2012["favoriteCount"] = pd.Series()
senate_2012["likeCount"] = pd.Series()
senate_2012["viewCount"] = pd.Series()
senate_2012["like_dislike_r"] = pd.Series()
senate_2012["views_share"] = pd.Series()
senate_2012["msgs_share"] = pd.Series()
senate_2012["likes_share"] = pd.Series()
senate_2012["dislikes_share"] = pd.Series()
senate_2012["sentiment"] = pd.Series()
for state in np.unique(senate_2012["State Postal"]):
print state + ":"
cands = senate_2012[senate_2012["State Postal"] == state]
top_cands = cands.sort("Vote Count",ascending=False)[:2]
#print top_cands
try:
youtube_stats = get_state_data(top_cands["Full Name"].values)
#print youtube_stats
# Store Data Back
for item in youtube_stats.iterrows():
cand = item[0]
stats = item[1]
index = int(senate_2012[senate_2012["Full Name"] == cand].index)
senate_2012["commentCount"][index] = stats["commentCount"]
senate_2012["dislikeCount"][index] = stats["dislikeCount"]
senate_2012["favoriteCount"][index] = stats["favoriteCount"]
senate_2012["likeCount"][index] = stats["likeCount"]
senate_2012["viewCount"][index] = stats["viewCount"]
senate_2012["like_dislike_r"][index] = stats["like_dislike_r"]
senate_2012["views_share"][index] = stats["views_share"]
senate_2012["msgs_share"][index] = stats["msgs_share"]
senate_2012["likes_share"][index] = stats["likes_share"]
senate_2012["dislikes_share"][index] = stats["dislikes_share"]
senate_2012["sentiment"][index] = stats["sentiment"]
except:
pass
AZ: commentCount dislikeCount favoriteCount likeCount viewCount like_dislike_r views_share \ candidate_name Jeff Flake 542 1240 0 2927 1829090 0.702424 0.58453 Richard Carmona 513 2112 0 4590 1300075 0.684870 0.41547 msgs_share likes_share dislikes_share sentiment candidate_name Jeff Flake 0.513744 0.389384 0.369928 0.016247 Richard Carmona 0.486256 0.610616 0.630072 0.044413 CA: commentCount dislikeCount favoriteCount likeCount viewCount like_dislike_r views_share \ candidate_name Dianne Feinstein 12130 1690 0 16760 3165370 0.908401 0.7182 Elizabeth Emken 4492 437 0 6777 1241994 0.939423 0.2818 msgs_share likes_share dislikes_share sentiment candidate_name Dianne Feinstein 0.729756 0.71207 0.794546 -0.000463 Elizabeth Emken 0.270244 0.28793 0.205454 -0.006897 CT: commentCount dislikeCount favoriteCount likeCount viewCount like_dislike_r views_share \ candidate_name Chris Murphy 233 88 0 447 159171 0.835514 0.371103 Linda McMahon 3962 418 0 4062 269742 0.906696 0.628897 msgs_share likes_share dislikes_share sentiment candidate_name Chris Murphy 0.055542 0.099135 0.173913 0.004893 Linda McMahon 0.944458 0.900865 0.826087 0.029649 DE: commentCount dislikeCount favoriteCount likeCount viewCount like_dislike_r views_share \ candidate_name Kevin Wade 60 10 0 70 11400 0.875 0.276123 Thomas Carper 28 10 0 70 29886 0.875 0.723877 msgs_share likes_share dislikes_share sentiment candidate_name Kevin Wade 0.681818 0.5 0.5 -0.031250 Thomas Carper 0.318182 0.5 0.5 0.018583 FL: commentCount dislikeCount favoriteCount likeCount viewCount like_dislike_r views_share \ candidate_name Bill Nelson 376 59 0 937 511466 0.940763 0.865892 Connie Mack 164 67 0 249 79215 0.787975 0.134108 msgs_share likes_share dislikes_share sentiment candidate_name Bill Nelson 0.696296 0.790051 0.468254 0.021819 Connie Mack 0.303704 0.209949 0.531746 0.015071 HI: commentCount dislikeCount favoriteCount likeCount viewCount like_dislike_r views_share \ candidate_name Linda Lingle 343 451 0 672 480724 0.598397 0.476687 Mazie Hirono 367 577 0 924 527744 0.615590 0.523313 msgs_share likes_share dislikes_share sentiment candidate_name Linda Lingle 0.483099 0.421053 0.438716 0.023237 Mazie Hirono 0.516901 0.578947 0.561284 0.065901 IN: commentCount dislikeCount favoriteCount likeCount viewCount like_dislike_r views_share \ candidate_name Joe Donnelly 3011 931 0 1203 2646352 0.563730 0.756489 Richard Mourdock 7869 1467 0 8777 851850 0.856794 0.243511 msgs_share likes_share dislikes_share sentiment candidate_name Joe Donnelly 0.276746 0.120541 0.38824 -0.009662 Richard Mourdock 0.723254 0.879459 0.61176 0.043522 MA: commentCount dislikeCount favoriteCount likeCount viewCount like_dislike_r views_share \ candidate_name Elizabeth Warren 10007 2208 0 15744 2458138 0.877005 0.718429 Scott Brown 5226 1259 0 6490 963410 0.837527 0.281571 msgs_share likes_share dislikes_share sentiment candidate_name Elizabeth Warren 0.656929 0.708105 0.636862 0.014748 Scott Brown 0.343071 0.291895 0.363138 0.019217 MD: commentCount dislikeCount favoriteCount likeCount viewCount like_dislike_r views_share \ candidate_name Ben Cardin 301 90 0 386 61822 0.810924 0.82694 Daniel Bongino 162 10 0 162 12938 0.941860 0.17306 msgs_share likes_share dislikes_share sentiment candidate_name Ben Cardin 0.650108 0.70438 0.9 0.008083 Daniel Bongino 0.349892 0.29562 0.1 -0.018750 ME: commentCount dislikeCount favoriteCount likeCount viewCount like_dislike_r views_share \ candidate_name Angus King 91 133 0 385 39436 0.743243 0.911689 Charles Summers 0 10 0 40 3820 0.800000 0.088311 msgs_share likes_share dislikes_share sentiment candidate_name Angus King 1 0.905882 0.93007 0.053315 Charles Summers 0 0.094118 0.06993 0.016667 MI: commentCount dislikeCount favoriteCount likeCount viewCount like_dislike_r views_share \ candidate_name Debbie Stabenow 63 83 0 595 428223 0.877581 0.870664 Pete Hoekstra 57 187 0 398 63612 0.680342 0.129336 msgs_share likes_share dislikes_share sentiment candidate_name Debbie Stabenow 0.525 0.599194 0.307407 0.062909 Pete Hoekstra 0.475 0.400806 0.692593 0.121008 MN: commentCount dislikeCount favoriteCount likeCount viewCount like_dislike_r views_share \ candidate_name Amy Klobuchar 264 242 0 430 97354 0.639881 0.659651 Kurt Bills 300 100 0 460 50230 0.821429 0.340349 msgs_share likes_share dislikes_share sentiment candidate_name Amy Klobuchar 0.468085 0.483146 0.707602 0.030468 Kurt Bills 0.531915 0.516854 0.292398 0.160000 MO: commentCount dislikeCount favoriteCount likeCount viewCount like_dislike_r views_share \ candidate_name Claire McCaskill 525 438 0 765 141427 0.635910 0.036271 Todd Akin 44285 5817 0 66830 3757741 0.919928 0.963729 msgs_share likes_share dislikes_share sentiment candidate_name Claire McCaskill 0.011716 0.011317 0.070024 0.009500 Todd Akin 0.988284 0.988683 0.929976 -0.016794 MS: commentCount dislikeCount favoriteCount likeCount viewCount like_dislike_r views_share \ candidate_name Albert Gore 370 0 0 260 75470 1.000000 0.70684 Roger Wicker 120 93 0 203 31301 0.685811 0.29316 msgs_share likes_share dislikes_share sentiment candidate_name Albert Gore 0.755102 0.561555 0 -0.050000 Roger Wicker 0.244898 0.438445 1 0.029821 MT: commentCount dislikeCount favoriteCount likeCount viewCount like_dislike_r views_share \ candidate_name Denny Rehberg 4500 351 0 10651 390033 0.968097 0.807661 Jon Tester 480 72 0 1172 92884 0.942122 0.192339 msgs_share likes_share dislikes_share sentiment candidate_name Denny Rehberg 0.903614 0.900871 0.829787 0.003883 Jon Tester 0.096386 0.099129 0.170213 0.023900 ND: commentCount dislikeCount favoriteCount likeCount viewCount like_dislike_r views_share \ candidate_name Heidi Heitkamp 315 289 0 726 773197 0.715271 0.574998 Rick Berg 331 200 0 573 571499 0.741268 0.425002 msgs_share likes_share dislikes_share sentiment candidate_name Heidi Heitkamp 0.487616 0.558891 0.591002 0.007323 Rick Berg 0.512384 0.441109 0.408998 0.033317 NE: commentCount dislikeCount favoriteCount likeCount viewCount like_dislike_r views_share \ candidate_name Bob Kerrey 328 179 0 1526 526859 0.895015 0.616558 Deb Fischer 441 272 0 2076 327657 0.884157 0.383442 msgs_share likes_share dislikes_share sentiment candidate_name Bob Kerrey 0.426528 0.423654 0.396896 0.015867 Deb Fischer 0.573472 0.576346 0.603104 0.011278 NJ: commentCount dislikeCount favoriteCount likeCount viewCount like_dislike_r views_share \ candidate_name Bob Menendez 601 232 0 819 132137 0.779258 0.777057 Joe Kyrillos 87 189 0 198 37911 0.511628 0.222943 msgs_share likes_share dislikes_share sentiment candidate_name Bob Menendez 0.873547 0.80531 0.551069 -0.017692 Joe Kyrillos 0.126453 0.19469 0.448931 0.083939 NM: commentCount dislikeCount favoriteCount likeCount viewCount like_dislike_r views_share \ candidate_name Heather Wilson 170 330 0 460 112560 0.582278 0.116505 Martin Heinrich 380 790 0 2580 853580 0.765579 0.883495 msgs_share likes_share dislikes_share sentiment candidate_name Heather Wilson 0.309091 0.151316 0.294643 0.064444 Martin Heinrich 0.690909 0.848684 0.705357 0.058437 NV: commentCount dislikeCount favoriteCount likeCount viewCount like_dislike_r views_share \ candidate_name Dean Heller 286 748 0 1077 1014221 0.590137 0.429432 Shelley Berkley 434 402 0 884 1347552 0.687403 0.570568 msgs_share likes_share dislikes_share sentiment candidate_name Dean Heller 0.397222 0.54921 0.650435 -0.003463 Shelley Berkley 0.602778 0.45079 0.349565 0.005413 NY: commentCount dislikeCount favoriteCount likeCount viewCount like_dislike_r views_share \ candidate_name Kirsten Gillibrand 684 167 0 1452 399903 0.896850 0.638157 Wendy Long 494 83 0 740 226750 0.899149 0.361843 msgs_share likes_share dislikes_share sentiment candidate_name Kirsten Gillibrand 0.580645 0.662409 0.668 0.030444 Wendy Long 0.419355 0.337591 0.332 0.074461 OH: commentCount dislikeCount favoriteCount likeCount viewCount like_dislike_r views_share \ candidate_name Josh Mandel 484 311 0 423 163496 0.576294 0.317986 Sherrod Brown 292 70 0 331 350665 0.825436 0.682014 msgs_share likes_share dislikes_share sentiment candidate_name Josh Mandel 0.623711 0.561008 0.816273 0.002392 Sherrod Brown 0.376289 0.438992 0.183727 0.034265 PA: commentCount dislikeCount favoriteCount likeCount viewCount like_dislike_r views_share \ candidate_name Bob Casey 117 139 0 316 57281 0.694505 0.203635 Tom Smith 793 209 0 1648 224012 0.887453 0.796365 msgs_share likes_share dislikes_share sentiment candidate_name Bob Casey 0.128571 0.160896 0.399425 0.014105 Tom Smith 0.871429 0.839104 0.600575 0.048067 RI: commentCount dislikeCount favoriteCount likeCount viewCount like_dislike_r views_share \ candidate_name Barry Hinckley 12 12 0 138 21111 0.920000 0.404712 Sheldon Whitehouse 10 10 0 134 31052 0.930556 0.595288 msgs_share likes_share dislikes_share sentiment candidate_name Barry Hinckley 0.545455 0.507353 0.545455 0.138462 Sheldon Whitehouse 0.454545 0.492647 0.454545 0.033625 TN: commentCount dislikeCount favoriteCount likeCount viewCount like_dislike_r views_share \ candidate_name Bob Corker 155 20 0 162 17499 0.890110 0.258238 Mark Clayton 230 50 0 230 50264 0.821429 0.741762 msgs_share likes_share dislikes_share sentiment candidate_name Bob Corker 0.402597 0.413265 0.285714 0.144643 Mark Clayton 0.597403 0.586735 0.714286 0.100538 TX: commentCount dislikeCount favoriteCount likeCount viewCount like_dislike_r views_share \ candidate_name Paul Sadler 840 257 0 1937 250148 0.882862 0.462666 Ted Cruz 1676 255 0 2410 290519 0.904315 0.537334 msgs_share likes_share dislikes_share sentiment candidate_name Paul Sadler 0.333863 0.445595 0.501953 0.015821 Ted Cruz 0.666137 0.554405 0.498047 0.009659 UT: commentCount dislikeCount favoriteCount likeCount viewCount like_dislike_r views_share \ candidate_name Orrin Hatch 495 336 0 284 38701 0.458065 0.511722 Scott Howell 19 13 0 175 36928 0.930851 0.488278 msgs_share likes_share dislikes_share sentiment candidate_name Orrin Hatch 0.963035 0.618736 0.962751 0.084909 Scott Howell 0.036965 0.381264 0.037249 0.050492 VA: commentCount dislikeCount favoriteCount likeCount viewCount like_dislike_r views_share \ candidate_name George Allen 324 443 0 500 206940 0.530223 0.718385 Timothy Kaine 174 90 0 272 81123 0.751381 0.281615 msgs_share likes_share dislikes_share sentiment candidate_name George Allen 0.650602 0.647668 0.831144 0.023067 Timothy Kaine 0.349398 0.352332 0.168856 0.022738 VT: commentCount dislikeCount favoriteCount likeCount viewCount like_dislike_r views_share \ candidate_name Bernie Sanders 2131 129 0 4878 223002 0.974236 0.992571 John MacGovern 6 0 0 2 1669 1.000000 0.007429 msgs_share likes_share dislikes_share sentiment candidate_name Bernie Sanders 0.997192 0.99959 1 0.011692 John MacGovern 0.002808 0.00041 0 0.000000 WA: commentCount dislikeCount favoriteCount likeCount viewCount like_dislike_r views_share \ candidate_name Maria Cantwell 234 52 0 244 62934 0.824324 0.736725 Michael Baumgartner 112 70 0 137 22490 0.661836 0.263275 msgs_share likes_share dislikes_share sentiment candidate_name Maria Cantwell 0.676301 0.64042 0.42623 0.107853 Michael Baumgartner 0.323699 0.35958 0.57377 0.031234 WI: commentCount dislikeCount favoriteCount likeCount viewCount like_dislike_r views_share \ candidate_name Tammy Baldwin 393 169 0 944 488400 0.848158 0.602401 Tommy Thompson 1494 550 0 2332 322355 0.809160 0.397599 msgs_share likes_share dislikes_share sentiment candidate_name Tammy Baldwin 0.208267 0.288156 0.235049 0.022898 Tommy Thompson 0.791733 0.711844 0.764951 0.010391 WV: commentCount dislikeCount favoriteCount likeCount viewCount like_dislike_r views_share \ candidate_name Joe Manchin 675 545 0 3005 604285 0.846479 0.964792 John Raese 74 68 0 118 22052 0.634409 0.035208 msgs_share likes_share dislikes_share sentiment candidate_name Joe Manchin 0.901202 0.962216 0.88907 0.006897 John Raese 0.098798 0.037784 0.11093 -0.057143 WY: commentCount dislikeCount favoriteCount likeCount viewCount like_dislike_r views_share \ candidate_name John Barrasso 630 290 0 380 74030 0.567164 0.996943 Tim Chesnut 4 2 0 2 227 0.500000 0.003057 msgs_share likes_share dislikes_share sentiment candidate_name John Barrasso 0.993691 0.994764 0.993151 0.09 Tim Chesnut 0.006309 0.005236 0.006849 0.00
cands_with_stats = senate_2012[~senate_2012["viewCount"].isnull()]
cands_with_stats["VotesShare"] = cands_with_stats[["Vote Count", "State Postal"]].apply(lambda x:x[0]/senate_2012[senate_2012["State Postal"]==x[1]]["Vote Count"].sum(), axis=1)
x_col = "views_share"
y_col = "VotesShare"
plt.figure(figsize=(15,10))
color_dict = {"Dem": "b", "GOP": "r", "Ind":"g", "NPA": "orange"}
shape_dict = {"X": "*", "nan": "."}
wl_dp = [len(cands_with_stats[(cands_with_stats[x_col]>=0.5) &
(cands_with_stats["Winner"]=="X")]),
len(cands_with_stats[(cands_with_stats[x_col]>=0.5)])]
wl_dm = [len(cands_with_stats[(cands_with_stats[x_col]<0.5) &
(cands_with_stats["Winner"]=="X")]),
len(cands_with_stats[(cands_with_stats[x_col]<0.5)])]
wl_50p = "Winning Ratio %s/%s ($%0.1f \%%$)" % (wl_dp[0], wl_dp[1], wl_dp[0]/wl_dp[1]*100)
wl_50m = "Winning Ratio %s/%s ($%0.1f \%%$)" % (wl_dm[0], wl_dm[1], wl_dm[0]/wl_dm[1]*100)
for cand in cands_with_stats.iterrows():
stats = cand[1]
x = stats[x_col]
y = stats[y_col]
c = color_dict[stats["Party"]]
m = shape_dict[str(stats["Winner"])]
plt.scatter(x, y, c=c, marker=m, s=500, alpha=0.5)
if stats[x_col] > 0.9:
plt.annotate(stats["Full Name"],xytext=(8,20), xy=(x,y),
textcoords='offset points', arrowprops=dict(arrowstyle='-|>'))
plt.xlabel("Youtube " + x_col + " Between Competing Candidates in a State Race")
plt.ylabel("Actual " + y_col)
plt.vlines(.5, ymin=0, ymax=1)
plt.annotate(s=wl_50p, xy=(0.7, 1))
plt.annotate(s=wl_50m, xy=(0.2, 1))
plt.title("Youtube Video Views for Candidate from 2012-08-04 to 2012-11-04 and Actual Votes")
plt.annotate("Start Represent Winning Candidates\nCircles Represent Loosing Candidate", xy=(0.03, 0.85))
plt.annotate("Red: GOP\nBlue: Dem\nGreen: Ind\nYellow: NPA", xy=(0.03, 0.7))
axis("tight")
plt.box(on="off")
plt.show()
cands_with_stats[cands_with_stats["State Postal"]=="MO"]
State Postal | County Name | Party | First Name | Last Name | Incumbent | Vote Count | Winner | Full Name | commentCount | dislikeCount | favoriteCount | likeCount | viewCount | like_dislike_r | views_share | msgs_share | likes_share | dislikes_share | sentiment | VotesShare | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
13 | MO | Missouri | Dem | Claire | McCaskill | 1 | 1484683 | X | Claire McCaskill | 525 | 438 | 0 | 765 | 141427 | 0.635910 | 0.036271 | 0.011716 | 0.011317 | 0.070024 | 0.009500 | 0.547173 |
46 | MO | Missouri | GOP | Todd | Akin | 0 | 1063698 | NaN | Todd Akin | 44285 | 5817 | 0 | 66830 | 3757741 | 0.919928 | 0.963729 | 0.988284 | 0.988683 | 0.929976 | -0.016794 | 0.392021 |