I like watching the Phillies. I do not have cable. Some Phillies games are broadcast on national television. This is how I made a list of those games.
Pandas is a data analysis tool for the Python programming language. It can do a tremendous amount of really powerful data analysis and visualization. It's a gun in this CSV knife fight.
import pandas as pd
A downloadable CSV schedule is available from mlb.com. Here is a direct link to the Phillies schedule.
The CSV schedule will be used to instantiate a Pandas DataFrame object.
schedule = pd.DataFrame.from_csv("phillies-2016.csv")
schedule.info()
<class 'pandas.core.frame.DataFrame'> DatetimeIndex: 190 entries, 2016-03-07 to 2016-10-02 Data columns (total 16 columns): START TIME 189 non-null object START TIME ET 189 non-null object SUBJECT 190 non-null object LOCATION 190 non-null object DESCRIPTION 187 non-null object END DATE 190 non-null object END DATE ET 190 non-null object END TIME 189 non-null object END TIME ET 189 non-null object REMINDER OFF 190 non-null bool REMINDER ON 190 non-null bool REMINDER DATE 190 non-null object REMINDER TIME 189 non-null object REMINDER TIME ET 189 non-null object SHOWTIMEAS FREE 190 non-null object SHOWTIMEAS BUSY 190 non-null object dtypes: bool(2), object(14) memory usage: 22.6+ KB
190 games and 16 columns of data for each game.
schedule.head()
START TIME | START TIME ET | SUBJECT | LOCATION | DESCRIPTION | END DATE | END DATE ET | END TIME | END TIME ET | REMINDER OFF | REMINDER ON | REMINDER DATE | REMINDER TIME | REMINDER TIME ET | SHOWTIMEAS FREE | SHOWTIMEAS BUSY | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
START DATE | ||||||||||||||||
2016-03-07 | 01:05 PM | 01:05 PM | Phillies at Pirates | McKechnie Field - Bradenton | Local TV: MLB.TV ----- Local Radio: MLB.com | 03/07/16 | 03/07/16 | 04:05 PM | 04:05 PM | False | True | 03/07/16 | 12:05 PM | 12:05 PM | FREE | BUSY |
2016-03-08 | 01:05 PM | 01:05 PM | Pirates at Phillies | Bright House Field - Clearwater | Local TV: TCN- MLB.TV | 03/08/16 | 03/08/16 | 04:05 PM | 04:05 PM | False | True | 03/08/16 | 12:05 PM | 12:05 PM | FREE | BUSY |
2016-03-09 | 01:05 PM | 01:05 PM | Phillies at Twins | CenturyLink Sports Complex - Fort Myers | NaN | 03/09/16 | 03/09/16 | 04:05 PM | 04:05 PM | False | True | 03/09/16 | 12:05 PM | 12:05 PM | FREE | BUSY |
2016-03-09 | 01:05 PM | 01:05 PM | Orioles at Phillies | Bright House Field - Clearwater | Local TV: TCN- MLB.TV | 03/09/16 | 03/09/16 | 04:05 PM | 04:05 PM | False | True | 03/09/16 | 12:05 PM | 12:05 PM | FREE | BUSY |
2016-03-10 | 01:05 PM | 01:05 PM | Tigers at Phillies | Bright House Field - Clearwater | Local TV: TCN- MLBN- MLB.TV | 03/10/16 | 03/10/16 | 04:05 PM | 04:05 PM | False | True | 03/10/16 | 12:05 PM | 12:05 PM | FREE | BUSY |
The DESCRIPTION
column contains the broadcast information. Less interesting columns can be removed.
schedule.drop(["REMINDER OFF",
"REMINDER ON",
"START TIME ET",
"END DATE",
"END DATE ET",
"END TIME",
"END TIME ET",
"REMINDER TIME",
"REMINDER TIME ET",
"SHOWTIMEAS FREE",
"SHOWTIMEAS BUSY",
"REMINDER DATE"], axis=1, inplace=True)
schedule.head()
START TIME | SUBJECT | LOCATION | DESCRIPTION | |
---|---|---|---|---|
START DATE | ||||
2016-03-07 | 01:05 PM | Phillies at Pirates | McKechnie Field - Bradenton | Local TV: MLB.TV ----- Local Radio: MLB.com |
2016-03-08 | 01:05 PM | Pirates at Phillies | Bright House Field - Clearwater | Local TV: TCN- MLB.TV |
2016-03-09 | 01:05 PM | Phillies at Twins | CenturyLink Sports Complex - Fort Myers | NaN |
2016-03-09 | 01:05 PM | Orioles at Phillies | Bright House Field - Clearwater | Local TV: TCN- MLB.TV |
2016-03-10 | 01:05 PM | Tigers at Phillies | Bright House Field - Clearwater | Local TV: TCN- MLBN- MLB.TV |
The DESCRIPTION
column is nice because it mentions the stations that games are broadcast on. Sometimes a game is broadcast on two channels at once. There is also radio broadcast information that I'm not interested in right now.
schedule.DESCRIPTION.head(50)
START DATE 2016-03-07 Local TV: MLB.TV ----- Local Radio: MLB.com 2016-03-08 Local TV: TCN- MLB.TV 2016-03-09 NaN 2016-03-09 Local TV: TCN- MLB.TV 2016-03-10 Local TV: TCN- MLBN- MLB.TV 2016-03-11 Local Radio: MLB.com 2016-03-12 Local TV: CSN- MLB.TV ----- Local Radio: 94 WIP 2016-03-13 Local TV: MLB.TV ----- Local Radio: 94 WIP 2016-03-14 Local Radio: MLB.com 2016-03-15 Local Radio: MLB.com 2016-03-17 Local TV: TCN- MLB.TV 2016-03-18 Local TV: TCN- MLB.TV 2016-03-19 Local TV: MLB.TV ----- Local Radio: 94 WIP 2016-03-20 Local TV: CSN- MLB.TV ----- Local Radio: 94 WIP 2016-03-21 Local Radio: MLB.com 2016-03-22 Local TV: TCN- MLB.TV 2016-03-23 Local Radio: 94 WIP 2016-03-24 Local TV: MLB.TV ----- Local Radio: 94 WIP 2016-03-25 Local TV: CSN- MLB.TV ----- Local Radio: 94 WIP 2016-03-26 Local TV: CSN- MLB.TV ----- Local Radio: 94 WIP 2016-03-27 Local TV: MLB.TV ----- Local Radio: 94 WIP 2016-03-28 NaN 2016-03-29 Local TV: TCN- MLBN- MLB.TV 2016-03-30 Local TV: TCN- MLB.TV 2016-03-31 Local TV: TCN- MLB.TV ----- Local Radio: 94 WIP 2016-04-01 Local TV: TCN- MLB.TV ----- Local Radio: 94 WIP 2016-04-02 Local TV: TCN- MLB.TV ----- Local Radio: 94 WIP 2016-04-04 Local TV: CSN 2016-04-06 Local TV: CSN- ESPN2 2016-04-07 Local TV: CSN 2016-04-08 Local TV: CSN 2016-04-09 Local TV: CSN 2016-04-10 Local TV: TCN 2016-04-11 Local TV: NBC 10 2016-04-12 Local TV: TCN 2016-04-13 Local TV: TCN 2016-04-14 Local TV: CSN 2016-04-15 Local TV: CSN 2016-04-16 Local TV: CSN 2016-04-17 Local TV: CSN 2016-04-18 Local TV: CSN 2016-04-19 Local TV: CSN 2016-04-20 Local TV: CSN 2016-04-22 Local TV: TCN 2016-04-23 Local TV: CSN 2016-04-24 Local TV: CSN 2016-04-26 Local TV: CSN 2016-04-27 Local TV: CSN 2016-04-28 Local TV: CSN 2016-04-29 Local TV: CSN Name: DESCRIPTION, dtype: object
DESCRIPTION
¶Thankfully, the DESCRIPTION
column data is parseable. Getting a list of television broadcast stations for each game is not too difficult.
description = schedule.DESCRIPTION[6]
print description
Local TV: CSN- MLB.TV ----- Local Radio: 94 WIP
Grab the rough station string with a regular expression.
import re
TV_STATION_RE = re.compile(r"""Local\s+TV:\s+ # TV token
(?P<stations>.*) # Group everything following it lazily as stations
""", re.X)
Use that to pull them out and do some text wrangling.
def tv_stations_from_description(description):
"""Return a list of television stations embedded in the given description."""
tv_stations = []
result = re.search(TV_STATION_RE, str(description))
if result:
media_delimiter = "-----"
tv_station_str = result.group("stations").split(media_delimiter)[0]
tv_stations = tv_station_str.split("- ")
tv_stations = [s.strip() for s in tv_stations]
return tv_stations
Test it out on all of the descriptions.
tv_stations = set()
for d in schedule.DESCRIPTION:
tv_stations |= set(tv_stations_from_description(d))
tv_stations
{'CSN', 'ESPN2', 'MLB.TV', 'MLBN', 'NBC 10', 'TCN'}
Applying this function to the DataFrame yields a Series
of all television stations on which the Phillies are broadcast this season.
stations_series = schedule.DESCRIPTION.apply(lambda d: tv_stations_from_description(d))
stations_series
START DATE 2016-03-07 [MLB.TV] 2016-03-08 [TCN, MLB.TV] 2016-03-09 [] 2016-03-09 [TCN, MLB.TV] 2016-03-10 [TCN, MLBN, MLB.TV] 2016-03-11 [] 2016-03-12 [CSN, MLB.TV] 2016-03-13 [MLB.TV] 2016-03-14 [] 2016-03-15 [] 2016-03-17 [TCN, MLB.TV] 2016-03-18 [TCN, MLB.TV] 2016-03-19 [MLB.TV] 2016-03-20 [CSN, MLB.TV] 2016-03-21 [] 2016-03-22 [TCN, MLB.TV] 2016-03-23 [] 2016-03-24 [MLB.TV] 2016-03-25 [CSN, MLB.TV] 2016-03-26 [CSN, MLB.TV] 2016-03-27 [MLB.TV] 2016-03-28 [] 2016-03-29 [TCN, MLBN, MLB.TV] 2016-03-30 [TCN, MLB.TV] 2016-03-31 [TCN, MLB.TV] 2016-04-01 [TCN, MLB.TV] 2016-04-02 [TCN, MLB.TV] 2016-04-04 [CSN] 2016-04-06 [CSN, ESPN2] 2016-04-07 [CSN] ... 2016-08-31 [CSN] 2016-09-02 [CSN] 2016-09-03 [CSN] 2016-09-04 [CSN] 2016-09-05 [CSN] 2016-09-06 [CSN] 2016-09-07 [CSN] 2016-09-08 [CSN] 2016-09-09 [CSN] 2016-09-10 [CSN] 2016-09-11 [CSN] 2016-09-12 [CSN] 2016-09-13 [CSN] 2016-09-14 [CSN] 2016-09-15 [CSN] 2016-09-16 [CSN] 2016-09-17 [CSN] 2016-09-18 [CSN] 2016-09-20 [CSN] 2016-09-21 [CSN] 2016-09-22 [CSN] 2016-09-23 [CSN] 2016-09-24 [CSN] 2016-09-25 [CSN] 2016-09-27 [CSN] 2016-09-28 [CSN] 2016-09-29 [CSN] 2016-09-30 [CSN] 2016-10-01 [CSN] 2016-10-02 [CSN] Name: DESCRIPTION, dtype: object
Double check the set
of stations from that Series
.
set([station for stations in stations_series.values for station in stations])
{'CSN', 'ESPN2', 'MLB.TV', 'MLBN', 'NBC 10', 'TCN'}
The 190 Phillies games are broadcast on 6 television channels. Unfortunately only 1 of those 6 stations are available without a cable television subscription. This means that I can only watch games on NBC.
Filtering the DESCRIPTION
column to national television broadcast stations yields only the games which I can watch over the air with my HD antenna.
national_broadcast_schedule = schedule[schedule.DESCRIPTION.str.contains("NBC 10") == True]
national_broadcast_schedule
START TIME | SUBJECT | LOCATION | DESCRIPTION | |
---|---|---|---|---|
START DATE | ||||
2016-04-11 | 03:05 PM | Padres at Phillies | Citizens Bank Park - Philadelphia | Local TV: NBC 10 |
2016-06-03 | 07:05 PM | Brewers at Phillies | Citizens Bank Park - Philadelphia | Local TV: NBC 10 |
2016-06-10 | 07:05 PM | Phillies at Nationals | Nationals Park - Washington | Local TV: NBC 10 |
2016-06-17 | 07:05 PM | D-backs at Phillies | Citizens Bank Park - Philadelphia | Local TV: NBC 10 |
2016-06-23 | 01:10 PM | Phillies at Twins | Target Field - Minneapolis | Local TV: NBC 10 |
2016-07-15 | 07:05 PM | Mets at Phillies | Citizens Bank Park - Philadelphia | Local TV: NBC 10 |
2016-07-16 | 07:05 PM | Mets at Phillies | Citizens Bank Park - Philadelphia | Local TV: NBC 10 |
2016-07-22 | 07:05 PM | Phillies at Pirates | PNC Park - Pittsburgh | Local TV: NBC 10 |
2016-07-30 | 07:10 PM | Phillies at Braves | Turner Field - Atlanta | Local TV: NBC 10 |
2016-08-04 | 01:05 PM | Giants at Phillies | Citizens Bank Park - Philadelphia | Local TV: NBC 10 |
national_broadcast_schedule.describe()
START TIME | SUBJECT | LOCATION | DESCRIPTION | |
---|---|---|---|---|
count | 10 | 10 | 10 | 10 |
unique | 5 | 9 | 5 | 1 |
top | 07:05 PM | Mets at Phillies | Citizens Bank Park - Philadelphia | Local TV: NBC 10 |
freq | 6 | 2 | 6 | 10 |
This means that I have the possibility to watch 10 out of 190 Phillies games this season which is roughly 5%.