It is possible for pandas to directly read the data from xls .
import pandas as pd
import numpy as np
We are going to load the Company profile data
xl = pd.io.excel.read_excel('crunchbase_monthly_export_d43b44299ade53.xlsx', sheetname='Companies', index_col='permalink')
xl.info()
<class 'pandas.core.frame.DataFrame'> Index: 48636 entries, /organization/waywire to /organization/x Data columns (total 17 columns): name 48635 non-null object homepage_url 45286 non-null object category_list 44040 non-null object market 44034 non-null object funding_total_usd 48636 non-null int64 status 47193 non-null object country_code 43480 non-null object state_code 29679 non-null object region 43480 non-null object city 42628 non-null object funding_rounds 48636 non-null int64 founded_at 37976 non-null object founded_month 37904 non-null object founded_quarter 37904 non-null object founded_year 37904 non-null float64 first_funding_at 48636 non-null object last_funding_at 48636 non-null object dtypes: float64(1), int64(2), object(14) memory usage: 6.7+ MB
xl=xl[xl.founded_year >= 2000]
xl.info()
<class 'pandas.core.frame.DataFrame'> Index: 34275 entries, /organization/waywire to /organization/zzzzapp-com Data columns (total 17 columns): name 34274 non-null object homepage_url 32505 non-null object category_list 32168 non-null object market 32164 non-null object funding_total_usd 34275 non-null int64 status 33414 non-null object country_code 31431 non-null object state_code 21621 non-null object region 31431 non-null object city 31030 non-null object funding_rounds 34275 non-null int64 founded_at 34275 non-null object founded_month 34275 non-null object founded_quarter 34275 non-null object founded_year 34275 non-null float64 first_funding_at 34275 non-null object last_funding_at 34275 non-null object dtypes: float64(1), int64(2), object(14) memory usage: 4.7+ MB
len(xl.market.value_counts())
718
That is way too many. So lets look at the top 20.
xl.market.value_counts().head(20)
Software 3165 Biotechnology 2041 Mobile 1485 E-Commerce 1338 Curated Web 1273 Enterprise Software 932 Games 859 Advertising 837 Health Care 747 Social Media 711 Hardware + Software 701 Clean Technology 672 Education 608 Finance 601 Health and Wellness 544 Analytics 493 Manufacturing 405 Security 355 Hospitality 325 Real Estate 307 dtype: int64
Software seems to be a catch all phrase, when they couldn't place it in a specific category or when there are just way too many products produced by the company. The surprise is 'Biotechnology'. Who knew they were so many?
Ok, now let's create a graph!
from matplotlib import pyplot as plt
%matplotlib inline
from mpltools import style
style.use('ggplot')
xl.market.value_counts().head(30).plot(kind='bar', figsize=(15,8))
<matplotlib.axes._subplots.AxesSubplot at 0x7feb3bd28ed0>
xl.country_code.value_counts().head(15)
USA 20716 GBR 1797 CAN 954 DEU 681 FRA 607 IND 601 CHN 563 ISR 538 ESP 413 RUS 251 SGP 239 IRL 233 AUS 230 NLD 223 BRA 220 dtype: int64
xl.country_code.value_counts().head(15).plot(kind='bar', figsize=(15,8))
<matplotlib.axes._subplots.AxesSubplot at 0x7feb3bd28150>
xl.founded_year.value_counts(sort=False).tail(25).plot(kind='bar', figsize=(15,8))
<matplotlib.axes._subplots.AxesSubplot at 0x7feb3b9fac50>
xl.status.value_counts()
operating 29161 acquired 2327 closed 1926 dtype: int64
xl.status.value_counts()* 100/len(xl)
operating 85.079504 acquired 6.789205 closed 5.619256 dtype: float64
year_gp = xl.groupby('founded_year')
type(year_gp)
pandas.core.groupby.DataFrameGroupBy
for year, group in year_gp:
print year
print group.info()
break
2000.0 <class 'pandas.core.frame.DataFrame'> Index: 883 entries, /organization/1010data to /organization/zoomingo Data columns (total 17 columns): name 883 non-null object homepage_url 799 non-null object category_list 840 non-null object market 840 non-null object funding_total_usd 883 non-null int64 status 872 non-null object country_code 847 non-null object state_code 613 non-null object region 847 non-null object city 836 non-null object funding_rounds 883 non-null int64 founded_at 883 non-null object founded_month 883 non-null object founded_quarter 883 non-null object founded_year 883 non-null float64 first_funding_at 883 non-null object last_funding_at 883 non-null object dtypes: float64(1), int64(2), object(14) memory usage: 124.2+ KB None
year_gp = xl[xl.founded_year>=2000].groupby('founded_year')
for year, group in year_gp:
print year, len(group)
2000.0 883 2001.0 738 2002.0 788 2003.0 963 2004.0 1148 2005.0 1412 2006.0 1807 2007.0 2311 2008.0 2328 2009.0 2956 2010.0 3744 2011.0 4855 2012.0 5115 2013.0 3935 2014.0 1291 2015.0 1
We see that we get a single record for the year 2015. A startup getting funding before it is founded? Lets see what it is.
xl[xl.founded_year==2015]
name | homepage_url | category_list | market | funding_total_usd | status | country_code | state_code | region | city | funding_rounds | founded_at | founded_month | founded_quarter | founded_year | first_funding_at | last_funding_at | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
permalink | |||||||||||||||||
/organization/merchme | MerchMe | NaN | |Marketplaces|Technology|Web Development|E-Com... | Web Development | 0 | operating | USA | IL | Chicago | Chicago | 1 | 2015-01-15 00:00:00 | 2015-01 | 2015-Q1 | 2015 | 2014-09-17 00:00:00 | 2014-09-17 00:00:00 |
The company has had funding this year but the founded_year is set wrong. May be it is just bad data. Let us try to ignore this. So we need to add another criteria saying founded_year
should be less than 2015. but the above simple filtering syntax will not be enough. We need to create a mask.
mask = (xl.founded_year>=2000) & (xl.founded_year<2015)
year_gp = xl[(xl.founded_year>=2000) & (xl.founded_year<2015)].groupby('founded_year')
xl.status.value_counts()
operating 29161 acquired 2327 closed 1926 dtype: int64
type(xl.status.value_counts())
pandas.core.series.Series
xl.status.value_counts().operating2
--------------------------------------------------------------------------- AttributeError Traceback (most recent call last) <ipython-input-29-755ea729b897> in <module>() ----> 1 xl.status.value_counts().operating2 /home/sharmi/installs/virtualenv/crunchbase/local/lib/python2.7/site-packages/pandas/core/generic.pyc in __getattr__(self, name) 1934 return self[name] 1935 raise AttributeError("'%s' object has no attribute '%s'" % -> 1936 (type(self).__name__, name)) 1937 1938 def __setattr__(self, name, value): AttributeError: 'Series' object has no attribute 'operating2'
for year, group in year_gp:
print year, len(group)
print group.status.value_counts() *100/len(group)
2005.0 1412 operating 76.416431 acquired 14.235127 closed 7.365439 dtype: float64 2006.0 1807 operating 75.539568 acquired 13.060321 closed 9.241837 dtype: float64 2007.0 2311 operating 76.893120 acquired 10.731285 closed 10.514929 dtype: float64 2008.0 2328 operating 79.982818 closed 9.579038 acquired 8.290378 dtype: float64 2009.0 2956 operating 81.292287 closed 8.694181 acquired 7.408660 dtype: float64 2010.0 3744 operating 84.375000 closed 7.959402 acquired 5.368590 dtype: float64 2011.0 4855 operating 87.785788 closed 5.293512 acquired 3.872297 dtype: float64 2012.0 5115 operating 92.101662 closed 2.287390 acquired 1.955034 dtype: float64 2013.0 3935 operating 96.213469 closed 0.686150 acquired 0.584498 dtype: float64 2014.0 1291 operating 99.070488 closed 0.077459 dtype: float64 2015.0 1 operating 100 dtype: float64
index, closed, operating, acquired, unknown = [],[],[],[],[]
for year, group in year_gp:
index.append(year)
status = group.status.value_counts() *100/len(group)
operating.append(status.operating)
try:
acquired.append(status.acquired)
except AttributeError:
acquired.append(0)
try:
closed.append(status.closed )
except AttributeError:
closed.append(0)
unknown.append(100 - operating[-1] -acquired[-1] -closed[-1])
closed, operating, acquired, unknown = np.array(closed), np.array(operating), np.array(acquired), np.array(unknown)
width = 0.35
plt.figure(figsize=(15,5))
p1 = plt.bar(index, operating, width, color='g')
p2 = plt.bar(index, acquired, width, bottom=operating, color='b')
p3 = plt.bar(index, closed, width, bottom=acquired+operating, color='r')
p4 = plt.bar(index, unknown, width, bottom=closed+acquired+operating, color='y')
plt.ylabel('Success %')
plt.title('Percentage of Startup Success by Year')
#plt.xticks([int(i) for i in index])
#plt.yticks(xrange(0,101,10))
plt.legend( (p1[0], p2[0], p3[0], p4[0]), ('Operating', 'Acquired','Closed', 'Unknown'), bbox_to_anchor=(1.1, 1.05) )
<matplotlib.legend.Legend at 0x7f2ae0b39ad0>
We can see that in 2000, almost 20% of the companies are acquired. We can also see that between 5-10% of the companies are closed, peaking around 2006-2008.
We are fortunate that the data has both founding date and first funding date. But there are a few unspecified values. So let us select only the records that have the founding date and funding date.
xl.info()
<class 'pandas.core.frame.DataFrame'> Index: 34275 entries, /organization/waywire to /organization/zzzzapp-com Data columns (total 17 columns): name 34274 non-null object homepage_url 32505 non-null object category_list 32168 non-null object market 32164 non-null object funding_total_usd 34275 non-null int64 status 33414 non-null object country_code 31431 non-null object state_code 21621 non-null object region 31431 non-null object city 31030 non-null object funding_rounds 34275 non-null int64 founded_at 34275 non-null object founded_month 34275 non-null object founded_quarter 34275 non-null object founded_year 34275 non-null float64 first_funding_at 34275 non-null object last_funding_at 34275 non-null object dtypes: float64(1), int64(2), object(14) memory usage: 4.7+ MB
import datetime
first_funding_at = pd.to_datetime(xl.first_funding_at, dayfirst=True, errors='ignore')
print len(first_funding_at)
first_funding_at = first_funding_at[[isinstance(value, datetime.datetime) for index, value in first_funding_at.iteritems() ]]
print len(first_funding_at)
34275 34272
founded_at = pd.to_datetime(xl.founded_at, dayfirst=True)
founded_at = founded_at[[isinstance(value, datetime.datetime) for index, value in founded_at.iteritems()]]
daysdiff = first_funding_at - founded_at
daysdiff = daysdiff[pd.notnull(daysdiff)]
dayslist=[value.days for index, value in daysdiff.iteritems()]
plt.figure(figsize=(15,8))
graph = plt.hist(dayslist, bins=range(0, 6000, 365))
plt.figure(figsize=(15,8))
graph = plt.hist(dayslist, bins=[0, 30, 60, 90, 180, 365, 365+180, 2*365, 2*365 + 180, 3*365])
daysdiff[daysdiff<30]
permalink /organization/-qounter -187 days /organization/1 -12 days /organization/10-minutes-with 0 days /organization/1000memories -181 days /organization/10bestthings 0 days /organization/140-proof -194 days /organization/1calendar -290 days /organization/1daylater -117 days /organization/1daymakeover 0 days /organization/1eq -250 days /organization/1st-choice-lawn-care -10 days /organization/27-perry -69 days /organization/2code-online -274 days /organization/2crisk -91 days /organization/2vancouver -238 days ... /organization/zoosk -245 days /organization/zopa -59 days /organization/zopim -122 days /organization/zova -31 days /organization/zqgame -1186 days /organization/zsoup 0 days /organization/zubie 0 days /organization/zuki 0 days /organization/zulily -15 days /organization/zupcat 0 days /organization/zuzuche 0 days /organization/zyken-nightcove 0 days /organization/zykis 0 days /organization/zynga -181 days /organization/zzzzapp-com -194 days Length: 4965, dtype: timedelta64[ns]
daysdiff[(daysdiff<30)&(daysdiff>0)]
Series([], dtype: timedelta64[ns])
More than 5000 companies have acquired funds before starting. Zynga is one of them.
Now let us read in the investments data.
rounds = pd.io.excel.read_excel('crunchbase_monthly_export_d43b44299ade53.xlsx', sheetname='Rounds', index_col='funding_round_permalink')
rounds.info()
<class 'pandas.core.frame.DataFrame'> Index: 82308 entries, /funding-round/cc409188fa2b63482bd9008f682c2efa to /funding-round/5c5ebcc9d85c2a5f810e319c249a36e7 Data columns (total 15 columns): company_permalink 82308 non-null object company_name 82305 non-null object company_category_list 76985 non-null object company_market 76979 non-null object company_country_code 75889 non-null object company_state_code 55483 non-null object company_region 75889 non-null object company_city 74798 non-null object funding_round_type 82308 non-null object funding_round_code 22496 non-null object funded_at 82308 non-null object funded_month 82300 non-null object funded_quarter 82300 non-null object funded_year 82300 non-null float64 raised_amount_usd 69883 non-null float64 dtypes: float64(2), object(13) memory usage: 10.0+ MB
fund_type = rounds.funding_round_type.value_counts()
fund_type
venture 41191 seed 20613 debt_financing 5593 angel 4381 undisclosed 3727 equity_crowdfunding 2115 private_equity 1801 grant 1454 convertible_note 691 post_ipo_equity 381 product_crowdfunding 229 post_ipo_debt 79 secondary_market 53 dtype: int64
fund_type.plot(kind='pie', figsize=(10,10))
<matplotlib.axes._subplots.AxesSubplot at 0x7feb4183b090>
recent_funding = rounds[rounds.funded_year >= 2000]
funding_sum = recent_funding[['funded_quarter','raised_amount_usd']].groupby('funded_quarter').sum()
funding_sum.plot(kind='bar', figsize=(15, 8))
<matplotlib.axes._subplots.AxesSubplot at 0x7feb39f90510>
We can see that over all the funding keeps increasing, and there is more interest in Q1, peaking at Q2 and dipping in Q3 and Q4. So what is the story about 2006-Q3?
q3_2006 = recent_funding[recent_funding.funded_quarter=='2006-Q3'].sort("raised_amount_usd", ascending=False)
q3_2006[['company_name', 'company_market', 'raised_amount_usd']].head()
company_name | company_market | raised_amount_usd | |
---|---|---|---|
funding_round_permalink | |||
/funding-round/949ee7711c5604b77dfa2a660eb0ad89 | Cardinal Health | Hospitals | 78795064652 |
/funding-round/c7f31e1a300f9e4aa8d8ba8cf93ce6ec | Cortina Systems | Hardware | 132000000 |
/funding-round/fd84230c623d8a57e63217d5f1f69b1d | Lumenis | Health Care | 120000000 |
/funding-round/ec2e66080a7fee8bef30b8cb30f48470 | United Information Technology Co. | Enterprise Software | 101000000 |
/funding-round/4ef70909558ae7d85b9ac196d87d2c54 | Solexa | Biotechnology | 75000000 |
venture_funding = rounds[(rounds.funded_year >= 2000) & (rounds.funding_round_type == 'venture')]
venture_sum = venture_funding[['funded_quarter','raised_amount_usd']].groupby('funded_quarter').sum()
venture_sum.plot(kind='bar', figsize=(15, 8))
<matplotlib.axes._subplots.AxesSubplot at 0x7feb3a149910>
seed_funding = rounds[(rounds.funded_year >= 2000) & ((rounds.funding_round_type == 'seed') | (rounds.funding_round_type=='angel'))]
seed_sum = seed_funding[['funded_quarter','raised_amount_usd']].groupby('funded_quarter').sum()
seed_sum.plot(kind='bar', figsize=(15, 8))
<matplotlib.axes._subplots.AxesSubplot at 0x7feb37b10090>
For this we need the next sheet of data.
investments = pd.io.excel.read_excel('crunchbase_monthly_export_d43b44299ade53.xlsx', sheetname='Investments', index_col='funding_round_permalink')
To see the behaviours of investment firms, let us group the data by investment name.
investments.info()
<class 'pandas.core.frame.DataFrame'> Index: 112590 entries, /funding-round/3b01561dd8c054727c9ddc0705a73f4c to /funding-round/e732b70e579f61fd973820ea9e348d7d Data columns (total 23 columns): company_permalink 112590 non-null object company_name 112587 non-null object company_category_list 108648 non-null object company_market 108647 non-null object company_country_code 105300 non-null object company_state_code 77851 non-null object company_region 105300 non-null object company_city 103940 non-null object investor_permalink 112528 non-null object investor_name 112528 non-null object investor_category_list 28906 non-null object investor_market 28854 non-null object investor_country_code 85082 non-null object investor_state_code 61418 non-null object investor_region 85082 non-null object investor_city 84574 non-null object funding_round_type 112590 non-null object funding_round_code 53847 non-null object funded_at 112590 non-null datetime64[ns] funded_month 112590 non-null object funded_quarter 112590 non-null object funded_year 112590 non-null int64 raised_amount_usd 99563 non-null float64 dtypes: datetime64[ns](1), float64(1), int64(1), object(20) memory usage: 20.6+ MB
invest_gp = investments[['investor_permalink', 'raised_amount_usd']]\
.groupby('investor_permalink').sum()
type(invest_gp)
pandas.core.frame.DataFrame
invest_gp.sort('raised_amount_usd', ascending=False).head()
raised_amount_usd | |
---|---|
investor_permalink | |
/organization/kleiner-perkins-caufield-byers | 16460723024 |
/organization/new-enterprise-associates | 13187618119 |
/organization/sequoia-capital | 13019932802 |
/organization/accel-partners | 12206334596 |
/organization/intel-capital | 10195565636 |
investments[investments.investor_country_code == 'IND']
company_permalink | company_name | company_category_list | company_market | company_country_code | company_state_code | company_region | company_city | investor_permalink | investor_name | ... | investor_state_code | investor_region | investor_city | funding_round_type | funding_round_code | funded_at | funded_month | funded_quarter | funded_year | raised_amount_usd | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
funding_round_permalink | |||||||||||||||||||||
/funding-round/bbdedaea55b5c2f0a4814a91c7d30ace | /organization/kartrocket | KartRocket | |SaaS|E-Commerce| | SaaS | IND | NaN | New Delhi | New Delhi | /organization/5ideas-in | 5ideas.in | ... | NaN | New Delhi | Gurgaon | seed | NaN | 2013-07-23 | 2013-07 | 2013-Q3 | 2013 | NaN |
/funding-round/0c34e9aeda49a68b54b048c125c80365 | /organization/vyome-biosciences | Vyome Biosciences | |Life Sciences|Biotechnology| | Life Sciences | IND | NaN | Delhi | Delhi | /organization/aarin-capital | Aarin Capital | ... | NaN | Bangalore | Bangalore | venture | A | 2012-09-12 | 2012-09 | 2012-Q3 | 2012 | 3300000 |
/funding-round/54202ff7c6901052c657d61584b1f9f6 | /organization/invictus-oncology | Invictus Oncology | |Biotechnology| | Biotechnology | IND | NaN | Delhi | Delhi | /organization/aarin-capital | Aarin Capital | ... | NaN | Bangalore | Bangalore | venture | A | 2013-05-28 | 2013-05 | 2013-Q2 | 2013 | 1880000 |
/funding-round/514241eccd4edab7903b6c7d1f5f42ec | /organization/tabtor | Tabtor | |Education| | Education | USA | NJ | Newark | Kendall Park | /organization/aarin-capital | Aarin Capital | ... | NaN | Bangalore | Bangalore | seed | NaN | 2013-06-12 | 2013-06 | 2013-Q2 | 2013 | 1000000 |
/funding-round/e56ecdeebd8bd8970e41942e28797c63 | /organization/theramyt-novobiologics | Theramyt Novobiologics | |Biotechnology| | Biotechnology | NaN | NaN | NaN | NaN | /organization/aarin-capital | Aarin Capital | ... | NaN | Bangalore | Bangalore | venture | A | 2013-10-11 | 2013-10 | 2013-Q4 | 2013 | 4500000 |
/funding-round/339647785b60e71ab176256b8a062774 | /organization/vyome-biosciences | Vyome Biosciences | |Life Sciences|Biotechnology| | Life Sciences | IND | NaN | Delhi | Delhi | /organization/aarin-capital | Aarin Capital | ... | NaN | Bangalore | Bangalore | venture | B | 2014-08-12 | 2014-08 | 2014-Q3 | 2014 | 8000000 |
/funding-round/ba08e700e66fa5e12384e60e8cbcc7d8 | /organization/edcast-inc | EdCast Inc. | |All Students|Education|Portals| | Education | USA | CA | SF Bay Area | Mountain View | /organization/aarin-capital | Aarin Capital | ... | NaN | Bangalore | Bangalore | venture | A | 2014-09-22 | 2014-09 | 2014-Q3 | 2014 | 6000000 |
/funding-round/3925da8eee16114d6570c3a0985d8ad7 | /organization/milk-mantra | Milk Mantra | |Health and Wellness| | Health and Wellness | IND | NaN | Bhubaneswar | Bhubaneswar | /organization/aavishkaar-venture-management-se... | Aavishkaar Venture Management services | ... | NaN | Mumbai | Mumbai | undisclosed | NaN | 2011-02-02 | 2011-02 | 2011-Q1 | 2011 | NaN |
/funding-round/c7087296aaa1e6ee33e9d174022ad444 | /organization/electronic-payment-and-services | Electronic Payment and Services (EPS) | |Hardware + Software| | Hardware + Software | IND | NaN | Mumbai | Mumbai | /organization/aavishkaar-venture-management-se... | Aavishkaar Venture Management services | ... | NaN | Mumbai | Mumbai | venture | B | 2013-08-26 | 2013-08 | 2013-Q3 | 2013 | 6000000 |
/funding-round/cabd3c8428576ef3018e1c91812a732e | /organization/electronic-payment-and-services | Electronic Payment and Services (EPS) | |Hardware + Software| | Hardware + Software | IND | NaN | Mumbai | Mumbai | /organization/aavishkaar-venture-management-se... | Aavishkaar Venture Management services | ... | NaN | Mumbai | Mumbai | venture | NaN | 2013-12-17 | 2013-12 | 2013-Q4 | 2013 | 5000000 |
/funding-round/2107d9f9c8568d80d7972b6992c9c14c | /organization/mela-artisans | Mela Artisans | |Fashion| | Fashion | USA | FL | Palm Beaches | Boca Raton | /organization/aavishkaar-venture-management-se... | Aavishkaar Venture Management services | ... | NaN | Mumbai | Mumbai | venture | NaN | 2014-03-03 | 2014-03 | 2014-Q1 | 2014 | 3000000 |
/funding-round/924be083c917d8ac047241c53fdf651b | /organization/milk-mantra | Milk Mantra | |Health and Wellness| | Health and Wellness | IND | NaN | Bhubaneswar | Bhubaneswar | /organization/aavishkaar-venture-management-se... | Aavishkaar Venture Management services | ... | NaN | Mumbai | Mumbai | venture | C | 2014-06-30 | 2014-06 | 2014-Q2 | 2014 | 13080000 |
/funding-round/3fb22f83e95fb1cabfca2cbbcdb4da48 | /organization/esolar | eSolar | |Green|Clean Technology| | Clean Technology | USA | CA | Los Angeles | Pasadena | /organization/acme-group | ACME Group | ... | NaN | Mumbai | Mumbai | venture | F | 2009-01-01 | 2009-01 | 2009-Q1 | 2009 | 40000000 |
/funding-round/9a466a1e2602795b46eabc65b12277e0 | /organization/amiigo | Amiigo | |Exercise|Tracking|Hardware|Technology|Fitness... | Health and Wellness | USA | UT | Salt Lake City | Salt Lake City | /organization/alpha-investments | Alpha Investments | ... | NaN | Mumbai | Mumbai | seed | NaN | 2013-01-16 | 2013-01 | 2013-Q1 | 2013 | NaN |
/funding-round/e4183920092065834cd3ec1e498ac49d | /organization/unamia | unamia | |E-Commerce| | E-Commerce | IND | NaN | Bangalore | Bangalore | /organization/angelprime | AngelPrime | ... | NaN | Bangalore | Bengaluru | seed | NaN | 2012-10-17 | 2012-10 | 2012-Q4 | 2012 | 1200000 |
/funding-round/0155ffde94f5c477acd0e52ce5f87fca | /organization/hackerearth | HackerEarth | |Software| | Software | IND | NaN | Bangalore | Bangalore | /organization/angelprime | AngelPrime | ... | NaN | Bangalore | Bengaluru | seed | NaN | 2014-02-24 | 2014-02 | 2014-Q1 | 2014 | 500000 |
/funding-round/3a4629496f195ec5035271602d485234 | /organization/synup | Synup | |Software| | Software | USA | DE | Wilmington, Delaware | Wilmington | /organization/angelprime | AngelPrime | ... | NaN | Bangalore | Bengaluru | seed | NaN | 2014-08-11 | 2014-08 | 2014-Q3 | 2014 | NaN |
/funding-round/f3ca0e0715575437846695885f21349e | /organization/erc-eye-care | ERC Eye Care | |Biotechnology| | Biotechnology | IND | NaN | IND - Other | Jorhat | /organization/ankur-capital | Ankur Capital | ... | NaN | Mumbai | Mumbai | undisclosed | NaN | 2013-12-18 | 2013-12 | 2013-Q4 | 2013 | NaN |
/funding-round/2e76287d0606a929e926312e68d6c61e | /organization/swiftshift | Swift Shift | |Software| | Software | GBR | NaN | London | London | /organization/ankur-capital | Ankur Capital | ... | NaN | Mumbai | Mumbai | venture | NaN | 2014-05-01 | 2014-05 | 2014-Q2 | 2014 | 0 |
/funding-round/b1b5dbefec49800b9791f8974990e41c | /organization/mydentist | MyDentist | |Health Care| | Health Care | IND | NaN | Mumbai | Mumbai | /organization/asian-healthcare-fund | Asian Healthcare Fund | ... | NaN | New Delhi | New Delhi | venture | NaN | 2013-01-01 | 2013-01 | 2013-Q1 | 2013 | 10000000 |
/funding-round/f744530cce618681192f6d9e334d0ef5 | /organization/healthspring | HealthSpring | |Hospitals|Biotechnology| | Hospitals | USA | TN | Nashville | Franklin | /organization/asian-healthcare-fund | Asian Healthcare Fund | ... | NaN | New Delhi | New Delhi | venture | B | 2013-06-19 | 2013-06 | 2013-Q2 | 2013 | 3700000 |
/funding-round/ec9d6802389ed053dffb07ac9ee8acfd | /organization/forus-health | Forus Health | |Health Care| | Health Care | IND | NaN | Bangalore | Bangalore | /organization/asian-healthcare-fund | Asian Healthcare Fund | ... | NaN | New Delhi | New Delhi | venture | B | 2014-01-09 | 2014-01 | 2014-Q1 | 2014 | 8400000 |
/funding-round/734f972917a468cbaa0fced4c60310fe | /organization/omni-hospitals | Omni Hospitals | |Biotechnology| | Biotechnology | IND | NaN | Hyderabad | Hyderabad | /organization/ask-pravi | ASK Pravi | ... | NaN | Mumbai | Mumbai | venture | A | 2013-08-07 | 2013-08 | 2013-Q3 | 2013 | 9800000 |
/funding-round/635e75b6014396e40d84117b3c0687ff | /organization/thinklink | ThinkLink | |Consulting| | Consulting | IND | NaN | Haryana | Haryana | /organization/aspada | Aspada | ... | NaN | Bangalore | Bangalore | venture | NaN | 2014-01-07 | 2014-01 | 2014-Q1 | 2014 | 1600000 |
/funding-round/0f9f1cd76ce19e552f771abe59c32483 | /organization/neogrowth | Neogrowth | |Finance| | Finance | IND | NaN | Mumbai | Mumbai | /organization/aspada | Aspada | ... | NaN | Bangalore | Bangalore | venture | NaN | 2014-03-26 | 2014-03 | 2014-Q1 | 2014 | 1620000 |
/funding-round/2afd408dad65e43e9b1af3a852f8ea54 | /organization/capital-float | Capital Float | |Financial Services| | Financial Services | IND | NaN | Bangalore | Bangalore | /organization/aspada | Aspada | ... | NaN | Bangalore | Bangalore | seed | NaN | 2014-06-25 | 2014-06 | 2014-Q2 | 2014 | 2000000 |
/funding-round/f328070e282b09cdf034cc9b941e065e | /organization/xamcheck | Xamcheck | |Education| | Education | IND | NaN | Secunderabad | Secunderabad | /organization/aspada | Aspada | ... | NaN | Bangalore | Bangalore | venture | NaN | 2014-10-30 | 2014-10 | 2014-Q4 | 2014 | 1800000 |
/funding-round/c0b21533794fc35ff56a26e922470838 | /organization/dunenetworks | DuneNetworks | |Semiconductors| | Semiconductors | USA | CA | SF Bay Area | Sunnyvale | /organization/aurum-ventures | Aurum Ventures | ... | NaN | Mumbai | Mumbai | venture | B | 2008-01-14 | 2008-01 | 2008-Q1 | 2008 | 12000000 |
/funding-round/5424f8d2aa768a737b010ac1024308a0 | /organization/n-trig | N-Trig | |Hardware + Software| | Hardware + Software | ISR | NaN | Tel Aviv | Kfar Saba | /organization/aurum-ventures | Aurum Ventures | ... | NaN | Mumbai | Mumbai | venture | C | 2008-02-20 | 2008-02 | 2008-Q1 | 2008 | 28000000 |
/funding-round/8ece8f4668dd0bdc869a6eb0a197df84 | /organization/n-trig | N-Trig | |Hardware + Software| | Hardware + Software | ISR | NaN | Tel Aviv | Kfar Saba | /organization/aurum-ventures | Aurum Ventures | ... | NaN | Mumbai | Mumbai | venture | D | 2009-01-12 | 2009-01 | 2009-Q1 | 2009 | 24000000 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
/funding-round/fd88bc8d12635d46e25106f6b2c1cbdf | /organization/si2-microsystems | Si2 Microsystems | |Semiconductors| | Semiconductors | USA | CA | SF Bay Area | San Jose | /organization/ventureast | VenturEast | ... | NaN | Chennai | Chennai | venture | NaN | 2009-02-04 | 2009-02 | 2009-Q1 | 2009 | 12700000 |
/funding-round/0391607b40e8d8d1f3e4cf36e06fc6db | /organization/desicrew-solutions | DesiCrew Solutions | |Education| | Education | IND | NaN | Chennai | Chennai | /organization/ventureast | VenturEast | ... | NaN | Chennai | Chennai | venture | B | 2012-01-01 | 2012-01 | 2012-Q1 | 2012 | 1200000 |
/funding-round/441925078ed4b78a3fe72d98744bd19c | /organization/seclore | Seclore | |Software| | Software | IND | NaN | Mumbai | Mumbai | /organization/ventureast | VenturEast | ... | NaN | Chennai | Chennai | venture | NaN | 2013-04-24 | 2013-04 | 2013-Q2 | 2013 | 6000000 |
/funding-round/c03336b81d71755274c48d7a3719a612 | /organization/protea-medical | Protea Medical | |Health Care| | Health Care | USA | AZ | Phoenix | Chandler | /organization/ventureast | VenturEast | ... | NaN | Chennai | Chennai | venture | NaN | 2013-12-01 | 2013-12 | 2013-Q4 | 2013 | 8000000 |
/funding-round/c608fef63d3a5f68227b386c0507a293 | /organization/portea-medical | Portea Medical | |Healthcare Services|Medical|Health Care| | Health Care | IND | NaN | Bangalore | Bangalore | /organization/ventureast | VenturEast | ... | NaN | Chennai | Chennai | private_equity | NaN | 2013-12-02 | 2013-12 | 2013-Q4 | 2013 | 8000000 |
/funding-round/7e61b0c22631b580112180a94c497566 | /organization/pinnacle-engines | Pinnacle Engines | |Clean Technology| | Clean Technology | USA | CA | SF Bay Area | San Carlos | /organization/ventureast | VenturEast | ... | NaN | Chennai | Chennai | venture | C | 2014-05-08 | 2014-05 | 2014-Q2 | 2014 | NaN |
/funding-round/ce3db57eb6e70ab6cd089c09082fbbe7 | /organization/polygenta-technologies | Polygenta Technologies | |Textiles| | Textiles | IND | NaN | Mumbai | Mumbai | /organization/ventureast | VenturEast | ... | NaN | Chennai | Chennai | venture | NaN | 2014-07-21 | 2014-07 | 2014-Q3 | 2014 | 3600000 |
/funding-round/365b84dae9d6e2264f425f91ff9b9454 | /organization/seclore | Seclore | |Software| | Software | IND | NaN | Mumbai | Mumbai | /organization/ventureast-tenet-fund | Ventureast Tenet Fund | ... | NaN | Hyderabad | Hyderabad | seed | NaN | 2009-06-08 | 2009-06 | 2009-Q2 | 2009 | NaN |
/funding-round/61a294d317c2f74cdabc4f8b96527799 | /organization/icrederity | iCrederity | |Security|Identity Management|Nonprofits| | Nonprofits | USA | NY | New York City | New York | /organization/ventureast-tenet-fund | Ventureast Tenet Fund | ... | NaN | Hyderabad | Hyderabad | seed | NaN | 2010-04-01 | 2010-04 | 2010-Q2 | 2010 | NaN |
/funding-round/064b2a75f8f0259805d14d5d151af925 | /organization/inopen | InOpen | |Education| | Education | IND | NaN | Mumbai | Mumbai | /organization/ventureast-tenet-fund | Ventureast Tenet Fund | ... | NaN | Hyderabad | Hyderabad | seed | NaN | 2011-08-01 | 2011-08 | 2011-Q3 | 2011 | NaN |
/funding-round/aeb62fcde22725715d03097e2ea7ff52 | /organization/smartrx | SmartRx | |Health Care Information Technology|Software| | Software | IND | NaN | Bangalore | Bangalore | /organization/ventureast-tenet-fund | Ventureast Tenet Fund | ... | NaN | Hyderabad | Hyderabad | seed | NaN | 2013-04-01 | 2013-04 | 2013-Q2 | 2013 | NaN |
/funding-round/fe12e7a0fb551e0e756b10dabd509649 | /organization/little-eye-labs | Little Eye Labs | |Testing|Android|Mobile| | Testing | IND | NaN | Bangalore | Bangalore | /organization/ventureast-tenet-fund | Ventureast Tenet Fund | ... | NaN | Hyderabad | Hyderabad | seed | NaN | 2013-06-01 | 2013-06 | 2013-Q2 | 2013 | NaN |
/funding-round/c21c550c4c8ad8f7660a1d9de79ce41b | /organization/stylecraze-beauty-care-pvt-ltd | StyleCraze Beauty Care Pvt Ltd | |Health and Wellness|Beauty| | Health and Wellness | IND | NaN | Hyderabad | Hyderabad | /organization/ventureast-tenet-fund | Ventureast Tenet Fund | ... | NaN | Hyderabad | Hyderabad | seed | NaN | 2013-08-03 | 2013-08 | 2013-Q3 | 2013 | NaN |
/funding-round/a9f44c174f683aaa7e14421c0c10f117 | /organization/onebreath | OneBreath | |Consulting| | Consulting | IND | NaN | Bangalore | Bangalore | /organization/ventureast-tenet-fund | Ventureast Tenet Fund | ... | NaN | Hyderabad | Hyderabad | venture | NaN | 2014-01-28 | 2014-01 | 2014-Q1 | 2014 | 3000000 |
/funding-round/563f4fa3e428b5b4738a0d0b39ab085e | /organization/mediangels | MediAngels | |Hospitality| | Hospitality | IND | NaN | Mumbai | Mumbai | /organization/ventureast-tenet-fund | Ventureast Tenet Fund | ... | NaN | Hyderabad | Hyderabad | seed | NaN | 2014-10-10 | 2014-10 | 2014-Q4 | 2014 | 1000000 |
/funding-round/c392231129fde459a31f9b7ee0767735 | /organization/interview-master | Interview Master | |Recruiting|Human Resources|Curated Web| | Human Resources | IND | NaN | Bangalore | Bangalore | /organization/venturenursery | VentureNursery | ... | NaN | Mumbai | Mumbai | seed | NaN | 2013-06-14 | 2013-06 | 2013-Q2 | 2013 | NaN |
/funding-round/f923a3256cbf209805ad900a85876234 | /organization/klip-in | Klip.in | |E-Commerce| | E-Commerce | USA | CA | SF Bay Area | Palo Alto | /organization/venturenursery | VentureNursery | ... | NaN | Mumbai | Mumbai | seed | NaN | 2013-09-06 | 2013-09 | 2013-Q3 | 2013 | NaN |
/funding-round/6953bc5b4d7302797e060c96fc1b1f1b | /organization/hotelsaroundyou | HotelsAroundYou | |Hotels| | Hotels | IND | NaN | Mumbai | Mumbai | /organization/venturenursery | VentureNursery | ... | NaN | Mumbai | Mumbai | seed | NaN | 2014-10-27 | 2014-10 | 2014-Q4 | 2014 | NaN |
/funding-round/1fbdac141bb229304baced175b700a1d | /organization/india-orders | India Orders | |E-Commerce| | E-Commerce | NaN | NaN | NaN | NaN | /organization/vijay-group | Vijay Group | ... | NaN | Thane | Thane | undisclosed | NaN | 2013-07-09 | 2013-07 | 2013-Q3 | 2013 | NaN |
/funding-round/e3c18bc577de12897186ea2668ea3842 | /organization/mobileveda | MobileVeda | |Mobile| | Mobile | IND | NaN | Vellore | Vellore | /organization/vit-technology-business-incubator | VIT Technology Business Incubator | ... | NaN | Vellore | Vellore | seed | NaN | 2007-04-01 | 2007-04 | 2007-Q2 | 2007 | 9000 |
/funding-round/76a232e1ac21fd9912dc6ad7c04b5970 | /organization/abaltat | Tunepresto | |Video Editing|Web Development|Music| | Music | IRL | NaN | Galway | Galway | /organization/wdc | WDC | ... | NaN | Kolkata | Kolkata | venture | A | 2007-01-01 | 2007-01 | 2007-Q1 | 2007 | 263400 |
/funding-round/f9c8271bb4d88eab399485796ac49305 | /organization/nooked | nooked | |Shopping|Web Development|Web Tools|E-Commerce| | Web Development | IRL | NaN | Sligo | Sligo | /organization/wdc | WDC | ... | NaN | Kolkata | Kolkata | seed | NaN | 2008-08-01 | 2008-08 | 2008-Q3 | 2008 | 1557400 |
/funding-round/04f480ddf19f264fbfb2a99343eb3d9c | /organization/crude-area | Crude Area | |Product Design|Art|Design| | Product Design | IND | NaN | Mumbai | Mumbai | /organization/webchutney | Webchutney | ... | NaN | New Delhi | New Delhi | angel | NaN | 2013-07-29 | 2013-07 | 2013-Q3 | 2013 | NaN |
/funding-round/7cb5c9070cdcb535482b209796698fc2 | /organization/opera-solutions | Opera Solutions | |Business Intelligence|Predictive Analytics|An... | Predictive Analytics | USA | NJ | Newark | Jersey City | /organization/wipro | Wipro Technologies | ... | NaN | Bangalore | Bangalore | private_equity | NaN | 2013-05-09 | 2013-05 | 2013-Q2 | 2013 | 30000000 |
/funding-round/463de5ee1fc9c25cf45e88d2754b203a | /organization/opera-solutions | Opera Solutions | |Business Intelligence|Predictive Analytics|An... | Predictive Analytics | USA | NJ | Newark | Jersey City | /organization/wipro | Wipro Technologies | ... | NaN | Bangalore | Bangalore | venture | G | 2014-10-23 | 2014-10 | 2014-Q4 | 2014 | 8199999 |
/funding-round/64e0d2bafe167a51be69ac678fa78752 | /organization/axeda | Axeda | |Enterprise Software|M2M|Software| | Enterprise Software | USA | MA | New Bedford | Foxboro | /organization/wipro-2 | Wipro | ... | NaN | Bangalore | Bangalore | private_equity | NaN | 2013-08-20 | 2013-08 | 2013-Q3 | 2013 | 12000000 |
/funding-round/7c47dd82bc76607ea8cf60cb6a428f38 | /organization/uniphore | Uniphore | |Software| | Software | IND | NaN | IND - Other | Taramani | /organization/yournest-angel-fund | YourNest Angel Fund | ... | NaN | New Delhi | Gurgaon | angel | NaN | 2014-03-13 | 2014-03 | 2014-Q1 | 2014 | NaN |
/funding-round/5abb4d1afe518ab7ff453d362f647580 | /organization/antuit | Antuit | |Analytics| | Analytics | SGP | NaN | Singapore | Singapore | /organization/zodius-capital | Zodius Capital | ... | NaN | Mumbai | Mumbai | venture | NaN | 2013-08-13 | 2013-08 | 2013-Q3 | 2013 | 3859800 |
/funding-round/79d0dc9121121b1147932b1401c6215a | /organization/culture-machine | Culture Machine | |Digital Media|Entertainment|Media| | Entertainment | IND | NaN | Mumbai | Mumbai | /organization/zodius-capital | Zodius Capital | ... | NaN | Mumbai | Mumbai | venture | A | 2014-09-01 | 2014-09 | 2014-Q3 | 2014 | 3500000 |
/funding-round/bc6ec5767c789dec7d704a5a9148db80 | /organization/bigbasket-com | Bigbasket.com | |Online Shopping|Groceries|E-Commerce| | E-Commerce | IND | NaN | Bangalore | Bangalore | /organization/zodius-capital | Zodius Capital | ... | NaN | Mumbai | Mumbai | venture | B | 2014-09-12 | 2014-09 | 2014-Q3 | 2014 | 32799999 |
462 rows × 23 columns
invest_ind = investments[investments.company_country_code=='IND']\
[['investor_permalink', 'raised_amount_usd']]\
.groupby('investor_permalink').sum()
len(invest_ind)
628
invest_us.sort('raised_amount_usd', ascending=False).head(10)
raised_amount_usd | |
---|---|
investor_permalink | |
/organization/tiger-global | 2204850000 |
/organization/naspers | 1560000000 |
/organization/accel-partners | 1540325000 |
/organization/morgan-stanley | 1309500000 |
/organization/sofina | 1243000000 |
/organization/dst-global | 1210000000 |
/organization/digital-sky-technologies-fo | 1210000000 |
/organization/government-of-singapore-investment-corporation-gic | 1000000000 |
/organization/sequoia-capital | 628550000 |
/organization/softbank-internet-and-media-(simi) | 627000000 |
cross_country_investing = investments[(investments.company_country_code != investments.investor_country_code) & pd.notnull(investments.company_country_code) & pd.notnull(investments.investor_country_code)]
cross_country_investing = cross_country_investing[['investor_country_code', 'company_country_code', 'company_permalink']]
cross_country_investing.info()
<class 'pandas.core.frame.DataFrame'> Index: 16376 entries, /funding-round/33fbea0c06971aac9297ff2f9e56512f to /funding-round/655045befcf3c5b2805a0fbd73300e78 Data columns (total 3 columns): investor_country_code 16376 non-null object company_country_code 16376 non-null object company_permalink 16376 non-null object dtypes: object(3) memory usage: 511.8+ KB
invest_count = cross_country_investing.groupby(['investor_country_code','company_country_code']).count()
invest_count.ix['IND']
company_permalink | |
---|---|
company_country_code | |
ARE | 1 |
AUS | 1 |
BEL | 3 |
CAN | 1 |
CYM | 1 |
GBR | 3 |
IRL | 2 |
ISR | 7 |
SGP | 4 |
UGA | 1 |
USA | 97 |
invest_count.ix['USA']
company_permalink | |
---|---|
company_country_code | |
ARE | 14 |
ARG | 40 |
AUS | 65 |
AUT | 21 |
AZE | 1 |
BEL | 22 |
BGD | 14 |
BGR | 4 |
BHR | 1 |
BMU | 3 |
BRA | 113 |
BRN | 1 |
CAN | 634 |
CHE | 61 |
CHL | 6 |
CHN | 803 |
COL | 3 |
CRI | 1 |
CYM | 13 |
CYP | 8 |
CZE | 6 |
DEU | 213 |
DNK | 23 |
ECU | 1 |
EGY | 3 |
ESP | 92 |
EST | 13 |
FIN | 32 |
FRA | 127 |
GBR | 904 |
... | ... |
MAR | 1 |
MEX | 22 |
MKD | 1 |
MOZ | 1 |
MYS | 28 |
NGA | 10 |
NLD | 69 |
NOR | 18 |
NZL | 16 |
PAK | 2 |
PAN | 1 |
PER | 2 |
PHL | 8 |
POL | 1 |
PRT | 5 |
ROU | 1 |
RUS | 73 |
SGP | 131 |
SLV | 1 |
SRB | 3 |
SWE | 78 |
THA | 12 |
TUR | 28 |
TWN | 11 |
TZA | 7 |
UGA | 1 |
UKR | 6 |
URY | 1 |
VNM | 3 |
ZAF | 5 |
77 rows × 1 columns
invest_count.ix[('USA','IND')]
company_permalink 583 Name: (USA, IND), dtype: int64
investments[['raised_amount_usd','company_market']]\
.groupby('company_market')\
.sum()\
.sort('raised_amount_usd',ascending=False)
raised_amount_usd | |
---|---|
company_market | |
Biotechnology | 1.475935e+11 |
Software | 8.722225e+10 |
Clean Technology | 6.958990e+10 |
Health Care | 5.994843e+10 |
E-Commerce | 4.768432e+10 |
Enterprise Software | 4.241772e+10 |
Mobile | 4.203615e+10 |
Internet | 3.739474e+10 |
Advertising | 3.469002e+10 |
Semiconductors | 3.383968e+10 |
Finance | 2.505558e+10 |
Curated Web | 2.448018e+10 |
Hardware + Software | 2.445069e+10 |
Technology | 2.209097e+10 |
Web Hosting | 2.034072e+10 |
Games | 1.906473e+10 |
Analytics | 1.835564e+10 |
Transportation | 1.678087e+10 |
Security | 1.633595e+10 |
Health and Wellness | 1.603438e+10 |
Online Shopping | 1.408079e+10 |
Manufacturing | 1.222042e+10 |
Travel | 1.087952e+10 |
Education | 9.417769e+09 |
Social Media | 8.986376e+09 |
Fashion | 8.501845e+09 |
Video | 7.549672e+09 |
Search | 7.470412e+09 |
Automotive | 7.043365e+09 |
SaaS | 6.975657e+09 |
... | ... |
Carbon | 5.184200e+04 |
Email Newsletters | 5.000000e+04 |
Video Processing | 4.500000e+04 |
Comics | 4.000000e+04 |
Unmanned Air Systems | 4.000000e+04 |
Ventures for Good | 4.000000e+04 |
Intellectual Asset Management | 3.999400e+04 |
Veterinary | 3.462300e+04 |
CAD | 2.800000e+04 |
Soccer | 2.500000e+04 |
Home Owners | 2.200000e+04 |
Self Development | 2.000000e+04 |
Families | 2.000000e+04 |
Direct Sales | 1.800000e+04 |
Advanced Materials | NaN |
BPO Services | NaN |
Consumer Lending | NaN |
Debt Collecting | NaN |
Estimation and Quoting | NaN |
Farmers Market | NaN |
Green Building | NaN |
Independent Music Labels | NaN |
Lasers | NaN |
Lotteries | NaN |
Physical Security | NaN |
Social News | NaN |
Software Compliance | NaN |
Synthetic Biology | NaN |
Technical Continuing Education | NaN |
Weird Hardware | NaN |
700 rows × 1 columns
investments[investments.company_country_code == 'IND']\
[['raised_amount_usd','company_market']]\
.groupby('company_market')\
.sum()\
.sort('raised_amount_usd',ascending=False).head(20)
raised_amount_usd | |
---|---|
company_market | |
Online Shopping | 11305500000 |
E-Commerce | 4018555280 |
Clean Technology | 1772017000 |
Curated Web | 1474882534 |
Mobile | 1258024000 |
Advertising | 897400000 |
Transportation | 724460000 |
Software | 692046500 |
Finance | 656182000 |
Biotechnology | 459790000 |
Home Automation | 364000000 |
Real Estate | 343600000 |
Hardware | 302600000 |
Education | 282740786 |
Security | 217040000 |
Jewelry | 213000000 |
Analytics | 211550000 |
Health Care | 206420000 |
Hardware + Software | 194150000 |
Games | 188040000 |
investments[(investments.company_country_code == 'IND') | (investments.funded_year >=2013)]\
[['raised_amount_usd','company_market']]\
.groupby('company_market')\
.sum()\
.sort('raised_amount_usd',ascending=False).head(20)
raised_amount_usd | |
---|---|
company_market | |
Biotechnology | 45784700319 |
Software | 25600534587 |
E-Commerce | 22865083457 |
Enterprise Software | 16630107468 |
Clean Technology | 16468227505 |
Health Care | 16233216253 |
Finance | 15152577885 |
Transportation | 15053838568 |
Curated Web | 13318647294 |
Online Shopping | 12961270701 |
Mobile | 10001123275 |
Health and Wellness | 9470712226 |
Analytics | 7491131425 |
Education | 6489969851 |
Advertising | 6148014461 |
Hardware + Software | 6102775439 |
Oil and Gas | 6059186366 |
Travel | 5244774574 |
Security | 5056600726 |
Technology | 4848314582 |
investments.ix[1]
company_permalink /organization/advercar company_name AdverCar company_category_list |Cars|Advertising| company_market Advertising company_country_code USA company_state_code CA company_region SF Bay Area company_city San Francisco investor_permalink /organization/1-800-flowers-com investor_name 1-800-FLOWERS.COM investor_category_list |E-Commerce| investor_market E-Commerce investor_country_code USA investor_state_code NY investor_region New York City investor_city New York funding_round_type seed funding_round_code NaN funded_at 2012-10-30 00:00:00 funded_month 2012-10 funded_quarter 2012-Q4 funded_year 2012 raised_amount_usd 2000000 Name: /funding-round/0347630bdf4b26a66ebc20666d3b2069, dtype: object
We have to load yet another excel sheet to answer this question.
acquisitions = pd.io.excel.read_excel('crunchbase_monthly_export_d43b44299ade53.xlsx', sheetname='Acquisitions', index_col='company_permalink')
acquisitions.info()
<class 'pandas.core.frame.DataFrame'> Index: 12851 entries, /organization/waywire to /organization/zytex-group Data columns (total 21 columns): company_name 12851 non-null object company_category_list 9243 non-null object company_market 9242 non-null object company_country_code 9599 non-null object company_state_code 7373 non-null object company_region 9600 non-null object company_city 9401 non-null object acquirer_permalink 12851 non-null object acquirer_name 12851 non-null object acquirer_category_list 11160 non-null object acquirer_market 11160 non-null object acquirer_country_code 11995 non-null object acquirer_state_code 9612 non-null object acquirer_region 11995 non-null object acquirer_city 11878 non-null object acquired_at 12851 non-null datetime64[ns] acquired_month 12850 non-null object acquired_quarter 12850 non-null object acquired_year 12850 non-null float64 price_amount 3642 non-null float64 price_currency_code 12847 non-null object dtypes: datetime64[ns](1), float64(2), object(18) memory usage: 2.2+ MB
acquisitions = acquisitions[acquisitions.acquired_year>=2000]
acquisitions[['acquirer_permalink','company_name']]\
.groupby("acquirer_permalink")\
.count()\
.sort('company_name',ascending=False)
company_name | |
---|---|
acquirer_permalink | |
/organization/google | 164 |
/organization/cisco | 123 |
/organization/ibm | 121 |
/organization/microsoft | 114 |
/organization/yahoo | 101 |
/organization/oracle | 90 |
/organization/emc | 60 |
/organization/hewlett-packard | 60 |
/organization/facebook | 49 |
/organization/aol | 48 |
/organization/apple | 48 |
/organization/intel | 47 |
/organization/ebay | 45 |
/organization/amazon | 41 |
/organization/twitter | 38 |
/organization/groupon | 33 |
/organization/nokia | 30 |
/organization/salesforce | 30 |
/organization/adobe-systems | 28 |
/organization/dell | 28 |
/organization/zayo-group | 27 |
/organization/electronicarts | 26 |
/organization/zynga | 25 |
/organization/autodesk | 25 |
/organization/iac | 24 |
/organization/blackberry | 24 |
/organization/qualcomm | 23 |
/organization/thoma-bravo | 22 |
/organization/intuit | 20 |
/organization/homeaway | 19 |
... | ... |
/organization/hansa-medical | 1 |
/organization/hanon-mckendry | 1 |
/organization/hanlon-investment-management | 1 |
/organization/hangzhou-tigermed-technology-co-ltd | 1 |
/organization/handelsbanken | 1 |
/organization/hanco | 1 |
/organization/halo-healthcare | 1 |
/organization/halma | 1 |
/organization/harrison-gypsum | 1 |
/organization/harvest-power | 1 |
/organization/healthgate-data-corp | 1 |
/organization/harvestmark | 1 |
/organization/healthcare-solutions | 1 |
/organization/healthcare-of-today | 1 |
/organization/healthagen | 1 |
/organization/health-insurance-innovations | 1 |
/organization/health-grades | 1 |
/organization/health-club-media-network | 1 |
/organization/health-care-reit | 1 |
/organization/headstrong | 1 |
/organization/hdfc-bank | 1 |
/organization/hcp-company | 1 |
/organization/hcl | 1 |
/organization/hca | 1 |
/organization/haymarket | 1 |
/organization/hawker-beechcraft | 1 |
/organization/havok | 1 |
/organization/havas-discovery | 1 |
/organization/hasbro-inc | 1 |
/organization/zyraz-technology | 1 |
6485 rows × 1 columns
investments.info()
<class 'pandas.core.frame.DataFrame'> Index: 112590 entries, /funding-round/3b01561dd8c054727c9ddc0705a73f4c to /funding-round/e732b70e579f61fd973820ea9e348d7d Data columns (total 23 columns): company_permalink 112590 non-null object company_name 112587 non-null object company_category_list 108648 non-null object company_market 108647 non-null object company_country_code 105300 non-null object company_state_code 77851 non-null object company_region 105300 non-null object company_city 103940 non-null object investor_permalink 112528 non-null object investor_name 112528 non-null object investor_category_list 28906 non-null object investor_market 28854 non-null object investor_country_code 85082 non-null object investor_state_code 61418 non-null object investor_region 85082 non-null object investor_city 84574 non-null object funding_round_type 112590 non-null object funding_round_code 53847 non-null object funded_at 112590 non-null datetime64[ns] funded_month 112590 non-null object funded_quarter 112590 non-null object funded_year 112590 non-null int64 raised_amount_usd 99563 non-null float64 dtypes: datetime64[ns](1), float64(1), int64(1), object(20) memory usage: 20.6+ MB
yearly_invest = investments[investments.funded_year>=2000][['funded_year','company_market']].groupby('funded_year')
index = []
from collections import defaultdict
plotpoints = defaultdict(dict)
for year, group in yearly_invest:
index.append(year)
rank = 1
for key, value in group.company_market.value_counts().head(10).iteritems():
plotpoints[key][year] = rank
rank = rank + 1
with plt.style.context('fivethirtyeight'):
plt.figure(figsize=(15,8))
plt.gca().invert_yaxis()
for market,points in plotpoints.iteritems():
values =[points.get(year, 11) for year in index]
plt.plot(index, values)
#plt.plot(x, np.sin(x) + x + np.random.randn(50))
#plt.plot(x, np.sin(x) + 0.5 * x + np.random.randn(50))
#plt.plot(x, np.sin(x) + 2 * x + np.random.randn(50))