Columns Needed to import from MongoDB:
file_list_columns = ['EIN', 'OrganizationName', 'URL', 'SubmittedOn', 'TaxPeriod']
SOX_columns = ['WhistleblowerPolicy', 'WhistleblowerPolicyInd', 'ConflictOfInterestPolicyInd', 'ConflictOfInterestPolicy', 'DocumentRetentionPolicy', 'DocumentRetentionPolicyInd',]
revenue_columns = ['CYTotalRevenueAmt', 'TotalRevenueCurrentYear']
efficiency_columns = ['CYTotalExpensesAmt', 'TotalExpensesCurrentYear' 'TotalProgramServiceExpense', 'TotalProgramServiceExpensesAmt']
complexity_columns = ['CYContributionsGrantsAmt', 'FederatedCampaigns', 'FederatedCampaignsAmt', 'MembershipDues', 'MembershipDuesAmt', 'NetIncomeFromFundraisingEvents', 'NetIncmFromFundraisingEvtGrp', 'FundraisingAmt', 'FundraisingGrossIncomeAmt', 'RelatedOrganizations', 'RelatedOrganizationsAmt', 'GovernmentGrants', 'GovernmentGrantsAmt', 'PYProgramServiceRevenueAmt','CYProgramServiceRevenueAmt', 'ProgramServiceRevenueCY', 'TotalProgramServiceRevenue', 'TotalProgramServiceRevenueAmt', 'CYInvestmentIncomeAmt', 'InvestmentIncomeCurrentYear', 'OtherRevenueMiscGrp', 'OtherRevenueTotalAmt', 'CYOtherRevenueAmt', 'TotalOtherRevenue', ]
cols = file_list_columns+SOX_columns+revenue_columns+efficiency_columns+complexity_columns
Columns Needed to keep to merge with existing data:
file_list_columns = ['EIN', 'OrganizationName', 'URL', 'SubmittedOn', 'TaxPeriod', 'FYE']
SOX_columns = ['whistleblower_policy', 'conflict_of_interest_policy', 'records_retention_policy', 'SOX_policies', 'SOX_policies_binary', 'SOX_policies_all_binary']
control_columns = ['tot_rev', 'total_expenses', 'program_expenses', 'complexity']
unneeded_columns = ['CYTotalExpensesAmt', u'TotalExpensesCurrentYear', u'TotalFunctionalExpenses', 'TotalFunctionalExpensesGrp', u'TotalProgramServiceExpense', u'TotalProgramServiceExpensesAmt', 'CYContributionsGrantsAmt', u'FederatedCampaigns', u'FederatedCampaignsAmt', u'MembershipDues', 'MembershipDuesAmt', u'NetIncomeFromFundraisingEvents', u'NetIncmFromFundraisingEvtGrp', 'FundraisingAmt', 'FundraisingGrossIncomeAmt', u'RelatedOrganizations', 'RelatedOrganizationsAmt', u'GovernmentGrants', 'GovernmentGrantsAmt', 'PYProgramServiceRevenueAmt', u'CYProgramServiceRevenueAmt', 'ProgramServiceRevenue', 'ProgramServiceRevenueCY', u'ProgramServiceRevenueGrp', u'TotalProgramServiceRevenue', 'TotalProgramServiceRevenueAmt', u'CYInvestmentIncomeAmt', u'InvestmentIncome', 'InvestmentIncomeCurrentYear', 'InvestmentIncomeGrp', u'OtherRevenueTotalAmt', 'OtherRevenueMiscGrp', u'TotalOtherRevenue', 'CYOtherRevenueAmt', 'fundraising', 'program_revenue', 'OtherRevMisc', 'other_revenue_binary']
cols = file_list_columns+SOX_columns+control_columns
import sys
import time
import json
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import sqlite3 as lite
from pandas import DataFrame
from pandas import Series
#http://pandas.pydata.org/pandas-docs/stable/options.html
pd.set_option('display.max_columns', None)
pd.set_option('max_colwidth', 250)
plt.rcParams['figure.figsize'] = (15, 5)
%matplotlib inline
Run pip install awscli
See this site for more information: https://aws.amazon.com/public-data-sets/irs-990/
#import awscli
#aws s3 ls s3://irs-form-990/2012 --human-readable --summarize
#aws s3 s3://irs-form-990/index.json
File "<ipython-input-11-59b06a38f6b5>", line 1 aws s3 s3://irs-form-990/index.json ^ SyntaxError: invalid syntax
#https://s3.amazonaws.com/irs-form-990/index.json
aws s3 ls s3://irs-form-990/2012 --human-readable --summarize
import requests
url = 'https://s3.amazonaws.com/irs-form-990/index.json'
f = requests.get(url)
print url, '\n'
print f.text[:100]
print type(f)
data = f.json()
https://s3.amazonaws.com/irs-form-990/index.json {"AllFilings":[{"EIN":"742661023","TaxPeriod":"201412","DLN":"93491315003445","FormType":"990PF","Is <class 'requests.models.Response'>
print len(data)
print data.keys()
print len(data['AllFilings'])
print data['AllFilings'][0]
1 [u'AllFilings'] 3618506 {u'OrganizationName': u'HARRIET AND HARMON KELLEY FOUNDATION FOR THE ARTS', u'ObjectId': u'201543159349100344', u'SubmittedOn': u'2016-02-09', u'DLN': u'93491315003445', u'LastUpdated': u'2016-08-19T01:03:07.6330786Z', u'TaxPeriod': u'201412', u'IsElectronic': True, u'FormType': u'990PF', u'IsAvailable': False, u'EIN': u'742661023'}
https://s3.amazonaws.com/irs-form-990/201543109349200219_public.xml
To get the variable descriptions you need to dig through the 'schemas', such as that here https://www.irs.gov/charities-non-profits/990-990-ez-990-pf-ty2014-v5-0-schema-business-rules-and-release-memo or here https://www.irs.gov/tax-professionals/e-file-providers-partners/modernized-e-file-mef-stylesheets
cd '/Users/gregorysaxton/Google Drive/SOX'
/Users/gregorysaxton/Google Drive/SOX
Depending on the project, I will store the data in SQLite or MongoDB. This time I'll use MongoDB -- it's great for storing JSON data where each observation could have different variables. Before we get to the interesting part the following code blocks set up the MongoDB environment and the new database we'll be using.
Note: In a terminal we'll have to start MongoDB by running the command mongod or sudo mongod. Then we run the following code block here to access MongoDB.
import pymongo
from pymongo import MongoClient
client = MongoClient()
# DEFINE MY mongoDB DATABASE
db = client['irs_990_db']
# DEFINE MY COLLECTION WHERE I'LL INSERT MY SEARCH
file_list = db['990_files']
Check how many observations in the database table.
file_list.count()
0
SET INDEX ON TABLE TO KEEP DUPLICATES FROM BEING INSERTED
#db.users_collection.create_index([('screen_name', pymongo.ASCENDING)], unique=True)
#db.users.create_index([('from_user_id', pymongo.ASCENDING), ('date', pymongo.ASCENDING)], unique=True)
db.file_list.create_index([('ObjectId', pymongo.ASCENDING)], unique=True)
list(db.file_list.index_information())
[]
#drop_index(index_or_name)¶
#SEE DATABASES
client.database_names()
#[u'local', u'twitter_results']
[u'admin', u'crowdflower', u'crowdflower_firm', u'crowdflower_firm_v2', u'CSR_2014_user_info', u'CSR_firm_user_info', u'ELH', u'favorites', u'irs_990_db', u'local', u'nonprofit_db', u'panamagate_user_info_API', u'politician_user_info', u'twitter_results']
As we saw above, the list of filings are in the AllFilings key index.json file. We see there are 3,618,506 filings.
filings = data['AllFilings']
print len(filings)
print filings[:2]
Let's insert all 3,618,506 filing details into the MongoDB collection at once.
#file_list.insert_many(filings)
<pymongo.results.InsertManyResult at 0x34c1399b0>
file_list.count()
3618506
f = open('2016 donor advisory EINs.json', 'r')
advisories_2016 = json.load(f)
print len(advisories_2016)
print advisories_2016[:10]
321 [nan, nan, u'311660730', u'432059591', u'320022295', u'870804117', nan, u'223871535', u'620476294', u'311016441']
print len([str(x) for x in advisories_2016 if str(x) != 'nan'])
advisories_2016 = [str(x) for x in advisories_2016 if str(x) != 'nan']
print len(advisories_2016)
print advisories_2016[:5]
256 256 ['311660730', '432059591', '320022295', '870804117', '223871535']
There are 1,077 filings for those 256 EINs
file_list.find({'EIN': { '$in': advisories_2016}}).count()
1077
# DEFINE MY COLLECTION (DATABASE TABLE) WHERE I'LL INSERT THE MESSAGES
filings = db['filings']
filings.count()
0
db.filings.create_index([('URL', pymongo.ASCENDING)], unique=True)
u'URL_1'
list(db.filings.index_information())
[u'_id_', u'URL_1']
First we'll write a function to turn an ordered dictionary (which is what is returned by xmltodict) into a normal Python dictionary so that we can combine it with the filing data gathered above.
from json import loads, dumps
from collections import OrderedDict
def to_dict(input_ordered_dict):
return loads(dumps(input_ordered_dict))
import xmltodict
import urllib2
import timeit
start_time = timeit.default_timer()
from IPython.display import display, clear_output ##### FOR USE WITH STDOUT (DYNAMIC, SINGLE-LINE PRINTING)
counter = 0
for f in file_list.find({'EIN': { '$in': advisories_2016}})[10:]:
counter += 1
#print f
#print f['_id']
#print f['EIN'] #, f['ObjectId']
print f
#print f['OrganizationName']
if 'URL' in f:
#print f['URL']
url = f['URL']
#url = 'https://s3.amazonaws.com/irs-form-990/201533159349302748_public.xml'
print url, '\n'
#f = requests.get(url)
#data = f.read()
#data = xmltodict.parse(data)
url_data = urllib2.urlopen(url)
f_string = url_data.read()
data = xmltodict.parse(f_string)
try:
if 'IRS990' in data['Return']['ReturnData']:
data = data['Return']['ReturnData']['IRS990']
#data['URL'] = url
#data['OrganizationName'] = f['OrganizationName']
#data['SubmittedOn'] = f['SubmittedOn']
data = to_dict(data)
##### COMBINE THE DICT OF FILING INFO FROM FIRST STEP WITH FILING DATA GATHERED HERE
c = {key: value for (key, value) in (f.items() + data.items())}
#c.pop('ObjectId', None)
c.pop('_id', None) #DROP 'id' (OR IT WILL NOT INSERT)
t = json.dumps(c)
print t
#print type(t) #<type 'str'>
loaded_entry = json.loads(t)
print type(loaded_entry) , loaded_entry #<type 'dict'>
try:
filings.insert_one(loaded_entry)
except pymongo.errors.DuplicateKeyError, e:
print e, '\n'
except KeyError:
continue # I just chose to continue. You can do anything here though
else:
print f['IsAvailable']
#print 'counter: ', counter
clear_output()
print ('counter: ', counter), '\n'
#print (data.keys()), '\n'
print (f), '\n'
print data
sys.stdout.flush()
elapsed = timeit.default_timer() - start_time
print '# of minutes: ', elapsed/60, '\n', '\n'
('counter: ', 1067) {u'OrganizationName': u'POP WARNER LITTLE SCHOLARS INC VISTA POP WARNER FOOTBALL AND CHEER', u'ObjectId': u'201631379349202768', u'SubmittedOn': u'2016-08-09', u'DLN': u'93492137027686', u'LastUpdated': u'2016-08-19T01:05:40.5300786Z', u'TaxPeriod': u'201412', u'IsElectronic': True, u'FormType': u'990EZ', u'_id': ObjectId('57ccb3283ffc5a55fe7f4e77'), u'IsAvailable': False, u'EIN': u'330807324'} {u'Form990ProvidedToGvrnBodyInd': u'1', u'TaxExemptBondsInd': u'0', u'IncludeFIN48FootnoteInd': {u'#text': u'1', u'@referenceDocumentId': u'RetDoc1040000001'}, u'PartialLiquidationInd': u'0', u'DeductibleNonCashContriInd': u'0', u'TotalGrossUBIAmt': u'0', u'SavingsAndTempCashInvstGrp': {u'EOYAmt': u'475000'}, u'ActivitiesConductedPrtshpInd': u'0', u'ElectionOfBoardMembersInd': u'0', u'MethodOfAccountingAccrualInd': u'X', u'WhistleblowerPolicyInd': u'1', u'GrantsToIndividualsInd': u'0', u'ReportProgramRelatedInvstInd': {u'#text': u'0', u'@referenceDocumentId': u'RetDoc1040000001'}, u'ForeignActivitiesInd': u'0', u'CompensationProcessOtherInd': u'1', u'InvestmentInJointVentureInd': u'0', u'VotingMembersIndependentCnt': u'7', u'PYBenefitsPaidToMembersAmt': u'0', u'TotalAssetsGrp': {u'BOYAmt': u'8435569', u'EOYAmt': u'8053630'}, u'MortgNotesPyblScrdInvstPropGrp': {u'BOYAmt': u'1302514', u'EOYAmt': u'1878871'}, u'WebsiteAddressTxt': u'WWW.JCHCC.ORG', u'BusinessRlnWithFamMemInd': u'0', u'OperateHospitalInd': u'0', u'ChangeToOrgDocumentsInd': u'0', u'ReconcilationRevenueExpnssAmt': u'759184', u'InformationTechnologyGrp': {u'TotalAmt': u'15677', u'ProgramServicesAmt': u'12592', u'ManagementAndGeneralAmt': u'3085'}, u'MembersOrStockholdersInd': u'0', u'CYOtherExpensesAmt': u'2759125', u'TotalLiabilitiesEOYAmt': u'2510564', u'GoverningBodyVotingMembersCnt': u'7', u'InfoInScheduleOPartIXInd': u'X', u'AnnualDisclosureCoveredPrsnInd': u'1', u'PYProgramServiceRevenueAmt': u'4976539', u'TotalFunctionalExpensesGrp': {u'TotalAmt': u'8169805', u'ProgramServicesAmt': u'6611724', u'ManagementAndGeneralAmt': u'1558081', u'FundraisingAmt': u'0'}, u'IndoorTanningServicesInd': u'0', u'DocumentRetentionPolicyInd': u'1', u'NetUnrelatedBusTxblIncmAmt': u'0', u'PYInvestmentIncomeAmt': u'27', u'NetAssetsOrFundBalancesEOYAmt': u'5543066', u'InvestmentIncomeGrp': {u'TotalRevenueColumnAmt': u'1051', u'ExclusionAmt': u'1051'}, u'MoreThan5000KToIndividualsInd': u'0', u'InfoInScheduleOPartVIIIInd': u'X', u'LobbyingActivitiesInd': u'0', u'DescribedInSection501c3Ind': {u'#text': u'1', u'@referenceDocumentId': u'RetDoc1039100001'}, u'PoliticalCampaignActyInd': u'0', u'FederalGrantAuditRequiredInd': u'1', u'Organization501c3Ind': u'X', u'Form990PartVIISectionAGrp': [{u'PersonNm': u'REV JOSEPH DYSON', u'IndividualTrusteeOrDirectorInd': u'X', u'AverageHoursPerWeekRt': u'2.00', u'ReportableCompFromRltdOrgAmt': u'0', u'OtherCompensationAmt': u'0', u'TitleTxt': u'BOARD MEMBER', u'ReportableCompFromOrgAmt': u'0'}, {u'PersonNm': u'AUDRY PARKER', u'IndividualTrusteeOrDirectorInd': u'X', u'AverageHoursPerWeekRt': u'2.00', u'ReportableCompFromRltdOrgAmt': u'0', u'OtherCompensationAmt': u'0', u'TitleTxt': u'BOARD MEMBER', u'ReportableCompFromOrgAmt': u'0'}, {u'PersonNm': u'WILLIE KERNER', u'IndividualTrusteeOrDirectorInd': u'X', u'AverageHoursPerWeekRt': u'2.00', u'ReportableCompFromRltdOrgAmt': u'0', u'OtherCompensationAmt': u'0', u'TitleTxt': u'BOARD MEMBER', u'ReportableCompFromOrgAmt': u'0'}, {u'PersonNm': u'LINDA BOSTICK', u'IndividualTrusteeOrDirectorInd': u'X', u'AverageHoursPerWeekRt': u'2.00', u'ReportableCompFromRltdOrgAmt': u'0', u'OtherCompensationAmt': u'0', u'TitleTxt': u'BOARD MEMBER', u'ReportableCompFromOrgAmt': u'0'}, {u'PersonNm': u'THADDEUS VALENTINE', u'IndividualTrusteeOrDirectorInd': u'X', u'AverageHoursPerWeekRt': u'2.00', u'ReportableCompFromRltdOrgAmt': u'0', u'OtherCompensationAmt': u'0', u'TitleTxt': u'VICE CHAIRMAN', u'OfficerInd': u'X', u'ReportableCompFromOrgAmt': u'0'}, {u'PersonNm': u'KAWANA RIPOLL', u'IndividualTrusteeOrDirectorInd': u'X', u'AverageHoursPerWeekRt': u'2.00', u'ReportableCompFromRltdOrgAmt': u'0', u'OtherCompensationAmt': u'0', u'TitleTxt': u'BOARD MEMBER', u'ReportableCompFromOrgAmt': u'0'}, {u'PersonNm': u'CYNTHIA BUTLER MCLINTRYE', u'IndividualTrusteeOrDirectorInd': u'X', u'AverageHoursPerWeekRt': u'2.00', u'ReportableCompFromRltdOrgAmt': u'0', u'OtherCompensationAmt': u'0', u'TitleTxt': u'BOARD MEMBER', u'ReportableCompFromOrgAmt': u'0'}, {u'PersonNm': u'BRIAN DAVIS', u'IndividualTrusteeOrDirectorInd': u'X', u'AverageHoursPerWeekRt': u'2.00', u'ReportableCompFromRltdOrgAmt': u'0', u'OtherCompensationAmt': u'0', u'TitleTxt': u'BOARD SECRETARY', u'OfficerInd': u'X', u'ReportableCompFromOrgAmt': u'0'}, {u'PersonNm': u'MICHAEL MCKENNA', u'IndividualTrusteeOrDirectorInd': u'X', u'AverageHoursPerWeekRt': u'2.00', u'ReportableCompFromRltdOrgAmt': u'0', u'OtherCompensationAmt': u'0', u'TitleTxt': u'BOARD CHAIRMAN', u'OfficerInd': u'X', u'ReportableCompFromOrgAmt': u'0'}, {u'PersonNm': u'LLOYD CLARK', u'IndividualTrusteeOrDirectorInd': u'X', u'AverageHoursPerWeekRt': u'2.00', u'ReportableCompFromRltdOrgAmt': u'0', u'OtherCompensationAmt': u'0', u'TitleTxt': u'TREASURER', u'OfficerInd': u'X', u'ReportableCompFromOrgAmt': u'0'}, {u'PersonNm': u'DIEDRE D HAYES', u'IndividualTrusteeOrDirectorInd': u'X', u'AverageHoursPerWeekRt': u'2.00', u'ReportableCompFromRltdOrgAmt': u'0', u'OtherCompensationAmt': u'0', u'TitleTxt': u'BOARD MEMBER', u'ReportableCompFromOrgAmt': u'0'}, {u'PersonNm': u'GWEN MEREDITH', u'IndividualTrusteeOrDirectorInd': u'X', u'AverageHoursPerWeekRt': u'2.00', u'ReportableCompFromRltdOrgAmt': u'0', u'OtherCompensationAmt': u'0', u'TitleTxt': u'BOARD MEMBER', u'ReportableCompFromOrgAmt': u'0'}, {u'PersonNm': u'CHAD BROWN', u'OtherCompensationAmt': u'4483', u'AverageHoursPerWeekRt': u'50.00', u'ReportableCompFromRltdOrgAmt': u'0', u'TitleTxt': u'CHIEF OPERATING OFFICER', u'OfficerInd': u'X', u'ReportableCompFromOrgAmt': u'37363'}, {u'PersonNm': u'DR CYROUS ARDALAN', u'OtherCompensationAmt': u'14948', u'AverageHoursPerWeekRt': u'40.00', u'ReportableCompFromRltdOrgAmt': u'0', u'TitleTxt': u'CHIEF DENTAL OFFICER', u'OfficerInd': u'X', u'ReportableCompFromOrgAmt': u'86884'}, {u'PersonNm': u'DRMICHELE KAUTZMAN', u'OtherCompensationAmt': u'27803', u'AverageHoursPerWeekRt': u'40.00', u'ReportableCompFromRltdOrgAmt': u'0', u'TitleTxt': u'DEPUTY MEDICAL DIRECTOR', u'OfficerInd': u'X', u'ReportableCompFromOrgAmt': u'149738'}, {u'PersonNm': u'SHANA MARCELIN', u'OtherCompensationAmt': u'11083', u'AverageHoursPerWeekRt': u'50.00', u'ReportableCompFromRltdOrgAmt': u'0', u'TitleTxt': u'CHIEF FINANCIAL OFFICER', u'OfficerInd': u'X', u'ReportableCompFromOrgAmt': u'100596'}, {u'PersonNm': u'DR JEREMY DUMAS', u'OtherCompensationAmt': u'9747', u'AverageHoursPerWeekRt': u'45.00', u'ReportableCompFromRltdOrgAmt': u'0', u'TitleTxt': u'CHIEF MEDICAL OFFICER', u'OfficerInd': u'X', u'ReportableCompFromOrgAmt': u'179872'}, {u'PersonNm': u'ANNETTE YOUNG', u'OtherCompensationAmt': u'12766', u'AverageHoursPerWeekRt': u'50.00', u'ReportableCompFromRltdOrgAmt': u'0', u'TitleTxt': u'DIRECTOR OF NURSING', u'OfficerInd': u'X', u'ReportableCompFromOrgAmt': u'72943'}, {u'PersonNm': u'DRSHONDRA WILLIAMS', u'OtherCompensationAmt': u'27453', u'AverageHoursPerWeekRt': u'55.00', u'ReportableCompFromRltdOrgAmt': u'0', u'TitleTxt': u'CHIEF EXECUTIVE OFFICER', u'OfficerInd': u'X', u'ReportableCompFromOrgAmt': u'144880'}, {u'PersonNm': u'DR JUDE HANEY', u'OtherCompensationAmt': u'13110', u'AverageHoursPerWeekRt': u'50.00', u'ReportableCompFromRltdOrgAmt': u'0', u'TitleTxt': u'DIRECTOR OF HEALTH INFORMA', u'OfficerInd': u'X', u'ReportableCompFromOrgAmt': u'82587'}, {u'PersonNm': u'KIM NGUYEN', u'OtherCompensationAmt': u'9000', u'AverageHoursPerWeekRt': u'45.00', u'ReportableCompFromRltdOrgAmt': u'0', u'TitleTxt': u'DEPUTY DENTAL DIRECTOR', u'OfficerInd': u'X', u'ReportableCompFromOrgAmt': u'135337'}, {u'PersonNm': u'DR JAYA WARRIER', u'OtherCompensationAmt': u'0', u'AverageHoursPerWeekRt': u'40.00', u'ReportableCompFromRltdOrgAmt': u'0', u'TitleTxt': u'PHYSICIAN', u'ReportableCompFromOrgAmt': u'162544', u'HighestCompensatedEmployeeInd': u'X'}, {u'PersonNm': u'DR MARIO KAKAZU', u'OtherCompensationAmt': u'0', u'AverageHoursPerWeekRt': u'40.00', u'ReportableCompFromRltdOrgAmt': u'0', u'TitleTxt': u'PHYSICIAN', u'ReportableCompFromOrgAmt': u'147008', u'HighestCompensatedEmployeeInd': u'X'}, {u'PersonNm': u'DR JEAN MATHIS', u'OtherCompensationAmt': u'0', u'AverageHoursPerWeekRt': u'40.00', u'ReportableCompFromRltdOrgAmt': u'0', u'TitleTxt': u'PHYSICIAN', u'ReportableCompFromOrgAmt': u'149901', u'HighestCompensatedEmployeeInd': u'X'}, {u'PersonNm': u'KIMBERLY RAYFORD', u'OtherCompensationAmt': u'0', u'AverageHoursPerWeekRt': u'40.00', u'ReportableCompFromRltdOrgAmt': u'0', u'TitleTxt': u'DENTIST', u'ReportableCompFromOrgAmt': u'111367', u'HighestCompensatedEmployeeInd': u'X'}, {u'PersonNm': u'GENELLE PRICE', u'OtherCompensationAmt': u'0', u'AverageHoursPerWeekRt': u'40.00', u'ReportableCompFromRltdOrgAmt': u'0', u'TitleTxt': u'DENTIST', u'ReportableCompFromOrgAmt': u'129179', u'HighestCompensatedEmployeeInd': u'X'}], u'ExpenseAmt': u'6611724', u'EngagedInExcessBenefitTransInd': u'0', u'OtherAssetsTotalGrp': {u'BOYAmt': u'13193', u'EOYAmt': u'10475'}, u'TaxablePartyNotificationInd': u'0', u'PrincipalOfficerNm': u'DR SHONDRA WILLIAMS', u'PYTotalExpensesAmt': u'8021492', u'ScheduleBRequiredInd': u'0', u'TotalReportableCompFromOrgAmt': u'1690199', u'SubjectToProxyTaxInd': u'0', u'OccupancyGrp': {u'TotalAmt': u'207572', u'ProgramServicesAmt': u'202435', u'ManagementAndGeneralAmt': u'5137'}, u'BusinessRlnWithOfficerEntInd': u'0', u'AuditCommitteeInd': u'1', u'TotalRevenueGrp': {u'UnrelatedBusinessRevenueAmt': u'0', u'TotalRevenueColumnAmt': u'8928989', u'ExclusionAmt': u'1051', u'RelatedOrExemptFuncIncomeAmt': u'5379477'}, u'PYSalariesCompEmpBnftPaidAmt': u'4736818', u'TotReportableCompRltdOrgAmt': u'0', u'TotalLiabilitiesBOYAmt': u'2413232', u'OtherChangesInNetAssetsAmt': u'-1238455', u'ScheduleJRequiredInd': {u'#text': u'1', u'@referenceDocumentId': u'RetDoc1042400001'}, u'CYTotalProfFndrsngExpnsAmt': u'0', u'ConflictOfInterestPolicyInd': u'1', u'TotalVolunteersCnt': u'12', u'TotalProgramServiceExpensesAmt': u'6611724', u'ContractorCompensationGrp': [{u'ContractorName': {u'BusinessName': {u'BusinessNameLine1Txt': u'WILLIAMS SCOTSMAN'}}, u'CompensationAmt': u'444034', u'ContractorAddress': {u'USAddress': {u'CityNm': u'CHICAGO', u'StateAbbreviationCd': u'IL', u'ZIPCd': u'60693', u'AddressLine1Txt': u'PO BOX 91975'}}, u'ServicesDesc': u'CONSTR. CONTRACTORS'}, {u'ContractorName': {u'BusinessName': {u'BusinessNameLine1Txt': u'SUCESS EHS'}}, u'CompensationAmt': u'184414', u'ContractorAddress': {u'USAddress': {u'CityNm': u'BIRMINGHAM', u'StateAbbreviationCd': u'AL', u'ZIPCd': u'35209', u'AddressLine1Txt': u'ONE METROPLEX DRIVE STE 500'}}, u'ServicesDesc': u'RECORDS/BILLING SOFTWARE'}, {u'ContractorName': {u'BusinessName': {u'BusinessNameLine1Txt': u'HENRY SHEIN'}}, u'CompensationAmt': u'137193', u'ContractorAddress': {u'USAddress': {u'CityNm': u'PALANTINE', u'StateAbbreviationCd': u'IL', u'ZIPCd': u'60055', u'AddressLine1Txt': u'DEPT CH 10541'}}, u'ServicesDesc': u'MEDICAL/DENTAL SUPPLIES'}, {u'ContractorName': {u'BusinessName': {u'BusinessNameLine1Txt': u'JANI KING'}}, u'CompensationAmt': u'135694', u'ContractorAddress': {u'USAddress': {u'CityNm': u'PONCHATOULA', u'StateAbbreviationCd': u'LA', u'ZIPCd': u'704543309', u'AddressLine1Txt': u'122 W PINE STREET'}}, u'ServicesDesc': u'JANITORIAL SUPPLIER'}, {u'ContractorName': {u'BusinessName': {u'BusinessNameLine1Txt': u'LABORATORY CORPORATION'}}, u'CompensationAmt': u'116856', u'ContractorAddress': {u'USAddress': {u'CityNm': u'BURLINGTON', u'StateAbbreviationCd': u'NC', u'ZIPCd': u'272162140', u'AddressLine1Txt': u'PO BOX 12140'}}, u'ServicesDesc': u'LABORATORY FEES'}], u'GrantToRelatedPersonInd': u'0', u'ConsolidatedAuditFinclStmtInd': {u'#text': u'0', u'@referenceDocumentId': u'RetDoc1040000001'}, u'AdvertisingGrp': {u'TotalAmt': u'176801', u'ProgramServicesAmt': u'123416', u'ManagementAndGeneralAmt': u'53385'}, u'PYOtherExpensesAmt': u'3284674', u'PYContributionsGrantsAmt': u'3500326', u'FormerOfcrEmployeesListedInd': u'0', u'CYBenefitsPaidToMembersAmt': u'0', u'ReportInvestmentsOtherSecInd': {u'#text': u'1', u'@referenceDocumentId': u'RetDoc1040000001'}, u'UnrelatedBusIncmOverLimitInd': u'0', u'ReportLandBuildingEquipmentInd': {u'#text': u'1', u'@referenceDocumentId': u'RetDoc1040000001'}, u'ScheduleORequiredInd': u'1', u'InvestmentsOtherSecuritiesGrp': {u'BOYAmt': u'1476350', u'EOYAmt': u'1502774'}, u'AccountantCompileOrReviewInd': u'0', u'GroupReturnForAffiliatesInd': u'0', u'CollectionsOfArtInd': {u'#text': u'0', u'@referenceDocumentId': u'RetDoc1040000001'}, u'TotalAssetsBOYAmt': u'8435569', u'DepreciationDepletionGrp': {u'TotalAmt': u'239217', u'ProgramServicesAmt': u'47843', u'ManagementAndGeneralAmt': u'191374'}, u'@referenceDocumentId': u'RetDoc1044400001', u'IndivRcvdGreaterThan100KCnt': u'10', u'LandBldgEquipBasisNetGrp': {u'BOYAmt': u'3079488', u'EOYAmt': u'3433754'}, u'CYTotalExpensesAmt': u'8169805', u'GrossReceiptsAmt': u'8928989', u'TotalProgramServiceRevenueAmt': u'5357093', u'CYContributionsGrantsAmt': u'3548461', u'SignificantChangeInd': u'0', u'CYInvestmentIncomeAmt': u'1051', u'ForeignFinancialAccountInd': u'0', u'TempOrPermanentEndowmentsInd': {u'#text': u'0', u'@referenceDocumentId': u'RetDoc1040000001'}, u'LandBldgEquipCostOrOtherBssAmt': u'4787960', u'CompDisqualPersonsGrp': {u'TotalAmt': u'5135215', u'ProgramServicesAmt': u'4356453', u'ManagementAndGeneralAmt': u'778762'}, u'RelatedEntityInd': u'0', u'TotalOtherCompensationAmt': u'130393', u'ReportOtherAssetsInd': {u'#text': u'0', u'@referenceDocumentId': u'RetDoc1040000001'}, u'OrganizationFollowsSFAS117Ind': u'X', u'ProgramServiceRevenueGrp': {u'TotalRevenueColumnAmt': u'5357093', u'BusinessCd': u'621400', u'RelatedOrExemptFuncIncomeAmt': u'5357093', u'Desc': u'SERVICE REVENUES'}, u'USAddress': {u'CityNm': u'MARRERO', u'StateAbbreviationCd': u'LA', u'ZIPCd': u'70072', u'AddressLine1Txt': u'1855 AMES BLVD'}, u'MinutesOfGoverningBodyInd': u'1', u'MaterialDiversionOrMisuseInd': u'0', u'TotalLiabilitiesGrp': {u'BOYAmt': u'2413232', u'EOYAmt': u'2510564'}, u'CYGrantsAndSimilarPaidAmt': u'0', u'EmployeeCnt': u'99', u'PYOtherRevenueAmt': u'16541', u'@documentId': u'RetDoc1038000001', u'OtherRevenueMiscGrp': [{u'TotalRevenueColumnAmt': u'13241', u'BusinessCd': u'900099', u'RelatedOrExemptFuncIncomeAmt': u'13241', u'Desc': u'RETURN ON FCC INVESTME'}, {u'TotalRevenueColumnAmt': u'9143', u'BusinessCd': u'900099', u'RelatedOrExemptFuncIncomeAmt': u'9143', u'Desc': u'MISCELLANEOUS INCOME'}], u'LandBldgEquipAccumDeprecAmt': u'1354206', u'TotalContributionsAmt': u'3548461', u'ActivityOrMissionDesc': u'PROVIDE HEALTHCARE SERVICES INCLUDING MEDICAL, DENTAL, MENTAL HEALTH, PODIATRY, PEDIATRICS, AND OBSTETRICS/GYNECOLOGY.', u'CYSalariesCompEmpBnftPaidAmt': u'5410680', u'SignificantNewProgramSrvcInd': u'0', u'Form8282PropertyDisposedOfInd': u'0', u'TerminateOperationsInd': u'0', u'TrnsfrExmptNonChrtblRltdOrgInd': u'0', u'PrepaidExpensesDefrdChargesGrp': {u'BOYAmt': u'67705', u'EOYAmt': u'80809'}, u'PYRevenuesLessExpensesAmt': u'471941', u'FederalGrantAuditPerformedInd': u'1', u'CntrctRcvdGreaterThan100KCnt': u'5', u'VotingMembersGoverningBodyCnt': u'7', u'GamingActivitiesInd': u'0', u'DisregardedEntityInd': u'0', u'ForeignOfficeInd': u'0', u'IndependentAuditFinclStmtInd': {u'#text': u'1', u'@referenceDocumentId': u'RetDoc1040000001'}, u'GovernmentGrantsAmt': u'3548461', u'TotalNetAssetsFundBalanceGrp': {u'BOYAmt': u'6022337', u'EOYAmt': u'5543066'}, u'DelegationOfMgmtDutiesInd': u'0', u'MinutesOfCommitteesInd': u'1', u'PYExcessBenefitTransInd': u'0', u'CompensationFromOtherSrcsInd': u'0', u'RevenueAmt': u'5379477', u'EmploymentTaxReturnsFiledInd': u'1', u'RelatedOrganizationCtrlEntInd': u'0', u'OtherExpensesGrp': [{u'TotalAmt': u'783622', u'ProgramServicesAmt': u'626478', u'ManagementAndGeneralAmt': u'157144', u'Desc': u'CONTRACT LABOR & SERVIC'}, {u'TotalAmt': u'378306', u'ProgramServicesAmt': u'371001', u'ManagementAndGeneralAmt': u'7305', u'Desc': u'SUPPLIES'}, {u'TotalAmt': u'241682', u'ProgramServicesAmt': u'241682', u'Desc': u'DUES & SUBSCRIPTIONS'}, {u'TotalAmt': u'235938', u'ProgramServicesAmt': u'228309', u'ManagementAndGeneralAmt': u'7629', u'Desc': u'TELEPHONE & UTILITIES'}], u'AccountsPayableAccrExpnssGrp': {u'BOYAmt': u'851967', u'EOYAmt': u'484571'}, u'IndependentVotingMemberCnt': u'7', u'InfoInScheduleOPartXIIInd': u'X', u'SchoolOperatingInd': u'0', u'PYGrantsAndSimilarPaidAmt': u'0', u'PYTotalRevenueAmt': u'8493433', u'ReportOtherLiabilitiesInd': {u'#text': u'0', u'@referenceDocumentId': u'RetDoc1040000001'}, u'TravelGrp': {u'TotalAmt': u'108590', u'ProgramServicesAmt': u'16793', u'ManagementAndGeneralAmt': u'91797'}, u'FormationYr': u'2004', u'GrantsToOrganizationsInd': u'0', u'AccountsReceivableGrp': {u'BOYAmt': u'1156367', u'EOYAmt': u'1860238'}, u'CompensationProcessCEOInd': u'1', u'TotalEmployeeCnt': u'99', u'CYRevenuesLessExpensesAmt': u'759184', u'PYTotalProfFndrsngExpnsAmt': u'0', u'InfoInScheduleOPartXIInd': u'X', u'FSAuditedBasisGrp': {u'SeparateBasisFinclStmtInd': u'X'}, u'InterestGrp': {u'TotalAmt': u'74855', u'ProgramServicesAmt': u'41170', u'ManagementAndGeneralAmt': u'33685'}, u'InsuranceGrp': {u'TotalAmt': u'89293', u'ProgramServicesAmt': u'88400', u'ManagementAndGeneralAmt': u'893'}, u'FundraisingActivitiesInd': u'0', u'AllOtherExpensesGrp': {u'TotalAmt': u'207572', u'ProgramServicesAmt': u'172513', u'ManagementAndGeneralAmt': u'35059'}, u'CompCurrentOfcrDirectorsGrp': {u'TotalAmt': u'275465', u'ProgramServicesAmt': u'82639', u'ManagementAndGeneralAmt': u'192826'}, u'ProfessionalFundraisingInd': u'0', u'LegalDomicileStateCd': u'LA', u'TotalCompGreaterThan150KInd': u'1', u'NetAssetsOrFundBalancesBOYAmt': u'6022337', u'DeductibleArtContributionInd': u'0', u'CYTotalFundraisingExpenseAmt': u'0', u'TotLiabNetAssetsFundBalanceGrp': {u'BOYAmt': u'8435569', u'EOYAmt': u'8053630'}, u'UnrestrictedNetAssetsGrp': {u'BOYAmt': u'6022337', u'EOYAmt': u'5543066'}, u'OtherRevenueTotalAmt': u'22384', u'DecisionsSubjectToApprovaInd': u'0', u'TotalAssetsEOYAmt': u'8053630', u'CashNonInterestBearingGrp': {u'BOYAmt': u'2642466', u'EOYAmt': u'690580'}, u'Desc': u'PROVIDED MEDICAL, DENTAL, MENTAL HEALTH, PODIATRY, PEDIATRICS, AND OBSTETRICS/GYNECOLOGY TO APPROXIMATELY 20,000 PATIENTS. APPROXIMATELY 50% OF THE PATIENTS TREATED WERE UNDER INSURED OR ON MEDICAID OR MEDICARE.', u'FSAuditedInd': u'1', u'CYOtherRevenueAmt': u'22384', u'LocalChaptersInd': u'0', u'FamilyOrBusinessRlnInd': u'0', u'CYTotalRevenueAmt': u'8928989', u'IRPDocumentCnt': u'24', u'DonorAdvisedFundInd': {u'#text': u'0', u'@referenceDocumentId': u'RetDoc1040000001'}, u'LoanOutstandingInd': u'0', u'NondeductibleContributionsInd': u'0', u'BusinessRlnWithOrgMemInd': u'0', u'MoreThan5000KToOrgInd': u'0', u'TypeOfOrganizationCorpInd': u'X', u'BooksInCareOfDetail': {u'PhoneNum': u'5047628931', u'USAddress': {u'CityNm': u'MARRERO', u'StateAbbreviationCd': u'LA', u'ZIPCd': u'70072', u'AddressLine1Txt': u'1855 AMES'}, u'BusinessName': {u'BusinessNameLine1Txt': u'DR SHONDRA WILLIAMS'}}, u'RegularMonitoringEnfrcInd': u'1', u'CYProgramServiceRevenueAmt': u'5357093', u'MissionDesc': u'PROVIDE HEALTHCARE SERVICES INCLUDING MEDICAL, DENTAL, MENTAL HEALTH, PODIATRY, PEDIATRICS, AND OBSTETRICS/GYNECOLOGY SERVICES TO THE UNDERINSURED & UNINSURED CITIZENS OF JEFFERSON PARISH, LOUISIANA.', u'UponRequestInd': u'X', u'CreditCounselingInd': {u'#text': u'0', u'@referenceDocumentId': u'RetDoc1040000001'}, u'OfficerMailingAddressInd': u'0', u'InfoInScheduleOPartVIInd': u'X', u'ProhibitedTaxShelterTransInd': u'0', u'DeferredRevenueGrp': {u'BOYAmt': u'258751', u'EOYAmt': u'147122'}, u'IRPDocumentW2GCnt': u'0', u'ConservationEasementsInd': {u'#text': u'0', u'@referenceDocumentId': u'RetDoc1040000001'}, u'QuidProQuoContributionsInd': u'0'} # of minutes: 4.70899313291
'''
import xmltodict
import urllib2
import timeit
start_time = timeit.default_timer()
from IPython.display import display, clear_output ##### FOR USE WITH STDOUT (DYNAMIC, SINGLE-LINE PRINTING)
counter = 0
for f in file_list.find({'EIN': { '$in': advisories_2016}})[:1]:
counter += 1
#print f
#print f['_id']
#print f['EIN'] #, f['ObjectId']
#print f
#print f['OrganizationName']
if 'URL' in f:
#print f['URL']
url = f['URL']
#url = 'https://s3.amazonaws.com/irs-form-990/201533159349302748_public.xml'
print url, '\n'
#f = requests.get(url)
#data = f.read()
#data = xmltodict.parse(data)
f = urllib2.urlopen(url)
f_string = f.read()
#print f
data = xmltodict.parse(f_string)
try:
if 'IRS990' in data['Return']['ReturnData']:
data = data['Return']['ReturnData']['IRS990']
data['URL'] = url
data['OrganizationName'] = f['OrganizationName']
data['SubmittedOn'] = f['SubmittedOn']
t = json.dumps(data)
print t
#print type(t) #<type 'str'>
loaded_entry = json.loads(t)
print type(loaded_entry) , loaded_entry #<type 'dict'>
try:
filings.insert_one(loaded_entry)
except pymongo.errors.DuplicateKeyError, e:
print e, '\n'
except KeyError:
continue # I just chose to continue. You can do anything here though
print data.keys(), '\n'
print data['Return'].keys(), '\n'
print data['Return']['ReturnData'].keys(), '\n'
print len(data['Return']['ReturnData']['IRS990'].keys()), '\n'
print data['Return']['ReturnData']['IRS990'].keys(), '\n'
#print data['Return']['ReturnData']['IRS990']
data = data['Return']['ReturnData']['IRS990']
data['URL'] = url
t = json.dumps(data)
print t
#print type(t) #<type 'str'>
loaded_entry = json.loads(t)
print type(loaded_entry) , loaded_entry #<type 'dict'>
try:
filings.insert_one(loaded_entry)
except pymongo.errors.DuplicateKeyError, e:
print e, '\n'
else:
print f['IsAvailable']
#print 'counter: ', counter
clear_output()
print ('counter: ', counter), '\n'
#print (data.keys()), '\n'
print (f)
sys.stdout.flush()
elapsed = timeit.default_timer() - start_time
print '# of minutes: ', elapsed/60, '\n', '\n'
'''
('counter: ', 1) <addinfourl at 7037063968 whose fp = <socket._fileobject object at 0x1a31b74d0>> # of minutes: 0.0232946316401
df = pd.DataFrame(list(filings.find()))
print '# of columns:', len(df.columns)
print '# of observations:', len(df)
df.head(1)
# of columns: 671 # of observations: 538
@documentId | @referenceDocumentId | @referenceDocumentName | @softwareId | @softwareVersion | @softwareVersionNum | AccountantCompileOrReview | AccountantCompileOrReviewBasis | AccountantCompileOrReviewInd | AccountsPayableAccrExpnssGrp | AccountsPayableAccruedExpenses | AccountsReceivable | AccountsReceivableGrp | AcctCompileOrReviewBasisGrp | ActivitiesConductedPartnership | ActivitiesConductedPrtshpInd | Activity2 | Activity3 | ActivityOrMissionDesc | ActivityOrMissionDescription | ActivityOther | AddressChange | AddressChangeInd | AddressPrincipalOfficerUS | Advertising | AdvertisingGrp | AllAffiliatesIncluded | AllAffiliatesIncludedInd | AllOtherContributions | AllOtherContributionsAmt | AllOtherExpenses | AllOtherExpensesGrp | AmendedReturn | AmendedReturnInd | AnnualDisclosureCoveredPersons | AnnualDisclosureCoveredPrsnInd | AuditCommittee | AuditCommitteeInd | AuditedFinancialStmtAttInd | BackupWthldComplianceInd | BalanceSheetAmountsReported | BenefitsPaidToMembersCY | BenefitsPaidToMembersPriorYear | BenefitsToMembers | BenefitsToMembersGrp | BooksInCareOfDetail | BsnssRltnshpThruFamilyMember | BsnssRltnshpWithOrganization | BusinessRlnWithFamMemInd | BusinessRlnWithOfficerEntInd | BusinessRlnWithOrgMemInd | CYBenefitsPaidToMembersAmt | CYContributionsGrantsAmt | CYGrantsAndSimilarPaidAmt | CYInvestmentIncomeAmt | CYOtherExpensesAmt | CYOtherRevenueAmt | CYProgramServiceRevenueAmt | CYRevenuesLessExpensesAmt | CYSalariesCompEmpBnftPaidAmt | CYTotalExpensesAmt | CYTotalFundraisingExpenseAmt | CYTotalProfFndrsngExpnsAmt | CYTotalRevenueAmt | CapStckTrstPrinCurrentFunds | CapStkTrPrinCurrentFundsGrp | CashNonInterestBearing | CashNonInterestBearingGrp | ChangeToOrgDocumentsInd | ChangesToOrganizingDocs | CntrbtnsRprtdFundraisingEvents | CntrctRcvdGreaterThan100KCnt | CollectionsOfArt | CollectionsOfArtInd | CompCurrentOfcrDirectorsGrp | CompCurrentOfficersDirectors | CompDisqualPersons | CompDisqualPersonsGrp | CompensationFromOtherSources | CompensationFromOtherSrcsInd | CompensationProcessCEO | CompensationProcessCEOInd | CompensationProcessOther | CompensationProcessOtherInd | ComplianceWithBackupWitholding | ConferencesMeetings | ConferencesMeetingsGrp | ConflictOfInterestPolicy | ConflictOfInterestPolicyInd | ConservationEasements | ConservationEasementsInd | ConsolidatedAuditFinancialStmt | ConsolidatedAuditFinclStmtInd | ContractorCompensation | ContractorCompensationGrp | ContriRptFundraisingEventAmt | ContributionsGrantsCurrentYear | ContributionsGrantsPriorYear | CostOfGoodsSold | CostOfGoodsSoldAmt | CreditCounseling | CreditCounselingInd | DAFExcessBusinessHoldingsInd | DLN | DecisionsSubjectToApprovaInd | DecisionsSubjectToApproval | DeductibleArtContributionInd | DeductibleContributionsOfArt | DeductibleNonCashContriInd | DeductibleNonCashContributions | DeferredRevenue | DeferredRevenueGrp | DelegationOfManagementDuties | DelegationOfMgmtDutiesInd | DepreciationDepletion | DepreciationDepletionGrp | Desc | DescribedIn501C3 | DescribedInSection501c3Ind | Description | DisregardedEntity | DisregardedEntityInd | DistributionToDonor | DistributionToDonorInd | DoNotFollowSFAS117 | DocumentRetentionPolicy | DocumentRetentionPolicyInd | DoingBusinessAs | DoingBusinessAsName | DonatedServicesAndUseFcltsAmt | DonorAdvisedFundInd | DonorAdvisedFunds | EIN | ElectionOfBoardMembers | ElectionOfBoardMembersInd | EmployeeCnt | EmploymentTaxReturnsFiled | EmploymentTaxReturnsFiledInd | EngagedInExcessBenefitTransInd | EscrowAccount | EscrowAccountInd | EscrowAccountLiability | EscrowAccountLiabilityGrp | ExcessBenefitTransaction | ExcessBusinessHoldings | ExcessBusinessHoldingsInd | Expense | ExpenseAmt | FSAudited | FSAuditedBasis | FSAuditedBasisGrp | FSAuditedInd | FamilyOrBusinessRelationship | FamilyOrBusinessRlnInd | FederalGrantAuditPerformed | FederalGrantAuditPerformedInd | FederalGrantAuditRequired | FederalGrantAuditRequiredInd | FederatedCampaigns | FederatedCampaignsAmt | FeesForServicesAccounting | FeesForServicesAccountingGrp | FeesForServicesInvstMgmntFees | FeesForServicesLegal | FeesForServicesLegalGrp | FeesForServicesLobbying | FeesForServicesLobbyingGrp | FeesForServicesManagement | FeesForServicesManagementGrp | FeesForServicesOther | FeesForServicesOtherGrp | FeesForServicesProfFundraising | FeesForSrvcInvstMgmntFeesGrp | FiledLieu1041 | FinancialStatementBoth | FinancialStatementConsolidated | FinancialStatementSeparate | FinancialStmtAttached | FollowSFAS117 | ForeignActivities | ForeignActivitiesInd | ForeignAddress | ForeignCountryCd | ForeignFinancialAccount | ForeignFinancialAccountInd | ForeignGrants | ForeignGrantsGrp | ForeignOffice | ForeignOfficeInd | Form1098CFiled | Form1098CFiledInd | Form720Filed | Form8282FiledCnt | Form8282PropertyDisposedOf | Form8282PropertyDisposedOfInd | Form8886TFiled | Form8886TFiledInd | Form8899Filed | Form8899Filedind | Form990-TFiled | Form990PartVIISectionA | Form990PartVIISectionAGrp | Form990ProvidedToGoverningBody | Form990ProvidedToGvrnBodyInd | Form990TFiledInd | FormType | FormationYr | FormerOfcrEmployeesListedInd | FormersListed | FundraisingActivities | FundraisingActivitiesInd | FundraisingAmt | FundraisingDirectExpenses | FundraisingDirectExpensesAmt | FundraisingEvents | FundraisingGrossIncomeAmt | FundsToPayPremiums | GainOrLoss | GainOrLossGrp | Gaming | GamingActivitiesInd | GoverningBodyVotingMembersCnt | GovernmentGrants | GovernmentGrantsAmt | GrantAmt | GrantToRelatedPerson | GrantToRelatedPersonInd | Grants | GrantsAndSimilarAmntsCY | GrantsAndSimilarAmntsPriorYear | GrantsPayable | GrantsPayableGrp | GrantsToDomesticIndividuals | GrantsToDomesticIndividualsGrp | GrantsToDomesticOrgs | GrantsToDomesticOrgsGrp | GrantsToIndividuals | GrantsToIndividualsInd | GrantsToOrganizations | GrantsToOrganizationsInd | GrossAmountSalesAssets | GrossAmountSalesAssetsGrp | GrossIncomeFundraisingEvents | GrossReceipts | GrossReceiptsAmt | GrossRents | GrossRentsGrp | GrossSalesOfInventory | GrossSalesOfInventoryAmt | GroupExemptionNum | GroupExemptionNumber | GroupReturnForAffiliates | GroupReturnForAffiliatesInd | Hospital | IRPDocumentCnt | IRPDocumentW2GCnt | IncludeFIN48FootnoteInd | IncmFromInvestBondProceedsGrp | IncomeFromInvestBondProceeds | IndependentAuditFinancialStmt | IndependentAuditFinclStmtInd | IndependentVotingMemberCnt | IndivRcvdGreaterThan100KCnt | IndoorTanningServices | IndoorTanningServicesInd | InfoInScheduleOPartIII | InfoInScheduleOPartIIIInd | InfoInScheduleOPartIX | InfoInScheduleOPartIXInd | InfoInScheduleOPartV | InfoInScheduleOPartVI | InfoInScheduleOPartVII | InfoInScheduleOPartVIII | InfoInScheduleOPartVIIIInd | InfoInScheduleOPartVIIInd | InfoInScheduleOPartVIInd | InfoInScheduleOPartVInd | InfoInScheduleOPartX | InfoInScheduleOPartXI | InfoInScheduleOPartXII | InfoInScheduleOPartXIIInd | InfoInScheduleOPartXIInd | InfoInScheduleOPartXInd | InformationTechnology | InformationTechnologyGrp | InitialReturn | Insurance | InsuranceGrp | IntangibleAssets | IntangibleAssetsGrp | Interest | InterestGrp | InventoriesForSaleOrUse | InventoriesForSaleOrUseGrp | InvestTaxExemptBonds | InvestTaxExemptBondsInd | InvestmentExpenseAmt | InvestmentInJointVenture | InvestmentInJointVentureInd | InvestmentIncome | InvestmentIncomeCurrentYear | InvestmentIncomeGrp | InvestmentIncomePriorYear | InvestmentsOtherSecurities | InvestmentsOtherSecuritiesGrp | InvestmentsProgramRelated | InvestmentsProgramRelatedGrp | InvestmentsPubTradedSecGrp | InvestmentsPubTradedSecurities | IsAvailable | IsElectronic | JointCosts | JointCostsInd | LandBldgEquipAccumDeprecAmt | LandBldgEquipBasisNetGrp | LandBldgEquipCostOrOtherBssAmt | LandBldgEquipmentAccumDeprec | LandBuildingsEquipmentBasis | LandBuildingsEquipmentBasisNet | LastUpdated | LegalDomicileStateCd | LessCostOthBasisSalesExpenses | LessCostOthBasisSalesExpnssGrp | LessRentalExpenses | LessRentalExpensesGrp | LicensedMoreThanOneState | LicensedMoreThanOneStateInd | LoanOutstandingInd | LoanToOfficerOrDQP | LoansFromOfficersDirectors | LoansFromOfficersDirectorsGrp | LobbyingActivities | LobbyingActivitiesInd | LocalChapters | LocalChaptersInd | MaterialDiversionOrMisuse | MaterialDiversionOrMisuseInd | MembersOrStockholders | MembersOrStockholdersInd | MembershipDues | MembershipDuesAmt | MethodOfAccountingAccrual | MethodOfAccountingAccrualInd | MethodOfAccountingCash | MethodOfAccountingCashInd | MethodOfAccountingOther | MethodOfAccountingOtherInd | MinutesOfCommittees | MinutesOfCommitteesInd | MinutesOfGoverningBody | MinutesOfGoverningBodyInd | MiscellaneousRevenue | MiscellaneousRevenueGrp | MissionDesc | MissionDescription | MoreThan5000KToIndividuals | MoreThan5000KToIndividualsInd | MoreThan5000KToOrgInd | MoreThan5000KToOrganizations | MortNotesPyblSecuredInvestProp | MortgNotesPyblScrdInvstPropGrp | NameOfForeignCountry | NameOfPrincipalOfficerBusiness | NameOfPrincipalOfficerPerson | NbrIndependentVotingMembers | NbrVotingGoverningBodyMembers | NbrVotingMembersGoverningBody | NetAssetsOrFundBalancesBOY | NetAssetsOrFundBalancesBOYAmt | NetAssetsOrFundBalancesEOY | NetAssetsOrFundBalancesEOYAmt | NetGainOrLossInvestments | NetGainOrLossInvestmentsGrp | NetIncmFromFundraisingEvtGrp | NetIncomeFromFundraisingEvents | NetIncomeFromGaming | NetIncomeFromGamingGrp | NetIncomeOrLoss | NetIncomeOrLossGrp | NetRentalIncomeOrLoss | NetRentalIncomeOrLossGrp | NetUnrelatedBusTxblIncmAmt | NetUnrelatedBusinessTxblIncome | NetUnrlzdGainsLossesInvstAmt | NoListedPersonsCompensated | NoListedPersonsCompensatedInd | NonDeductibleContributions | NonDeductibleDisclosure | NoncashContributions | NoncashContributionsAmt | NondeductibleContriDisclInd | NondeductibleContributionsInd | NumberFormsTransmittedWith1096 | NumberIndependentVotingMembers | NumberIndividualsGT100K | NumberOf8282Filed | NumberOfContractorsGT100K | NumberOfEmployees | NumberW2GIncluded | ObjectId | Occupancy | OccupancyGrp | OfficeExpenses | OfficeExpensesGrp | OfficerEntityWithBsnssRltnshp | OfficerMailingAddress | OfficerMailingAddressInd | OnBehalfOfIssuer | OnBehalfOfIssuerInd | OperateHospitalInd | OrgDoesNotFollowSFAS117Ind | OrgFiledInLieuOfForm1041Ind | Organization501c | Organization501c3 | Organization501c3Ind | Organization501cInd | OrganizationFollowsSFAS117Ind | OrganizationName | OthNotesLoansReceivableNetGrp | OtherAssetsTotal | OtherAssetsTotalGrp | OtherChangesInNetAssetsAmt | OtherEmployeeBenefits | OtherEmployeeBenefitsGrp | OtherExpensePriorYear | OtherExpenses | OtherExpensesCurrentYear | OtherExpensesGrp | OtherExplainInSchO | OtherInd | OtherLiabilities | OtherLiabilitiesGrp | OtherNotesLoansReceivableNet | OtherRevenueCurrentYear | OtherRevenueMisc | OtherRevenueMiscGrp | OtherRevenuePriorYear | OtherRevenueTotalAmt | OtherSalariesAndWages | OtherSalariesAndWagesGrp | OtherWebsite | OtherWebsiteInd | OwnWebsite | OwnWebsiteInd | PYBenefitsPaidToMembersAmt | PYContributionsGrantsAmt | PYExcessBenefitTransInd | PYGrantsAndSimilarPaidAmt | PYInvestmentIncomeAmt | PYOtherExpensesAmt | PYOtherRevenueAmt | PYProgramServiceRevenueAmt | PYRevenuesLessExpensesAmt | PYSalariesCompEmpBnftPaidAmt | PYTotalExpensesAmt | PYTotalProfFndrsngExpnsAmt | PYTotalRevenueAmt | PaidInCapSrplsLandBldgEqpFund | PartialLiquidation | PartialLiquidationInd | PayPremiumsPrsnlBnftCntrctInd | PaymentsToAffiliates | PaymentsToAffiliatesGrp | PayrollTaxes | PayrollTaxesGrp | PdInCapSrplsLandBldgEqpFundGrp | PensionPlanContributions | PensionPlanContributionsGrp | PermanentlyRestrictedNetAssets | PermanentlyRstrNetAssetsGrp | PledgesAndGrantsReceivable | PledgesAndGrantsReceivableGrp | PoliciesReferenceChapters | PoliciesReferenceChaptersInd | PoliticalActivities | PoliticalCampaignActyInd | PremiumsPaid | PrepaidExpensesDeferredCharges | PrepaidExpensesDefrdChargesGrp | PrincipalOfficerNm | PriorExcessBenefitTransaction | PriorPeriodAdjustmentsAmt | ProfessionalFundraising | ProfessionalFundraisingInd | ProgSrvcAccomActy2Grp | ProgSrvcAccomActy3Grp | ProgSrvcAccomActyOtherGrp | ProgramServiceRevenue | ProgramServiceRevenueCY | ProgramServiceRevenueGrp | ProgramServiceRevenuePriorYear | ProhibitedTaxShelterTrans | ProhibitedTaxShelterTransInd | PymtTravelEntrtnmntPubOfclGrp | QuidProQuoContriDisclInd | QuidProQuoContributions | QuidProQuoContributionsInd | QuidProQuoDisclosure | RcvFndsToPayPrsnlBnftCntrctInd | RcvblFromDisqualifiedPrsnGrp | ReceivablesFromDisqualPersons | ReceivablesFromOfficersEtc | ReceivablesFromOfficersEtcGrp | ReconcilationDonatedServices | ReconcilationInvestExpenses | ReconcilationOtherChanges | ReconcilationPriorAdjustment | ReconcilationRevenueExpenses | ReconcilationRevenueExpnssAmt | ReconciliationUnrealizedInvest | RegularMonitoringEnforcement | RegularMonitoringEnfrcInd | RelatedEntity | RelatedEntityInd | RelatedOrgControlledEntity | RelatedOrganizationCtrlEntInd | RelatedOrganizations | RelatedOrganizationsAmt | RentalIncomeOrLoss | RentalIncomeOrLossGrp | ReportFin48Footnote | ReportInvestOthSecurities | ReportInvestmentsOtherSecInd | ReportLandBldgEquip | ReportLandBuildingEquipmentInd | ReportOtherAssets | ReportOtherAssetsInd | ReportOtherLiabilities | ReportOtherLiabilitiesInd | ReportProgRelInvest | ReportProgramRelatedInvstInd | RetainedEarningsEndowmentEtc | Revenue | RevenueAmt | RevenuesLessExpensesCY | RevenuesLessExpensesPriorYear | Royalties | RoyaltiesGrp | RoyaltiesRevenue | RoyaltiesRevenueGrp | RtnEarnEndowmentIncmOthFndsGrp | SalariesEtcCurrentYear | SalariesEtcPriorYear | SavingsAndTempCashInvestments | SavingsAndTempCashInvstGrp | ScheduleBRequired | ScheduleBRequiredInd | ScheduleJRequired | ScheduleJRequiredInd | ScheduleORequired | ScheduleORequiredInd | School | SchoolOperatingInd | SignificantChange | SignificantChangeInd | SignificantNewProgramServices | SignificantNewProgramSrvcInd | StateLegalDomicile | StatesWhereCopyOfReturnIsFiled | StatesWhereCopyOfReturnIsFldCd | SubjectToProxyTax | SubjectToProxyTaxInd | SubmittedOn | TaxExemptBondLiabilities | TaxExemptBondLiabilitiesGrp | TaxExemptBonds | TaxExemptBondsInd | TaxPeriod | TaxableDistributions | TaxableDistributionsInd | TaxablePartyNotification | TaxablePartyNotificationInd | TempOrPermanentEndowmentsInd | TemporarilyRestrictedNetAssets | TemporarilyRstrNetAssetsGrp | TermOrPermanentEndowments | TerminateOperationsInd | Terminated | TerminationOrContraction | TheBooksAreInCareOf | TotLiabNetAssetsFundBalanceGrp | TotReportableCompRltdOrgAmt | TotalAssets | TotalAssetsBOY | TotalAssetsBOYAmt | TotalAssetsEOY | TotalAssetsEOYAmt | TotalAssetsGrp | TotalCompGT150K | TotalCompGreaterThan150KInd | TotalContributions | TotalContributionsAmt | TotalEmployeeCnt | TotalExpensesCurrentYear | TotalExpensesPriorYear | TotalFunctionalExpenses | TotalFunctionalExpensesGrp | TotalFundrsngExpCurrentYear | TotalGrossUBI | TotalGrossUBIAmt | TotalJointCosts | TotalJointCostsGrp | TotalLiabNetAssetsFundBalances | TotalLiabilities | TotalLiabilitiesBOY | TotalLiabilitiesBOYAmt | TotalLiabilitiesEOY | TotalLiabilitiesEOYAmt | TotalLiabilitiesGrp | TotalNbrEmployees | TotalNbrVolunteers | TotalNetAssetsFundBalanceGrp | TotalNetAssetsFundBalances | TotalOfOtherProgramServiceExp | TotalOfOtherProgramServiceGrnt | TotalOfOtherProgramServiceRev | TotalOthProgramServiceRevGrp | TotalOthProgramServiceRevenue | TotalOtherCompensation | TotalOtherCompensationAmt | TotalOtherProgSrvcExpenseAmt | TotalOtherProgSrvcGrantAmt | TotalOtherProgSrvcRevenueAmt | TotalOtherRevenue | TotalProfFundrsngExpCY | TotalProfFundrsngExpPriorYear | TotalProgramServiceExpense | TotalProgramServiceExpensesAmt | TotalProgramServiceRevenue | TotalProgramServiceRevenueAmt | TotalReportableCompFrmRltdOrgs | TotalReportableCompFromOrg | TotalReportableCompFromOrgAmt | TotalRevenue | TotalRevenueCurrentYear | TotalRevenueGrp | TotalRevenuePriorYear | TotalVolunteersCnt | TransactionRelatedEntity | TransactionWithControlEntInd | TransfersToExemptNonChrtblOrg | Travel | TravelEntrtnmntPublicOfficials | TravelGrp | TrnsfrExmptNonChrtblRltdOrgInd | TypeOfOrganizationCorpInd | TypeOfOrganizationCorporation | TypeOfOrganizationOther | TypeOfOrganizationOtherInd | TypeOfOrganizationTrust | TypeOfOrganizationTrustInd | URL | USAddress | UnrelatedBusIncmOverLimitInd | UnrelatedBusinessIncome | UnrestrictedNetAssets | UnrestrictedNetAssetsGrp | UnsecuredNotesLoansPayable | UnsecuredNotesLoansPayableGrp | UponRequest | UponRequestInd | VotingMembersGoverningBodyCnt | VotingMembersIndependentCnt | WebSite | WebsiteAddressTxt | WhistleblowerPolicy | WhistleblowerPolicyInd | WrittenPolicyOrProcedure | WrittenPolicyOrProcedureInd | YearFormation | _id | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | RetDoc1 | RetDoc2 | NaN | NaN | NaN | NaN | NaN | NaN | false | NaN | NaN | NaN | NaN | NaN | NaN | false | NaN | NaN | RAISE MONIES TO CONSTRUCT A FACILITY FOR YEAR-ROUND RECREATION PROGRAM DEDICATED TO CHILDREN AND ADULTS WITH SPECIAL NEEDS WHO RESIDE IN THE TOWN OF HEMPSTEAD. | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | {u'PhoneNum': u'5164316946', u'USAddress': {u'CityNm': u'LIDO BEACH', u'StateAbbreviationCd': u'NY', u'ZIPCd': u'11561', u'AddressLine1Txt': u'630 LIDO BLVD'}, u'BusinessName': {u'BusinessNameLine1Txt': u'ANCHOR BUILDING FUND INC'}} | NaN | NaN | false | false | false | 0 | 0 | 0 | 0 | 34400 | 325000 | 0 | 290600 | 0 | 34400 | 0 | 0 | 325000 | NaN | NaN | NaN | {u'EOYAmt': u'290600'} | false | NaN | NaN | NaN | NaN | false | NaN | NaN | NaN | NaN | NaN | false | NaN | false | NaN | false | NaN | NaN | NaN | NaN | false | NaN | false | NaN | false | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | false | NaN | 93493316047805 | false | NaN | false | NaN | false | NaN | NaN | NaN | NaN | false | NaN | NaN | THE ORGANIZATIONS EXEMPT PURPOSE WAS TO RAISE MONIES TO CONSTRUCT A FACILTIY TO OPERATE ITS YEAR-ROUND PROGRAM. | NaN | {u'#text': u'true', u'@referenceDocumentId': u'RetDoc3'} | NaN | NaN | false | NaN | NaN | NaN | NaN | false | NaN | NaN | NaN | false | NaN | 271317072 | NaN | false | 0 | NaN | NaN | false | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | false | NaN | false | NaN | NaN | NaN | NaN | NaN | NaN | NaN | {u'TotalAmt': u'9400', u'ManagementAndGeneralAmt': u'9400'} | NaN | NaN | {u'TotalAmt': u'25000', u'ManagementAndGeneralAmt': u'25000'} | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | false | NaN | NaN | NaN | false | NaN | NaN | NaN | false | NaN | false | NaN | NaN | NaN | false | NaN | NaN | NaN | false | NaN | NaN | [{u'PersonNm': u'JOHN MALLOY', u'TitleTxt': u'PRESIDENT', u'AverageHoursPerWeekRt': u'2.00', u'ReportableCompFromRltdOrgAmt': u'0', u'OtherCompensationAmt': u'0', u'OfficerInd': u'X', u'ReportableCompFromOrgAmt': u'0'}, {u'PersonNm': u'ANNA DRAGO... | NaN | false | NaN | 990 | 2009 | false | NaN | NaN | false | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | false | 2 | NaN | NaN | NaN | NaN | false | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | false | NaN | false | NaN | NaN | NaN | NaN | 325000 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | false | NaN | 2 | 0 | false | NaN | NaN | NaN | false | 2 | NaN | NaN | false | NaN | X | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | X | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | false | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | True | True | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2016-03-21T17:23:53 | NY | NaN | NaN | NaN | NaN | NaN | NaN | false | NaN | NaN | NaN | NaN | false | NaN | false | NaN | true | NaN | false | NaN | NaN | NaN | NaN | NaN | X | NaN | NaN | NaN | false | NaN | false | NaN | NaN | RAISE MONIES TO CONSTRUCT A FACILITY FOR YEAR-ROUND RECREATION PROGRAM DEDICATED TO CHILDREN AND ADULTS WITH SPECIAL NEEDS WHO RESIDE IN THE TOWN OF HEMPSTEAD. | NaN | NaN | false | false | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 290600 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | X | NaN | NaN | NaN | NaN | NaN | false | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 201503169349304780 | NaN | NaN | NaN | NaN | NaN | NaN | false | NaN | NaN | false | NaN | NaN | NaN | NaN | X | NaN | X | ANCHOR BUILDING FUND INC | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | {u'TotalRevenueColumnAmt': u'325000', u'ExclusionAmt': u'325000', u'Desc': u'SETTLEMENT'} | NaN | 325000 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | false | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | false | false | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | false | NaN | NaN | NaN | JOHN MALLOY | NaN | NaN | NaN | false | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | false | NaN | NaN | NaN | false | NaN | false | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 290600 | NaN | NaN | NaN | NaN | false | NaN | false | NaN | NaN | NaN | NaN | NaN | NaN | false | NaN | false | NaN | false | NaN | false | NaN | false | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | false | NaN | false | NaN | true | NaN | false | NaN | true | NaN | false | NaN | NaN | NY | NaN | false | 2016-01-30 | NaN | NaN | NaN | false | 201412 | NaN | NaN | NaN | false | false | NaN | NaN | NaN | false | NaN | NaN | NaN | {u'BOYAmt': u'0', u'EOYAmt': u'290600'} | NaN | NaN | NaN | NaN | NaN | 290600 | {u'BOYAmt': u'0', u'EOYAmt': u'290600'} | NaN | false | NaN | NaN | 0 | NaN | NaN | NaN | {u'TotalAmt': u'34400', u'ProgramServicesAmt': u'0', u'ManagementAndGeneralAmt': u'34400', u'FundraisingAmt': u'0'} | NaN | NaN | 0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0 | {u'BOYAmt': u'0', u'EOYAmt': u'0'} | NaN | NaN | {u'BOYAmt': u'0', u'EOYAmt': u'290600'} | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | {u'TotalRevenueColumnAmt': u'325000', u'ExclusionAmt': u'325000'} | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | false | X | NaN | NaN | NaN | NaN | NaN | https://s3.amazonaws.com/irs-form-990/201503169349304780_public.xml | {u'CityNm': u'LIDO BEACH', u'StateAbbreviationCd': u'NY', u'ZIPCd': u'11561', u'AddressLine1Txt': u'630 LIDO BLVD'} | false | NaN | NaN | {u'EOYAmt': u'290600'} | NaN | NaN | NaN | X | 2 | 2 | NaN | WWW.CAMPANCHOR.ORG | NaN | false | NaN | NaN | NaN | 57cce7823ffc5a55fe7f54b5 |
print df['WhistleblowerPolicy'].value_counts(), '\n'
print df['WhistleblowerPolicyInd'].value_counts(), '\n'
print df['ConflictOfInterestPolicy'].value_counts(), '\n'
print df['ConflictOfInterestPolicyInd'].value_counts(), '\n'
print df['DocumentRetentionPolicy'].value_counts(), '\n'
print df['DocumentRetentionPolicyInd'].value_counts(), '\n'
1 114 true 88 0 72 false 70 Name: WhistleblowerPolicy, dtype: int64 1 65 true 50 false 47 0 32 Name: WhistleblowerPolicyInd, dtype: int64 1 173 true 121 false 37 0 13 Name: ConflictOfInterestPolicy, dtype: int64 1 89 true 71 false 26 0 8 Name: ConflictOfInterestPolicyInd, dtype: int64 1 142 true 102 false 56 0 44 Name: DocumentRetentionPolicy, dtype: int64 1 82 true 57 false 40 0 15 Name: DocumentRetentionPolicyInd, dtype: int64
print df.columns.tolist()
[u'@documentId', u'@referenceDocumentId', u'@referenceDocumentName', u'@softwareId', u'@softwareVersion', u'@softwareVersionNum', u'AccountantCompileOrReview', u'AccountantCompileOrReviewBasis', u'AccountantCompileOrReviewInd', u'AccountsPayableAccrExpnssGrp', u'AccountsPayableAccruedExpenses', u'AccountsReceivable', u'AccountsReceivableGrp', u'AcctCompileOrReviewBasisGrp', u'ActivitiesConductedPartnership', u'ActivitiesConductedPrtshpInd', u'Activity2', u'Activity3', u'ActivityOrMissionDesc', u'ActivityOrMissionDescription', u'ActivityOther', u'AddressChange', u'AddressChangeInd', u'AddressPrincipalOfficerUS', u'Advertising', u'AdvertisingGrp', u'AllAffiliatesIncluded', u'AllAffiliatesIncludedInd', u'AllOtherContributions', u'AllOtherContributionsAmt', u'AllOtherExpenses', u'AllOtherExpensesGrp', u'AmendedReturn', u'AmendedReturnInd', u'AnnualDisclosureCoveredPersons', u'AnnualDisclosureCoveredPrsnInd', u'AuditCommittee', u'AuditCommitteeInd', u'AuditedFinancialStmtAttInd', u'BackupWthldComplianceInd', u'BalanceSheetAmountsReported', u'BenefitsPaidToMembersCY', u'BenefitsPaidToMembersPriorYear', u'BenefitsToMembers', u'BenefitsToMembersGrp', u'BooksInCareOfDetail', u'BsnssRltnshpThruFamilyMember', u'BsnssRltnshpWithOrganization', u'BusinessRlnWithFamMemInd', u'BusinessRlnWithOfficerEntInd', u'BusinessRlnWithOrgMemInd', u'CYBenefitsPaidToMembersAmt', u'CYContributionsGrantsAmt', u'CYGrantsAndSimilarPaidAmt', u'CYInvestmentIncomeAmt', u'CYOtherExpensesAmt', u'CYOtherRevenueAmt', u'CYProgramServiceRevenueAmt', u'CYRevenuesLessExpensesAmt', u'CYSalariesCompEmpBnftPaidAmt', u'CYTotalExpensesAmt', u'CYTotalFundraisingExpenseAmt', u'CYTotalProfFndrsngExpnsAmt', u'CYTotalRevenueAmt', u'CapStckTrstPrinCurrentFunds', u'CapStkTrPrinCurrentFundsGrp', u'CashNonInterestBearing', u'CashNonInterestBearingGrp', u'ChangeToOrgDocumentsInd', u'ChangesToOrganizingDocs', u'CntrbtnsRprtdFundraisingEvents', u'CntrctRcvdGreaterThan100KCnt', u'CollectionsOfArt', u'CollectionsOfArtInd', u'CompCurrentOfcrDirectorsGrp', u'CompCurrentOfficersDirectors', u'CompDisqualPersons', u'CompDisqualPersonsGrp', u'CompensationFromOtherSources', u'CompensationFromOtherSrcsInd', u'CompensationProcessCEO', u'CompensationProcessCEOInd', u'CompensationProcessOther', u'CompensationProcessOtherInd', u'ComplianceWithBackupWitholding', u'ConferencesMeetings', u'ConferencesMeetingsGrp', u'ConflictOfInterestPolicy', u'ConflictOfInterestPolicyInd', u'ConservationEasements', u'ConservationEasementsInd', u'ConsolidatedAuditFinancialStmt', u'ConsolidatedAuditFinclStmtInd', u'ContractorCompensation', u'ContractorCompensationGrp', u'ContriRptFundraisingEventAmt', u'ContributionsGrantsCurrentYear', u'ContributionsGrantsPriorYear', u'CostOfGoodsSold', u'CostOfGoodsSoldAmt', u'CreditCounseling', u'CreditCounselingInd', u'DAFExcessBusinessHoldingsInd', u'DLN', u'DecisionsSubjectToApprovaInd', u'DecisionsSubjectToApproval', u'DeductibleArtContributionInd', u'DeductibleContributionsOfArt', u'DeductibleNonCashContriInd', u'DeductibleNonCashContributions', u'DeferredRevenue', u'DeferredRevenueGrp', u'DelegationOfManagementDuties', u'DelegationOfMgmtDutiesInd', u'DepreciationDepletion', u'DepreciationDepletionGrp', u'Desc', u'DescribedIn501C3', u'DescribedInSection501c3Ind', u'Description', u'DisregardedEntity', u'DisregardedEntityInd', u'DistributionToDonor', u'DistributionToDonorInd', u'DoNotFollowSFAS117', u'DocumentRetentionPolicy', u'DocumentRetentionPolicyInd', u'DoingBusinessAs', u'DoingBusinessAsName', u'DonatedServicesAndUseFcltsAmt', u'DonorAdvisedFundInd', u'DonorAdvisedFunds', u'EIN', u'ElectionOfBoardMembers', u'ElectionOfBoardMembersInd', u'EmployeeCnt', u'EmploymentTaxReturnsFiled', u'EmploymentTaxReturnsFiledInd', u'EngagedInExcessBenefitTransInd', u'EscrowAccount', u'EscrowAccountInd', u'EscrowAccountLiability', u'EscrowAccountLiabilityGrp', u'ExcessBenefitTransaction', u'ExcessBusinessHoldings', u'ExcessBusinessHoldingsInd', u'Expense', u'ExpenseAmt', u'FSAudited', u'FSAuditedBasis', u'FSAuditedBasisGrp', u'FSAuditedInd', u'FamilyOrBusinessRelationship', u'FamilyOrBusinessRlnInd', u'FederalGrantAuditPerformed', u'FederalGrantAuditPerformedInd', u'FederalGrantAuditRequired', u'FederalGrantAuditRequiredInd', u'FederatedCampaigns', u'FederatedCampaignsAmt', u'FeesForServicesAccounting', u'FeesForServicesAccountingGrp', u'FeesForServicesInvstMgmntFees', u'FeesForServicesLegal', u'FeesForServicesLegalGrp', u'FeesForServicesLobbying', u'FeesForServicesLobbyingGrp', u'FeesForServicesManagement', u'FeesForServicesManagementGrp', u'FeesForServicesOther', u'FeesForServicesOtherGrp', u'FeesForServicesProfFundraising', u'FeesForSrvcInvstMgmntFeesGrp', u'FiledLieu1041', u'FinancialStatementBoth', u'FinancialStatementConsolidated', u'FinancialStatementSeparate', u'FinancialStmtAttached', u'FollowSFAS117', u'ForeignActivities', u'ForeignActivitiesInd', u'ForeignAddress', u'ForeignCountryCd', u'ForeignFinancialAccount', u'ForeignFinancialAccountInd', u'ForeignGrants', u'ForeignGrantsGrp', u'ForeignOffice', u'ForeignOfficeInd', u'Form1098CFiled', u'Form1098CFiledInd', u'Form720Filed', u'Form8282FiledCnt', u'Form8282PropertyDisposedOf', u'Form8282PropertyDisposedOfInd', u'Form8886TFiled', u'Form8886TFiledInd', u'Form8899Filed', u'Form8899Filedind', u'Form990-TFiled', u'Form990PartVIISectionA', u'Form990PartVIISectionAGrp', u'Form990ProvidedToGoverningBody', u'Form990ProvidedToGvrnBodyInd', u'Form990TFiledInd', u'FormType', u'FormationYr', u'FormerOfcrEmployeesListedInd', u'FormersListed', u'FundraisingActivities', u'FundraisingActivitiesInd', u'FundraisingAmt', u'FundraisingDirectExpenses', u'FundraisingDirectExpensesAmt', u'FundraisingEvents', u'FundraisingGrossIncomeAmt', u'FundsToPayPremiums', u'GainOrLoss', u'GainOrLossGrp', u'Gaming', u'GamingActivitiesInd', u'GoverningBodyVotingMembersCnt', u'GovernmentGrants', u'GovernmentGrantsAmt', u'GrantAmt', u'GrantToRelatedPerson', u'GrantToRelatedPersonInd', u'Grants', u'GrantsAndSimilarAmntsCY', u'GrantsAndSimilarAmntsPriorYear', u'GrantsPayable', u'GrantsPayableGrp', u'GrantsToDomesticIndividuals', u'GrantsToDomesticIndividualsGrp', u'GrantsToDomesticOrgs', u'GrantsToDomesticOrgsGrp', u'GrantsToIndividuals', u'GrantsToIndividualsInd', u'GrantsToOrganizations', u'GrantsToOrganizationsInd', u'GrossAmountSalesAssets', u'GrossAmountSalesAssetsGrp', u'GrossIncomeFundraisingEvents', u'GrossReceipts', u'GrossReceiptsAmt', u'GrossRents', u'GrossRentsGrp', u'GrossSalesOfInventory', u'GrossSalesOfInventoryAmt', u'GroupExemptionNum', u'GroupExemptionNumber', u'GroupReturnForAffiliates', u'GroupReturnForAffiliatesInd', u'Hospital', u'IRPDocumentCnt', u'IRPDocumentW2GCnt', u'IncludeFIN48FootnoteInd', u'IncmFromInvestBondProceedsGrp', u'IncomeFromInvestBondProceeds', u'IndependentAuditFinancialStmt', u'IndependentAuditFinclStmtInd', u'IndependentVotingMemberCnt', u'IndivRcvdGreaterThan100KCnt', u'IndoorTanningServices', u'IndoorTanningServicesInd', u'InfoInScheduleOPartIII', u'InfoInScheduleOPartIIIInd', u'InfoInScheduleOPartIX', u'InfoInScheduleOPartIXInd', u'InfoInScheduleOPartV', u'InfoInScheduleOPartVI', u'InfoInScheduleOPartVII', u'InfoInScheduleOPartVIII', u'InfoInScheduleOPartVIIIInd', u'InfoInScheduleOPartVIIInd', u'InfoInScheduleOPartVIInd', u'InfoInScheduleOPartVInd', u'InfoInScheduleOPartX', u'InfoInScheduleOPartXI', u'InfoInScheduleOPartXII', u'InfoInScheduleOPartXIIInd', u'InfoInScheduleOPartXIInd', u'InfoInScheduleOPartXInd', u'InformationTechnology', u'InformationTechnologyGrp', u'InitialReturn', u'Insurance', u'InsuranceGrp', u'IntangibleAssets', u'IntangibleAssetsGrp', u'Interest', u'InterestGrp', u'InventoriesForSaleOrUse', u'InventoriesForSaleOrUseGrp', u'InvestTaxExemptBonds', u'InvestTaxExemptBondsInd', u'InvestmentExpenseAmt', u'InvestmentInJointVenture', u'InvestmentInJointVentureInd', u'InvestmentIncome', u'InvestmentIncomeCurrentYear', u'InvestmentIncomeGrp', u'InvestmentIncomePriorYear', u'InvestmentsOtherSecurities', u'InvestmentsOtherSecuritiesGrp', u'InvestmentsProgramRelated', u'InvestmentsProgramRelatedGrp', u'InvestmentsPubTradedSecGrp', u'InvestmentsPubTradedSecurities', u'IsAvailable', u'IsElectronic', u'JointCosts', u'JointCostsInd', u'LandBldgEquipAccumDeprecAmt', u'LandBldgEquipBasisNetGrp', u'LandBldgEquipCostOrOtherBssAmt', u'LandBldgEquipmentAccumDeprec', u'LandBuildingsEquipmentBasis', u'LandBuildingsEquipmentBasisNet', u'LastUpdated', u'LegalDomicileStateCd', u'LessCostOthBasisSalesExpenses', u'LessCostOthBasisSalesExpnssGrp', u'LessRentalExpenses', u'LessRentalExpensesGrp', u'LicensedMoreThanOneState', u'LicensedMoreThanOneStateInd', u'LoanOutstandingInd', u'LoanToOfficerOrDQP', u'LoansFromOfficersDirectors', u'LoansFromOfficersDirectorsGrp', u'LobbyingActivities', u'LobbyingActivitiesInd', u'LocalChapters', u'LocalChaptersInd', u'MaterialDiversionOrMisuse', u'MaterialDiversionOrMisuseInd', u'MembersOrStockholders', u'MembersOrStockholdersInd', u'MembershipDues', u'MembershipDuesAmt', u'MethodOfAccountingAccrual', u'MethodOfAccountingAccrualInd', u'MethodOfAccountingCash', u'MethodOfAccountingCashInd', u'MethodOfAccountingOther', u'MethodOfAccountingOtherInd', u'MinutesOfCommittees', u'MinutesOfCommitteesInd', u'MinutesOfGoverningBody', u'MinutesOfGoverningBodyInd', u'MiscellaneousRevenue', u'MiscellaneousRevenueGrp', u'MissionDesc', u'MissionDescription', u'MoreThan5000KToIndividuals', u'MoreThan5000KToIndividualsInd', u'MoreThan5000KToOrgInd', u'MoreThan5000KToOrganizations', u'MortNotesPyblSecuredInvestProp', u'MortgNotesPyblScrdInvstPropGrp', u'NameOfForeignCountry', u'NameOfPrincipalOfficerBusiness', u'NameOfPrincipalOfficerPerson', u'NbrIndependentVotingMembers', u'NbrVotingGoverningBodyMembers', u'NbrVotingMembersGoverningBody', u'NetAssetsOrFundBalancesBOY', u'NetAssetsOrFundBalancesBOYAmt', u'NetAssetsOrFundBalancesEOY', u'NetAssetsOrFundBalancesEOYAmt', u'NetGainOrLossInvestments', u'NetGainOrLossInvestmentsGrp', u'NetIncmFromFundraisingEvtGrp', u'NetIncomeFromFundraisingEvents', u'NetIncomeFromGaming', u'NetIncomeFromGamingGrp', u'NetIncomeOrLoss', u'NetIncomeOrLossGrp', u'NetRentalIncomeOrLoss', u'NetRentalIncomeOrLossGrp', u'NetUnrelatedBusTxblIncmAmt', u'NetUnrelatedBusinessTxblIncome', u'NetUnrlzdGainsLossesInvstAmt', u'NoListedPersonsCompensated', u'NoListedPersonsCompensatedInd', u'NonDeductibleContributions', u'NonDeductibleDisclosure', u'NoncashContributions', u'NoncashContributionsAmt', u'NondeductibleContriDisclInd', u'NondeductibleContributionsInd', u'NumberFormsTransmittedWith1096', u'NumberIndependentVotingMembers', u'NumberIndividualsGT100K', u'NumberOf8282Filed', u'NumberOfContractorsGT100K', u'NumberOfEmployees', u'NumberW2GIncluded', u'ObjectId', u'Occupancy', u'OccupancyGrp', u'OfficeExpenses', u'OfficeExpensesGrp', u'OfficerEntityWithBsnssRltnshp', u'OfficerMailingAddress', u'OfficerMailingAddressInd', u'OnBehalfOfIssuer', u'OnBehalfOfIssuerInd', u'OperateHospitalInd', u'OrgDoesNotFollowSFAS117Ind', u'OrgFiledInLieuOfForm1041Ind', u'Organization501c', u'Organization501c3', u'Organization501c3Ind', u'Organization501cInd', u'OrganizationFollowsSFAS117Ind', u'OrganizationName', u'OthNotesLoansReceivableNetGrp', u'OtherAssetsTotal', u'OtherAssetsTotalGrp', u'OtherChangesInNetAssetsAmt', u'OtherEmployeeBenefits', u'OtherEmployeeBenefitsGrp', u'OtherExpensePriorYear', u'OtherExpenses', u'OtherExpensesCurrentYear', u'OtherExpensesGrp', u'OtherExplainInSchO', u'OtherInd', u'OtherLiabilities', u'OtherLiabilitiesGrp', u'OtherNotesLoansReceivableNet', u'OtherRevenueCurrentYear', u'OtherRevenueMisc', u'OtherRevenueMiscGrp', u'OtherRevenuePriorYear', u'OtherRevenueTotalAmt', u'OtherSalariesAndWages', u'OtherSalariesAndWagesGrp', u'OtherWebsite', u'OtherWebsiteInd', u'OwnWebsite', u'OwnWebsiteInd', u'PYBenefitsPaidToMembersAmt', u'PYContributionsGrantsAmt', u'PYExcessBenefitTransInd', u'PYGrantsAndSimilarPaidAmt', u'PYInvestmentIncomeAmt', u'PYOtherExpensesAmt', u'PYOtherRevenueAmt', u'PYProgramServiceRevenueAmt', u'PYRevenuesLessExpensesAmt', u'PYSalariesCompEmpBnftPaidAmt', u'PYTotalExpensesAmt', u'PYTotalProfFndrsngExpnsAmt', u'PYTotalRevenueAmt', u'PaidInCapSrplsLandBldgEqpFund', u'PartialLiquidation', u'PartialLiquidationInd', u'PayPremiumsPrsnlBnftCntrctInd', u'PaymentsToAffiliates', u'PaymentsToAffiliatesGrp', u'PayrollTaxes', u'PayrollTaxesGrp', u'PdInCapSrplsLandBldgEqpFundGrp', u'PensionPlanContributions', u'PensionPlanContributionsGrp', u'PermanentlyRestrictedNetAssets', u'PermanentlyRstrNetAssetsGrp', u'PledgesAndGrantsReceivable', u'PledgesAndGrantsReceivableGrp', u'PoliciesReferenceChapters', u'PoliciesReferenceChaptersInd', u'PoliticalActivities', u'PoliticalCampaignActyInd', u'PremiumsPaid', u'PrepaidExpensesDeferredCharges', u'PrepaidExpensesDefrdChargesGrp', u'PrincipalOfficerNm', u'PriorExcessBenefitTransaction', u'PriorPeriodAdjustmentsAmt', u'ProfessionalFundraising', u'ProfessionalFundraisingInd', u'ProgSrvcAccomActy2Grp', u'ProgSrvcAccomActy3Grp', u'ProgSrvcAccomActyOtherGrp', u'ProgramServiceRevenue', u'ProgramServiceRevenueCY', u'ProgramServiceRevenueGrp', u'ProgramServiceRevenuePriorYear', u'ProhibitedTaxShelterTrans', u'ProhibitedTaxShelterTransInd', u'PymtTravelEntrtnmntPubOfclGrp', u'QuidProQuoContriDisclInd', u'QuidProQuoContributions', u'QuidProQuoContributionsInd', u'QuidProQuoDisclosure', u'RcvFndsToPayPrsnlBnftCntrctInd', u'RcvblFromDisqualifiedPrsnGrp', u'ReceivablesFromDisqualPersons', u'ReceivablesFromOfficersEtc', u'ReceivablesFromOfficersEtcGrp', u'ReconcilationDonatedServices', u'ReconcilationInvestExpenses', u'ReconcilationOtherChanges', u'ReconcilationPriorAdjustment', u'ReconcilationRevenueExpenses', u'ReconcilationRevenueExpnssAmt', u'ReconciliationUnrealizedInvest', u'RegularMonitoringEnforcement', u'RegularMonitoringEnfrcInd', u'RelatedEntity', u'RelatedEntityInd', u'RelatedOrgControlledEntity', u'RelatedOrganizationCtrlEntInd', u'RelatedOrganizations', u'RelatedOrganizationsAmt', u'RentalIncomeOrLoss', u'RentalIncomeOrLossGrp', u'ReportFin48Footnote', u'ReportInvestOthSecurities', u'ReportInvestmentsOtherSecInd', u'ReportLandBldgEquip', u'ReportLandBuildingEquipmentInd', u'ReportOtherAssets', u'ReportOtherAssetsInd', u'ReportOtherLiabilities', u'ReportOtherLiabilitiesInd', u'ReportProgRelInvest', u'ReportProgramRelatedInvstInd', u'RetainedEarningsEndowmentEtc', u'Revenue', u'RevenueAmt', u'RevenuesLessExpensesCY', u'RevenuesLessExpensesPriorYear', u'Royalties', u'RoyaltiesGrp', u'RoyaltiesRevenue', u'RoyaltiesRevenueGrp', u'RtnEarnEndowmentIncmOthFndsGrp', u'SalariesEtcCurrentYear', u'SalariesEtcPriorYear', u'SavingsAndTempCashInvestments', u'SavingsAndTempCashInvstGrp', u'ScheduleBRequired', u'ScheduleBRequiredInd', u'ScheduleJRequired', u'ScheduleJRequiredInd', u'ScheduleORequired', u'ScheduleORequiredInd', u'School', u'SchoolOperatingInd', u'SignificantChange', u'SignificantChangeInd', u'SignificantNewProgramServices', u'SignificantNewProgramSrvcInd', u'StateLegalDomicile', u'StatesWhereCopyOfReturnIsFiled', u'StatesWhereCopyOfReturnIsFldCd', u'SubjectToProxyTax', u'SubjectToProxyTaxInd', u'SubmittedOn', u'TaxExemptBondLiabilities', u'TaxExemptBondLiabilitiesGrp', u'TaxExemptBonds', u'TaxExemptBondsInd', u'TaxPeriod', u'TaxableDistributions', u'TaxableDistributionsInd', u'TaxablePartyNotification', u'TaxablePartyNotificationInd', u'TempOrPermanentEndowmentsInd', u'TemporarilyRestrictedNetAssets', u'TemporarilyRstrNetAssetsGrp', u'TermOrPermanentEndowments', u'TerminateOperationsInd', u'Terminated', u'TerminationOrContraction', u'TheBooksAreInCareOf', u'TotLiabNetAssetsFundBalanceGrp', u'TotReportableCompRltdOrgAmt', u'TotalAssets', u'TotalAssetsBOY', u'TotalAssetsBOYAmt', u'TotalAssetsEOY', u'TotalAssetsEOYAmt', u'TotalAssetsGrp', u'TotalCompGT150K', u'TotalCompGreaterThan150KInd', u'TotalContributions', u'TotalContributionsAmt', u'TotalEmployeeCnt', u'TotalExpensesCurrentYear', u'TotalExpensesPriorYear', u'TotalFunctionalExpenses', u'TotalFunctionalExpensesGrp', u'TotalFundrsngExpCurrentYear', u'TotalGrossUBI', u'TotalGrossUBIAmt', u'TotalJointCosts', u'TotalJointCostsGrp', u'TotalLiabNetAssetsFundBalances', u'TotalLiabilities', u'TotalLiabilitiesBOY', u'TotalLiabilitiesBOYAmt', u'TotalLiabilitiesEOY', u'TotalLiabilitiesEOYAmt', u'TotalLiabilitiesGrp', u'TotalNbrEmployees', u'TotalNbrVolunteers', u'TotalNetAssetsFundBalanceGrp', u'TotalNetAssetsFundBalances', u'TotalOfOtherProgramServiceExp', u'TotalOfOtherProgramServiceGrnt', u'TotalOfOtherProgramServiceRev', u'TotalOthProgramServiceRevGrp', u'TotalOthProgramServiceRevenue', u'TotalOtherCompensation', u'TotalOtherCompensationAmt', u'TotalOtherProgSrvcExpenseAmt', u'TotalOtherProgSrvcGrantAmt', u'TotalOtherProgSrvcRevenueAmt', u'TotalOtherRevenue', u'TotalProfFundrsngExpCY', u'TotalProfFundrsngExpPriorYear', u'TotalProgramServiceExpense', u'TotalProgramServiceExpensesAmt', u'TotalProgramServiceRevenue', u'TotalProgramServiceRevenueAmt', u'TotalReportableCompFrmRltdOrgs', u'TotalReportableCompFromOrg', u'TotalReportableCompFromOrgAmt', u'TotalRevenue', u'TotalRevenueCurrentYear', u'TotalRevenueGrp', u'TotalRevenuePriorYear', u'TotalVolunteersCnt', u'TransactionRelatedEntity', u'TransactionWithControlEntInd', u'TransfersToExemptNonChrtblOrg', u'Travel', u'TravelEntrtnmntPublicOfficials', u'TravelGrp', u'TrnsfrExmptNonChrtblRltdOrgInd', u'TypeOfOrganizationCorpInd', u'TypeOfOrganizationCorporation', u'TypeOfOrganizationOther', u'TypeOfOrganizationOtherInd', u'TypeOfOrganizationTrust', u'TypeOfOrganizationTrustInd', u'URL', u'USAddress', u'UnrelatedBusIncmOverLimitInd', u'UnrelatedBusinessIncome', u'UnrestrictedNetAssets', u'UnrestrictedNetAssetsGrp', u'UnsecuredNotesLoansPayable', u'UnsecuredNotesLoansPayableGrp', u'UponRequest', u'UponRequestInd', u'VotingMembersGoverningBodyCnt', u'VotingMembersIndependentCnt', u'WebSite', u'WebsiteAddressTxt', u'WhistleblowerPolicy', u'WhistleblowerPolicyInd', u'WrittenPolicyOrProcedure', u'WrittenPolicyOrProcedureInd', u'YearFormation', u'_id']
#df2 = pd.DataFrame(list(filings.find()))
#print '# of columns:', len(df2.columns)
#print '# of observations:', len(df2)
#df2.head(1)
'''
df2[['CYContributionsGrantsAmt', 'FederatedCampaigns', 'FederatedCampaignsAmt',
'MembershipDues', 'MembershipDuesAmt',
'NetIncomeFromFundraisingEvents', 'NetIncmFromFundraisingEvtGrp',
'FundraisingActivities', 'FundraisingActivitiesInd', 'FundraisingAmt',
'FundraisingDirectExpenses', 'FundraisingDirectExpensesAmt', 'FundraisingEvents',
'FundraisingGrossIncomeAmt',
]][:5]
'RelatedOrganizations', 'RelatedOrganizationsAmt',
'GovernmentGrants', 'GovernmentGrantsAmt',
'PYProgramServiceRevenueAmt', 'CYProgramServiceRevenueAmt',
'ProgramServiceRevenue', 'ProgramServiceRevenueCY', 'ProgramServiceRevenueGrp',
'TotalProgramServiceRevenue', 'TotalProgramServiceRevenueAmt',
'CYInvestmentIncomeAmt', 'InvestmentIncome', 'InvestmentIncomeCurrentYear', 'InvestmentIncomeGrp',
'OtherRevenueTotalAmt', 'OtherRevenueMiscGrp',
'TotalOtherRevenue', 'CYOtherRevenueAmt'
]
'''
"\ndf2[['CYContributionsGrantsAmt', 'FederatedCampaigns', 'FederatedCampaignsAmt', \n 'MembershipDues', 'MembershipDuesAmt',\n 'NetIncomeFromFundraisingEvents', 'NetIncmFromFundraisingEvtGrp',\n 'FundraisingActivities', 'FundraisingActivitiesInd', 'FundraisingAmt', \n 'FundraisingDirectExpenses', 'FundraisingDirectExpensesAmt', 'FundraisingEvents', \n 'FundraisingGrossIncomeAmt',\n ]][:5]\n\n \n 'RelatedOrganizations', 'RelatedOrganizationsAmt',\n 'GovernmentGrants', 'GovernmentGrantsAmt',\n 'PYProgramServiceRevenueAmt', 'CYProgramServiceRevenueAmt',\n 'ProgramServiceRevenue', 'ProgramServiceRevenueCY', 'ProgramServiceRevenueGrp',\n 'TotalProgramServiceRevenue', 'TotalProgramServiceRevenueAmt', \n 'CYInvestmentIncomeAmt', 'InvestmentIncome', 'InvestmentIncomeCurrentYear', 'InvestmentIncomeGrp', \n 'OtherRevenueTotalAmt', 'OtherRevenueMiscGrp',\n 'TotalOtherRevenue', 'CYOtherRevenueAmt'\n ]\n"
'''
df2[[ 'TaxPeriod',
#'CYContributionsGrantsAmt', 'FederatedCampaigns', 'FederatedCampaignsAmt',
#'MembershipDues', 'MembershipDuesAmt',
'NetIncomeFromFundraisingEvents', 'NetIncmFromFundraisingEvtGrp',
#'FundraisingActivities',
#'FundraisingActivitiesInd',
'FundraisingAmt',
'FundraisingDirectExpenses', 'FundraisingDirectExpensesAmt', 'FundraisingEvents',
'FundraisingGrossIncomeAmt',
]][5:20]
'''
"\ndf2[[ 'TaxPeriod',\n #'CYContributionsGrantsAmt', 'FederatedCampaigns', 'FederatedCampaignsAmt', \n #'MembershipDues', 'MembershipDuesAmt',\n 'NetIncomeFromFundraisingEvents', 'NetIncmFromFundraisingEvtGrp',\n #'FundraisingActivities', \n #'FundraisingActivitiesInd', \n 'FundraisingAmt', \n 'FundraisingDirectExpenses', 'FundraisingDirectExpensesAmt', 'FundraisingEvents', \n 'FundraisingGrossIncomeAmt',\n ]][5:20]\n"
file_list_columns = ['OrganizationName', 'ObjectId', 'URL', 'SubmittedOn', 'DLN', 'LastUpdated', 'TaxPeriod',
'IsElectronic', 'FormType', '_id', 'IsAvailable', 'EIN']
SOX_columns = ['WhistleblowerPolicy', 'WhistleblowerPolicyInd',
'ConflictOfInterestPolicyInd', 'ConflictOfInterestPolicy',
'DocumentRetentionPolicy', 'DocumentRetentionPolicyInd',]
revenue_columns = ['CYTotalRevenueAmt', 'TotalRevenue', 'TotalRevenueCurrentYear']
efficiency_columns = ['CYTotalExpensesAmt', 'TotalExpensesCurrentYear',
'TotalFunctionalExpenses', 'TotalFunctionalExpensesGrp',
'TotalProgramServiceExpense', 'TotalProgramServiceExpensesAmt',]
#df['program_efficiency'] = df['program_expenses']/df['total_expenses']
complexity_columns = ['CYContributionsGrantsAmt', 'FederatedCampaigns', 'FederatedCampaignsAmt',
'MembershipDues', 'MembershipDuesAmt',
'NetIncomeFromFundraisingEvents', 'NetIncmFromFundraisingEvtGrp',
#'FundraisingActivities', 'FundraisingActivitiesInd',
'FundraisingAmt',
#'FundraisingDirectExpenses', 'FundraisingDirectExpensesAmt', 'FundraisingEvents',
'FundraisingGrossIncomeAmt',
'RelatedOrganizations', 'RelatedOrganizationsAmt',
'GovernmentGrants', 'GovernmentGrantsAmt',
'PYProgramServiceRevenueAmt', 'CYProgramServiceRevenueAmt',
'ProgramServiceRevenue', 'ProgramServiceRevenueCY', 'ProgramServiceRevenueGrp',
'TotalProgramServiceRevenue', 'TotalProgramServiceRevenueAmt',
'CYInvestmentIncomeAmt', 'InvestmentIncome', 'InvestmentIncomeCurrentYear', 'InvestmentIncomeGrp',
'OtherRevenueTotalAmt', 'OtherRevenueMiscGrp',
'TotalOtherRevenue', 'CYOtherRevenueAmt'
]
#[contributions_gifts_grants_2016', 'federated_campaigns_2016', 'membership_dues_2016',
# 'fundraising_events_2016', 'related_organizations_2016', 'government_grants_2016',
# 'program_service_revenue_2016', 'other_revenue_2016']
cols = file_list_columns+SOX_columns+revenue_columns+efficiency_columns+complexity_columns
#total_revenue_logged + program_efficiency + age + complexity_2011
df[cols][:2]
OrganizationName | ObjectId | URL | SubmittedOn | DLN | LastUpdated | TaxPeriod | IsElectronic | FormType | _id | IsAvailable | EIN | WhistleblowerPolicy | WhistleblowerPolicyInd | ConflictOfInterestPolicyInd | ConflictOfInterestPolicy | DocumentRetentionPolicy | DocumentRetentionPolicyInd | CYTotalRevenueAmt | TotalRevenue | TotalRevenueCurrentYear | CYTotalExpensesAmt | TotalExpensesCurrentYear | TotalFunctionalExpenses | TotalFunctionalExpensesGrp | TotalProgramServiceExpense | TotalProgramServiceExpensesAmt | CYContributionsGrantsAmt | FederatedCampaigns | FederatedCampaignsAmt | MembershipDues | MembershipDuesAmt | NetIncomeFromFundraisingEvents | NetIncmFromFundraisingEvtGrp | FundraisingAmt | FundraisingGrossIncomeAmt | RelatedOrganizations | RelatedOrganizationsAmt | GovernmentGrants | GovernmentGrantsAmt | PYProgramServiceRevenueAmt | CYProgramServiceRevenueAmt | ProgramServiceRevenue | ProgramServiceRevenueCY | ProgramServiceRevenueGrp | TotalProgramServiceRevenue | TotalProgramServiceRevenueAmt | CYInvestmentIncomeAmt | InvestmentIncome | InvestmentIncomeCurrentYear | InvestmentIncomeGrp | OtherRevenueTotalAmt | OtherRevenueMiscGrp | TotalOtherRevenue | CYOtherRevenueAmt | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | ANCHOR BUILDING FUND INC | 201503169349304780 | https://s3.amazonaws.com/irs-form-990/201503169349304780_public.xml | 2016-01-30 | 93493316047805 | 2016-03-21T17:23:53 | 201412 | True | 990 | 57cce7823ffc5a55fe7f54b5 | True | 271317072 | NaN | false | false | NaN | NaN | false | 325000 | NaN | NaN | 34400 | NaN | NaN | {u'TotalAmt': u'34400', u'ProgramServicesAmt': u'0', u'ManagementAndGeneralAmt': u'34400', u'FundraisingAmt': u'0'} | NaN | NaN | 0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0 | NaN | NaN | NaN | NaN | NaN | 0 | NaN | NaN | NaN | 325000 | {u'TotalRevenueColumnAmt': u'325000', u'ExclusionAmt': u'325000', u'Desc': u'SETTLEMENT'} | NaN | 325000 |
1 | INLAND BEHAVIORAL AND HEALTH SERVICES INC | 201501339349302635 | https://s3.amazonaws.com/irs-form-990/201501339349302635_public.xml | 2015-07-15 | 93493133026355 | 2016-03-21T17:23:53 | 201406 | True | 990 | 57cce87d3ffc5a55fe7f568c | True | 953246624 | NaN | true | true | NaN | NaN | true | 8825153 | NaN | NaN | 9016469 | NaN | NaN | {u'TotalAmt': u'9016469', u'ProgramServicesAmt': u'7477089', u'ManagementAndGeneralAmt': u'1539380', u'FundraisingAmt': u'0'} | NaN | 7477089 | 32937 | NaN | NaN | NaN | NaN | NaN | {u'TotalRevenueColumnAmt': u'0'} | NaN | NaN | NaN | NaN | NaN | NaN | 9457980 | 8758017 | NaN | NaN | [{u'TotalRevenueColumnAmt': u'265216', u'BusinessCd': u'624100', u'RelatedOrExemptFuncIncomeAmt': u'265216', u'Desc': u'Drug/Alcohol Treatment'}, {u'TotalRevenueColumnAmt': u'3987366', u'BusinessCd': u'624100', u'RelatedOrExemptFuncIncomeAmt': u'... | NaN | 8758017 | 107 | NaN | NaN | {u'TotalRevenueColumnAmt': u'107', u'ExclusionAmt': u'107'} | 34092 | [{u'TotalRevenueColumnAmt': u'7299', u'ExclusionAmt': u'7299', u'Desc': u'Gain on A/C ins. claim'}, {u'TotalRevenueColumnAmt': u'26793', u'ExclusionAmt': u'26793', u'Desc': u'Other income'}] | NaN | 34092 |
#file_list_columns = ['OrganizationName', 'ObjectId', 'URL', 'SubmittedOn', 'DLN', 'LastUpdated', 'TaxPeriod',
# 'IsElectronic', 'FormType', '_id', 'IsAvailable', 'EIN']
#cols = file_list_columns + ['MaterialDiversionOrMisuse', 'MaterialDiversionOrMisuseInd']
#df[cols][:1]
OrganizationName | ObjectId | URL | SubmittedOn | DLN | LastUpdated | TaxPeriod | IsElectronic | FormType | _id | IsAvailable | EIN | MaterialDiversionOrMisuse | MaterialDiversionOrMisuseInd | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | ANCHOR BUILDING FUND INC | 201503169349304780 | https://s3.amazonaws.com/irs-form-990/201503169349304780_public.xml | 2016-01-30 | 93493316047805 | 2016-03-21T17:23:53 | 201412 | True | 990 | 57cce7823ffc5a55fe7f54b5 | True | 271317072 | NaN | true |
#print df[cols[-1]].value_counts(), '\n'
#print df[cols[-2]].value_counts(), '\n'
0 93 false 89 true 8 1 4 Name: MaterialDiversionOrMisuseInd, dtype: int64 0 185 false 157 1 1 true 1 Name: MaterialDiversionOrMisuse, dtype: int64
152 unique EINs in dataframe.
print len(df['EIN'].tolist())
print len(set(df['EIN'].tolist()))
538 152
All except one are 990s.
df['FormType'].value_counts()
990 537 990EZ 1 Name: FormType, dtype: int64
print len(df)
print df['TaxPeriod'].value_counts().sum(), '\n'
print df['IsAvailable'].value_counts(), '\n'
print df['IsElectronic'].value_counts(), '\n'
538 538 True 538 Name: IsAvailable, dtype: int64 True 538 Name: IsElectronic, dtype: int64
df['FYE'] = 'FY' + df['TaxPeriod'].str[:4]
df['FYE'][:2]
0 FY2014 1 FY2014 Name: FYE, dtype: object
df['FYE'].value_counts()
FY2013 120 FY2014 117 FY2012 112 FY2011 97 FY2010 77 FY2015 15 Name: FYE, dtype: int64
file_list_columns = ['EIN', 'OrganizationName', 'URL', 'SubmittedOn', 'TaxPeriod',
#'ObjectId', '_id', 'IsAvailable', 'DLN', 'LastUpdated', 'IsElectronic', 'FormType',
]
SOX_columns = ['WhistleblowerPolicy', 'WhistleblowerPolicyInd',
'ConflictOfInterestPolicyInd', 'ConflictOfInterestPolicy',
'DocumentRetentionPolicy', 'DocumentRetentionPolicyInd',]
revenue_columns = ['CYTotalRevenueAmt', 'TotalRevenue', 'TotalRevenueCurrentYear']
efficiency_columns = ['CYTotalExpensesAmt', 'TotalExpensesCurrentYear',
'TotalFunctionalExpenses', 'TotalFunctionalExpensesGrp',
'TotalProgramServiceExpense', 'TotalProgramServiceExpensesAmt',]
#df['program_efficiency'] = df['program_expenses']/df['total_expenses']
complexity_columns = ['CYContributionsGrantsAmt', 'FederatedCampaigns', 'FederatedCampaignsAmt',
'MembershipDues', 'MembershipDuesAmt',
'NetIncomeFromFundraisingEvents', 'NetIncmFromFundraisingEvtGrp',
#'FundraisingActivities', 'FundraisingActivitiesInd',
'FundraisingAmt',
#'FundraisingDirectExpenses', 'FundraisingDirectExpensesAmt', 'FundraisingEvents',
'FundraisingGrossIncomeAmt',
'RelatedOrganizations', 'RelatedOrganizationsAmt',
'GovernmentGrants', 'GovernmentGrantsAmt',
'PYProgramServiceRevenueAmt', 'CYProgramServiceRevenueAmt',
'ProgramServiceRevenue', 'ProgramServiceRevenueCY', 'ProgramServiceRevenueGrp',
'TotalProgramServiceRevenue', 'TotalProgramServiceRevenueAmt',
'CYInvestmentIncomeAmt', 'InvestmentIncome', 'InvestmentIncomeCurrentYear', 'InvestmentIncomeGrp',
'OtherRevenueTotalAmt', 'OtherRevenueMiscGrp',
'TotalOtherRevenue', 'CYOtherRevenueAmt'
]
#[contributions_gifts_grants_2016', 'federated_campaigns_2016', 'membership_dues_2016',
# 'fundraising_events_2016', 'related_organizations_2016', 'government_grants_2016',
# 'program_service_revenue_2016', 'other_revenue_2016']
cols = file_list_columns+SOX_columns+revenue_columns+efficiency_columns+complexity_columns
#total_revenue_logged + program_efficiency + age + complexity_2011
df[cols][:2]
cols = ['FYE']+file_list_columns+SOX_columns+revenue_columns+efficiency_columns+complexity_columns
#total_revenue_logged + program_efficiency + age + complexity_2011
df[cols][:2]
FYE | EIN | OrganizationName | URL | SubmittedOn | TaxPeriod | WhistleblowerPolicy | WhistleblowerPolicyInd | ConflictOfInterestPolicyInd | ConflictOfInterestPolicy | DocumentRetentionPolicy | DocumentRetentionPolicyInd | CYTotalRevenueAmt | TotalRevenue | TotalRevenueCurrentYear | CYTotalExpensesAmt | TotalExpensesCurrentYear | TotalFunctionalExpenses | TotalFunctionalExpensesGrp | TotalProgramServiceExpense | TotalProgramServiceExpensesAmt | CYContributionsGrantsAmt | FederatedCampaigns | FederatedCampaignsAmt | MembershipDues | MembershipDuesAmt | NetIncomeFromFundraisingEvents | NetIncmFromFundraisingEvtGrp | FundraisingAmt | FundraisingGrossIncomeAmt | RelatedOrganizations | RelatedOrganizationsAmt | GovernmentGrants | GovernmentGrantsAmt | PYProgramServiceRevenueAmt | CYProgramServiceRevenueAmt | ProgramServiceRevenue | ProgramServiceRevenueCY | ProgramServiceRevenueGrp | TotalProgramServiceRevenue | TotalProgramServiceRevenueAmt | CYInvestmentIncomeAmt | InvestmentIncome | InvestmentIncomeCurrentYear | InvestmentIncomeGrp | OtherRevenueTotalAmt | OtherRevenueMiscGrp | TotalOtherRevenue | CYOtherRevenueAmt | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | FY2014 | 271317072 | ANCHOR BUILDING FUND INC | https://s3.amazonaws.com/irs-form-990/201503169349304780_public.xml | 2016-01-30 | 201412 | NaN | false | false | NaN | NaN | false | 325000 | NaN | NaN | 34400 | NaN | NaN | {u'TotalAmt': u'34400', u'ProgramServicesAmt': u'0', u'ManagementAndGeneralAmt': u'34400', u'FundraisingAmt': u'0'} | NaN | NaN | 0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0 | NaN | NaN | NaN | NaN | NaN | 0 | NaN | NaN | NaN | 325000 | {u'TotalRevenueColumnAmt': u'325000', u'ExclusionAmt': u'325000', u'Desc': u'SETTLEMENT'} | NaN | 325000 |
1 | FY2014 | 953246624 | INLAND BEHAVIORAL AND HEALTH SERVICES INC | https://s3.amazonaws.com/irs-form-990/201501339349302635_public.xml | 2015-07-15 | 201406 | NaN | true | true | NaN | NaN | true | 8825153 | NaN | NaN | 9016469 | NaN | NaN | {u'TotalAmt': u'9016469', u'ProgramServicesAmt': u'7477089', u'ManagementAndGeneralAmt': u'1539380', u'FundraisingAmt': u'0'} | NaN | 7477089 | 32937 | NaN | NaN | NaN | NaN | NaN | {u'TotalRevenueColumnAmt': u'0'} | NaN | NaN | NaN | NaN | NaN | NaN | 9457980 | 8758017 | NaN | NaN | [{u'TotalRevenueColumnAmt': u'265216', u'BusinessCd': u'624100', u'RelatedOrExemptFuncIncomeAmt': u'265216', u'Desc': u'Drug/Alcohol Treatment'}, {u'TotalRevenueColumnAmt': u'3987366', u'BusinessCd': u'624100', u'RelatedOrExemptFuncIncomeAmt': u'... | NaN | 8758017 | 107 | NaN | NaN | {u'TotalRevenueColumnAmt': u'107', u'ExclusionAmt': u'107'} | 34092 | [{u'TotalRevenueColumnAmt': u'7299', u'ExclusionAmt': u'7299', u'Desc': u'Gain on A/C ins. claim'}, {u'TotalRevenueColumnAmt': u'26793', u'ExclusionAmt': u'26793', u'Desc': u'Other income'}] | NaN | 34092 |
#cols = ['FYE'] + file_list_columns + ['MaterialDiversionOrMisuse', 'MaterialDiversionOrMisuseInd']
#df = df.sort_values(by=['EIN', 'FYE'], ascending=[1, 0])[cols]
df = df.sort_values(by=['EIN', 'FYE'], ascending=[1, 0])[cols]
print len(df)
df[:2]
538
FYE | EIN | OrganizationName | URL | SubmittedOn | TaxPeriod | WhistleblowerPolicy | WhistleblowerPolicyInd | ConflictOfInterestPolicyInd | ConflictOfInterestPolicy | DocumentRetentionPolicy | DocumentRetentionPolicyInd | CYTotalRevenueAmt | TotalRevenue | TotalRevenueCurrentYear | CYTotalExpensesAmt | TotalExpensesCurrentYear | TotalFunctionalExpenses | TotalFunctionalExpensesGrp | TotalProgramServiceExpense | TotalProgramServiceExpensesAmt | CYContributionsGrantsAmt | FederatedCampaigns | FederatedCampaignsAmt | MembershipDues | MembershipDuesAmt | NetIncomeFromFundraisingEvents | NetIncmFromFundraisingEvtGrp | FundraisingAmt | FundraisingGrossIncomeAmt | RelatedOrganizations | RelatedOrganizationsAmt | GovernmentGrants | GovernmentGrantsAmt | PYProgramServiceRevenueAmt | CYProgramServiceRevenueAmt | ProgramServiceRevenue | ProgramServiceRevenueCY | ProgramServiceRevenueGrp | TotalProgramServiceRevenue | TotalProgramServiceRevenueAmt | CYInvestmentIncomeAmt | InvestmentIncome | InvestmentIncomeCurrentYear | InvestmentIncomeGrp | OtherRevenueTotalAmt | OtherRevenueMiscGrp | TotalOtherRevenue | CYOtherRevenueAmt | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
252 | FY2012 | 030498214 | NEWARK NOW INC | https://s3.amazonaws.com/irs-form-990/201303199349303505_public.xml | 2013-12-31 | 201212 | false | NaN | NaN | true | true | NaN | NaN | {u'TotalRevenueColumn': u'2515399', u'ExclusionAmount': u'2063'} | 2515399 | NaN | 2100922 | {u'ManagementAndGeneral': u'362113', u'ProgramServices': u'1598626', u'Fundraising': u'140183', u'Total': u'2100922'} | NaN | 1598626 | NaN | NaN | NaN | NaN | NaN | NaN | {u'TotalRevenueColumn': u'0'} | NaN | NaN | NaN | NaN | NaN | 185408 | NaN | NaN | NaN | NaN | 0 | NaN | 0 | NaN | NaN | {u'TotalRevenueColumn': u'63', u'ExclusionAmount': u'63'} | 63 | NaN | NaN | NaN | 0 | NaN |
433 | FY2011 | 030498214 | NEWARK NOW INC | https://s3.amazonaws.com/irs-form-990/201320249349300417_public.xml | 2013-02-14 | 201112 | false | NaN | NaN | true | true | NaN | NaN | {u'TotalRevenueColumn': u'2161209'} | 2161209 | NaN | 2481310 | {u'ManagementAndGeneral': u'527949', u'ProgramServices': u'1715793', u'Fundraising': u'237568', u'Total': u'2481310'} | NaN | 1715793 | NaN | NaN | NaN | NaN | NaN | NaN | {u'TotalRevenueColumn': u'0'} | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0 | NaN | 0 | NaN | NaN | {u'TotalRevenueColumn': u'0'} | 0 | NaN | NaN | NaN | 0 | NaN |
#print len(df)
#df[:2]
538
FYE | OrganizationName | ObjectId | URL | SubmittedOn | DLN | LastUpdated | TaxPeriod | IsElectronic | FormType | _id | IsAvailable | EIN | MaterialDiversionOrMisuse | MaterialDiversionOrMisuseInd | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
252 | FY2012 | NEWARK NOW INC | 201303199349303505 | https://s3.amazonaws.com/irs-form-990/201303199349303505_public.xml | 2013-12-31 | 93493319035053 | 2016-03-21T17:23:53 | 201212 | True | 990 | 57cce8673ffc5a55fe7f5664 | True | 030498214 | false | NaN |
433 | FY2011 | NEWARK NOW INC | 201320249349300417 | https://s3.amazonaws.com/irs-form-990/201320249349300417_public.xml | 2013-02-14 | 93493024004173 | 2016-03-21T17:23:53 | 201112 | True | 990 | 57cce7d63ffc5a55fe7f554d | True | 030498214 | false | NaN |
print df[revenue_columns[0]].value_counts().sum()
print df[revenue_columns[1]].value_counts().sum()
print df[revenue_columns[2]].value_counts().sum()
194 344 344
df[revenue_columns][:5]
CYTotalRevenueAmt | TotalRevenue | TotalRevenueCurrentYear | |
---|---|---|---|
252 | NaN | {u'TotalRevenueColumn': u'2515399', u'ExclusionAmount': u'2063'} | 2515399 |
433 | NaN | {u'TotalRevenueColumn': u'2161209'} | 2161209 |
78 | NaN | {u'TotalRevenueColumn': u'3990564'} | 3990564 |
333 | 22995526 | NaN | NaN |
12 | NaN | {u'TotalRevenueColumn': u'25923449', u'RelatedOrExemptFunctionIncome': u'451157', u'ExclusionAmount': u'1549366'} | 25923449 |
df['tot_rev'] = np.nan
print len(df[df['tot_rev'].notnull()])
df['tot_rev'] = np.where(df['TotalRevenueCurrentYear'].notnull(), df['TotalRevenueCurrentYear'], df['tot_rev'] )
print len(df[df['tot_rev'].notnull()])
df['tot_rev'] = np.where( ((df['tot_rev'].isnull()) & (df['CYTotalRevenueAmt'].notnull())),
df['CYTotalRevenueAmt'], df['tot_rev'] )
print len(df[df['tot_rev'].notnull()])
df['tot_rev'] = df['tot_rev'].astype('int')
print len(df[df['tot_rev'].notnull()])
0 344 538 538
print df.columns.tolist()
['FYE', u'EIN', u'OrganizationName', u'URL', u'SubmittedOn', u'TaxPeriod', u'WhistleblowerPolicy', u'WhistleblowerPolicyInd', u'ConflictOfInterestPolicyInd', u'ConflictOfInterestPolicy', u'DocumentRetentionPolicy', u'DocumentRetentionPolicyInd', u'CYTotalRevenueAmt', u'TotalRevenue', u'TotalRevenueCurrentYear', u'CYTotalExpensesAmt', u'TotalExpensesCurrentYear', u'TotalFunctionalExpenses', u'TotalFunctionalExpensesGrp', u'TotalProgramServiceExpense', u'TotalProgramServiceExpensesAmt', u'CYContributionsGrantsAmt', u'FederatedCampaigns', u'FederatedCampaignsAmt', u'MembershipDues', u'MembershipDuesAmt', u'NetIncomeFromFundraisingEvents', u'NetIncmFromFundraisingEvtGrp', u'FundraisingAmt', u'FundraisingGrossIncomeAmt', u'RelatedOrganizations', u'RelatedOrganizationsAmt', u'GovernmentGrants', u'GovernmentGrantsAmt', u'PYProgramServiceRevenueAmt', u'CYProgramServiceRevenueAmt', u'ProgramServiceRevenue', u'ProgramServiceRevenueCY', u'ProgramServiceRevenueGrp', u'TotalProgramServiceRevenue', u'TotalProgramServiceRevenueAmt', u'CYInvestmentIncomeAmt', u'InvestmentIncome', u'InvestmentIncomeCurrentYear', u'InvestmentIncomeGrp', u'OtherRevenueTotalAmt', u'OtherRevenueMiscGrp', u'TotalOtherRevenue', u'CYOtherRevenueAmt', 'tot_rev']
new_cols = ['FYE', 'EIN', 'OrganizationName', 'URL', 'SubmittedOn', 'TaxPeriod',
#'CYTotalRevenueAmt', 'TotalRevenue', 'TotalRevenueCurrentYear',
'tot_rev',
'WhistleblowerPolicy', 'WhistleblowerPolicyInd',
'ConflictOfInterestPolicyInd', 'ConflictOfInterestPolicy',
'DocumentRetentionPolicy', 'DocumentRetentionPolicyInd',
'CYTotalExpensesAmt', 'TotalExpensesCurrentYear',
'TotalFunctionalExpenses', 'TotalFunctionalExpensesGrp',
'TotalProgramServiceExpense', 'TotalProgramServiceExpensesAmt',
'CYContributionsGrantsAmt',
'FederatedCampaigns', 'FederatedCampaignsAmt',
'MembershipDues', 'MembershipDuesAmt',
'NetIncomeFromFundraisingEvents', 'NetIncmFromFundraisingEvtGrp',
'FundraisingAmt', 'FundraisingGrossIncomeAmt',
'RelatedOrganizations', 'RelatedOrganizationsAmt',
'GovernmentGrants', 'GovernmentGrantsAmt',
'PYProgramServiceRevenueAmt', 'CYProgramServiceRevenueAmt', 'ProgramServiceRevenue',
'ProgramServiceRevenueCY', 'ProgramServiceRevenueGrp', 'TotalProgramServiceRevenue',
'TotalProgramServiceRevenueAmt',
'CYInvestmentIncomeAmt', 'InvestmentIncome', 'InvestmentIncomeCurrentYear', 'InvestmentIncomeGrp',
'OtherRevenueTotalAmt', 'OtherRevenueMiscGrp', 'TotalOtherRevenue', 'CYOtherRevenueAmt']
df[new_cols][:5]
FYE | EIN | OrganizationName | URL | SubmittedOn | TaxPeriod | tot_rev | WhistleblowerPolicy | WhistleblowerPolicyInd | ConflictOfInterestPolicyInd | ConflictOfInterestPolicy | DocumentRetentionPolicy | DocumentRetentionPolicyInd | CYTotalExpensesAmt | TotalExpensesCurrentYear | TotalFunctionalExpenses | TotalFunctionalExpensesGrp | TotalProgramServiceExpense | TotalProgramServiceExpensesAmt | CYContributionsGrantsAmt | FederatedCampaigns | FederatedCampaignsAmt | MembershipDues | MembershipDuesAmt | NetIncomeFromFundraisingEvents | NetIncmFromFundraisingEvtGrp | FundraisingAmt | FundraisingGrossIncomeAmt | RelatedOrganizations | RelatedOrganizationsAmt | GovernmentGrants | GovernmentGrantsAmt | PYProgramServiceRevenueAmt | CYProgramServiceRevenueAmt | ProgramServiceRevenue | ProgramServiceRevenueCY | ProgramServiceRevenueGrp | TotalProgramServiceRevenue | TotalProgramServiceRevenueAmt | CYInvestmentIncomeAmt | InvestmentIncome | InvestmentIncomeCurrentYear | InvestmentIncomeGrp | OtherRevenueTotalAmt | OtherRevenueMiscGrp | TotalOtherRevenue | CYOtherRevenueAmt | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
252 | FY2012 | 030498214 | NEWARK NOW INC | https://s3.amazonaws.com/irs-form-990/201303199349303505_public.xml | 2013-12-31 | 201212 | 2515399 | false | NaN | NaN | true | true | NaN | NaN | 2100922 | {u'ManagementAndGeneral': u'362113', u'ProgramServices': u'1598626', u'Fundraising': u'140183', u'Total': u'2100922'} | NaN | 1598626 | NaN | NaN | NaN | NaN | NaN | NaN | {u'TotalRevenueColumn': u'0'} | NaN | NaN | NaN | NaN | NaN | 185408 | NaN | NaN | NaN | NaN | 0 | NaN | 0 | NaN | NaN | {u'TotalRevenueColumn': u'63', u'ExclusionAmount': u'63'} | 63 | NaN | NaN | NaN | 0 | NaN |
433 | FY2011 | 030498214 | NEWARK NOW INC | https://s3.amazonaws.com/irs-form-990/201320249349300417_public.xml | 2013-02-14 | 201112 | 2161209 | false | NaN | NaN | true | true | NaN | NaN | 2481310 | {u'ManagementAndGeneral': u'527949', u'ProgramServices': u'1715793', u'Fundraising': u'237568', u'Total': u'2481310'} | NaN | 1715793 | NaN | NaN | NaN | NaN | NaN | NaN | {u'TotalRevenueColumn': u'0'} | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0 | NaN | 0 | NaN | NaN | {u'TotalRevenueColumn': u'0'} | 0 | NaN | NaN | NaN | 0 | NaN |
78 | FY2010 | 030498214 | NEWARK NOW INC | https://s3.amazonaws.com/irs-form-990/201220909349300327_public.xml | 2012-05-25 | 201012 | 3990564 | false | NaN | NaN | true | true | NaN | NaN | 3912643 | {u'ManagementAndGeneral': u'908524', u'ProgramServices': u'2930785', u'Fundraising': u'73334', u'Total': u'3912643'} | NaN | 2930785 | NaN | NaN | NaN | NaN | NaN | NaN | {u'TotalRevenueColumn': u'0'} | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0 | NaN | 0 | NaN | NaN | {u'TotalRevenueColumn': u'0'} | 0 | NaN | NaN | NaN | 0 | NaN |
333 | FY2014 | 042129889 | SCHEPENS EYE RESEARCH INSTITUTE INC | https://s3.amazonaws.com/irs-form-990/201502299349301700_public.xml | 2015-09-21 | 201409 | 22995526 | NaN | true | true | NaN | NaN | true | 25626262 | NaN | NaN | {u'TotalAmt': u'25626262', u'ProgramServicesAmt': u'18485740', u'ManagementAndGeneralAmt': u'7140522', u'FundraisingAmt': u'0'} | NaN | 18485740 | 20437991 | NaN | NaN | NaN | NaN | NaN | {u'TotalRevenueColumnAmt': u'0'} | NaN | NaN | NaN | 2053228 | NaN | 16124016 | 0 | 0 | NaN | NaN | NaN | NaN | 0 | 1539746 | NaN | NaN | {u'TotalRevenueColumnAmt': u'362651', u'ExclusionAmt': u'362651'} | 428192 | [{u'TotalRevenueColumnAmt': u'33550', u'BusinessCd': u'531390', u'RelatedOrExemptFuncIncomeAmt': u'33550', u'Desc': u'PARKING INCOME'}, {u'TotalRevenueColumnAmt': u'394642', u'BusinessCd': u'900099', u'RelatedOrExemptFuncIncomeAmt': u'394642', u'... | NaN | 1017789 |
12 | FY2013 | 042129889 | SCHEPENS EYE RESEARCH INSTITUTE INC | https://s3.amazonaws.com/irs-form-990/201442269349304504_public.xml | 2014-10-28 | 201309 | 25923449 | true | NaN | NaN | true | true | NaN | NaN | 26224173 | {u'ManagementAndGeneral': u'6501012', u'ProgramServices': u'19280493', u'Fundraising': u'442668', u'Total': u'26224173'} | NaN | 19280493 | NaN | NaN | NaN | NaN | NaN | NaN | {u'TotalRevenueColumn': u'0'} | NaN | NaN | NaN | 2466016 | NaN | 16660756 | NaN | NaN | NaN | NaN | 0 | NaN | 0 | NaN | NaN | {u'TotalRevenueColumn': u'398381', u'ExclusionAmount': u'398381'} | 1195606 | NaN | NaN | NaN | 451157 | NaN |
df.describe().T
count | mean | std | min | 25% | 50% | 75% | max | |
---|---|---|---|---|---|---|---|---|
tot_rev | 538.0 | 5.857063e+07 | 3.234413e+08 | -218265025.0 | 1320586.25 | 4261393.5 | 13341627.0 | 3.741635e+09 |
print df['WhistleblowerPolicy'].value_counts().sum()
print df['WhistleblowerPolicyInd'].value_counts().sum(), '\n'
print df['ConflictOfInterestPolicy'].value_counts().sum()
print df['ConflictOfInterestPolicyInd'].value_counts().sum(), '\n'
print df['DocumentRetentionPolicy'].value_counts().sum()
print df['DocumentRetentionPolicyInd'].value_counts().sum(), '\n'
344 194 344 194 344 194
NOTE: THIS VARIABLE JUST ADDED IN AD HOC -- IT WASN'T SAVED IN INITIAL ROUND.
df['material_diversion'] = np.nan
print len(df[df['material_diversion'].notnull()])
df['material_diversion'] = np.where(df['MaterialDiversionOrMisuse'].notnull(), df['MaterialDiversionOrMisuse'],
df['material_diversion'] )
print len(df[df['material_diversion'].notnull()])
df['material_diversion'] = np.where( ((df['material_diversion'].isnull()) &
(df['MaterialDiversionOrMisuseInd'].notnull())),
df['MaterialDiversionOrMisuseInd'], df['material_diversion'] )
print len(df[df['material_diversion'].notnull()])
df['material_diversion'].value_counts()
0 344 538
0 278 false 246 true 9 1 5 Name: material_diversion, dtype: int64
df['material_diversion'] = np.where( df['material_diversion']=='true', 1, df['material_diversion'] )
df['material_diversion'] = np.where( df['material_diversion']=='1', 1, df['material_diversion'] )
df['material_diversion'] = np.where( df['material_diversion']=='false', 0, df['material_diversion'] )
df['material_diversion'] = np.where( df['material_diversion']=='0', 0, df['material_diversion'] )
print len(df[df['material_diversion'].notnull()])
df['material_diversion'].value_counts()
538
0 524 1 14 Name: material_diversion, dtype: int64
df['whistleblower_policy'] = np.nan
print len(df[df['whistleblower_policy'].notnull()])
df['whistleblower_policy'] = np.where(df['WhistleblowerPolicy'].notnull(), df['WhistleblowerPolicy'],
df['whistleblower_policy'] )
print len(df[df['whistleblower_policy'].notnull()])
df['whistleblower_policy'] = np.where( ((df['whistleblower_policy'].isnull()) & (df['WhistleblowerPolicyInd'].notnull())),
df['WhistleblowerPolicyInd'], df['whistleblower_policy'] )
print len(df[df['whistleblower_policy'].notnull()])
df['whistleblower_policy'].value_counts()
0 344 538
1 179 true 138 false 117 0 104 Name: whistleblower_policy, dtype: int64
df['whistleblower_policy'] = np.where( df['whistleblower_policy']=='true', 1, df['whistleblower_policy'] )
df['whistleblower_policy'] = np.where( df['whistleblower_policy']=='1', 1, df['whistleblower_policy'] )
df['whistleblower_policy'] = np.where( df['whistleblower_policy']=='false', 0, df['whistleblower_policy'] )
df['whistleblower_policy'] = np.where( df['whistleblower_policy']=='0', 0, df['whistleblower_policy'] )
print len(df[df['whistleblower_policy'].notnull()])
df['whistleblower_policy'].value_counts()
538
1 317 0 221 Name: whistleblower_policy, dtype: int64
df[['whistleblower_policy', 'WhistleblowerPolicyInd', 'WhistleblowerPolicy' ]][:5]
whistleblower_policy | WhistleblowerPolicyInd | WhistleblowerPolicy | |
---|---|---|---|
252 | 0 | NaN | false |
433 | 0 | NaN | false |
78 | 0 | NaN | false |
333 | 1 | true | NaN |
12 | 1 | NaN | true |
df['conflict_of_interest_policy'] = np.nan
print len(df[df['conflict_of_interest_policy'].notnull()])
df['conflict_of_interest_policy'] = np.where(df['ConflictOfInterestPolicy'].notnull(), df['ConflictOfInterestPolicy'],
df['conflict_of_interest_policy'] )
print len(df[df['conflict_of_interest_policy'].notnull()])
df['conflict_of_interest_policy'] = np.where( ((df['conflict_of_interest_policy'].isnull())
& (df['ConflictOfInterestPolicyInd'].notnull())),
df['ConflictOfInterestPolicyInd'], df['conflict_of_interest_policy'] )
print len(df[df['conflict_of_interest_policy'].notnull()])
df['conflict_of_interest_policy'].value_counts()
0 344 538
1 262 true 192 false 63 0 21 Name: conflict_of_interest_policy, dtype: int64
df['conflict_of_interest_policy'] = np.where( df['conflict_of_interest_policy']=='true', 1, df['conflict_of_interest_policy'] )
df['conflict_of_interest_policy'] = np.where( df['conflict_of_interest_policy']=='1', 1, df['conflict_of_interest_policy'] )
df['conflict_of_interest_policy'] = np.where( df['conflict_of_interest_policy']=='false', 0, df['conflict_of_interest_policy'] )
df['conflict_of_interest_policy'] = np.where( df['conflict_of_interest_policy']=='0', 0, df['conflict_of_interest_policy'] )
print len(df[df['conflict_of_interest_policy'].notnull()])
df['conflict_of_interest_policy'].value_counts()
538
1 454 0 84 Name: conflict_of_interest_policy, dtype: int64
df[['conflict_of_interest_policy', 'ConflictOfInterestPolicyInd', 'ConflictOfInterestPolicy' ]][:5]
conflict_of_interest_policy | ConflictOfInterestPolicyInd | ConflictOfInterestPolicy | |
---|---|---|---|
252 | 1 | NaN | true |
433 | 1 | NaN | true |
78 | 1 | NaN | true |
333 | 1 | true | NaN |
12 | 1 | NaN | true |
df['records_retention_policy'] = np.nan
print len(df[df['records_retention_policy'].notnull()])
df['records_retention_policy'] = np.where(df['DocumentRetentionPolicy'].notnull(), df['DocumentRetentionPolicy'],
df['records_retention_policy'] )
print len(df[df['records_retention_policy'].notnull()])
df['records_retention_policy'] = np.where( ((df['records_retention_policy'].isnull())
& (df['DocumentRetentionPolicyInd'].notnull())),
df['DocumentRetentionPolicyInd'], df['records_retention_policy'] )
print len(df[df['records_retention_policy'].notnull()])
df['records_retention_policy'].value_counts()
0 344 538
1 224 true 159 false 96 0 59 Name: records_retention_policy, dtype: int64
df['records_retention_policy'] = np.where( df['records_retention_policy']=='true', 1, df['records_retention_policy'] )
df['records_retention_policy'] = np.where( df['records_retention_policy']=='1', 1, df['records_retention_policy'] )
df['records_retention_policy'] = np.where( df['records_retention_policy']=='false', 0, df['records_retention_policy'] )
df['records_retention_policy'] = np.where( df['records_retention_policy']=='0', 0, df['records_retention_policy'] )
print len(df[df['records_retention_policy'].notnull()])
df['records_retention_policy'].value_counts()
538
1 383 0 155 Name: records_retention_policy, dtype: int64
df[['records_retention_policy', 'DocumentRetentionPolicyInd', 'DocumentRetentionPolicy' ]][:5]
records_retention_policy | DocumentRetentionPolicyInd | DocumentRetentionPolicy | |
---|---|---|---|
252 | 1 | NaN | true |
433 | 1 | NaN | true |
78 | 1 | NaN | true |
333 | 1 | true | NaN |
12 | 1 | NaN | true |
print df.columns.tolist()
['FYE', u'EIN', u'OrganizationName', u'URL', u'SubmittedOn', u'TaxPeriod', u'WhistleblowerPolicy', u'WhistleblowerPolicyInd', u'ConflictOfInterestPolicyInd', u'ConflictOfInterestPolicy', u'DocumentRetentionPolicy', u'DocumentRetentionPolicyInd', u'CYTotalRevenueAmt', u'TotalRevenue', u'TotalRevenueCurrentYear', u'CYTotalExpensesAmt', u'TotalExpensesCurrentYear', u'TotalFunctionalExpenses', u'TotalFunctionalExpensesGrp', u'TotalProgramServiceExpense', u'TotalProgramServiceExpensesAmt', u'CYContributionsGrantsAmt', u'FederatedCampaigns', u'FederatedCampaignsAmt', u'MembershipDues', u'MembershipDuesAmt', u'NetIncomeFromFundraisingEvents', u'NetIncmFromFundraisingEvtGrp', u'FundraisingAmt', u'FundraisingGrossIncomeAmt', u'RelatedOrganizations', u'RelatedOrganizationsAmt', u'GovernmentGrants', u'GovernmentGrantsAmt', u'PYProgramServiceRevenueAmt', u'CYProgramServiceRevenueAmt', u'ProgramServiceRevenue', u'ProgramServiceRevenueCY', u'ProgramServiceRevenueGrp', u'TotalProgramServiceRevenue', u'TotalProgramServiceRevenueAmt', u'CYInvestmentIncomeAmt', u'InvestmentIncome', u'InvestmentIncomeCurrentYear', u'InvestmentIncomeGrp', u'OtherRevenueTotalAmt', u'OtherRevenueMiscGrp', u'TotalOtherRevenue', u'CYOtherRevenueAmt', 'tot_rev', 'whistleblower_policy', 'conflict_of_interest_policy', 'records_retention_policy']
new_cols = ['FYE', 'EIN', 'OrganizationName', 'URL', 'SubmittedOn', 'TaxPeriod',
'tot_rev', 'whistleblower_policy', 'conflict_of_interest_policy', 'records_retention_policy',
#'WhistleblowerPolicy', u'WhistleblowerPolicyInd', u'ConflictOfInterestPolicyInd',
#'ConflictOfInterestPolicy', u'DocumentRetentionPolicy', u'DocumentRetentionPolicyInd',
#'CYTotalRevenueAmt', 'TotalRevenue', 'TotalRevenueCurrentYear',
'CYTotalExpensesAmt',
'TotalExpensesCurrentYear', 'TotalFunctionalExpenses', 'TotalFunctionalExpensesGrp',
'TotalProgramServiceExpense', 'TotalProgramServiceExpensesAmt', 'CYContributionsGrantsAmt',
'FederatedCampaigns', 'FederatedCampaignsAmt', u'MembershipDues', 'MembershipDuesAmt',
'NetIncomeFromFundraisingEvents', 'NetIncmFromFundraisingEvtGrp',
'FundraisingAmt', 'FundraisingGrossIncomeAmt',
'RelatedOrganizations',
'RelatedOrganizationsAmt', 'GovernmentGrants', 'GovernmentGrantsAmt', 'PYProgramServiceRevenueAmt',
'CYProgramServiceRevenueAmt', 'ProgramServiceRevenue', 'ProgramServiceRevenueCY',
'ProgramServiceRevenueGrp', 'TotalProgramServiceRevenue', 'TotalProgramServiceRevenueAmt',
'CYInvestmentIncomeAmt', 'InvestmentIncome', 'InvestmentIncomeCurrentYear', 'InvestmentIncomeGrp',
'OtherRevenueTotalAmt', 'OtherRevenueMiscGrp', 'TotalOtherRevenue', 'CYOtherRevenueAmt',
]
df = df[new_cols]
df[:5]
FYE | EIN | OrganizationName | URL | SubmittedOn | TaxPeriod | tot_rev | whistleblower_policy | conflict_of_interest_policy | records_retention_policy | CYTotalExpensesAmt | TotalExpensesCurrentYear | TotalFunctionalExpenses | TotalFunctionalExpensesGrp | TotalProgramServiceExpense | TotalProgramServiceExpensesAmt | CYContributionsGrantsAmt | FederatedCampaigns | FederatedCampaignsAmt | MembershipDues | MembershipDuesAmt | NetIncomeFromFundraisingEvents | NetIncmFromFundraisingEvtGrp | FundraisingAmt | FundraisingGrossIncomeAmt | RelatedOrganizations | RelatedOrganizationsAmt | GovernmentGrants | GovernmentGrantsAmt | PYProgramServiceRevenueAmt | CYProgramServiceRevenueAmt | ProgramServiceRevenue | ProgramServiceRevenueCY | ProgramServiceRevenueGrp | TotalProgramServiceRevenue | TotalProgramServiceRevenueAmt | CYInvestmentIncomeAmt | InvestmentIncome | InvestmentIncomeCurrentYear | InvestmentIncomeGrp | OtherRevenueTotalAmt | OtherRevenueMiscGrp | TotalOtherRevenue | CYOtherRevenueAmt | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
252 | FY2012 | 030498214 | NEWARK NOW INC | https://s3.amazonaws.com/irs-form-990/201303199349303505_public.xml | 2013-12-31 | 201212 | 2515399 | 0 | 1 | 1 | NaN | 2100922 | {u'ManagementAndGeneral': u'362113', u'ProgramServices': u'1598626', u'Fundraising': u'140183', u'Total': u'2100922'} | NaN | 1598626 | NaN | NaN | NaN | NaN | NaN | NaN | {u'TotalRevenueColumn': u'0'} | NaN | NaN | NaN | NaN | NaN | 185408 | NaN | NaN | NaN | NaN | 0 | NaN | 0 | NaN | NaN | {u'TotalRevenueColumn': u'63', u'ExclusionAmount': u'63'} | 63 | NaN | NaN | NaN | 0 | NaN |
433 | FY2011 | 030498214 | NEWARK NOW INC | https://s3.amazonaws.com/irs-form-990/201320249349300417_public.xml | 2013-02-14 | 201112 | 2161209 | 0 | 1 | 1 | NaN | 2481310 | {u'ManagementAndGeneral': u'527949', u'ProgramServices': u'1715793', u'Fundraising': u'237568', u'Total': u'2481310'} | NaN | 1715793 | NaN | NaN | NaN | NaN | NaN | NaN | {u'TotalRevenueColumn': u'0'} | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0 | NaN | 0 | NaN | NaN | {u'TotalRevenueColumn': u'0'} | 0 | NaN | NaN | NaN | 0 | NaN |
78 | FY2010 | 030498214 | NEWARK NOW INC | https://s3.amazonaws.com/irs-form-990/201220909349300327_public.xml | 2012-05-25 | 201012 | 3990564 | 0 | 1 | 1 | NaN | 3912643 | {u'ManagementAndGeneral': u'908524', u'ProgramServices': u'2930785', u'Fundraising': u'73334', u'Total': u'3912643'} | NaN | 2930785 | NaN | NaN | NaN | NaN | NaN | NaN | {u'TotalRevenueColumn': u'0'} | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0 | NaN | 0 | NaN | NaN | {u'TotalRevenueColumn': u'0'} | 0 | NaN | NaN | NaN | 0 | NaN |
333 | FY2014 | 042129889 | SCHEPENS EYE RESEARCH INSTITUTE INC | https://s3.amazonaws.com/irs-form-990/201502299349301700_public.xml | 2015-09-21 | 201409 | 22995526 | 1 | 1 | 1 | 25626262 | NaN | NaN | {u'TotalAmt': u'25626262', u'ProgramServicesAmt': u'18485740', u'ManagementAndGeneralAmt': u'7140522', u'FundraisingAmt': u'0'} | NaN | 18485740 | 20437991 | NaN | NaN | NaN | NaN | NaN | {u'TotalRevenueColumnAmt': u'0'} | NaN | NaN | NaN | 2053228 | NaN | 16124016 | 0 | 0 | NaN | NaN | NaN | NaN | 0 | 1539746 | NaN | NaN | {u'TotalRevenueColumnAmt': u'362651', u'ExclusionAmt': u'362651'} | 428192 | [{u'TotalRevenueColumnAmt': u'33550', u'BusinessCd': u'531390', u'RelatedOrExemptFuncIncomeAmt': u'33550', u'Desc': u'PARKING INCOME'}, {u'TotalRevenueColumnAmt': u'394642', u'BusinessCd': u'900099', u'RelatedOrExemptFuncIncomeAmt': u'394642', u'... | NaN | 1017789 |
12 | FY2013 | 042129889 | SCHEPENS EYE RESEARCH INSTITUTE INC | https://s3.amazonaws.com/irs-form-990/201442269349304504_public.xml | 2014-10-28 | 201309 | 25923449 | 1 | 1 | 1 | NaN | 26224173 | {u'ManagementAndGeneral': u'6501012', u'ProgramServices': u'19280493', u'Fundraising': u'442668', u'Total': u'26224173'} | NaN | 19280493 | NaN | NaN | NaN | NaN | NaN | NaN | {u'TotalRevenueColumn': u'0'} | NaN | NaN | NaN | 2466016 | NaN | 16660756 | NaN | NaN | NaN | NaN | 0 | NaN | 0 | NaN | NaN | {u'TotalRevenueColumn': u'398381', u'ExclusionAmount': u'398381'} | 1195606 | NaN | NaN | NaN | 451157 | NaN |
152 unique EINs in this dataset -- roughly half of the 321 '2016' donor advisories.
df.to_pickle('538 e-file 990s for 2016 donor advisory organizations, v2.pkl')
#cols = ['FYE'] + file_list_columns + ['MaterialDiversionOrMisuse', 'MaterialDiversionOrMisuseInd'] + ['material_diversion']
#df[cols].to_pickle('538 e-file 990s for 2016 donor advisory organizations (material diversions only).pkl')
#df['material_diversion'] = df['material_diversion'].astype('int')
#df[cols].describe().T
count | mean | std | min | 25% | 50% | 75% | max | |
---|---|---|---|---|---|---|---|---|
material_diversion | 538.0 | 0.026022 | 0.15935 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
#cols = ['FYE'] + file_list_columns + ['MaterialDiversionOrMisuse', 'MaterialDiversionOrMisuseInd'] + ['material_diversion']
#df[cols].to_pickle('538 e-file 990s for 2016 donor advisory organizations (material diversions only).pkl')
efficiency_columns = ['CYTotalExpensesAmt', 'TotalExpensesCurrentYear',
'TotalFunctionalExpenses', 'TotalFunctionalExpensesGrp',
'TotalProgramServiceExpense', 'TotalProgramServiceExpensesAmt',]
df[efficiency_columns][:4]
CYTotalExpensesAmt | TotalExpensesCurrentYear | TotalFunctionalExpenses | TotalFunctionalExpensesGrp | TotalProgramServiceExpense | TotalProgramServiceExpensesAmt | |
---|---|---|---|---|---|---|
252 | NaN | 2100922 | {u'ManagementAndGeneral': u'362113', u'ProgramServices': u'1598626', u'Fundraising': u'140183', u'Total': u'2100922'} | NaN | 1598626 | NaN |
433 | NaN | 2481310 | {u'ManagementAndGeneral': u'527949', u'ProgramServices': u'1715793', u'Fundraising': u'237568', u'Total': u'2481310'} | NaN | 1715793 | NaN |
78 | NaN | 3912643 | {u'ManagementAndGeneral': u'908524', u'ProgramServices': u'2930785', u'Fundraising': u'73334', u'Total': u'3912643'} | NaN | 2930785 | NaN |
333 | 25626262 | NaN | NaN | {u'TotalAmt': u'25626262', u'ProgramServicesAmt': u'18485740', u'ManagementAndGeneralAmt': u'7140522', u'FundraisingAmt': u'0'} | NaN | 18485740 |
df['total_expenses'] = np.nan
print len(df[df['total_expenses'].notnull()])
df['total_expenses'] = np.where(df['CYTotalExpensesAmt'].notnull(), df['CYTotalExpensesAmt'],
df['total_expenses'] )
print len(df[df['total_expenses'].notnull()])
df['total_expenses'] = np.where( ((df['total_expenses'].isnull()) & (df['TotalExpensesCurrentYear'].notnull())),
df['TotalExpensesCurrentYear'], df['total_expenses'] )
print len(df[df['total_expenses'].notnull()])
df['total_expenses'].describe().T
0 194 538
count 538 unique 536 top 271781 freq 2 Name: total_expenses, dtype: object
df['program_expenses'] = np.nan
print len(df[df['program_expenses'].notnull()])
df['program_expenses'] = np.where(df['TotalProgramServiceExpense'].notnull(), df['TotalProgramServiceExpense'],
df['program_expenses'] )
print len(df[df['program_expenses'].notnull()])
df['program_expenses'] = np.where( ((df['program_expenses'].isnull()) & (df['TotalProgramServiceExpensesAmt'].notnull())),
df['TotalProgramServiceExpensesAmt'], df['program_expenses'] )
print len(df[df['program_expenses'].notnull()])
df['program_expenses'].describe().T
0 336 524
count 524 unique 522 top 31689530 freq 2 Name: program_expenses, dtype: object
print len(df[df['program_expenses'].isnull()])
df[df['program_expenses'].isnull()]
14
FYE | EIN | OrganizationName | URL | SubmittedOn | TaxPeriod | tot_rev | whistleblower_policy | conflict_of_interest_policy | records_retention_policy | CYTotalExpensesAmt | TotalExpensesCurrentYear | TotalFunctionalExpenses | TotalFunctionalExpensesGrp | TotalProgramServiceExpense | TotalProgramServiceExpensesAmt | CYContributionsGrantsAmt | FederatedCampaigns | FederatedCampaignsAmt | MembershipDues | MembershipDuesAmt | NetIncomeFromFundraisingEvents | NetIncmFromFundraisingEvtGrp | FundraisingAmt | FundraisingGrossIncomeAmt | RelatedOrganizations | RelatedOrganizationsAmt | GovernmentGrants | GovernmentGrantsAmt | PYProgramServiceRevenueAmt | CYProgramServiceRevenueAmt | ProgramServiceRevenue | ProgramServiceRevenueCY | ProgramServiceRevenueGrp | TotalProgramServiceRevenue | TotalProgramServiceRevenueAmt | CYInvestmentIncomeAmt | InvestmentIncome | InvestmentIncomeCurrentYear | InvestmentIncomeGrp | OtherRevenueTotalAmt | OtherRevenueMiscGrp | TotalOtherRevenue | CYOtherRevenueAmt | total_expenses | program_expenses | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
172 | FY2014 | 132644507 | WORLD TRADE CENTERS ASSOCIATION INC | https://s3.amazonaws.com/irs-form-990/201543149349303504_public.xml | 2016-02-08 | 201412 | 7913104 | 1 | 1 | 1 | 7763846 | NaN | NaN | {u'TotalAmt': u'7763846'} | NaN | NaN | 0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 8834499 | 6812087 | NaN | NaN | [{u'TotalRevenueColumnAmt': u'3242920', u'BusinessCd': u'900099', u'RelatedOrExemptFuncIncomeAmt': u'3242920', u'Desc': u'MEMBERSHIP DUES'}, {u'TotalRevenueColumnAmt': u'2890000', u'BusinessCd': u'900099', u'RelatedOrExemptFuncIncomeAmt': u'28900... | NaN | 6812087 | 1075687 | NaN | NaN | {u'TotalRevenueColumnAmt': u'334806', u'ExclusionAmt': u'334806'} | 25330 | {u'TotalRevenueColumnAmt': u'25330', u'BusinessCd': u'900099', u'ExclusionAmt': u'25330', u'Desc': u'MISCELLANEOUS'} | NaN | 25330 | 7763846 | NaN |
404 | FY2013 | 132644507 | WORLD TRADE CENTERS ASSOCIATION INC | https://s3.amazonaws.com/irs-form-990/201433149349302383_public.xml | 2014-12-10 | 201312 | 9228395 | 1 | 1 | 1 | 7795021 | NaN | NaN | {u'TotalAmt': u'7795021'} | NaN | NaN | 0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 7520245 | 8834499 | NaN | NaN | [{u'TotalRevenueColumnAmt': u'4774000', u'BusinessCd': u'900099', u'RelatedOrExemptFuncIncomeAmt': u'4774000', u'Desc': u'INITIATION FEES'}, {u'TotalRevenueColumnAmt': u'3347499', u'BusinessCd': u'900099', u'RelatedOrExemptFuncIncomeAmt': u'33474... | NaN | 8834499 | 391315 | NaN | NaN | {u'TotalRevenueColumnAmt': u'275100', u'ExclusionAmt': u'275100'} | 2581 | {u'TotalRevenueColumnAmt': u'2581', u'BusinessCd': u'900099', u'ExclusionAmt': u'2581', u'Desc': u'MISCELLANEOUS'} | NaN | 2581 | 7795021 | NaN |
431 | FY2012 | 132644507 | WORLD TRADE CENTERS ASSOCIATION INC | https://s3.amazonaws.com/irs-form-990/201313169349303646_public.xml | 2013-12-24 | 201212 | 7749706 | 1 | 1 | 1 | NaN | 6237509 | {u'Total': u'6237509'} | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | [{u'BusinessCode': u'900099', u'TotalRevenueColumn': u'3333245', u'RelatedOrExemptFunctionIncome': u'3333245', u'Description': u'MEMBERSHIP DUES'}, {u'BusinessCode': u'900099', u'TotalRevenueColumn': u'3278000', u'RelatedOrExemptFunctionIncome': ... | 7520245 | NaN | 7520245 | NaN | NaN | {u'TotalRevenueColumn': u'234964', u'ExclusionAmount': u'234964'} | 211074 | NaN | NaN | NaN | 18387 | NaN | 6237509 | NaN |
106 | FY2011 | 132644507 | WORLD TRADE CENTERS ASSOCIATION INC | https://s3.amazonaws.com/irs-form-990/201222279349301852_public.xml | 2012-11-15 | 201112 | 7072954 | 1 | 1 | 1 | NaN | 5880789 | {u'Total': u'5880789'} | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | [{u'BusinessCode': u'900099', u'TotalRevenueColumn': u'3313485', u'RelatedOrExemptFunctionIncome': u'3313485', u'Description': u'MEMBERSHIP DUES'}, {u'BusinessCode': u'900099', u'TotalRevenueColumn': u'3183000', u'RelatedOrExemptFunctionIncome': ... | 6926485 | NaN | 6926485 | NaN | NaN | {u'TotalRevenueColumn': u'237277', u'ExclusionAmount': u'237277'} | 144021 | NaN | NaN | NaN | 2448 | NaN | 5880789 | NaN |
72 | FY2010 | 132644507 | WORLD TRADE CENTERS ASSOCIATION INC | https://s3.amazonaws.com/irs-form-990/201123199349305807_public.xml | 2012-01-03 | 201012 | 6858206 | 1 | 1 | 1 | NaN | 6024086 | {u'Total': u'6024086'} | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | [{u'BusinessCode': u'900099', u'TotalRevenueColumn': u'3452000', u'RelatedOrExemptFunctionIncome': u'3452000', u'Description': u'INITIATION FEES'}, {u'BusinessCode': u'900099', u'TotalRevenueColumn': u'3134807', u'RelatedOrExemptFunctionIncome': ... | 6586807 | NaN | 6586807 | NaN | NaN | {u'TotalRevenueColumn': u'217696', u'ExclusionAmount': u'217696'} | 192714 | NaN | NaN | NaN | 78685 | NaN | 6024086 | NaN |
52 | FY2014 | 135579302 | UNITED HEBREW CEMETERY INC | https://s3.amazonaws.com/irs-form-990/201531949349300508_public.xml | 2015-07-29 | 201412 | 4223826 | 1 | 1 | 1 | 2023750 | NaN | NaN | {u'TotalAmt': u'2023750'} | NaN | NaN | 0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2572305 | 2466379 | NaN | NaN | [{u'TotalRevenueColumnAmt': u'1235180', u'RelatedOrExemptFuncIncomeAmt': u'1235180', u'Desc': u'SALES OF GRAVES & PLOTS'}, {u'TotalRevenueColumnAmt': u'601571', u'RelatedOrExemptFuncIncomeAmt': u'601571', u'Desc': u'INTERMENTS'}, {u'TotalRevenueC... | NaN | 2466379 | 1299578 | NaN | NaN | {u'TotalRevenueColumnAmt': u'481123', u'RelatedOrExemptFuncIncomeAmt': u'481123'} | 457869 | {u'TotalRevenueColumnAmt': u'457869', u'RelatedOrExemptFuncIncomeAmt': u'457869', u'Desc': u'PERPETUAL CARE ENDOWMENTS'} | NaN | 457869 | 2023750 | NaN |
406 | FY2013 | 135579302 | UNITED HEBREW CEMETERY INC | https://s3.amazonaws.com/irs-form-990/201433039349300503_public.xml | 2014-11-24 | 201312 | 4936178 | 0 | 0 | 0 | 4133761 | NaN | NaN | {u'TotalAmt': u'4133761', u'ProgramServicesAmt': u'0', u'ManagementAndGeneralAmt': u'0', u'FundraisingAmt': u'0'} | NaN | NaN | 0 | NaN | NaN | NaN | NaN | NaN | {u'TotalRevenueColumnAmt': u'0'} | NaN | NaN | NaN | NaN | NaN | NaN | 2196546 | 2572305 | NaN | NaN | [{u'TotalRevenueColumnAmt': u'43547', u'RelatedOrExemptFuncIncomeAmt': u'43547', u'Desc': u'CELL TOWER INCOME'}, {u'TotalRevenueColumnAmt': u'371441', u'RelatedOrExemptFuncIncomeAmt': u'371441', u'Desc': u'FOUNDATIONS'}, {u'TotalRevenueColumnAmt'... | NaN | 2572305 | 917828 | NaN | NaN | {u'TotalRevenueColumnAmt': u'584869', u'RelatedOrExemptFuncIncomeAmt': u'584869'} | 1446045 | [{u'TotalRevenueColumnAmt': u'1000000', u'RelatedOrExemptFuncIncomeAmt': u'1000000', u'Desc': u'LEGAL RECOVERY SETTLEMENT'}, {u'TotalRevenueColumnAmt': u'446045', u'RelatedOrExemptFuncIncomeAmt': u'446045', u'Desc': u'PERPETUAL CARE ENDOWMENTS'}] | NaN | 1446045 | 4133761 | NaN |
500 | FY2010 | 201737542 | GREATER CORNERSTONE COMMUNITY DEV | https://s3.amazonaws.com/irs-form-990/201111369349303781_public.xml | 2011-06-08 | 201006 | 548759 | 1 | 1 | 1 | NaN | 177901 | {u'ManagementAndGeneral': u'111025', u'ProgramServices': u'0', u'Fundraising': u'66876', u'Total': u'177901'} | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0 | NaN | NaN | NaN | NaN | NaN | 0 | NaN | NaN | NaN | NaN | NaN | 177901 | NaN |
0 | FY2014 | 271317072 | ANCHOR BUILDING FUND INC | https://s3.amazonaws.com/irs-form-990/201503169349304780_public.xml | 2016-01-30 | 201412 | 325000 | 0 | 0 | 0 | 34400 | NaN | NaN | {u'TotalAmt': u'34400', u'ProgramServicesAmt': u'0', u'ManagementAndGeneralAmt': u'34400', u'FundraisingAmt': u'0'} | NaN | NaN | 0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0 | NaN | NaN | NaN | NaN | NaN | 0 | NaN | NaN | NaN | 325000 | {u'TotalRevenueColumnAmt': u'325000', u'ExclusionAmt': u'325000', u'Desc': u'SETTLEMENT'} | NaN | 325000 | 34400 | NaN |
277 | FY2014 | 330837780 | INTERNATIONAL CHRISTIAN BROADCASTING INC | https://s3.amazonaws.com/irs-form-990/201543209349314539_public.xml | 2016-02-22 | 201412 | 9926 | 1 | 1 | 1 | 3330 | NaN | NaN | {u'TotalAmt': u'3330', u'ProgramServicesAmt': u'0', u'ManagementAndGeneralAmt': u'3330', u'FundraisingAmt': u'0'} | NaN | NaN | 9225 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 9025 | NaN | NaN | NaN | 0 | NaN | NaN | NaN | NaN | NaN | 701 | NaN | NaN | {u'TotalRevenueColumnAmt': u'701', u'ExclusionAmt': u'701'} | NaN | NaN | NaN | 0 | 3330 | NaN |
226 | FY2011 | 591951577 | ALLIED VETERANS OF THE WORLD INC | https://s3.amazonaws.com/irs-form-990/201243209349307874_public.xml | 2012-12-31 | 201112 | 1736929 | 1 | 1 | 1 | NaN | 1690623 | {u'Total': u'1690623'} | NaN | NaN | NaN | NaN | NaN | NaN | 26929 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0 | NaN | NaN | NaN | NaN | {u'TotalRevenueColumn': u'1710000'} | 1710000 | NaN | NaN | NaN | NaN | NaN | 1690623 | NaN |
213 | FY2010 | 591951577 | ALLIED VETERANS OF THE WORLD INC | https://s3.amazonaws.com/irs-form-990/201132279349301923_public.xml | 2011-09-07 | 201012 | 2496382 | 1 | 1 | 1 | NaN | 1161939 | {u'Total': u'1161939'} | NaN | NaN | NaN | NaN | NaN | NaN | 8056 | NaN | NaN | NaN | NaN | NaN | 2482845 | NaN | NaN | NaN | NaN | NaN | NaN | 0 | NaN | NaN | NaN | NaN | NaN | 0 | NaN | NaN | NaN | 5481 | NaN | 1161939 | NaN |
495 | FY2010 | 591951577 | ALLIED VETERANS OF THE WORLD INC | https://s3.amazonaws.com/irs-form-990/201243219349300739_public.xml | 2012-12-31 | 201012 | 2496382 | 1 | 1 | 1 | NaN | 1282149 | {u'Total': u'1282149'} | NaN | NaN | NaN | NaN | NaN | NaN | 8056 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0 | NaN | NaN | NaN | NaN | {u'TotalRevenueColumn': u'2482845'} | 2482845 | NaN | NaN | NaN | 5481 | NaN | 1282149 | NaN |
496 | FY2011 | 953553530 | NATIONAL MINORITY TV INC | https://s3.amazonaws.com/irs-form-990/201243219349300444_public.xml | 2012-12-31 | 201112 | 0 | 1 | 1 | 1 | NaN | 0 | {u'ManagementAndGeneral': u'0', u'ProgramServices': u'0', u'Fundraising': u'0', u'Total': u'0'} | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0 | NaN | NaN | NaN | NaN | NaN | 0 | NaN | NaN | NaN | NaN | NaN | 0 | NaN |
df[df['program_expenses'].isnull()].to_csv('14 filings with zero program expenses.csv')
df['program_expenses'] = df['program_expenses'].astype('float')
df['total_expenses'] = df['total_expenses'].astype('float')
df['program_efficiency'] = df['program_expenses']/df['total_expenses']
print df['program_efficiency'].describe()
count 524.000000 mean 0.779227 std 0.191070 min 0.023817 25% NaN 50% NaN 75% NaN max 1.000000 Name: program_efficiency, dtype: float64
df[complexity_columns][:5]
CYContributionsGrantsAmt | FederatedCampaigns | FederatedCampaignsAmt | MembershipDues | MembershipDuesAmt | NetIncomeFromFundraisingEvents | NetIncmFromFundraisingEvtGrp | FundraisingAmt | FundraisingGrossIncomeAmt | RelatedOrganizations | RelatedOrganizationsAmt | GovernmentGrants | GovernmentGrantsAmt | PYProgramServiceRevenueAmt | CYProgramServiceRevenueAmt | ProgramServiceRevenue | ProgramServiceRevenueCY | ProgramServiceRevenueGrp | TotalProgramServiceRevenue | TotalProgramServiceRevenueAmt | CYInvestmentIncomeAmt | InvestmentIncome | InvestmentIncomeCurrentYear | InvestmentIncomeGrp | OtherRevenueTotalAmt | OtherRevenueMiscGrp | TotalOtherRevenue | CYOtherRevenueAmt | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
252 | NaN | NaN | NaN | NaN | NaN | {u'TotalRevenueColumn': u'0'} | NaN | NaN | NaN | NaN | NaN | 185408 | NaN | NaN | NaN | NaN | 0 | NaN | 0 | NaN | NaN | {u'TotalRevenueColumn': u'63', u'ExclusionAmount': u'63'} | 63 | NaN | NaN | NaN | 0 | NaN |
433 | NaN | NaN | NaN | NaN | NaN | {u'TotalRevenueColumn': u'0'} | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0 | NaN | 0 | NaN | NaN | {u'TotalRevenueColumn': u'0'} | 0 | NaN | NaN | NaN | 0 | NaN |
78 | NaN | NaN | NaN | NaN | NaN | {u'TotalRevenueColumn': u'0'} | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0 | NaN | 0 | NaN | NaN | {u'TotalRevenueColumn': u'0'} | 0 | NaN | NaN | NaN | 0 | NaN |
333 | 20437991 | NaN | NaN | NaN | NaN | NaN | {u'TotalRevenueColumnAmt': u'0'} | NaN | NaN | NaN | 2053228 | NaN | 16124016 | 0 | 0 | NaN | NaN | NaN | NaN | 0 | 1539746 | NaN | NaN | {u'TotalRevenueColumnAmt': u'362651', u'ExclusionAmt': u'362651'} | 428192 | [{u'TotalRevenueColumnAmt': u'33550', u'BusinessCd': u'531390', u'RelatedOrExemptFuncIncomeAmt': u'33550', u'Desc': u'PARKING INCOME'}, {u'TotalRevenueColumnAmt': u'394642', u'BusinessCd': u'900099', u'RelatedOrExemptFuncIncomeAmt': u'394642', u'... | NaN | 1017789 |
12 | NaN | NaN | NaN | NaN | NaN | {u'TotalRevenueColumn': u'0'} | NaN | NaN | NaN | 2466016 | NaN | 16660756 | NaN | NaN | NaN | NaN | 0 | NaN | 0 | NaN | NaN | {u'TotalRevenueColumn': u'398381', u'ExclusionAmount': u'398381'} | 1195606 | NaN | NaN | NaN | 451157 | NaN |
complexity_columns = ['CYContributionsGrantsAmt', 'FederatedCampaigns', 'FederatedCampaignsAmt',
'MembershipDues', 'MembershipDuesAmt',
'NetIncomeFromFundraisingEvents', 'NetIncmFromFundraisingEvtGrp',
#'FundraisingActivities', 'FundraisingActivitiesInd',
'FundraisingAmt',
#'FundraisingDirectExpenses', 'FundraisingDirectExpensesAmt', 'FundraisingEvents',
'FundraisingGrossIncomeAmt',
'RelatedOrganizations', 'RelatedOrganizationsAmt',
'GovernmentGrants', 'GovernmentGrantsAmt',
'PYProgramServiceRevenueAmt', 'CYProgramServiceRevenueAmt',
'ProgramServiceRevenue', 'ProgramServiceRevenueCY', 'ProgramServiceRevenueGrp',
'TotalProgramServiceRevenue', 'TotalProgramServiceRevenueAmt',
'CYInvestmentIncomeAmt', 'InvestmentIncome', 'InvestmentIncomeCurrentYear', 'InvestmentIncomeGrp',
'OtherRevenueTotalAmt', 'OtherRevenueMiscGrp',
'TotalOtherRevenue', 'CYOtherRevenueAmt'
]
#[contributions_gifts_grants_2016', 'federated_campaigns_2016', 'membership_dues_2016',
# 'fundraising_events_2016', 'related_organizations_2016', 'government_grants_2016',
# 'program_service_revenue_2016', 'other_revenue_2016']
print len(df[df['FederatedCampaigns'].notnull()])
df['FederatedCampaigns'] = np.where( ((df['FederatedCampaigns'].isnull()) & (df['FederatedCampaignsAmt'].notnull())),
df['FederatedCampaignsAmt'], df['FederatedCampaigns'] )
print len(df[df['FederatedCampaigns'].notnull()])
df['FederatedCampaigns'].describe().T
21 40
count 40 unique 34 top 0 freq 6 Name: FederatedCampaigns, dtype: object
df[complexity_columns][20:25]
CYContributionsGrantsAmt | FederatedCampaigns | FederatedCampaignsAmt | MembershipDues | MembershipDuesAmt | NetIncomeFromFundraisingEvents | NetIncmFromFundraisingEvtGrp | FundraisingAmt | FundraisingGrossIncomeAmt | RelatedOrganizations | RelatedOrganizationsAmt | GovernmentGrants | GovernmentGrantsAmt | PYProgramServiceRevenueAmt | CYProgramServiceRevenueAmt | ProgramServiceRevenue | ProgramServiceRevenueCY | ProgramServiceRevenueGrp | TotalProgramServiceRevenue | TotalProgramServiceRevenueAmt | CYInvestmentIncomeAmt | InvestmentIncome | InvestmentIncomeCurrentYear | InvestmentIncomeGrp | OtherRevenueTotalAmt | OtherRevenueMiscGrp | TotalOtherRevenue | CYOtherRevenueAmt | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
101 | NaN | NaN | NaN | NaN | NaN | {u'TotalRevenueColumn': u'0'} | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | {u'BusinessCode': u'611600', u'TotalRevenueColumn': u'194006', u'RelatedOrExemptFunctionIncome': u'194006', u'Description': u'TUITION'} | 194006 | NaN | 194006 | NaN | NaN | {u'TotalRevenueColumn': u'7', u'RelatedOrExemptFunctionIncome': u'7'} | 7 | NaN | NaN | NaN | 0 | NaN |
528 | 12294982 | NaN | NaN | NaN | NaN | NaN | NaN | 1817567 | NaN | NaN | NaN | NaN | 646173 | NaN | 0 | NaN | NaN | NaN | NaN | NaN | 1549 | NaN | NaN | {u'TotalRevenueColumnAmt': u'1549'} | NaN | NaN | NaN | 0 |
401 | 12741996 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 629939 | NaN | 0 | NaN | NaN | NaN | NaN | NaN | 1656 | NaN | NaN | {u'TotalRevenueColumnAmt': u'1656'} | NaN | NaN | NaN | 0 |
361 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 437339 | NaN | NaN | NaN | NaN | 0 | NaN | NaN | NaN | NaN | {u'TotalRevenueColumn': u'2695'} | 2695 | NaN | NaN | NaN | NaN | NaN |
36 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 461226 | NaN | NaN | NaN | NaN | 0 | NaN | NaN | NaN | NaN | {u'TotalRevenueColumn': u'2317'} | 2317 | NaN | NaN | NaN | NaN | NaN |
print len(df[df['MembershipDues'].notnull()])
print len(df[df['MembershipDuesAmt'].notnull()])
df['MembershipDues'] = np.where( ((df['MembershipDues'].isnull()) & (df['MembershipDuesAmt'].notnull())),
df['MembershipDuesAmt'], df['MembershipDues'] )
print len(df[df['MembershipDues'].notnull()])
df['MembershipDues'].describe().T
32 12 44
count 44 unique 37 top 0 freq 7 Name: MembershipDues, dtype: object
print complexity_columns
['CYContributionsGrantsAmt', 'FederatedCampaigns', 'FederatedCampaignsAmt', 'MembershipDues', 'MembershipDuesAmt', 'NetIncomeFromFundraisingEvents', 'NetIncmFromFundraisingEvtGrp', 'FundraisingAmt', 'FundraisingGrossIncomeAmt', 'RelatedOrganizations', 'RelatedOrganizationsAmt', 'GovernmentGrants', 'GovernmentGrantsAmt', 'PYProgramServiceRevenueAmt', 'CYProgramServiceRevenueAmt', 'ProgramServiceRevenue', 'ProgramServiceRevenueCY', 'ProgramServiceRevenueGrp', 'TotalProgramServiceRevenue', 'TotalProgramServiceRevenueAmt', 'CYInvestmentIncomeAmt', 'InvestmentIncome', 'InvestmentIncomeCurrentYear', 'InvestmentIncomeGrp', 'OtherRevenueTotalAmt', 'OtherRevenueMiscGrp', 'TotalOtherRevenue', 'CYOtherRevenueAmt']
import ast
df[['NetIncomeFromFundraisingEvents', 'NetIncmFromFundraisingEvtGrp', 'FundraisingAmt',
'FundraisingGrossIncomeAmt']][4:12]
NetIncomeFromFundraisingEvents | NetIncmFromFundraisingEvtGrp | FundraisingAmt | FundraisingGrossIncomeAmt | |
---|---|---|---|---|
12 | {u'TotalRevenueColumn': u'0'} | NaN | NaN | NaN |
481 | {u'TotalRevenueColumn': u'221449', u'ExclusionAmount': u'221449'} | NaN | NaN | NaN |
16 | {u'TotalRevenueColumn': u'23265', u'ExclusionAmount': u'23265'} | NaN | NaN | NaN |
130 | {u'TotalRevenueColumn': u'1322', u'RelatedOrExemptFunctionIncome': u'1322', u'UnrelatedBusinessRevenue': u'0', u'ExclusionAmount': u'0'} | NaN | NaN | NaN |
434 | NaN | {u'TotalRevenueColumnAmt': u'9429', u'ExclusionAmt': u'9429'} | 2985 | 28106 |
374 | {u'TotalRevenueColumn': u'2923', u'ExclusionAmount': u'2923'} | NaN | NaN | NaN |
118 | {u'TotalRevenueColumn': u'47210', u'ExclusionAmount': u'47210'} | NaN | NaN | NaN |
124 | {u'TotalRevenueColumn': u'99073', u'RelatedOrExemptFunctionIncome': u'99073'} | NaN | NaN | NaN |
print len(df[df['NetIncomeFromFundraisingEvents'].notnull()])
print len(df[df['NetIncmFromFundraisingEvtGrp'].notnull()])
df['NetIncomeFromFundraisingEvents'] = np.where( ((df['NetIncomeFromFundraisingEvents'].isnull())
& (df['NetIncmFromFundraisingEvtGrp'].notnull())),
df['NetIncmFromFundraisingEvtGrp'], df['NetIncomeFromFundraisingEvents'] )
print len(df[df['NetIncomeFromFundraisingEvents'].notnull()])
df['NetIncomeFromFundraisingEvents'].describe().T
144 84 228
count 228 unique 129 top {u'TotalRevenueColumn': u'0'} freq 66 Name: NetIncomeFromFundraisingEvents, dtype: object
df[['NetIncomeFromFundraisingEvents', 'NetIncmFromFundraisingEvtGrp', 'FundraisingAmt',
'FundraisingGrossIncomeAmt']][4:12]
NetIncomeFromFundraisingEvents | NetIncmFromFundraisingEvtGrp | FundraisingAmt | FundraisingGrossIncomeAmt | |
---|---|---|---|---|
12 | {u'TotalRevenueColumn': u'0'} | NaN | NaN | NaN |
481 | {u'TotalRevenueColumn': u'221449', u'ExclusionAmount': u'221449'} | NaN | NaN | NaN |
16 | {u'TotalRevenueColumn': u'23265', u'ExclusionAmount': u'23265'} | NaN | NaN | NaN |
130 | {u'TotalRevenueColumn': u'1322', u'RelatedOrExemptFunctionIncome': u'1322', u'UnrelatedBusinessRevenue': u'0', u'ExclusionAmount': u'0'} | NaN | NaN | NaN |
434 | {u'TotalRevenueColumnAmt': u'9429', u'ExclusionAmt': u'9429'} | {u'TotalRevenueColumnAmt': u'9429', u'ExclusionAmt': u'9429'} | 2985 | 28106 |
374 | {u'TotalRevenueColumn': u'2923', u'ExclusionAmount': u'2923'} | NaN | NaN | NaN |
118 | {u'TotalRevenueColumn': u'47210', u'ExclusionAmount': u'47210'} | NaN | NaN | NaN |
124 | {u'TotalRevenueColumn': u'99073', u'RelatedOrExemptFunctionIncome': u'99073'} | NaN | NaN | NaN |
counter = 1
df['fundraising'] = np.nan
for index, row in df.iterrows():
#print row['NetIncomeFromFundraisingEvents']
#print type(ast.literal_eval(str(row['NetIncomeFromFundraisingEvents'])))
if pd.notnull(row['NetIncomeFromFundraisingEvents']):
try:
fr_revs = ast.literal_eval(str(row['NetIncomeFromFundraisingEvents']))
except:
df.ix[index, 'fundraising'] = np.nan
if 'TotalRevenueColumn' in fr_revs:
#print 'yes', fr_revs['TotalRevenueColumn']
df.ix[index, 'fundraising'] = fr_revs['TotalRevenueColumn']
elif 'TotalRevenueColumnAmt' in fr_revs:
#print 'yes', fr_revs['TotalRevenueColumnAmt']
df.ix[index, 'fundraising'] = fr_revs['TotalRevenueColumnAmt']
else:
df.ix[index, 'fundraising'] = np.nan
#counter +=1
#print counter
'''
counter = 1
df['fundraising'] = np.nan
for index, row in df.iterrows():
#print row['NetIncomeFromFundraisingEvents']
#print type(ast.literal_eval(str(row['NetIncomeFromFundraisingEvents'])))
if pd.notnull(row['NetIncomeFromFundraisingEvents']):
try:
fr_revs = ast.literal_eval(str(row['NetIncomeFromFundraisingEvents']))
except:
df[index, 'fundraising'] = np.nan
if 'TotalRevenueColumn' in fr_revs:
#print 'yes', fr_revs['TotalRevenueColumn']
df.ix[index, 'fundraising'] = fr_revs['TotalRevenueColumn']
elif 'TotalRevenueColumnAmt' in fr_revs:
#print 'yes', fr_revs['TotalRevenueColumnAmt']
df.ix[index, 'fundraising'] = fr_revs['TotalRevenueColumnAmt']
else:
df[index, 'fundraising'] = np.nan
#counter +=1
#print counter
'''
df[['fundraising', 'NetIncomeFromFundraisingEvents', 'NetIncmFromFundraisingEvtGrp', 'FundraisingAmt',
'FundraisingGrossIncomeAmt']][4:8]
fundraising | NetIncomeFromFundraisingEvents | NetIncmFromFundraisingEvtGrp | FundraisingAmt | FundraisingGrossIncomeAmt | |
---|---|---|---|---|---|
12 | 0 | {u'TotalRevenueColumn': u'0'} | NaN | NaN | NaN |
481 | 221449 | {u'TotalRevenueColumn': u'221449', u'ExclusionAmount': u'221449'} | NaN | NaN | NaN |
16 | 23265 | {u'TotalRevenueColumn': u'23265', u'ExclusionAmount': u'23265'} | NaN | NaN | NaN |
130 | 1322 | {u'TotalRevenueColumn': u'1322', u'RelatedOrExemptFunctionIncome': u'1322', u'UnrelatedBusinessRevenue': u'0', u'ExclusionAmount': u'0'} | NaN | NaN | NaN |
df[['fundraising', 'FundraisingAmt', 'FundraisingGrossIncomeAmt']][:12]
fundraising | FundraisingAmt | FundraisingGrossIncomeAmt | |
---|---|---|---|
252 | 0 | NaN | NaN |
433 | 0 | NaN | NaN |
78 | 0 | NaN | NaN |
333 | 0 | NaN | NaN |
12 | 0 | NaN | NaN |
481 | 221449 | NaN | NaN |
16 | 23265 | NaN | NaN |
130 | 1322 | NaN | NaN |
434 | 9429 | 2985 | 28106 |
374 | 2923 | NaN | NaN |
118 | 47210 | NaN | NaN |
124 | 99073 | NaN | NaN |
print len(df[df['fundraising'].notnull()])
print len(df[df['FundraisingAmt'].notnull()])
print len(df[df['FundraisingGrossIncomeAmt'].notnull()])
df['fundraising'] = np.where( ((df['fundraising']=='0') & (df['FundraisingAmt'].notnull())),
df['FundraisingAmt'], df['fundraising'] )
df['fundraising'] = np.where( ((df['fundraising']=='0') & (df['FundraisingGrossIncomeAmt'].notnull())),
df['FundraisingGrossIncomeAmt'], df['fundraising'] )
print len(df[df['fundraising'].notnull()])
df[['fundraising', 'FundraisingAmt', 'FundraisingGrossIncomeAmt']][:12]
228 32 53 228
fundraising | FundraisingAmt | FundraisingGrossIncomeAmt | |
---|---|---|---|
252 | 0 | NaN | NaN |
433 | 0 | NaN | NaN |
78 | 0 | NaN | NaN |
333 | 0 | NaN | NaN |
12 | 0 | NaN | NaN |
481 | 221449 | NaN | NaN |
16 | 23265 | NaN | NaN |
130 | 1322 | NaN | NaN |
434 | 9429 | 2985 | 28106 |
374 | 2923 | NaN | NaN |
118 | 47210 | NaN | NaN |
124 | 99073 | NaN | NaN |
df['fundraising'].describe().T
count 228 unique 132 top 0 freq 96 Name: fundraising, dtype: object
#df[df['fundraising']=='-38198'][['EIN', 'FYE', 'fundraising', 'NetIncomeFromFundraisingEvents', 'NetIncmFromFundraisingEvtGrp',
# 'FundraisingAmt', 'FundraisingGrossIncomeAmt']]
df[['EIN', 'FYE', 'fundraising', 'NetIncomeFromFundraisingEvents', 'NetIncmFromFundraisingEvtGrp',
'FundraisingAmt', 'FundraisingGrossIncomeAmt']][50:74]
EIN | FYE | fundraising | NetIncomeFromFundraisingEvents | NetIncmFromFundraisingEvtGrp | FundraisingAmt | FundraisingGrossIncomeAmt | |
---|---|---|---|---|---|---|---|
106 | 132644507 | FY2011 | NaN | NaN | NaN | NaN | NaN |
72 | 132644507 | FY2010 | NaN | NaN | NaN | NaN | NaN |
336 | 132690403 | FY2013 | NaN | NaN | NaN | NaN | NaN |
430 | 132690403 | FY2012 | NaN | NaN | NaN | NaN | NaN |
473 | 132690403 | FY2011 | NaN | NaN | NaN | NaN | NaN |
33 | 132690403 | FY2010 | NaN | NaN | NaN | NaN | NaN |
531 | 132738818 | FY2014 | -18373 | {u'TotalRevenueColumnAmt': u'-18373', u'ExclusionAmt': u'-18373'} | {u'TotalRevenueColumnAmt': u'-18373', u'ExclusionAmt': u'-18373'} | 217434 | 32760 |
162 | 132738818 | FY2013 | 0 | {u'TotalRevenueColumn': u'0'} | NaN | NaN | NaN |
22 | 132738818 | FY2012 | 0 | {u'TotalRevenueColumn': u'0'} | NaN | NaN | NaN |
82 | 132738818 | FY2011 | NaN | NaN | NaN | NaN | NaN |
140 | 132738818 | FY2010 | 0 | {u'TotalRevenueColumn': u'0'} | NaN | NaN | NaN |
442 | 133119118 | FY2012 | 21748 | {u'TotalRevenueColumn': u'21748', u'ExclusionAmount': u'21748'} | NaN | NaN | NaN |
491 | 133119118 | FY2011 | 176541 | {u'TotalRevenueColumn': u'176541', u'ExclusionAmount': u'176541'} | NaN | NaN | NaN |
499 | 133119118 | FY2010 | 236763 | {u'TotalRevenueColumn': u'236763', u'ExclusionAmount': u'236763'} | NaN | NaN | NaN |
70 | 133552154 | FY2011 | 853901 | {u'TotalRevenueColumn': u'853901', u'ExclusionAmount': u'853901'} | NaN | NaN | NaN |
123 | 133552154 | FY2010 | 418298 | {u'TotalRevenueColumn': u'418298', u'ExclusionAmount': u'418298'} | NaN | NaN | NaN |
296 | 133600807 | FY2014 | NaN | NaN | NaN | NaN | NaN |
407 | 133600807 | FY2013 | NaN | NaN | NaN | NaN | NaN |
428 | 133600807 | FY2012 | NaN | NaN | NaN | NaN | NaN |
180 | 133600807 | FY2010 | NaN | NaN | NaN | NaN | NaN |
323 | 134147912 | FY2014 | NaN | NaN | NaN | NaN | NaN |
384 | 134147912 | FY2013 | NaN | NaN | NaN | NaN | NaN |
21 | 134147912 | FY2012 | 0 | {u'TotalRevenueColumn': u'0'} | NaN | NaN | NaN |
246 | 134147912 | FY2011 | 0 | {u'TotalRevenueColumn': u'0'} | NaN | NaN | NaN |
df['fundraising'] = df['fundraising'].astype('float')
print df['fundraising'].dtype, '\n'
print df['fundraising'].describe()
float64 count 2.280000e+02 mean 8.138974e+04 std 5.064406e+05 min -2.839000e+05 25% NaN 50% NaN 75% NaN max 7.283318e+06 Name: fundraising, dtype: float64
print len(df[df['RelatedOrganizations'].notnull()])
print len(df[df['RelatedOrganizationsAmt'].notnull()])
df['RelatedOrganizations'] = np.where( ((df['RelatedOrganizations'].isnull())
& (df['RelatedOrganizationsAmt'].notnull())),
df['RelatedOrganizationsAmt'], df['RelatedOrganizations'] )
print len(df[df['RelatedOrganizations'].notnull()])
df['RelatedOrganizations'].describe().T
33 19 52
count 52 unique 46 top 0 freq 7 Name: RelatedOrganizations, dtype: object
print len(df[df['GovernmentGrants'].notnull()])
print len(df[df['GovernmentGrantsAmt'].notnull()])
df['GovernmentGrants'] = np.where( ((df['GovernmentGrants'].isnull()) & (df['GovernmentGrantsAmt'].notnull())),
df['GovernmentGrantsAmt'], df['GovernmentGrants'] )
print len(df[df['GovernmentGrants'].notnull()])
df['GovernmentGrants'].describe().T
213 70 213
count 213 unique 207 top 548987 freq 3 Name: GovernmentGrants, dtype: object
print len(df[df['PYProgramServiceRevenueAmt'].notnull()])
print len(df[df['CYProgramServiceRevenueAmt'].notnull()])
print len(df[df['ProgramServiceRevenueCY'].notnull()])
print len(df[df['TotalProgramServiceRevenue'].notnull()])
print len(df[df['TotalProgramServiceRevenueAmt'].notnull()]), '\n'
df['program_revenue'] = np.nan
print len(df[df['program_revenue'].notnull()])
df['program_revenue'] = np.where( df['PYProgramServiceRevenueAmt'].notnull(),
df['PYProgramServiceRevenueAmt'], df['program_revenue'] )
print len(df[df['program_revenue'].notnull()])
df['program_revenue'] = np.where( ((df['program_revenue'].isnull()) & (df['CYProgramServiceRevenueAmt'].notnull())),
df['CYProgramServiceRevenueAmt'], df['program_revenue'] )
print len(df[df['program_revenue'].notnull()])
df['program_revenue'] = np.where( ((df['program_revenue'].isnull()) & (df['ProgramServiceRevenueCY'].notnull())),
df['ProgramServiceRevenueCY'], df['program_revenue'] )
print len(df[df['program_revenue'].notnull()])
df['program_revenue'] = np.where( ((df['program_revenue'].isnull()) & (df['TotalProgramServiceRevenue'].notnull())),
df['TotalProgramServiceRevenue'], df['program_revenue'] )
print len(df[df['program_revenue'].notnull()])
df['program_revenue'] = np.where( ((df['program_revenue'].isnull())
& (df['TotalProgramServiceRevenueAmt'].notnull())),
df['TotalProgramServiceRevenueAmt'], df['program_revenue'] )
print len(df[df['program_revenue'].notnull()])
df['program_revenue'].describe().T
151 194 344 233 130 0 151 194 538 538 538
count 538 unique 259 top 0 freq 217 Name: program_revenue, dtype: object
df[['program_revenue', 'PYProgramServiceRevenueAmt', 'CYProgramServiceRevenueAmt',
'ProgramServiceRevenue', 'ProgramServiceRevenueCY', 'ProgramServiceRevenueGrp',
'TotalProgramServiceRevenue', 'TotalProgramServiceRevenueAmt']][:20]
program_revenue | PYProgramServiceRevenueAmt | CYProgramServiceRevenueAmt | ProgramServiceRevenue | ProgramServiceRevenueCY | ProgramServiceRevenueGrp | TotalProgramServiceRevenue | TotalProgramServiceRevenueAmt | |
---|---|---|---|---|---|---|---|---|
252 | 0 | NaN | NaN | NaN | 0 | NaN | 0 | NaN |
433 | 0 | NaN | NaN | NaN | 0 | NaN | 0 | NaN |
78 | 0 | NaN | NaN | NaN | 0 | NaN | 0 | NaN |
333 | 0 | 0 | 0 | NaN | NaN | NaN | NaN | 0 |
12 | 0 | NaN | NaN | NaN | 0 | NaN | 0 | NaN |
481 | 0 | NaN | NaN | NaN | 0 | NaN | 0 | NaN |
16 | 0 | NaN | NaN | NaN | 0 | NaN | NaN | NaN |
130 | 0 | NaN | NaN | NaN | 0 | NaN | 0 | NaN |
434 | 0 | 0 | 0 | NaN | NaN | NaN | NaN | NaN |
374 | 0 | NaN | NaN | NaN | 0 | NaN | NaN | NaN |
118 | 0 | NaN | NaN | NaN | 0 | NaN | NaN | NaN |
124 | 0 | NaN | NaN | NaN | 0 | NaN | NaN | NaN |
263 | 76449 | 76449 | 0 | NaN | NaN | NaN | NaN | NaN |
196 | 76449 | NaN | NaN | {u'TotalRevenueColumn': u'76449', u'RelatedOrExemptFunctionIncome': u'76449', u'Description': u'RENTAL INCOME'} | 76449 | NaN | 76449 | NaN |
256 | 205125 | NaN | NaN | {u'TotalRevenueColumn': u'205125', u'RelatedOrExemptFunctionIncome': u'205125', u'Description': u'RENTAL INCOME'} | 205125 | NaN | 205125 | NaN |
233 | 256109 | NaN | NaN | [{u'TotalRevenueColumn': u'174799', u'RelatedOrExemptFunctionIncome': u'174799', u'Description': u'RENTAL INCOME'}, {u'TotalRevenueColumn': u'81310', u'RelatedOrExemptFunctionIncome': u'81310', u'Description': u'THRIFT STORE REVENUE'}] | 256109 | NaN | 256109 | NaN |
129 | 107536 | NaN | NaN | NaN | 107536 | NaN | 107536 | NaN |
319 | 224006 | 224006 | 249554 | NaN | NaN | {u'TotalRevenueColumnAmt': u'249554', u'BusinessCd': u'611600', u'RelatedOrExemptFuncIncomeAmt': u'249554', u'Desc': u'TUITION'} | NaN | 249554 |
394 | 206873 | 206873 | 224006 | NaN | NaN | {u'TotalRevenueColumnAmt': u'224006', u'BusinessCd': u'611600', u'RelatedOrExemptFuncIncomeAmt': u'224006', u'Desc': u'TUITION'} | NaN | 224006 |
462 | 206873 | NaN | NaN | {u'BusinessCode': u'611600', u'TotalRevenueColumn': u'206873', u'RelatedOrExemptFunctionIncome': u'206873', u'Description': u'TUITION'} | 206873 | NaN | 206873 | NaN |
complexity_columns = ['CYContributionsGrantsAmt', 'FederatedCampaigns', #'FederatedCampaignsAmt',
'MembershipDues', #'MembershipDuesAmt',
#'NetIncomeFromFundraisingEvents', 'NetIncmFromFundraisingEvtGrp',
#'FundraisingActivities', 'FundraisingActivitiesInd',
#'FundraisingAmt',
#'FundraisingDirectExpenses', 'FundraisingDirectExpensesAmt', 'FundraisingEvents',
#'FundraisingGrossIncomeAmt',
'fundraising',
'RelatedOrganizations', #'RelatedOrganizationsAmt',
'GovernmentGrants', #'GovernmentGrantsAmt',
#'PYProgramServiceRevenueAmt', 'CYProgramServiceRevenueAmt',
#'ProgramServiceRevenue', 'ProgramServiceRevenueCY', 'ProgramServiceRevenueGrp',
#'TotalProgramServiceRevenue', 'TotalProgramServiceRevenueAmt',
'program_revenue',
'CYInvestmentIncomeAmt', 'InvestmentIncome', 'InvestmentIncomeCurrentYear', 'InvestmentIncomeGrp',
'OtherRevenueTotalAmt', 'OtherRevenueMiscGrp',
'TotalOtherRevenue', 'CYOtherRevenueAmt'
]
#[contributions_gifts_grants_2016', 'federated_campaigns_2016', 'membership_dues_2016',
# 'fundraising_events_2016', 'related_organizations_2016', 'government_grants_2016',
# 'program_service_revenue_2016', 'other_revenue_2016']
df[complexity_columns][:10]
CYContributionsGrantsAmt | FederatedCampaigns | MembershipDues | fundraising | RelatedOrganizations | GovernmentGrants | program_revenue | CYInvestmentIncomeAmt | InvestmentIncome | InvestmentIncomeCurrentYear | InvestmentIncomeGrp | OtherRevenueTotalAmt | OtherRevenueMiscGrp | TotalOtherRevenue | CYOtherRevenueAmt | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
252 | NaN | NaN | NaN | 0.0 | NaN | 185408 | 0 | NaN | {u'TotalRevenueColumn': u'63', u'ExclusionAmount': u'63'} | 63 | NaN | NaN | NaN | 0 | NaN |
433 | NaN | NaN | NaN | 0.0 | NaN | NaN | 0 | NaN | {u'TotalRevenueColumn': u'0'} | 0 | NaN | NaN | NaN | 0 | NaN |
78 | NaN | NaN | NaN | 0.0 | NaN | NaN | 0 | NaN | {u'TotalRevenueColumn': u'0'} | 0 | NaN | NaN | NaN | 0 | NaN |
333 | 20437991 | NaN | NaN | 0.0 | 2053228 | 16124016 | 0 | 1539746 | NaN | NaN | {u'TotalRevenueColumnAmt': u'362651', u'ExclusionAmt': u'362651'} | 428192 | [{u'TotalRevenueColumnAmt': u'33550', u'BusinessCd': u'531390', u'RelatedOrExemptFuncIncomeAmt': u'33550', u'Desc': u'PARKING INCOME'}, {u'TotalRevenueColumnAmt': u'394642', u'BusinessCd': u'900099', u'RelatedOrExemptFuncIncomeAmt': u'394642', u'... | NaN | 1017789 |
12 | NaN | NaN | NaN | 0.0 | 2466016 | 16660756 | 0 | NaN | {u'TotalRevenueColumn': u'398381', u'ExclusionAmount': u'398381'} | 1195606 | NaN | NaN | NaN | 451157 | NaN |
481 | NaN | NaN | NaN | 221449.0 | 954610 | 21380666 | 0 | NaN | {u'TotalRevenueColumn': u'494644', u'ExclusionAmount': u'494644'} | 1249229 | NaN | NaN | NaN | 458249 | NaN |
16 | NaN | NaN | NaN | 23265.0 | NaN | 19362978 | 0 | NaN | {u'TotalRevenueColumn': u'646166', u'ExclusionAmount': u'646166'} | 703373 | NaN | NaN | NaN | 58199 | NaN |
130 | NaN | 0 | 0 | 1322.0 | 0 | 20439097 | 0 | NaN | {u'TotalRevenueColumn': u'761762', u'RelatedOrExemptFunctionIncome': u'0', u'UnrelatedBusinessRevenue': u'0', u'ExclusionAmount': u'761762'} | 1195892 | NaN | NaN | NaN | 91122 | NaN |
434 | 556113 | NaN | NaN | 9429.0 | NaN | NaN | 0 | 132915 | NaN | NaN | {u'TotalRevenueColumnAmt': u'132915', u'ExclusionAmt': u'132915'} | 15142 | {u'TotalRevenueColumnAmt': u'15142', u'BusinessCd': u'900099', u'ExclusionAmt': u'15142', u'Desc': u'OTHER INCOME'} | NaN | 75409 |
374 | NaN | NaN | NaN | 2923.0 | NaN | NaN | 0 | NaN | {u'TotalRevenueColumn': u'141660', u'ExclusionAmount': u'141660'} | 141660 | NaN | NaN | NaN | 14151 | NaN |
df[['CYInvestmentIncomeAmt', 'InvestmentIncomeCurrentYear']][10:16]
CYInvestmentIncomeAmt | InvestmentIncomeCurrentYear | |
---|---|---|
118 | NaN | 182952 |
124 | NaN | 92923 |
263 | -141807 | NaN |
196 | NaN | 0 |
256 | NaN | 0 |
233 | NaN | -52721 |
print len(df[df['CYInvestmentIncomeAmt'].notnull()])
print len(df[df['InvestmentIncomeCurrentYear'].notnull()])
df['CYInvestmentIncomeAmt'] = np.where( ((df['CYInvestmentIncomeAmt'].isnull())
& (df['InvestmentIncomeCurrentYear'].notnull())),
df['InvestmentIncomeCurrentYear'], df['CYInvestmentIncomeAmt'] )
print len(df[df['CYInvestmentIncomeAmt'].notnull()])
df['CYInvestmentIncomeAmt'].describe().T
194 344 538
count 538 unique 397 top 0 freq 123 Name: CYInvestmentIncomeAmt, dtype: object
complexity_columns = ['CYContributionsGrantsAmt', 'FederatedCampaigns', #'FederatedCampaignsAmt',
'MembershipDues', #'MembershipDuesAmt',
#'NetIncomeFromFundraisingEvents', 'NetIncmFromFundraisingEvtGrp',
#'FundraisingActivities', 'FundraisingActivitiesInd',
#'FundraisingAmt',
#'FundraisingDirectExpenses', 'FundraisingDirectExpensesAmt', 'FundraisingEvents',
#'FundraisingGrossIncomeAmt',
'fundraising',
'RelatedOrganizations', #'RelatedOrganizationsAmt',
'GovernmentGrants', #'GovernmentGrantsAmt',
#'PYProgramServiceRevenueAmt', 'CYProgramServiceRevenueAmt',
#'ProgramServiceRevenue', 'ProgramServiceRevenueCY', 'ProgramServiceRevenueGrp',
#'TotalProgramServiceRevenue', 'TotalProgramServiceRevenueAmt',
'program_revenue',
'CYInvestmentIncomeAmt',
#'InvestmentIncome', 'InvestmentIncomeCurrentYear', 'InvestmentIncomeGrp',
'OtherRevenueTotalAmt', 'OtherRevenueMiscGrp',
'TotalOtherRevenue', 'CYOtherRevenueAmt'
]
#[contributions_gifts_grants_2016', 'federated_campaigns_2016', 'membership_dues_2016',
# 'fundraising_events_2016', 'related_organizations_2016', 'government_grants_2016',
# 'program_service_revenue_2016', 'other_revenue_2016']
df[complexity_columns][:10]
CYContributionsGrantsAmt | FederatedCampaigns | MembershipDues | fundraising | RelatedOrganizations | GovernmentGrants | program_revenue | CYInvestmentIncomeAmt | OtherRevenueTotalAmt | OtherRevenueMiscGrp | TotalOtherRevenue | CYOtherRevenueAmt | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
252 | NaN | NaN | NaN | 0.0 | NaN | 185408 | 0 | 63 | NaN | NaN | 0 | NaN |
433 | NaN | NaN | NaN | 0.0 | NaN | NaN | 0 | 0 | NaN | NaN | 0 | NaN |
78 | NaN | NaN | NaN | 0.0 | NaN | NaN | 0 | 0 | NaN | NaN | 0 | NaN |
333 | 20437991 | NaN | NaN | 0.0 | 2053228 | 16124016 | 0 | 1539746 | 428192 | [{u'TotalRevenueColumnAmt': u'33550', u'BusinessCd': u'531390', u'RelatedOrExemptFuncIncomeAmt': u'33550', u'Desc': u'PARKING INCOME'}, {u'TotalRevenueColumnAmt': u'394642', u'BusinessCd': u'900099', u'RelatedOrExemptFuncIncomeAmt': u'394642', u'... | NaN | 1017789 |
12 | NaN | NaN | NaN | 0.0 | 2466016 | 16660756 | 0 | 1195606 | NaN | NaN | 451157 | NaN |
481 | NaN | NaN | NaN | 221449.0 | 954610 | 21380666 | 0 | 1249229 | NaN | NaN | 458249 | NaN |
16 | NaN | NaN | NaN | 23265.0 | NaN | 19362978 | 0 | 703373 | NaN | NaN | 58199 | NaN |
130 | NaN | 0 | 0 | 1322.0 | 0 | 20439097 | 0 | 1195892 | NaN | NaN | 91122 | NaN |
434 | 556113 | NaN | NaN | 9429.0 | NaN | NaN | 0 | 132915 | 15142 | {u'TotalRevenueColumnAmt': u'15142', u'BusinessCd': u'900099', u'ExclusionAmt': u'15142', u'Desc': u'OTHER INCOME'} | NaN | 75409 |
374 | NaN | NaN | NaN | 2923.0 | NaN | NaN | 0 | 141660 | NaN | NaN | 14151 | NaN |
df[['OtherRevenueTotalAmt', 'OtherRevenueMiscGrp','TotalOtherRevenue', 'CYOtherRevenueAmt']][:10]
OtherRevenueTotalAmt | OtherRevenueMiscGrp | TotalOtherRevenue | CYOtherRevenueAmt | |
---|---|---|---|---|
252 | NaN | NaN | 0 | NaN |
433 | NaN | NaN | 0 | NaN |
78 | NaN | NaN | 0 | NaN |
333 | 428192 | [{u'TotalRevenueColumnAmt': u'33550', u'BusinessCd': u'531390', u'RelatedOrExemptFuncIncomeAmt': u'33550', u'Desc': u'PARKING INCOME'}, {u'TotalRevenueColumnAmt': u'394642', u'BusinessCd': u'900099', u'RelatedOrExemptFuncIncomeAmt': u'394642', u'... | NaN | 1017789 |
12 | NaN | NaN | 451157 | NaN |
481 | NaN | NaN | 458249 | NaN |
16 | NaN | NaN | 58199 | NaN |
130 | NaN | NaN | 91122 | NaN |
434 | 15142 | {u'TotalRevenueColumnAmt': u'15142', u'BusinessCd': u'900099', u'ExclusionAmt': u'15142', u'Desc': u'OTHER INCOME'} | NaN | 75409 |
374 | NaN | NaN | 14151 | NaN |
counter = 1
df['OtherRevMisc'] = np.nan
for index, row in df.iterrows():
#print row['NetIncomeFromFundraisingEvents']
#print type(ast.literal_eval(str(row['NetIncomeFromFundraisingEvents'])))
try:
fr_revs = str(row['OtherRevenueMiscGrp'])
if fr_revs.startswith('['):
#print 'yes'
fr_revs = fr_revs.replace('[', '').replace(']', '')
fr_revs = ast.literal_eval(fr_revs)
fr_revs = fr_revs[0]
else:
fr_revs = ast.literal_eval(fr_revs)
print fr_revs, '\n'
if 'TotalRevenueColumn' in fr_revs:
#print 'yes', fr_revs['TotalRevenueColumn']
df.ix[index, 'OtherRevMisc'] = fr_revs['TotalRevenueColumn']
elif 'TotalRevenueColumnAmt' in fr_revs:
#print 'yes', fr_revs['TotalRevenueColumnAmt']
df.ix[index, 'OtherRevMisc'] = fr_revs['TotalRevenueColumnAmt']
except:
df.ix[index, 'OtherRevMisc'] = np.nan
else:
df.ix[index, 'OtherRevMisc'] = np.nan
#counter +=1
#print counter
{u'TotalRevenueColumnAmt': u'33550', u'BusinessCd': u'531390', u'RelatedOrExemptFuncIncomeAmt': u'33550', u'Desc': u'PARKING INCOME'} {u'TotalRevenueColumnAmt': u'15142', u'BusinessCd': u'900099', u'ExclusionAmt': u'15142', u'Desc': u'OTHER INCOME'} {u'TotalRevenueColumnAmt': u'155000', u'BusinessCd': u'900099', u'RelatedOrExemptFuncIncomeAmt': u'155000', u'Desc': u'REFUND OF PRIOR-YEAR GRANT'} {u'TotalRevenueColumnAmt': u'4740', u'BusinessCd': u'624100', u'RelatedOrExemptFuncIncomeAmt': u'4740', u'Desc': u'CONFERENCE REGISTRATION'} {u'TotalRevenueColumnAmt': u'2027768', u'BusinessCd': u'900099', u'ExclusionAmt': u'2027768', u'Desc': u'INSURANCE PROCEEDS'} {u'TotalRevenueColumnAmt': u'25330', u'BusinessCd': u'900099', u'ExclusionAmt': u'25330', u'Desc': u'MISCELLANEOUS'} {u'TotalRevenueColumnAmt': u'2581', u'BusinessCd': u'900099', u'ExclusionAmt': u'2581', u'Desc': u'MISCELLANEOUS'} {u'TotalRevenueColumnAmt': u'1480447', u'RelatedOrExemptFuncIncomeAmt': u'1480447'} {u'TotalRevenueColumnAmt': u'3837835', u'BusinessCd': u'900099', u'RelatedOrExemptFuncIncomeAmt': u'3837835', u'Desc': u'RESTITUTION'} {u'TotalRevenueColumnAmt': u'458878', u'BusinessCd': u'900099', u'ExclusionAmt': u'458878', u'Desc': u'MEALS REVENUE'} {u'TotalRevenueColumnAmt': u'850981', u'BusinessCd': u'900099', u'ExclusionAmt': u'850981', u'Desc': u'REINSURANCE RECOVERY'} {u'TotalRevenueColumnAmt': u'600', u'BusinessCd': u'900099', u'RelatedOrExemptFuncIncomeAmt': u'600', u'Desc': u'MISCELLANEOUS'} {u'TotalRevenueColumnAmt': u'11140', u'BusinessCd': u'900099', u'RelatedOrExemptFuncIncomeAmt': u'11140', u'Desc': u'MISCELLANEOUS'} {u'TotalRevenueColumnAmt': u'457869', u'RelatedOrExemptFuncIncomeAmt': u'457869', u'Desc': u'PERPETUAL CARE ENDOWMENTS'} {u'TotalRevenueColumnAmt': u'1000000', u'RelatedOrExemptFuncIncomeAmt': u'1000000', u'Desc': u'LEGAL RECOVERY SETTLEMENT'} {u'TotalRevenueColumnAmt': u'7316917', u'BusinessCd': u'561000', u'RelatedOrExemptFuncIncomeAmt': u'7316917', u'Desc': u'AGENCY FEES'} {u'TotalRevenueColumnAmt': u'6937816', u'BusinessCd': u'561000', u'RelatedOrExemptFuncIncomeAmt': u'6937816', u'Desc': u'AGENCY FEES'} {u'TotalRevenueColumnAmt': u'74698', u'BusinessCd': u'900099', u'RelatedOrExemptFuncIncomeAmt': u'74698', u'Desc': u'OTHER INCOME'} {u'TotalRevenueColumnAmt': u'-2193', u'BusinessCd': u'900099', u'ExclusionAmt': u'-2193', u'Desc': u'MISCELLANEOUS INCOME'} {u'TotalRevenueColumnAmt': u'6750977', u'BusinessCd': u'900099', u'ExclusionAmt': u'6750977', u'Desc': u'DEBT FORGIVENESS'} {u'TotalRevenueColumnAmt': u'749391', u'BusinessCd': u'900099', u'ExclusionAmt': u'749391', u'Desc': u'DEBT FORGIVENESS'} {u'TotalRevenueColumnAmt': u'517', u'BusinessCd': u'713990', u'RelatedOrExemptFuncIncomeAmt': u'517', u'Desc': u'MISCELLANEOUS'} {u'TotalRevenueColumnAmt': u'121400', u'BusinessCd': u'900099', u'RelatedOrExemptFuncIncomeAmt': u'121400', u'Desc': u'BAD DEBT RECOVERY'} {u'TotalRevenueColumnAmt': u'292707', u'BusinessCd': u'900099', u'ExclusionAmt': u'292707', u'Desc': u'DEBT FORGIVENESS'} {u'TotalRevenueColumnAmt': u'2653040', u'BusinessCd': u'900099', u'ExclusionAmt': u'2653040', u'Desc': u'DEBT FORGIVENESS'} {u'TotalRevenueColumnAmt': u'5297259', u'BusinessCd': u'900099', u'ExclusionAmt': u'5297259', u'Desc': u'DEBT FORGIVENESS'} {u'TotalRevenueColumnAmt': u'9096', u'BusinessCd': u'624200', u'ExclusionAmt': u'9096', u'Desc': u'MISCELLANEOUS'} {u'TotalRevenueColumnAmt': u'1143351', u'ExclusionAmt': u'1143351', u'Desc': u'MISCELLANEOUS'} {u'TotalRevenueColumnAmt': u'667098', u'ExclusionAmt': u'667098', u'Desc': u'MISCELLANEOUS'} {u'TotalRevenueColumnAmt': u'147143', u'BusinessCd': u'900099', u'ExclusionAmt': u'147143', u'Desc': u'REFUNDED GRANTS'} {u'TotalRevenueColumnAmt': u'81840', u'BusinessCd': u'900099', u'ExclusionAmt': u'81840', u'Desc': u'REFUNDED GRANTS'} {u'TotalRevenueColumnAmt': u'15652', u'BusinessCd': u'900099', u'RelatedOrExemptFuncIncomeAmt': u'15652', u'Desc': u'OTHER REVENUE'} {u'TotalRevenueColumnAmt': u'5100', u'BusinessCd': u'900099', u'ExclusionAmt': u'5100', u'Desc': u'MISCELLANEOUS INCOME'} {u'TotalRevenueColumnAmt': u'2993', u'RelatedOrExemptFuncIncomeAmt': u'2993', u'Desc': u'OTHER'} {u'TotalRevenueColumnAmt': u'2407', u'RelatedOrExemptFuncIncomeAmt': u'2407', u'Desc': u'OTHER'} {u'TotalRevenueColumnAmt': u'325000', u'ExclusionAmt': u'325000', u'Desc': u'SETTLEMENT'} {u'TotalRevenueColumnAmt': u'1050', u'BusinessCd': u'900099', u'ExclusionAmt': u'1050', u'Desc': u'MISCELLANEOUS'} {u'TotalRevenueColumnAmt': u'100', u'BusinessCd': u'900099', u'ExclusionAmt': u'100', u'Desc': u'MISCELLANEOUS'} {u'TotalRevenueColumnAmt': u'561', u'BusinessCd': u'900099', u'RelatedOrExemptFuncIncomeAmt': u'561', u'Desc': u'OTHER INCOME'} {u'TotalRevenueColumnAmt': u'629', u'BusinessCd': u'900099', u'RelatedOrExemptFuncIncomeAmt': u'629', u'Desc': u'OTHER INCOME'} {u'TotalRevenueColumnAmt': u'2111004', u'BusinessCd': u'900099', u'ExclusionAmt': u'2111004', u'Desc': u'DEBT FORGIVENESS'} {u'UnrelatedBusinessRevenueAmt': u'336417', u'TotalRevenueColumnAmt': u'336417', u'BusinessCd': u'900004', u'Desc': u'JOB BOARD'} {u'UnrelatedBusinessRevenueAmt': u'247673', u'TotalRevenueColumnAmt': u'247673', u'BusinessCd': u'900004', u'Desc': u'JOB PLACEMENT'} {u'TotalRevenueColumnAmt': u'3990', u'BusinessCd': u'900099', u'RelatedOrExemptFuncIncomeAmt': u'3990', u'Desc': u'CONVENTIONS'} {u'TotalRevenueColumnAmt': u'1057365', u'BusinessCd': u'900099', u'ExclusionAmt': u'1057365', u'Desc': u'DEBT FORGIVENESS'} {u'TotalRevenueColumnAmt': u'27153', u'BusinessCd': u'900099', u'ExclusionAmt': u'27153', u'Desc': u'MISCELLANEOUS REVENUE'} {u'TotalRevenueColumnAmt': u'13039', u'ExclusionAmt': u'13039', u'Desc': u'MISC. REVENUE'} {u'TotalRevenueColumnAmt': u'13039', u'BusinessCd': u'900099', u'ExclusionAmt': u'13039', u'Desc': u'Miscellaneous'} {u'TotalRevenueColumnAmt': u'1262179', u'BusinessCd': u'900099', u'ExclusionAmt': u'1262179', u'Desc': u'DEBT FORGIVENESS'} {u'TotalRevenueColumnAmt': u'4985', u'BusinessCd': u'900099', u'RelatedOrExemptFuncIncomeAmt': u'4985', u'Desc': u'MISCELLANEOUS INCOME'} {u'UnrelatedBusinessRevenueAmt': u'165000', u'TotalRevenueColumnAmt': u'165000', u'BusinessCd': u'541800', u'Desc': u'ADVERTISING'} {u'UnrelatedBusinessRevenueAmt': u'130000', u'TotalRevenueColumnAmt': u'130000', u'BusinessCd': u'541800', u'Desc': u'ADVERTISING'} {u'TotalRevenueColumnAmt': u'165533', u'BusinessCd': u'900099', u'ExclusionAmt': u'165533', u'Desc': u'RECOVERY ON THEFT'} {u'TotalRevenueColumnAmt': u'45246', u'BusinessCd': u'999999', u'RelatedOrExemptFuncIncomeAmt': u'45246', u'Desc': u'MANAGEMENT SERVICE'} {u'TotalRevenueColumnAmt': u'1235', u'BusinessCd': u'900099', u'RelatedOrExemptFuncIncomeAmt': u'1235', u'Desc': u'MISCELLANEOUS INCOME'} {u'TotalRevenueColumnAmt': u'205555', u'BusinessCd': u'900099', u'RelatedOrExemptFuncIncomeAmt': u'205555', u'Desc': u'INCOME FROM FIRE INSURANCE'} {u'TotalRevenueColumnAmt': u'183015', u'BusinessCd': u'900099', u'RelatedOrExemptFuncIncomeAmt': u'183015', u'Desc': u'DEBT WRITE-OFF'} {u'TotalRevenueColumnAmt': u'13241', u'BusinessCd': u'900099', u'RelatedOrExemptFuncIncomeAmt': u'13241', u'Desc': u'RETURN ON FCC INVESTME'} {u'TotalRevenueColumnAmt': u'13481', u'BusinessCd': u'900099', u'RelatedOrExemptFuncIncomeAmt': u'13481', u'Desc': u'MISCELLANEOUS INCOME'} {u'TotalRevenueColumnAmt': u'55430', u'BusinessCd': u'900099', u'RelatedOrExemptFuncIncomeAmt': u'55430', u'Desc': u'MISCELLANEOUS INCOME'} {u'TotalRevenueColumnAmt': u'6436', u'ExclusionAmt': u'6436', u'Desc': u'MISCELLANEOUS'} {u'BusinessCd': u'531190', u'Desc': u'LOSS FROM PARTNERSHIP'} {u'TotalRevenueColumnAmt': u'25911', u'ExclusionAmt': u'25911', u'Desc': u'GAIN ON CSV OF LIFE INSURANCE'} {u'TotalRevenueColumnAmt': u'2737', u'ExclusionAmt': u'2737', u'Desc': u'Other revenue'} {u'TotalRevenueColumnAmt': u'6480', u'BusinessCd': u'900099', u'RelatedOrExemptFuncIncomeAmt': u'6480', u'Desc': u'MISC REFUNDS'} {u'TotalRevenueColumnAmt': u'33', u'ExclusionAmt': u'33', u'Desc': u'Other'} {u'TotalRevenueColumnAmt': u'5171923', u'BusinessCd': u'900099', u'ExclusionAmt': u'5171923', u'Desc': u'MULTIMEDIA CONTRACTUAL INCOME'} {u'UnrelatedBusinessRevenueAmt': u'16379', u'TotalRevenueColumnAmt': u'25900', u'BusinessCd': u'900099', u'ExclusionAmt': u'9521', u'Desc': u'FDOT ENCROACHMENT REMOVAL'} {u'TotalRevenueColumnAmt': u'16893', u'BusinessCd': u'900099', u'RelatedOrExemptFuncIncomeAmt': u'16893', u'Desc': u'SMALL BUSINESS HEALTH'} {u'TotalRevenueColumnAmt': u'11214', u'BusinessCd': u'900099', u'RelatedOrExemptFuncIncomeAmt': u'11214', u'Desc': u'FEDERAL TAX REFUND'} {u'TotalRevenueColumnAmt': u'427458', u'RelatedOrExemptFuncIncomeAmt': u'159837', u'BusinessCd': u'900199', u'ExclusionAmt': u'267621', u'Desc': u'OTHER REVENUE'} {u'TotalRevenueColumnAmt': u'301470', u'BusinessCd': u'621110', u'Desc': u'MISCELLANEOUS'} {u'TotalRevenueColumnAmt': u'738003', u'BusinessCd': u'621112', u'RelatedOrExemptFuncIncomeAmt': u'738003', u'Desc': u'Miscellaneous Revenue'} {u'TotalRevenueColumnAmt': u'130433', u'BusinessCd': u'900099', u'ExclusionAmt': u'130433', u'Desc': u'FLOOD INSURANCE PAYOUT'} {u'TotalRevenueColumnAmt': u'772420', u'BusinessCd': u'900099', u'ExclusionAmt': u'772420', u'Desc': u'BABC REGISTRATION FEES'} {u'TotalRevenueColumnAmt': u'1284814', u'BusinessCd': u'900099', u'ExclusionAmt': u'1284814', u'Desc': u'DEBT FORGIVENESS'} {u'TotalRevenueColumnAmt': u'1271392', u'BusinessCd': u'900099', u'ExclusionAmt': u'1271392', u'Desc': u'DEBT FORGIVENESS'} {u'TotalRevenueColumnAmt': u'87550', u'BusinessCd': u'900099', u'RelatedOrExemptFuncIncomeAmt': u'87550', u'Desc': u'MISCELLANEOUS'} {u'TotalRevenueColumnAmt': u'93778', u'BusinessCd': u'900099', u'RelatedOrExemptFuncIncomeAmt': u'93778', u'Desc': u'INSURANCE PROCEEDS'} {u'TotalRevenueColumnAmt': u'378405', u'BusinessCd': u'541800', u'RelatedOrExemptFuncIncomeAmt': u'378405', u'Desc': u'ADMINISTRATIVE INCOME'} {u'TotalRevenueColumnAmt': u'134786', u'BusinessCd': u'900099', u'RelatedOrExemptFuncIncomeAmt': u'134786', u'Desc': u'INSURANCE PROCEEDS FRO'} {u'TotalRevenueColumnAmt': u'105205', u'BusinessCd': u'900099', u'RelatedOrExemptFuncIncomeAmt': u'105205', u'Desc': u'MISCELLANEOUS'} {u'UnrelatedBusinessRevenueAmt': u'7929062', u'TotalRevenueColumnAmt': u'7929062', u'BusinessCd': u'812300', u'Desc': u'LAUNDRY SERVICES'} {u'TotalRevenueColumnAmt': u'5706', u'RelatedOrExemptFuncIncomeAmt': u'5706', u'Desc': u'OTHER RECEIPTS'} {u'TotalRevenueColumnAmt': u'2380', u'BusinessCd': u'900099', u'RelatedOrExemptFuncIncomeAmt': u'2380', u'Desc': u'GAIN/LOSS ON CURRENCY'} {u'TotalRevenueColumnAmt': u'-1340', u'BusinessCd': u'900099', u'RelatedOrExemptFuncIncomeAmt': u'-1340', u'Desc': u'GAIN/LOSS ON CURRENCY'} {u'TotalRevenueColumnAmt': u'1613', u'RelatedOrExemptFuncIncomeAmt': u'1613', u'Desc': u'DAF Mgmt Fees'} {u'TotalRevenueColumnAmt': u'11566', u'RelatedOrExemptFuncIncomeAmt': u'11566', u'Desc': u'OTHER REVENUE'} {u'TotalRevenueColumnAmt': u'37665', u'RelatedOrExemptFuncIncomeAmt': u'37665', u'Desc': u'OTHER REVENUE'} {u'TotalRevenueColumnAmt': u'5005', u'ExclusionAmt': u'5005', u'Desc': u'MISC. REFUNDS'} {u'TotalRevenueColumnAmt': u'66924', u'BusinessCd': u'515100', u'ExclusionAmt': u'66924', u'Desc': u'MISCELLANEOUS'} {u'TotalRevenueColumnAmt': u'1548749', u'BusinessCd': u'900099', u'RelatedOrExemptFuncIncomeAmt': u'1548749', u'Desc': u'VALUE ADDED TAX REBATE'} {u'TotalRevenueColumnAmt': u'137059', u'BusinessCd': u'900099', u'ExclusionAmt': u'137059', u'Desc': u'OTHER'} {u'TotalRevenueColumnAmt': u'7299', u'ExclusionAmt': u'7299', u'Desc': u'Gain on A/C ins. claim'} {u'TotalRevenueColumnAmt': u'50340', u'BusinessCd': u'900099', u'ExclusionAmt': u'50340', u'Desc': u'List Rental'} {u'TotalRevenueColumnAmt': u'19820777', u'BusinessCd': u'900099', u'RelatedOrExemptFuncIncomeAmt': u'19820777', u'Desc': u'OTHER INCOME- FROM AFFILIATES'} {u'TotalRevenueColumnAmt': u'15369743', u'BusinessCd': u'900099', u'RelatedOrExemptFuncIncomeAmt': u'15369743', u'Desc': u'OTHER INCOME- FROM AFFILIATES'}
df[['OtherRevenueTotalAmt', 'OtherRevMisc', 'OtherRevenueMiscGrp','TotalOtherRevenue', 'CYOtherRevenueAmt']][:10]
OtherRevenueTotalAmt | OtherRevMisc | OtherRevenueMiscGrp | TotalOtherRevenue | CYOtherRevenueAmt | |
---|---|---|---|---|---|
252 | NaN | NaN | NaN | 0 | NaN |
433 | NaN | NaN | NaN | 0 | NaN |
78 | NaN | NaN | NaN | 0 | NaN |
333 | 428192 | NaN | [{u'TotalRevenueColumnAmt': u'33550', u'BusinessCd': u'531390', u'RelatedOrExemptFuncIncomeAmt': u'33550', u'Desc': u'PARKING INCOME'}, {u'TotalRevenueColumnAmt': u'394642', u'BusinessCd': u'900099', u'RelatedOrExemptFuncIncomeAmt': u'394642', u'... | NaN | 1017789 |
12 | NaN | NaN | NaN | 451157 | NaN |
481 | NaN | NaN | NaN | 458249 | NaN |
16 | NaN | NaN | NaN | 58199 | NaN |
130 | NaN | NaN | NaN | 91122 | NaN |
434 | 15142 | NaN | {u'TotalRevenueColumnAmt': u'15142', u'BusinessCd': u'900099', u'ExclusionAmt': u'15142', u'Desc': u'OTHER INCOME'} | NaN | 75409 |
374 | NaN | NaN | NaN | 14151 | NaN |
print len(df[df['OtherRevenueTotalAmt'].notnull()])
print len(df[df['InvestmentIncomeCurrentYear'].notnull()])
df['OtherRevenueTotalAmt'] = np.where( ((df['OtherRevenueTotalAmt'].isnull()) & (df['CYOtherRevenueAmt'].notnull())),
df['CYOtherRevenueAmt'], df['OtherRevenueTotalAmt'] )
print len(df[df['OtherRevenueTotalAmt'].notnull()])
df['OtherRevenueTotalAmt'] = np.where( ((df['OtherRevenueTotalAmt'].isnull()) & (df['TotalOtherRevenue'].notnull())),
df['TotalOtherRevenue'], df['OtherRevenueTotalAmt'] )
print len(df[df['OtherRevenueTotalAmt'].notnull()])
df['OtherRevenueTotalAmt'] = np.where( ((df['OtherRevenueTotalAmt'].isnull()) & (df['OtherRevMisc'].notnull())),
df['OtherRevMisc'], df['OtherRevenueTotalAmt'] )
print len(df[df['OtherRevenueTotalAmt'].notnull()])
df['OtherRevenueTotalAmt'].describe().T
120 344 194 414 414
count 414 unique 294 top 0 freq 119 Name: OtherRevenueTotalAmt, dtype: object
complexity_columns = ['CYContributionsGrantsAmt', 'FederatedCampaigns', #'FederatedCampaignsAmt',
'MembershipDues', #'MembershipDuesAmt',
#'NetIncomeFromFundraisingEvents', 'NetIncmFromFundraisingEvtGrp',
#'FundraisingActivities', 'FundraisingActivitiesInd',
#'FundraisingAmt',
#'FundraisingDirectExpenses', 'FundraisingDirectExpensesAmt', 'FundraisingEvents',
#'FundraisingGrossIncomeAmt',
'fundraising',
'RelatedOrganizations', #'RelatedOrganizationsAmt',
'GovernmentGrants', #'GovernmentGrantsAmt',
#'PYProgramServiceRevenueAmt', 'CYProgramServiceRevenueAmt',
#'ProgramServiceRevenue', 'ProgramServiceRevenueCY', 'ProgramServiceRevenueGrp',
#'TotalProgramServiceRevenue', 'TotalProgramServiceRevenueAmt',
'program_revenue',
'CYInvestmentIncomeAmt',
#'InvestmentIncome', 'InvestmentIncomeCurrentYear', 'InvestmentIncomeGrp',
'OtherRevenueTotalAmt', #'OtherRevenueMiscGrp',
#'TotalOtherRevenue', 'CYOtherRevenueAmt'
]
#[contributions_gifts_grants_2016', 'federated_campaigns_2016', 'membership_dues_2016',
# 'fundraising_events_2016', 'related_organizations_2016', 'government_grants_2016',
# 'program_service_revenue_2016', 'other_revenue_2016']
df[complexity_columns][:10]
CYContributionsGrantsAmt | FederatedCampaigns | MembershipDues | fundraising | RelatedOrganizations | GovernmentGrants | program_revenue | CYInvestmentIncomeAmt | OtherRevenueTotalAmt | |
---|---|---|---|---|---|---|---|---|---|
252 | NaN | NaN | NaN | 0.0 | NaN | 185408 | 0 | 63 | 0 |
433 | NaN | NaN | NaN | 0.0 | NaN | NaN | 0 | 0 | 0 |
78 | NaN | NaN | NaN | 0.0 | NaN | NaN | 0 | 0 | 0 |
333 | 20437991 | NaN | NaN | 0.0 | 2053228 | 16124016 | 0 | 1539746 | 428192 |
12 | NaN | NaN | NaN | 0.0 | 2466016 | 16660756 | 0 | 1195606 | 451157 |
481 | NaN | NaN | NaN | 221449.0 | 954610 | 21380666 | 0 | 1249229 | 458249 |
16 | NaN | NaN | NaN | 23265.0 | NaN | 19362978 | 0 | 703373 | 58199 |
130 | NaN | 0 | 0 | 1322.0 | 0 | 20439097 | 0 | 1195892 | 91122 |
434 | 556113 | NaN | NaN | 9429.0 | NaN | NaN | 0 | 132915 | 15142 |
374 | NaN | NaN | NaN | 2923.0 | NaN | NaN | 0 | 141660 | 14151 |
print complexity_columns
['CYContributionsGrantsAmt', 'FederatedCampaigns', 'MembershipDues', 'fundraising', 'RelatedOrganizations', 'GovernmentGrants', 'program_revenue', 'CYInvestmentIncomeAmt', 'OtherRevenueTotalAmt']
df['CYContributionsGrantsAmt'] = df['CYContributionsGrantsAmt'].astype('float')
df['FederatedCampaigns'] = df['FederatedCampaigns'].astype('float')
df['MembershipDues'] = df['MembershipDues'].astype('float')
df['fundraising'] = df['fundraising'].astype('float')
df['RelatedOrganizations'] = df['RelatedOrganizations'].astype('float')
df['GovernmentGrants'] = df['GovernmentGrants'].astype('float')
df['program_revenue'] = df['program_revenue'].astype('float')
df['CYInvestmentIncomeAmt'] = df['CYInvestmentIncomeAmt'].astype('float')
df['OtherRevenueTotalAmt'] = df['OtherRevenueTotalAmt'].astype('float')
df[complexity_columns].dtypes
CYContributionsGrantsAmt float64 FederatedCampaigns float64 MembershipDues float64 fundraising float64 RelatedOrganizations float64 GovernmentGrants float64 program_revenue float64 CYInvestmentIncomeAmt float64 OtherRevenueTotalAmt float64 dtype: object
df[complexity_columns].describe().T
count | mean | std | min | 25% | 50% | 75% | max | |
---|---|---|---|---|---|---|---|---|
CYContributionsGrantsAmt | 194.0 | 1.462879e+07 | 9.210089e+07 | 0.0 | NaN | NaN | NaN | 9.182532e+08 |
FederatedCampaigns | 40.0 | 8.356717e+05 | 1.710843e+06 | 0.0 | NaN | NaN | NaN | 6.051068e+06 |
MembershipDues | 44.0 | 9.858459e+04 | 1.330675e+05 | 0.0 | NaN | NaN | NaN | 6.000830e+05 |
fundraising | 228.0 | 8.138974e+04 | 5.064406e+05 | -283900.0 | NaN | NaN | NaN | 7.283318e+06 |
RelatedOrganizations | 52.0 | 3.521487e+06 | 5.394679e+06 | 0.0 | NaN | NaN | NaN | 2.086632e+07 |
GovernmentGrants | 213.0 | 1.975853e+07 | 9.133767e+07 | 0.0 | NaN | NaN | NaN | 7.549167e+08 |
program_revenue | 538.0 | 4.042349e+07 | 2.827323e+08 | 0.0 | 0.0 | 172176.0 | 3712362.0 | 3.090399e+09 |
CYInvestmentIncomeAmt | 538.0 | 9.094292e+05 | 1.338435e+07 | -218265025.0 | 0.0 | 1222.0 | 36411.5 | 1.583775e+08 |
OtherRevenueTotalAmt | 414.0 | 2.131831e+06 | 1.163631e+07 | -162277.0 | NaN | NaN | NaN | 1.463759e+08 |
#df = df.drop('other_revenue', 1)
print len(df[df['CYInvestmentIncomeAmt'].notnull()])
print len(df[df['OtherRevenueTotalAmt'].notnull()])
df['other_revenue_binary'] = np.nan
df['other_revenue_binary'] = np.where(df['CYInvestmentIncomeAmt'].notnull(), df['CYInvestmentIncomeAmt'],
df['OtherRevenueTotalAmt'])
print len(df[df['other_revenue_binary'].notnull()])
538 414 538
print complexity_columns
['CYContributionsGrantsAmt', 'FederatedCampaigns', 'MembershipDues', 'fundraising', 'RelatedOrganizations', 'GovernmentGrants', 'program_revenue', 'CYInvestmentIncomeAmt', 'OtherRevenueTotalAmt']
complexity_columns = ['CYContributionsGrantsAmt', 'FederatedCampaigns', 'MembershipDues', 'fundraising',
'RelatedOrganizations', 'GovernmentGrants', 'program_revenue', 'other_revenue_binary',
'CYInvestmentIncomeAmt', 'OtherRevenueTotalAmt']
df[complexity_columns][10:30]
CYContributionsGrantsAmt | FederatedCampaigns | MembershipDues | fundraising | RelatedOrganizations | GovernmentGrants | program_revenue | other_revenue_binary | CYInvestmentIncomeAmt | OtherRevenueTotalAmt | |
---|---|---|---|---|---|---|---|---|---|---|
118 | NaN | NaN | NaN | 47210.0 | NaN | NaN | 0.0 | 182952.0 | 182952.0 | 124761.0 |
124 | NaN | NaN | NaN | 99073.0 | NaN | NaN | 0.0 | 92923.0 | 92923.0 | 9015.0 |
263 | 994088.0 | NaN | NaN | NaN | NaN | NaN | 76449.0 | -141807.0 | -141807.0 | 0.0 |
196 | NaN | NaN | NaN | NaN | NaN | NaN | 76449.0 | 0.0 | 0.0 | NaN |
256 | NaN | NaN | NaN | NaN | NaN | NaN | 205125.0 | 0.0 | 0.0 | NaN |
233 | NaN | NaN | NaN | NaN | NaN | NaN | 256109.0 | -52721.0 | -52721.0 | NaN |
129 | NaN | NaN | NaN | NaN | NaN | NaN | 107536.0 | 0.0 | 0.0 | NaN |
319 | 104931.0 | NaN | NaN | 0.0 | NaN | NaN | 224006.0 | 23.0 | 23.0 | 0.0 |
394 | 86418.0 | NaN | NaN | 0.0 | NaN | NaN | 206873.0 | 30.0 | 30.0 | 0.0 |
462 | NaN | NaN | NaN | 0.0 | NaN | NaN | 206873.0 | 4.0 | 4.0 | 0.0 |
101 | NaN | NaN | NaN | 0.0 | NaN | NaN | 194006.0 | 7.0 | 7.0 | 0.0 |
528 | 12294982.0 | NaN | NaN | NaN | NaN | 646173.0 | 0.0 | 1549.0 | 1549.0 | 0.0 |
401 | 12741996.0 | NaN | NaN | NaN | NaN | 629939.0 | 0.0 | 1656.0 | 1656.0 | 0.0 |
361 | NaN | NaN | NaN | NaN | NaN | 437339.0 | 0.0 | 2695.0 | 2695.0 | NaN |
36 | NaN | NaN | NaN | NaN | NaN | 461226.0 | 0.0 | 2317.0 | 2317.0 | NaN |
41 | NaN | NaN | NaN | NaN | NaN | 373496.0 | 0.0 | 2984.0 | 2984.0 | NaN |
63 | NaN | NaN | NaN | NaN | NaN | NaN | 0.0 | 0.0 | 0.0 | NaN |
93 | NaN | NaN | NaN | NaN | NaN | NaN | 0.0 | 642.0 | 642.0 | NaN |
504 | NaN | NaN | NaN | NaN | NaN | NaN | 0.0 | 616.0 | 616.0 | NaN |
31 | 307743.0 | NaN | NaN | NaN | NaN | NaN | 0.0 | 5902.0 | 5902.0 | 155000.0 |
#[contributions_gifts_grants_2016', 'federated_campaigns_2016', 'membership_dues_2016',
# 'fundraising_events_2016', 'related_organizations_2016', 'government_grants_2016',
# 'program_service_revenue_2016', 'other_revenue_2016']
df[complexity_columns].describe().T
count | mean | std | min | 25% | 50% | 75% | max | |
---|---|---|---|---|---|---|---|---|
CYContributionsGrantsAmt | 194.0 | 1.462879e+07 | 9.210089e+07 | 0.0 | NaN | NaN | NaN | 9.182532e+08 |
FederatedCampaigns | 40.0 | 8.356717e+05 | 1.710843e+06 | 0.0 | NaN | NaN | NaN | 6.051068e+06 |
MembershipDues | 44.0 | 9.858459e+04 | 1.330675e+05 | 0.0 | NaN | NaN | NaN | 6.000830e+05 |
fundraising | 228.0 | 8.138974e+04 | 5.064406e+05 | -283900.0 | NaN | NaN | NaN | 7.283318e+06 |
RelatedOrganizations | 52.0 | 3.521487e+06 | 5.394679e+06 | 0.0 | NaN | NaN | NaN | 2.086632e+07 |
GovernmentGrants | 213.0 | 1.975853e+07 | 9.133767e+07 | 0.0 | NaN | NaN | NaN | 7.549167e+08 |
program_revenue | 538.0 | 4.042349e+07 | 2.827323e+08 | 0.0 | 0.0 | 172176.0 | 3712362.0 | 3.090399e+09 |
other_revenue_binary | 538.0 | 9.094292e+05 | 1.338435e+07 | -218265025.0 | 0.0 | 1222.0 | 36411.5 | 1.583775e+08 |
CYInvestmentIncomeAmt | 538.0 | 9.094292e+05 | 1.338435e+07 | -218265025.0 | 0.0 | 1222.0 | 36411.5 | 1.583775e+08 |
OtherRevenueTotalAmt | 414.0 | 2.131831e+06 | 1.163631e+07 | -162277.0 | NaN | NaN | NaN | 1.463759e+08 |
print complexity_columns
['CYContributionsGrantsAmt', 'FederatedCampaigns', 'MembershipDues', 'fundraising', 'RelatedOrganizations', 'GovernmentGrants', 'program_revenue', 'other_revenue_binary', 'CYInvestmentIncomeAmt', 'OtherRevenueTotalAmt']
complexity_columns = ['CYContributionsGrantsAmt', 'FederatedCampaigns', 'MembershipDues', 'fundraising',
'RelatedOrganizations', 'GovernmentGrants', 'program_revenue', 'other_revenue_binary']
df['complexity'] = (df[complexity_columns] > 0).sum(1)
df['complexity'][:3]
252 2 433 0 78 0 Name: complexity, dtype: int64
complexity_columns = ['complexity'] + complexity_columns
df['complexity'].describe()
count 538.000000 mean 2.431227 std 1.278642 min 0.000000 25% 2.000000 50% 2.000000 75% 3.000000 max 7.000000 Name: complexity, dtype: float64
df[complexity_columns].describe().T
count | mean | std | min | 25% | 50% | 75% | max | |
---|---|---|---|---|---|---|---|---|
complexity | 538.0 | 2.431227e+00 | 1.278642e+00 | 0.0 | 2.0 | 2.0 | 3.0 | 7.000000e+00 |
CYContributionsGrantsAmt | 194.0 | 1.462879e+07 | 9.210089e+07 | 0.0 | NaN | NaN | NaN | 9.182532e+08 |
FederatedCampaigns | 40.0 | 8.356717e+05 | 1.710843e+06 | 0.0 | NaN | NaN | NaN | 6.051068e+06 |
MembershipDues | 44.0 | 9.858459e+04 | 1.330675e+05 | 0.0 | NaN | NaN | NaN | 6.000830e+05 |
fundraising | 228.0 | 8.138974e+04 | 5.064406e+05 | -283900.0 | NaN | NaN | NaN | 7.283318e+06 |
RelatedOrganizations | 52.0 | 3.521487e+06 | 5.394679e+06 | 0.0 | NaN | NaN | NaN | 2.086632e+07 |
GovernmentGrants | 213.0 | 1.975853e+07 | 9.133767e+07 | 0.0 | NaN | NaN | NaN | 7.549167e+08 |
program_revenue | 538.0 | 4.042349e+07 | 2.827323e+08 | 0.0 | 0.0 | 172176.0 | 3712362.0 | 3.090399e+09 |
other_revenue_binary | 538.0 | 9.094292e+05 | 1.338435e+07 | -218265025.0 | 0.0 | 1222.0 | 36411.5 | 1.583775e+08 |
#[contributions_gifts_grants_2016', 'federated_campaigns_2016', 'membership_dues_2016',
# 'fundraising_events_2016', 'related_organizations_2016', 'government_grants_2016',
# 'program_service_revenue_2016', 'other_revenue_2016']
SOX_columns = ['conflict_of_interest_policy', 'whistleblower_policy', 'records_retention_policy']
#df['SOX_policies'] = (df[SOX_columns] > 0).sum(1)
df['SOX_policies'] = np.where(df['conflict_of_interest_policy'].notnull(),
(df[SOX_columns] > 0).sum(1), np.nan)
df['SOX_policies'][:3]
252 2.0 433 2.0 78 2.0 Name: SOX_policies, dtype: float64
SOX_columns = ['SOX_policies'] + SOX_columns
df[SOX_columns][:8]
SOX_policies | conflict_of_interest_policy | whistleblower_policy | records_retention_policy | |
---|---|---|---|---|
252 | 2.0 | 1 | 0 | 1 |
433 | 2.0 | 1 | 0 | 1 |
78 | 2.0 | 1 | 0 | 1 |
333 | 3.0 | 1 | 1 | 1 |
12 | 3.0 | 1 | 1 | 1 |
481 | 3.0 | 1 | 1 | 1 |
16 | 3.0 | 1 | 1 | 1 |
130 | 3.0 | 1 | 1 | 1 |
Binary version ('1' equals any SOX policy).
df['SOX_policies_binary'] = df['SOX_policies']
df['SOX_policies_binary'] = np.where(df['SOX_policies_binary']>=1, 1, df['SOX_policies'])
print df['SOX_policies_binary'].value_counts().sum()
df['SOX_policies_binary'].value_counts()
538
1.0 463 0.0 75 Name: SOX_policies_binary, dtype: int64
Create binary version for all three policies
df['SOX_policies_all_binary'] = np.nan
df['SOX_policies_all_binary'] = df['SOX_policies']
df['SOX_policies_all_binary'] = np.where( ((df['SOX_policies_all_binary']==1) | (df['SOX_policies_all_binary']==2)),
0, df['SOX_policies_all_binary'])
df['SOX_policies_all_binary'] = np.where(df['SOX_policies_all_binary']==3, 1, df['SOX_policies_all_binary'])
print df['SOX_policies_all_binary'].value_counts()
1.0 293 0.0 245 Name: SOX_policies_all_binary, dtype: int64
print df['SOX_policies'].value_counts(), '\n'
print df['SOX_policies_binary'].value_counts(), '\n'
print df['SOX_policies_all_binary'].value_counts()
3.0 293 2.0 105 0.0 75 1.0 65 Name: SOX_policies, dtype: int64 1.0 463 0.0 75 Name: SOX_policies_binary, dtype: int64 1.0 293 0.0 245 Name: SOX_policies_all_binary, dtype: int64
print len(df.columns)
print df.columns.tolist()
54 ['FYE', u'EIN', u'OrganizationName', u'URL', u'SubmittedOn', u'TaxPeriod', 'tot_rev', 'whistleblower_policy', 'conflict_of_interest_policy', 'records_retention_policy', u'CYTotalExpensesAmt', u'TotalExpensesCurrentYear', u'TotalFunctionalExpenses', u'TotalFunctionalExpensesGrp', u'TotalProgramServiceExpense', u'TotalProgramServiceExpensesAmt', u'CYContributionsGrantsAmt', u'FederatedCampaigns', u'FederatedCampaignsAmt', u'MembershipDues', u'MembershipDuesAmt', u'NetIncomeFromFundraisingEvents', u'NetIncmFromFundraisingEvtGrp', u'FundraisingAmt', u'FundraisingGrossIncomeAmt', u'RelatedOrganizations', u'RelatedOrganizationsAmt', u'GovernmentGrants', u'GovernmentGrantsAmt', u'PYProgramServiceRevenueAmt', u'CYProgramServiceRevenueAmt', u'ProgramServiceRevenue', u'ProgramServiceRevenueCY', u'ProgramServiceRevenueGrp', u'TotalProgramServiceRevenue', u'TotalProgramServiceRevenueAmt', u'CYInvestmentIncomeAmt', u'InvestmentIncome', u'InvestmentIncomeCurrentYear', u'InvestmentIncomeGrp', u'OtherRevenueTotalAmt', u'OtherRevenueMiscGrp', u'TotalOtherRevenue', u'CYOtherRevenueAmt', 'total_expenses', 'program_expenses', 'fundraising', 'program_revenue', 'OtherRevMisc', 'other_revenue_binary', 'complexity', 'SOX_policies', 'SOX_policies_binary', 'SOX_policies_all_binary']
df.to_pickle('538 e-file 990s for 2016 donor advisory organizations, v3 (all calc columns).pkl')
file_list_columns = ['EIN', 'OrganizationName', 'URL', 'SubmittedOn', 'TaxPeriod', 'FYE']
SOX_columns = ['whistleblower_policy', 'conflict_of_interest_policy', 'records_retention_policy',
'SOX_policies', 'SOX_policies_binary', 'SOX_policies_all_binary']
control_columns = ['tot_rev', 'total_expenses', 'program_expenses', 'program_efficiency', 'complexity']
unneeded_columns = ['CYTotalExpensesAmt', u'TotalExpensesCurrentYear', u'TotalFunctionalExpenses',
'TotalFunctionalExpensesGrp', u'TotalProgramServiceExpense', u'TotalProgramServiceExpensesAmt',
'CYContributionsGrantsAmt', u'FederatedCampaigns', u'FederatedCampaignsAmt', u'MembershipDues',
'MembershipDuesAmt', u'NetIncomeFromFundraisingEvents', u'NetIncmFromFundraisingEvtGrp',
'FundraisingAmt', 'FundraisingGrossIncomeAmt', u'RelatedOrganizations',
'RelatedOrganizationsAmt', u'GovernmentGrants', 'GovernmentGrantsAmt',
'PYProgramServiceRevenueAmt', u'CYProgramServiceRevenueAmt', 'ProgramServiceRevenue',
'ProgramServiceRevenueCY', u'ProgramServiceRevenueGrp', u'TotalProgramServiceRevenue',
'TotalProgramServiceRevenueAmt', u'CYInvestmentIncomeAmt', u'InvestmentIncome',
'InvestmentIncomeCurrentYear', 'InvestmentIncomeGrp', u'OtherRevenueTotalAmt',
'OtherRevenueMiscGrp', u'TotalOtherRevenue', 'CYOtherRevenueAmt', 'fundraising',
'program_revenue', 'OtherRevMisc', 'other_revenue_binary']
cols = file_list_columns+SOX_columns+control_columns
df = df[cols]
df[cols][:5]
EIN | OrganizationName | URL | SubmittedOn | TaxPeriod | FYE | whistleblower_policy | conflict_of_interest_policy | records_retention_policy | SOX_policies | SOX_policies_binary | SOX_policies_all_binary | tot_rev | total_expenses | program_expenses | program_efficiency | complexity | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
252 | 030498214 | NEWARK NOW INC | https://s3.amazonaws.com/irs-form-990/201303199349303505_public.xml | 2013-12-31 | 201212 | FY2012 | 0 | 1 | 1 | 2.0 | 1.0 | 0.0 | 2515399 | 2100922.0 | 1598626.0 | 0.760916 | 2 |
433 | 030498214 | NEWARK NOW INC | https://s3.amazonaws.com/irs-form-990/201320249349300417_public.xml | 2013-02-14 | 201112 | FY2011 | 0 | 1 | 1 | 2.0 | 1.0 | 0.0 | 2161209 | 2481310.0 | 1715793.0 | 0.691487 | 0 |
78 | 030498214 | NEWARK NOW INC | https://s3.amazonaws.com/irs-form-990/201220909349300327_public.xml | 2012-05-25 | 201012 | FY2010 | 0 | 1 | 1 | 2.0 | 1.0 | 0.0 | 3990564 | 3912643.0 | 2930785.0 | 0.749055 | 0 |
333 | 042129889 | SCHEPENS EYE RESEARCH INSTITUTE INC | https://s3.amazonaws.com/irs-form-990/201502299349301700_public.xml | 2015-09-21 | 201409 | FY2014 | 1 | 1 | 1 | 3.0 | 1.0 | 1.0 | 22995526 | 25626262.0 | 18485740.0 | 0.721359 | 4 |
12 | 042129889 | SCHEPENS EYE RESEARCH INSTITUTE INC | https://s3.amazonaws.com/irs-form-990/201442269349304504_public.xml | 2014-10-28 | 201309 | FY2013 | 1 | 1 | 1 | 3.0 | 1.0 | 1.0 | 25923449 | 26224173.0 | 19280493.0 | 0.735218 | 3 |
14 orgs are missing program_expenses. Dan and I agreed to count these as '0' (see email 9/5/16)
cols2 = ['EIN', 'FYE', 'total_expenses', 'program_expenses', 'program_efficiency']
df[df['program_efficiency'].isnull()][cols2]
EIN | FYE | total_expenses | program_expenses | program_efficiency | |
---|---|---|---|---|---|
172 | 132644507 | FY2014 | 7763846.0 | NaN | NaN |
404 | 132644507 | FY2013 | 7795021.0 | NaN | NaN |
431 | 132644507 | FY2012 | 6237509.0 | NaN | NaN |
106 | 132644507 | FY2011 | 5880789.0 | NaN | NaN |
72 | 132644507 | FY2010 | 6024086.0 | NaN | NaN |
52 | 135579302 | FY2014 | 2023750.0 | NaN | NaN |
406 | 135579302 | FY2013 | 4133761.0 | NaN | NaN |
500 | 201737542 | FY2010 | 177901.0 | NaN | NaN |
0 | 271317072 | FY2014 | 34400.0 | NaN | NaN |
277 | 330837780 | FY2014 | 3330.0 | NaN | NaN |
226 | 591951577 | FY2011 | 1690623.0 | NaN | NaN |
213 | 591951577 | FY2010 | 1161939.0 | NaN | NaN |
495 | 591951577 | FY2010 | 1282149.0 | NaN | NaN |
496 | 953553530 | FY2011 | 0.0 | NaN | NaN |
#df2 = pd.DataFrame(list(filings.find()))
#print '# of columns:', len(df2.columns)
#print '# of observations:', len(df2)
#df2.head(1)
# of columns: 671 # of observations: 538
@documentId | @referenceDocumentId | @referenceDocumentName | @softwareId | @softwareVersion | @softwareVersionNum | AccountantCompileOrReview | AccountantCompileOrReviewBasis | AccountantCompileOrReviewInd | AccountsPayableAccrExpnssGrp | AccountsPayableAccruedExpenses | AccountsReceivable | AccountsReceivableGrp | AcctCompileOrReviewBasisGrp | ActivitiesConductedPartnership | ActivitiesConductedPrtshpInd | Activity2 | Activity3 | ActivityOrMissionDesc | ActivityOrMissionDescription | ActivityOther | AddressChange | AddressChangeInd | AddressPrincipalOfficerUS | Advertising | AdvertisingGrp | AllAffiliatesIncluded | AllAffiliatesIncludedInd | AllOtherContributions | AllOtherContributionsAmt | AllOtherExpenses | AllOtherExpensesGrp | AmendedReturn | AmendedReturnInd | AnnualDisclosureCoveredPersons | AnnualDisclosureCoveredPrsnInd | AuditCommittee | AuditCommitteeInd | AuditedFinancialStmtAttInd | BackupWthldComplianceInd | BalanceSheetAmountsReported | BenefitsPaidToMembersCY | BenefitsPaidToMembersPriorYear | BenefitsToMembers | BenefitsToMembersGrp | BooksInCareOfDetail | BsnssRltnshpThruFamilyMember | BsnssRltnshpWithOrganization | BusinessRlnWithFamMemInd | BusinessRlnWithOfficerEntInd | BusinessRlnWithOrgMemInd | CYBenefitsPaidToMembersAmt | CYContributionsGrantsAmt | CYGrantsAndSimilarPaidAmt | CYInvestmentIncomeAmt | CYOtherExpensesAmt | CYOtherRevenueAmt | CYProgramServiceRevenueAmt | CYRevenuesLessExpensesAmt | CYSalariesCompEmpBnftPaidAmt | CYTotalExpensesAmt | CYTotalFundraisingExpenseAmt | CYTotalProfFndrsngExpnsAmt | CYTotalRevenueAmt | CapStckTrstPrinCurrentFunds | CapStkTrPrinCurrentFundsGrp | CashNonInterestBearing | CashNonInterestBearingGrp | ChangeToOrgDocumentsInd | ChangesToOrganizingDocs | CntrbtnsRprtdFundraisingEvents | CntrctRcvdGreaterThan100KCnt | CollectionsOfArt | CollectionsOfArtInd | CompCurrentOfcrDirectorsGrp | CompCurrentOfficersDirectors | CompDisqualPersons | CompDisqualPersonsGrp | CompensationFromOtherSources | CompensationFromOtherSrcsInd | CompensationProcessCEO | CompensationProcessCEOInd | CompensationProcessOther | CompensationProcessOtherInd | ComplianceWithBackupWitholding | ConferencesMeetings | ConferencesMeetingsGrp | ConflictOfInterestPolicy | ConflictOfInterestPolicyInd | ConservationEasements | ConservationEasementsInd | ConsolidatedAuditFinancialStmt | ConsolidatedAuditFinclStmtInd | ContractorCompensation | ContractorCompensationGrp | ContriRptFundraisingEventAmt | ContributionsGrantsCurrentYear | ContributionsGrantsPriorYear | CostOfGoodsSold | CostOfGoodsSoldAmt | CreditCounseling | CreditCounselingInd | DAFExcessBusinessHoldingsInd | DLN | DecisionsSubjectToApprovaInd | DecisionsSubjectToApproval | DeductibleArtContributionInd | DeductibleContributionsOfArt | DeductibleNonCashContriInd | DeductibleNonCashContributions | DeferredRevenue | DeferredRevenueGrp | DelegationOfManagementDuties | DelegationOfMgmtDutiesInd | DepreciationDepletion | DepreciationDepletionGrp | Desc | DescribedIn501C3 | DescribedInSection501c3Ind | Description | DisregardedEntity | DisregardedEntityInd | DistributionToDonor | DistributionToDonorInd | DoNotFollowSFAS117 | DocumentRetentionPolicy | DocumentRetentionPolicyInd | DoingBusinessAs | DoingBusinessAsName | DonatedServicesAndUseFcltsAmt | DonorAdvisedFundInd | DonorAdvisedFunds | EIN | ElectionOfBoardMembers | ElectionOfBoardMembersInd | EmployeeCnt | EmploymentTaxReturnsFiled | EmploymentTaxReturnsFiledInd | EngagedInExcessBenefitTransInd | EscrowAccount | EscrowAccountInd | EscrowAccountLiability | EscrowAccountLiabilityGrp | ExcessBenefitTransaction | ExcessBusinessHoldings | ExcessBusinessHoldingsInd | Expense | ExpenseAmt | FSAudited | FSAuditedBasis | FSAuditedBasisGrp | FSAuditedInd | FamilyOrBusinessRelationship | FamilyOrBusinessRlnInd | FederalGrantAuditPerformed | FederalGrantAuditPerformedInd | FederalGrantAuditRequired | FederalGrantAuditRequiredInd | FederatedCampaigns | FederatedCampaignsAmt | FeesForServicesAccounting | FeesForServicesAccountingGrp | FeesForServicesInvstMgmntFees | FeesForServicesLegal | FeesForServicesLegalGrp | FeesForServicesLobbying | FeesForServicesLobbyingGrp | FeesForServicesManagement | FeesForServicesManagementGrp | FeesForServicesOther | FeesForServicesOtherGrp | FeesForServicesProfFundraising | FeesForSrvcInvstMgmntFeesGrp | FiledLieu1041 | FinancialStatementBoth | FinancialStatementConsolidated | FinancialStatementSeparate | FinancialStmtAttached | FollowSFAS117 | ForeignActivities | ForeignActivitiesInd | ForeignAddress | ForeignCountryCd | ForeignFinancialAccount | ForeignFinancialAccountInd | ForeignGrants | ForeignGrantsGrp | ForeignOffice | ForeignOfficeInd | Form1098CFiled | Form1098CFiledInd | Form720Filed | Form8282FiledCnt | Form8282PropertyDisposedOf | Form8282PropertyDisposedOfInd | Form8886TFiled | Form8886TFiledInd | Form8899Filed | Form8899Filedind | Form990-TFiled | Form990PartVIISectionA | Form990PartVIISectionAGrp | Form990ProvidedToGoverningBody | Form990ProvidedToGvrnBodyInd | Form990TFiledInd | FormType | FormationYr | FormerOfcrEmployeesListedInd | FormersListed | FundraisingActivities | FundraisingActivitiesInd | FundraisingAmt | FundraisingDirectExpenses | FundraisingDirectExpensesAmt | FundraisingEvents | FundraisingGrossIncomeAmt | FundsToPayPremiums | GainOrLoss | GainOrLossGrp | Gaming | GamingActivitiesInd | GoverningBodyVotingMembersCnt | GovernmentGrants | GovernmentGrantsAmt | GrantAmt | GrantToRelatedPerson | GrantToRelatedPersonInd | Grants | GrantsAndSimilarAmntsCY | GrantsAndSimilarAmntsPriorYear | GrantsPayable | GrantsPayableGrp | GrantsToDomesticIndividuals | GrantsToDomesticIndividualsGrp | GrantsToDomesticOrgs | GrantsToDomesticOrgsGrp | GrantsToIndividuals | GrantsToIndividualsInd | GrantsToOrganizations | GrantsToOrganizationsInd | GrossAmountSalesAssets | GrossAmountSalesAssetsGrp | GrossIncomeFundraisingEvents | GrossReceipts | GrossReceiptsAmt | GrossRents | GrossRentsGrp | GrossSalesOfInventory | GrossSalesOfInventoryAmt | GroupExemptionNum | GroupExemptionNumber | GroupReturnForAffiliates | GroupReturnForAffiliatesInd | Hospital | IRPDocumentCnt | IRPDocumentW2GCnt | IncludeFIN48FootnoteInd | IncmFromInvestBondProceedsGrp | IncomeFromInvestBondProceeds | IndependentAuditFinancialStmt | IndependentAuditFinclStmtInd | IndependentVotingMemberCnt | IndivRcvdGreaterThan100KCnt | IndoorTanningServices | IndoorTanningServicesInd | InfoInScheduleOPartIII | InfoInScheduleOPartIIIInd | InfoInScheduleOPartIX | InfoInScheduleOPartIXInd | InfoInScheduleOPartV | InfoInScheduleOPartVI | InfoInScheduleOPartVII | InfoInScheduleOPartVIII | InfoInScheduleOPartVIIIInd | InfoInScheduleOPartVIIInd | InfoInScheduleOPartVIInd | InfoInScheduleOPartVInd | InfoInScheduleOPartX | InfoInScheduleOPartXI | InfoInScheduleOPartXII | InfoInScheduleOPartXIIInd | InfoInScheduleOPartXIInd | InfoInScheduleOPartXInd | InformationTechnology | InformationTechnologyGrp | InitialReturn | Insurance | InsuranceGrp | IntangibleAssets | IntangibleAssetsGrp | Interest | InterestGrp | InventoriesForSaleOrUse | InventoriesForSaleOrUseGrp | InvestTaxExemptBonds | InvestTaxExemptBondsInd | InvestmentExpenseAmt | InvestmentInJointVenture | InvestmentInJointVentureInd | InvestmentIncome | InvestmentIncomeCurrentYear | InvestmentIncomeGrp | InvestmentIncomePriorYear | InvestmentsOtherSecurities | InvestmentsOtherSecuritiesGrp | InvestmentsProgramRelated | InvestmentsProgramRelatedGrp | InvestmentsPubTradedSecGrp | InvestmentsPubTradedSecurities | IsAvailable | IsElectronic | JointCosts | JointCostsInd | LandBldgEquipAccumDeprecAmt | LandBldgEquipBasisNetGrp | LandBldgEquipCostOrOtherBssAmt | LandBldgEquipmentAccumDeprec | LandBuildingsEquipmentBasis | LandBuildingsEquipmentBasisNet | LastUpdated | LegalDomicileStateCd | LessCostOthBasisSalesExpenses | LessCostOthBasisSalesExpnssGrp | LessRentalExpenses | LessRentalExpensesGrp | LicensedMoreThanOneState | LicensedMoreThanOneStateInd | LoanOutstandingInd | LoanToOfficerOrDQP | LoansFromOfficersDirectors | LoansFromOfficersDirectorsGrp | LobbyingActivities | LobbyingActivitiesInd | LocalChapters | LocalChaptersInd | MaterialDiversionOrMisuse | MaterialDiversionOrMisuseInd | MembersOrStockholders | MembersOrStockholdersInd | MembershipDues | MembershipDuesAmt | MethodOfAccountingAccrual | MethodOfAccountingAccrualInd | MethodOfAccountingCash | MethodOfAccountingCashInd | MethodOfAccountingOther | MethodOfAccountingOtherInd | MinutesOfCommittees | MinutesOfCommitteesInd | MinutesOfGoverningBody | MinutesOfGoverningBodyInd | MiscellaneousRevenue | MiscellaneousRevenueGrp | MissionDesc | MissionDescription | MoreThan5000KToIndividuals | MoreThan5000KToIndividualsInd | MoreThan5000KToOrgInd | MoreThan5000KToOrganizations | MortNotesPyblSecuredInvestProp | MortgNotesPyblScrdInvstPropGrp | NameOfForeignCountry | NameOfPrincipalOfficerBusiness | NameOfPrincipalOfficerPerson | NbrIndependentVotingMembers | NbrVotingGoverningBodyMembers | NbrVotingMembersGoverningBody | NetAssetsOrFundBalancesBOY | NetAssetsOrFundBalancesBOYAmt | NetAssetsOrFundBalancesEOY | NetAssetsOrFundBalancesEOYAmt | NetGainOrLossInvestments | NetGainOrLossInvestmentsGrp | NetIncmFromFundraisingEvtGrp | NetIncomeFromFundraisingEvents | NetIncomeFromGaming | NetIncomeFromGamingGrp | NetIncomeOrLoss | NetIncomeOrLossGrp | NetRentalIncomeOrLoss | NetRentalIncomeOrLossGrp | NetUnrelatedBusTxblIncmAmt | NetUnrelatedBusinessTxblIncome | NetUnrlzdGainsLossesInvstAmt | NoListedPersonsCompensated | NoListedPersonsCompensatedInd | NonDeductibleContributions | NonDeductibleDisclosure | NoncashContributions | NoncashContributionsAmt | NondeductibleContriDisclInd | NondeductibleContributionsInd | NumberFormsTransmittedWith1096 | NumberIndependentVotingMembers | NumberIndividualsGT100K | NumberOf8282Filed | NumberOfContractorsGT100K | NumberOfEmployees | NumberW2GIncluded | ObjectId | Occupancy | OccupancyGrp | OfficeExpenses | OfficeExpensesGrp | OfficerEntityWithBsnssRltnshp | OfficerMailingAddress | OfficerMailingAddressInd | OnBehalfOfIssuer | OnBehalfOfIssuerInd | OperateHospitalInd | OrgDoesNotFollowSFAS117Ind | OrgFiledInLieuOfForm1041Ind | Organization501c | Organization501c3 | Organization501c3Ind | Organization501cInd | OrganizationFollowsSFAS117Ind | OrganizationName | OthNotesLoansReceivableNetGrp | OtherAssetsTotal | OtherAssetsTotalGrp | OtherChangesInNetAssetsAmt | OtherEmployeeBenefits | OtherEmployeeBenefitsGrp | OtherExpensePriorYear | OtherExpenses | OtherExpensesCurrentYear | OtherExpensesGrp | OtherExplainInSchO | OtherInd | OtherLiabilities | OtherLiabilitiesGrp | OtherNotesLoansReceivableNet | OtherRevenueCurrentYear | OtherRevenueMisc | OtherRevenueMiscGrp | OtherRevenuePriorYear | OtherRevenueTotalAmt | OtherSalariesAndWages | OtherSalariesAndWagesGrp | OtherWebsite | OtherWebsiteInd | OwnWebsite | OwnWebsiteInd | PYBenefitsPaidToMembersAmt | PYContributionsGrantsAmt | PYExcessBenefitTransInd | PYGrantsAndSimilarPaidAmt | PYInvestmentIncomeAmt | PYOtherExpensesAmt | PYOtherRevenueAmt | PYProgramServiceRevenueAmt | PYRevenuesLessExpensesAmt | PYSalariesCompEmpBnftPaidAmt | PYTotalExpensesAmt | PYTotalProfFndrsngExpnsAmt | PYTotalRevenueAmt | PaidInCapSrplsLandBldgEqpFund | PartialLiquidation | PartialLiquidationInd | PayPremiumsPrsnlBnftCntrctInd | PaymentsToAffiliates | PaymentsToAffiliatesGrp | PayrollTaxes | PayrollTaxesGrp | PdInCapSrplsLandBldgEqpFundGrp | PensionPlanContributions | PensionPlanContributionsGrp | PermanentlyRestrictedNetAssets | PermanentlyRstrNetAssetsGrp | PledgesAndGrantsReceivable | PledgesAndGrantsReceivableGrp | PoliciesReferenceChapters | PoliciesReferenceChaptersInd | PoliticalActivities | PoliticalCampaignActyInd | PremiumsPaid | PrepaidExpensesDeferredCharges | PrepaidExpensesDefrdChargesGrp | PrincipalOfficerNm | PriorExcessBenefitTransaction | PriorPeriodAdjustmentsAmt | ProfessionalFundraising | ProfessionalFundraisingInd | ProgSrvcAccomActy2Grp | ProgSrvcAccomActy3Grp | ProgSrvcAccomActyOtherGrp | ProgramServiceRevenue | ProgramServiceRevenueCY | ProgramServiceRevenueGrp | ProgramServiceRevenuePriorYear | ProhibitedTaxShelterTrans | ProhibitedTaxShelterTransInd | PymtTravelEntrtnmntPubOfclGrp | QuidProQuoContriDisclInd | QuidProQuoContributions | QuidProQuoContributionsInd | QuidProQuoDisclosure | RcvFndsToPayPrsnlBnftCntrctInd | RcvblFromDisqualifiedPrsnGrp | ReceivablesFromDisqualPersons | ReceivablesFromOfficersEtc | ReceivablesFromOfficersEtcGrp | ReconcilationDonatedServices | ReconcilationInvestExpenses | ReconcilationOtherChanges | ReconcilationPriorAdjustment | ReconcilationRevenueExpenses | ReconcilationRevenueExpnssAmt | ReconciliationUnrealizedInvest | RegularMonitoringEnforcement | RegularMonitoringEnfrcInd | RelatedEntity | RelatedEntityInd | RelatedOrgControlledEntity | RelatedOrganizationCtrlEntInd | RelatedOrganizations | RelatedOrganizationsAmt | RentalIncomeOrLoss | RentalIncomeOrLossGrp | ReportFin48Footnote | ReportInvestOthSecurities | ReportInvestmentsOtherSecInd | ReportLandBldgEquip | ReportLandBuildingEquipmentInd | ReportOtherAssets | ReportOtherAssetsInd | ReportOtherLiabilities | ReportOtherLiabilitiesInd | ReportProgRelInvest | ReportProgramRelatedInvstInd | RetainedEarningsEndowmentEtc | Revenue | RevenueAmt | RevenuesLessExpensesCY | RevenuesLessExpensesPriorYear | Royalties | RoyaltiesGrp | RoyaltiesRevenue | RoyaltiesRevenueGrp | RtnEarnEndowmentIncmOthFndsGrp | SalariesEtcCurrentYear | SalariesEtcPriorYear | SavingsAndTempCashInvestments | SavingsAndTempCashInvstGrp | ScheduleBRequired | ScheduleBRequiredInd | ScheduleJRequired | ScheduleJRequiredInd | ScheduleORequired | ScheduleORequiredInd | School | SchoolOperatingInd | SignificantChange | SignificantChangeInd | SignificantNewProgramServices | SignificantNewProgramSrvcInd | StateLegalDomicile | StatesWhereCopyOfReturnIsFiled | StatesWhereCopyOfReturnIsFldCd | SubjectToProxyTax | SubjectToProxyTaxInd | SubmittedOn | TaxExemptBondLiabilities | TaxExemptBondLiabilitiesGrp | TaxExemptBonds | TaxExemptBondsInd | TaxPeriod | TaxableDistributions | TaxableDistributionsInd | TaxablePartyNotification | TaxablePartyNotificationInd | TempOrPermanentEndowmentsInd | TemporarilyRestrictedNetAssets | TemporarilyRstrNetAssetsGrp | TermOrPermanentEndowments | TerminateOperationsInd | Terminated | TerminationOrContraction | TheBooksAreInCareOf | TotLiabNetAssetsFundBalanceGrp | TotReportableCompRltdOrgAmt | TotalAssets | TotalAssetsBOY | TotalAssetsBOYAmt | TotalAssetsEOY | TotalAssetsEOYAmt | TotalAssetsGrp | TotalCompGT150K | TotalCompGreaterThan150KInd | TotalContributions | TotalContributionsAmt | TotalEmployeeCnt | TotalExpensesCurrentYear | TotalExpensesPriorYear | TotalFunctionalExpenses | TotalFunctionalExpensesGrp | TotalFundrsngExpCurrentYear | TotalGrossUBI | TotalGrossUBIAmt | TotalJointCosts | TotalJointCostsGrp | TotalLiabNetAssetsFundBalances | TotalLiabilities | TotalLiabilitiesBOY | TotalLiabilitiesBOYAmt | TotalLiabilitiesEOY | TotalLiabilitiesEOYAmt | TotalLiabilitiesGrp | TotalNbrEmployees | TotalNbrVolunteers | TotalNetAssetsFundBalanceGrp | TotalNetAssetsFundBalances | TotalOfOtherProgramServiceExp | TotalOfOtherProgramServiceGrnt | TotalOfOtherProgramServiceRev | TotalOthProgramServiceRevGrp | TotalOthProgramServiceRevenue | TotalOtherCompensation | TotalOtherCompensationAmt | TotalOtherProgSrvcExpenseAmt | TotalOtherProgSrvcGrantAmt | TotalOtherProgSrvcRevenueAmt | TotalOtherRevenue | TotalProfFundrsngExpCY | TotalProfFundrsngExpPriorYear | TotalProgramServiceExpense | TotalProgramServiceExpensesAmt | TotalProgramServiceRevenue | TotalProgramServiceRevenueAmt | TotalReportableCompFrmRltdOrgs | TotalReportableCompFromOrg | TotalReportableCompFromOrgAmt | TotalRevenue | TotalRevenueCurrentYear | TotalRevenueGrp | TotalRevenuePriorYear | TotalVolunteersCnt | TransactionRelatedEntity | TransactionWithControlEntInd | TransfersToExemptNonChrtblOrg | Travel | TravelEntrtnmntPublicOfficials | TravelGrp | TrnsfrExmptNonChrtblRltdOrgInd | TypeOfOrganizationCorpInd | TypeOfOrganizationCorporation | TypeOfOrganizationOther | TypeOfOrganizationOtherInd | TypeOfOrganizationTrust | TypeOfOrganizationTrustInd | URL | USAddress | UnrelatedBusIncmOverLimitInd | UnrelatedBusinessIncome | UnrestrictedNetAssets | UnrestrictedNetAssetsGrp | UnsecuredNotesLoansPayable | UnsecuredNotesLoansPayableGrp | UponRequest | UponRequestInd | VotingMembersGoverningBodyCnt | VotingMembersIndependentCnt | WebSite | WebsiteAddressTxt | WhistleblowerPolicy | WhistleblowerPolicyInd | WrittenPolicyOrProcedure | WrittenPolicyOrProcedureInd | YearFormation | _id | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | RetDoc1 | RetDoc2 | NaN | NaN | NaN | NaN | NaN | NaN | false | NaN | NaN | NaN | NaN | NaN | NaN | false | NaN | NaN | RAISE MONIES TO CONSTRUCT A FACILITY FOR YEAR-ROUND RECREATION PROGRAM DEDICATED TO CHILDREN AND ADULTS WITH SPECIAL NEEDS WHO RESIDE IN THE TOWN OF HEMPSTEAD. | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | {u'PhoneNum': u'5164316946', u'USAddress': {u'CityNm': u'LIDO BEACH', u'StateAbbreviationCd': u'NY', u'ZIPCd': u'11561', u'AddressLine1Txt': u'630 LIDO BLVD'}, u'BusinessName': {u'BusinessNameLine1Txt': u'ANCHOR BUILDING FUND INC'}} | NaN | NaN | false | false | false | 0 | 0 | 0 | 0 | 34400 | 325000 | 0 | 290600 | 0 | 34400 | 0 | 0 | 325000 | NaN | NaN | NaN | {u'EOYAmt': u'290600'} | false | NaN | NaN | NaN | NaN | false | NaN | NaN | NaN | NaN | NaN | false | NaN | false | NaN | false | NaN | NaN | NaN | NaN | false | NaN | false | NaN | false | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | false | NaN | 93493316047805 | false | NaN | false | NaN | false | NaN | NaN | NaN | NaN | false | NaN | NaN | THE ORGANIZATIONS EXEMPT PURPOSE WAS TO RAISE MONIES TO CONSTRUCT A FACILTIY TO OPERATE ITS YEAR-ROUND PROGRAM. | NaN | {u'#text': u'true', u'@referenceDocumentId': u'RetDoc3'} | NaN | NaN | false | NaN | NaN | NaN | NaN | false | NaN | NaN | NaN | false | NaN | 271317072 | NaN | false | 0 | NaN | NaN | false | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | false | NaN | false | NaN | NaN | NaN | NaN | NaN | NaN | NaN | {u'TotalAmt': u'9400', u'ManagementAndGeneralAmt': u'9400'} | NaN | NaN | {u'TotalAmt': u'25000', u'ManagementAndGeneralAmt': u'25000'} | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | false | NaN | NaN | NaN | false | NaN | NaN | NaN | false | NaN | false | NaN | NaN | NaN | false | NaN | NaN | NaN | false | NaN | NaN | [{u'PersonNm': u'JOHN MALLOY', u'TitleTxt': u'PRESIDENT', u'AverageHoursPerWeekRt': u'2.00', u'ReportableCompFromRltdOrgAmt': u'0', u'OtherCompensationAmt': u'0', u'OfficerInd': u'X', u'ReportableCompFromOrgAmt': u'0'}, {u'PersonNm': u'ANNA DRAGO... | NaN | false | NaN | 990 | 2009 | false | NaN | NaN | false | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | false | 2 | NaN | NaN | NaN | NaN | false | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | false | NaN | false | NaN | NaN | NaN | NaN | 325000 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | false | NaN | 2 | 0 | false | NaN | NaN | NaN | false | 2 | NaN | NaN | false | NaN | X | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | X | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | false | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | True | True | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2016-03-21T17:23:53 | NY | NaN | NaN | NaN | NaN | NaN | NaN | false | NaN | NaN | NaN | NaN | false | NaN | false | NaN | true | NaN | false | NaN | NaN | NaN | NaN | NaN | X | NaN | NaN | NaN | false | NaN | false | NaN | NaN | RAISE MONIES TO CONSTRUCT A FACILITY FOR YEAR-ROUND RECREATION PROGRAM DEDICATED TO CHILDREN AND ADULTS WITH SPECIAL NEEDS WHO RESIDE IN THE TOWN OF HEMPSTEAD. | NaN | NaN | false | false | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 290600 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | X | NaN | NaN | NaN | NaN | NaN | false | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 201503169349304780 | NaN | NaN | NaN | NaN | NaN | NaN | false | NaN | NaN | false | NaN | NaN | NaN | NaN | X | NaN | X | ANCHOR BUILDING FUND INC | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | {u'TotalRevenueColumnAmt': u'325000', u'ExclusionAmt': u'325000', u'Desc': u'SETTLEMENT'} | NaN | 325000 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | false | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | false | false | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | false | NaN | NaN | NaN | JOHN MALLOY | NaN | NaN | NaN | false | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | false | NaN | NaN | NaN | false | NaN | false | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 290600 | NaN | NaN | NaN | NaN | false | NaN | false | NaN | NaN | NaN | NaN | NaN | NaN | false | NaN | false | NaN | false | NaN | false | NaN | false | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | false | NaN | false | NaN | true | NaN | false | NaN | true | NaN | false | NaN | NaN | NY | NaN | false | 2016-01-30 | NaN | NaN | NaN | false | 201412 | NaN | NaN | NaN | false | false | NaN | NaN | NaN | false | NaN | NaN | NaN | {u'BOYAmt': u'0', u'EOYAmt': u'290600'} | NaN | NaN | NaN | NaN | NaN | 290600 | {u'BOYAmt': u'0', u'EOYAmt': u'290600'} | NaN | false | NaN | NaN | 0 | NaN | NaN | NaN | {u'TotalAmt': u'34400', u'ProgramServicesAmt': u'0', u'ManagementAndGeneralAmt': u'34400', u'FundraisingAmt': u'0'} | NaN | NaN | 0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0 | {u'BOYAmt': u'0', u'EOYAmt': u'0'} | NaN | NaN | {u'BOYAmt': u'0', u'EOYAmt': u'290600'} | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | {u'TotalRevenueColumnAmt': u'325000', u'ExclusionAmt': u'325000'} | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | false | X | NaN | NaN | NaN | NaN | NaN | https://s3.amazonaws.com/irs-form-990/201503169349304780_public.xml | {u'CityNm': u'LIDO BEACH', u'StateAbbreviationCd': u'NY', u'ZIPCd': u'11561', u'AddressLine1Txt': u'630 LIDO BLVD'} | false | NaN | NaN | {u'EOYAmt': u'290600'} | NaN | NaN | NaN | X | 2 | 2 | NaN | WWW.CAMPANCHOR.ORG | NaN | false | NaN | NaN | NaN | 57cce7823ffc5a55fe7f54b5 |
#efficiency_columns = ['EIN', 'TaxPeriod', 'CYTotalExpensesAmt', 'TotalExpensesCurrentYear',
# 'TotalProgramServiceExpense', 'TotalProgramServiceExpensesAmt',
# 'TotalFunctionalExpenses', 'TotalFunctionalExpensesGrp']
#df2 = df2.sort_values(by=['EIN'], ascending=[1])
#df2[df2['EIN'].isin(zero_prog_exp)][efficiency_columns]
EIN | TaxPeriod | CYTotalExpensesAmt | TotalExpensesCurrentYear | TotalProgramServiceExpense | TotalProgramServiceExpensesAmt | TotalFunctionalExpenses | TotalFunctionalExpensesGrp | |
---|---|---|---|---|---|---|---|---|
404 | 132644507 | 201312 | 7795021 | NaN | NaN | NaN | NaN | {u'TotalAmt': u'7795021'} |
106 | 132644507 | 201112 | NaN | 5880789 | NaN | NaN | {u'Total': u'5880789'} | NaN |
431 | 132644507 | 201212 | NaN | 6237509 | NaN | NaN | {u'Total': u'6237509'} | NaN |
172 | 132644507 | 201412 | 7763846 | NaN | NaN | NaN | NaN | {u'TotalAmt': u'7763846'} |
72 | 132644507 | 201012 | NaN | 6024086 | NaN | NaN | {u'Total': u'6024086'} | NaN |
52 | 135579302 | 201412 | 2023750 | NaN | NaN | NaN | NaN | {u'TotalAmt': u'2023750'} |
406 | 135579302 | 201312 | 4133761 | NaN | NaN | NaN | NaN | {u'TotalAmt': u'4133761', u'ProgramServicesAmt': u'0', u'ManagementAndGeneralAmt': u'0', u'FundraisingAmt': u'0'} |
500 | 201737542 | 201006 | NaN | 177901 | NaN | NaN | {u'ManagementAndGeneral': u'111025', u'ProgramServices': u'0', u'Fundraising': u'66876', u'Total': u'177901'} | NaN |
0 | 271317072 | 201412 | 34400 | NaN | NaN | NaN | NaN | {u'TotalAmt': u'34400', u'ProgramServicesAmt': u'0', u'ManagementAndGeneralAmt': u'34400', u'FundraisingAmt': u'0'} |
339 | 330837780 | 201312 | 2151741 | NaN | NaN | 2151482 | NaN | {u'TotalAmt': u'2151741', u'ProgramServicesAmt': u'2151482', u'ManagementAndGeneralAmt': u'259', u'FundraisingAmt': u'0'} |
345 | 330837780 | 201212 | NaN | 211124 | 211124 | NaN | {u'ManagementAndGeneral': u'0', u'ProgramServices': u'211124', u'Fundraising': u'0', u'Total': u'211124'} | NaN |
216 | 330837780 | 201112 | NaN | 28538 | 26728 | NaN | {u'ManagementAndGeneral': u'1810', u'ProgramServices': u'26728', u'Fundraising': u'0', u'Total': u'28538'} | NaN |
277 | 330837780 | 201412 | 3330 | NaN | NaN | NaN | NaN | {u'TotalAmt': u'3330', u'ProgramServicesAmt': u'0', u'ManagementAndGeneralAmt': u'3330', u'FundraisingAmt': u'0'} |
213 | 591951577 | 201012 | NaN | 1161939 | NaN | NaN | {u'Total': u'1161939'} | NaN |
226 | 591951577 | 201112 | NaN | 1690623 | NaN | NaN | {u'Total': u'1690623'} | NaN |
495 | 591951577 | 201012 | NaN | 1282149 | NaN | NaN | {u'Total': u'1282149'} | NaN |
496 | 953553530 | 201112 | NaN | 0 | NaN | NaN | {u'ManagementAndGeneral': u'0', u'ProgramServices': u'0', u'Fundraising': u'0', u'Total': u'0'} | NaN |
zero_prog_exp = ['132644507', '135579302', '201737542', '271317072', '330837780', '591951577', '953553530']
df[df['EIN'].isin(zero_prog_exp)][cols2]
EIN | FYE | total_expenses | program_expenses | program_efficiency | |
---|---|---|---|---|---|
172 | 132644507 | FY2014 | 7763846.0 | NaN | NaN |
404 | 132644507 | FY2013 | 7795021.0 | NaN | NaN |
431 | 132644507 | FY2012 | 6237509.0 | NaN | NaN |
106 | 132644507 | FY2011 | 5880789.0 | NaN | NaN |
72 | 132644507 | FY2010 | 6024086.0 | NaN | NaN |
52 | 135579302 | FY2014 | 2023750.0 | NaN | NaN |
406 | 135579302 | FY2013 | 4133761.0 | NaN | NaN |
500 | 201737542 | FY2010 | 177901.0 | NaN | NaN |
0 | 271317072 | FY2014 | 34400.0 | NaN | NaN |
277 | 330837780 | FY2014 | 3330.0 | NaN | NaN |
339 | 330837780 | FY2013 | 2151741.0 | 2151482.0 | 0.999880 |
345 | 330837780 | FY2012 | 211124.0 | 211124.0 | 1.000000 |
216 | 330837780 | FY2011 | 28538.0 | 26728.0 | 0.936576 |
226 | 591951577 | FY2011 | 1690623.0 | NaN | NaN |
213 | 591951577 | FY2010 | 1161939.0 | NaN | NaN |
495 | 591951577 | FY2010 | 1282149.0 | NaN | NaN |
496 | 953553530 | FY2011 | 0.0 | NaN | NaN |
print len(df[df['program_efficiency'].isnull()])
print len(df[df['total_expenses'].isnull()])
print len(df[df['program_expenses'].isnull()])
print len(df[df['program_efficiency'].isnull()])
14 0 14 14
df['program_efficiency'] = np.where(df['program_efficiency'].isnull(), 0, df['program_efficiency'])
print len(df[df['program_efficiency'].isnull()])
0
print len(df[df['tot_rev']==0])
print len(df[df['tot_rev']<0])
df['tot_rev_no_neg'] = df['tot_rev']
df['tot_rev_no_neg'] = np.where(df['tot_rev_no_neg']<=0, 1, df['tot_rev_no_neg'])
print len(df[df['tot_rev_no_neg']==0])
print len(df[df['tot_rev_no_neg']<0])
1 2 0 0
df['total_revenue_logged'] = np.log(df['tot_rev_no_neg'])
file_list_columns = ['EIN', 'OrganizationName', 'URL', 'SubmittedOn', 'TaxPeriod', 'FYE']
SOX_columns = ['whistleblower_policy', 'conflict_of_interest_policy', 'records_retention_policy',
'SOX_policies', 'SOX_policies_binary', 'SOX_policies_all_binary']
control_columns = ['tot_rev', 'tot_rev_no_neg', 'total_revenue_logged', 'total_expenses', 'program_expenses',
'program_efficiency', 'complexity']
unneeded_columns = ['CYTotalExpensesAmt', u'TotalExpensesCurrentYear', u'TotalFunctionalExpenses',
'TotalFunctionalExpensesGrp', u'TotalProgramServiceExpense', u'TotalProgramServiceExpensesAmt',
'CYContributionsGrantsAmt', u'FederatedCampaigns', u'FederatedCampaignsAmt', u'MembershipDues',
'MembershipDuesAmt', u'NetIncomeFromFundraisingEvents', u'NetIncmFromFundraisingEvtGrp',
'FundraisingAmt', 'FundraisingGrossIncomeAmt', u'RelatedOrganizations',
'RelatedOrganizationsAmt', u'GovernmentGrants', 'GovernmentGrantsAmt',
'PYProgramServiceRevenueAmt', u'CYProgramServiceRevenueAmt', 'ProgramServiceRevenue',
'ProgramServiceRevenueCY', u'ProgramServiceRevenueGrp', u'TotalProgramServiceRevenue',
'TotalProgramServiceRevenueAmt', u'CYInvestmentIncomeAmt', u'InvestmentIncome',
'InvestmentIncomeCurrentYear', 'InvestmentIncomeGrp', u'OtherRevenueTotalAmt',
'OtherRevenueMiscGrp', u'TotalOtherRevenue', 'CYOtherRevenueAmt', 'fundraising',
'program_revenue', 'OtherRevMisc', 'other_revenue_binary']
cols = file_list_columns+SOX_columns+control_columns
df = df[cols]
df['whistleblower_policy'] = df['whistleblower_policy'].astype('int')
df['conflict_of_interest_policy'] = df['conflict_of_interest_policy'].astype('int')
df['records_retention_policy'] = df['records_retention_policy'].astype('int')
df[:10]
EIN | OrganizationName | URL | SubmittedOn | TaxPeriod | FYE | whistleblower_policy | conflict_of_interest_policy | records_retention_policy | SOX_policies | SOX_policies_binary | SOX_policies_all_binary | tot_rev | tot_rev_no_neg | total_revenue_logged | total_expenses | program_expenses | program_efficiency | complexity | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
252 | 030498214 | NEWARK NOW INC | https://s3.amazonaws.com/irs-form-990/201303199349303505_public.xml | 2013-12-31 | 201212 | FY2012 | 0 | 1 | 1 | 2.0 | 1.0 | 0.0 | 2515399 | 2515399 | 14.737942 | 2100922.0 | 1598626.0 | 0.760916 | 2 |
433 | 030498214 | NEWARK NOW INC | https://s3.amazonaws.com/irs-form-990/201320249349300417_public.xml | 2013-02-14 | 201112 | FY2011 | 0 | 1 | 1 | 2.0 | 1.0 | 0.0 | 2161209 | 2161209 | 14.586178 | 2481310.0 | 1715793.0 | 0.691487 | 0 |
78 | 030498214 | NEWARK NOW INC | https://s3.amazonaws.com/irs-form-990/201220909349300327_public.xml | 2012-05-25 | 201012 | FY2010 | 0 | 1 | 1 | 2.0 | 1.0 | 0.0 | 3990564 | 3990564 | 15.199443 | 3912643.0 | 2930785.0 | 0.749055 | 0 |
333 | 042129889 | SCHEPENS EYE RESEARCH INSTITUTE INC | https://s3.amazonaws.com/irs-form-990/201502299349301700_public.xml | 2015-09-21 | 201409 | FY2014 | 1 | 1 | 1 | 3.0 | 1.0 | 1.0 | 22995526 | 22995526 | 16.950810 | 25626262.0 | 18485740.0 | 0.721359 | 4 |
12 | 042129889 | SCHEPENS EYE RESEARCH INSTITUTE INC | https://s3.amazonaws.com/irs-form-990/201442269349304504_public.xml | 2014-10-28 | 201309 | FY2013 | 1 | 1 | 1 | 3.0 | 1.0 | 1.0 | 25923449 | 25923449 | 17.070658 | 26224173.0 | 19280493.0 | 0.735218 | 3 |
481 | 042129889 | SCHEPENS EYE RESEARCH INSTITUTE INC | https://s3.amazonaws.com/irs-form-990/201312269349301491_public.xml | 2013-12-17 | 201209 | FY2012 | 1 | 1 | 1 | 3.0 | 1.0 | 1.0 | 28579592 | 28579592 | 17.168203 | 25481942.0 | 19252739.0 | 0.755544 | 4 |
16 | 042129889 | SCHEPENS EYE RESEARCH INSTITUTE INC | https://s3.amazonaws.com/irs-form-990/201212219349301091_public.xml | 2012-08-16 | 201106 | FY2011 | 1 | 1 | 1 | 3.0 | 1.0 | 1.0 | 26410002 | 26410002 | 17.089253 | 28367393.0 | 22102327.0 | 0.779146 | 3 |
130 | 042129889 | SCHEPENS EYE RESEARCH INSTITUTE INC | https://s3.amazonaws.com/irs-form-990/201121199349301122_public.xml | 2011-05-04 | 201006 | FY2010 | 1 | 1 | 1 | 3.0 | 1.0 | 1.0 | 28493155 | 28493155 | 17.165174 | 29094188.0 | 24121623.0 | 0.829087 | 3 |
434 | 042701694 | WESTFIELD STATE FOUNDATION INC | https://s3.amazonaws.com/irs-form-990/201403119349300995_public.xml | 2014-12-03 | 201406 | FY2014 | 1 | 1 | 1 | 3.0 | 1.0 | 1.0 | 764437 | 764437 | 13.546895 | 636346.0 | 478219.0 | 0.751508 | 3 |
374 | 042701694 | WESTFIELD STATE FOUNDATION INC | https://s3.amazonaws.com/irs-form-990/201401339349305050_public.xml | 2014-06-27 | 201306 | FY2013 | 1 | 1 | 1 | 3.0 | 1.0 | 1.0 | 745204 | 745204 | 13.521413 | 766452.0 | 446064.0 | 0.581986 | 2 |
df.describe().T
count | mean | std | min | 25% | 50% | 75% | max | |
---|---|---|---|---|---|---|---|---|
whistleblower_policy | 538.0 | 5.892193e-01 | 4.924334e-01 | 0.0 | 0.000000e+00 | 1.000000e+00 | 1.000000e+00 | 1.000000e+00 |
conflict_of_interest_policy | 538.0 | 8.438662e-01 | 3.633200e-01 | 0.0 | 1.000000e+00 | 1.000000e+00 | 1.000000e+00 | 1.000000e+00 |
records_retention_policy | 538.0 | 7.118959e-01 | 4.533013e-01 | 0.0 | 0.000000e+00 | 1.000000e+00 | 1.000000e+00 | 1.000000e+00 |
SOX_policies | 538.0 | 2.144981e+00 | 1.097391e+00 | 0.0 | 1.000000e+00 | 3.000000e+00 | 3.000000e+00 | 3.000000e+00 |
SOX_policies_binary | 538.0 | 8.605948e-01 | 3.466912e-01 | 0.0 | 1.000000e+00 | 1.000000e+00 | 1.000000e+00 | 1.000000e+00 |
SOX_policies_all_binary | 538.0 | 5.446097e-01 | 4.984695e-01 | 0.0 | 0.000000e+00 | 1.000000e+00 | 1.000000e+00 | 1.000000e+00 |
tot_rev | 538.0 | 5.857063e+07 | 3.234413e+08 | -218265025.0 | 1.320586e+06 | 4.261394e+06 | 1.334163e+07 | 3.741635e+09 |
tot_rev_no_neg | 538.0 | 5.897644e+07 | 3.232302e+08 | 1.0 | 1.320586e+06 | 4.261394e+06 | 1.334163e+07 | 3.741635e+09 |
total_revenue_logged | 538.0 | 1.524967e+01 | 2.287752e+00 | 0.0 | 1.409358e+01 | 1.526510e+01 | 1.640638e+01 | 2.204279e+01 |
total_expenses | 538.0 | 5.632274e+07 | 3.003909e+08 | 0.0 | 1.375426e+06 | 4.421644e+06 | 1.285109e+07 | 3.287631e+09 |
program_expenses | 524.0 | 4.994483e+07 | 2.653191e+08 | 3884.0 | NaN | NaN | NaN | 2.884879e+09 |
program_efficiency | 538.0 | 7.589501e-01 | 2.257745e-01 | 0.0 | 6.945523e-01 | 8.398745e-01 | 9.007387e-01 | 1.000000e+00 |
complexity | 538.0 | 2.431227e+00 | 1.278642e+00 | 0.0 | 2.000000e+00 | 2.000000e+00 | 3.000000e+00 | 7.000000e+00 |
df.to_pickle('e-file 990s for 2016 donor advisory organizations, v4 (key columns only).pkl')
df.to_excel('e-file 990s for 2016 donor advisory organizations, v4 (key columns only).xls')
df[:2]
EIN | OrganizationName | URL | SubmittedOn | TaxPeriod | FYE | whistleblower_policy | conflict_of_interest_policy | records_retention_policy | SOX_policies | SOX_policies_binary | SOX_policies_all_binary | tot_rev | tot_rev_no_neg | total_revenue_logged | total_expenses | program_expenses | program_efficiency | complexity | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
252 | 030498214 | NEWARK NOW INC | https://s3.amazonaws.com/irs-form-990/201303199349303505_public.xml | 2013-12-31 | 201212 | FY2012 | 0 | 1 | 1 | 2.0 | 1.0 | 0.0 | 2515399 | 2515399 | 14.737942 | 2100922.0 | 1598626.0 | 0.760916 | 2 |
433 | 030498214 | NEWARK NOW INC | https://s3.amazonaws.com/irs-form-990/201320249349300417_public.xml | 2013-02-14 | 201112 | FY2011 | 0 | 1 | 1 | 2.0 | 1.0 | 0.0 | 2161209 | 2161209 | 14.586178 | 2481310.0 | 1715793.0 | 0.691487 | 0 |
missing_47 = pd.read_excel('47 missing SOX_updated.xls')
print len(missing_47)
missing_47.head()
47
CN org_id | EIN | FYE (choose latest available on CN) | donor_advisory | conflict_of_interest | records_retention | whistleblower | complexity | total_revenue | program_efficiency | program_expense | total_expense | Unnamed: 12 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 10087 | 581925867 | 2015 | 1 | 1.0 | 0.0 | 0.0 | 5.0 | 445953.0 | NaN | 229316.0 | 370526.0 | NaN |
1 | 10552 | 942719901 | 2012 | 1 | 1.0 | 0.0 | 1.0 | 3.0 | 3935913.0 | NaN | 3842824.0 | 4134682.0 | NaN |
2 | 10902 | 262224994 | 2014 | 1 | 1.0 | 1.0 | 1.0 | 3.0 | 706895.0 | NaN | 356046.0 | 565973.0 | NaN |
3 | 11009 | 953523852 | 2014 | 1 | 1.0 | 1.0 | 1.0 | 2.0 | 3620634.0 | NaN | 490708.0 | 3455917.0 | NaN |
4 | 11327 | 720760857 | 2014 | 1 | 1.0 | 1.0 | 1.0 | 5.0 | 4244456.0 | NaN | 4210946.0 | 4942239.0 | NaN |
missing_47_EINs = missing_47['EIN'].tolist()
missing_47_EINs = [str(x) for x in missing_47_EINs]
print len(missing_47_EINs)
print len(set(missing_47_EINs))
missing_47_EINs[:5]
47 47
['581925867', '942719901', '262224994', '953523852', '720760857']
42 of the 47 EINs are in the e-file database.
print len(df[df['EIN'].isin(missing_47_EINs)])
print len(df[df['EIN'].isin(missing_47_EINs)]['EIN'].tolist())
print len(set(df[df['EIN'].isin(missing_47_EINs)]['EIN'].tolist()))
164 164 42