Another quick look at the 2013 summary data about current PFI contracts, this time focusing on the special purpose vehicles (SPVs) set up to operate the projects.
Not all projects in the spreadsheets have a declared SPV, but some of them do - and where they do, the company number is generally given, which is handy:-)
In this notebook, I'll have a quick look at some of the things we might be able to learn about the SPVs, and perhaps something of the corporate structures they're enmeshed in, from open data retrieved from OpenCorporates.
#Load in some libraries to help us work with the data
import pandas as pd
import networkx as nx
#Data source - download the PFI summary data to the data/ directory
#wget https://www.gov.uk/government/uploads/system/uploads/attachment_data/file/267640/pfi_current_projects_list_march_2013_published_dec_2013.xlsx -P data
#Import the data we've just downloaded - by insprection, the first row is ignorable
df=pd.read_excel('data/pfi_current_projects_list_march_2013_published_dec_2013.xlsx',skiprows=1)
df[:3]
#Look to see if there are any obvious 'not known' style beacons in the SPV company number column
df['SPV company number'].value_counts()
#To make life easier, produce a view over the data that contains things that might have SPV company numbers available
spv=df.dropna(subset=['SPV company number'])
spv=spv[~spv['SPV company number'].isin(['N/K',0,'0','not known'])]
#Preview the first few rows
spv[:3]
The SPV company numbers given for some of the PFI contracts can be used to pull down company information from OpenCorporates.
The heuristic (rule of thumb) I'm going to use in this notebook is that the directors that are appointed in an SPV vehicle may also be appointed en masse to other companies associated with the SPV. (Actually, this is probably testable as a hypothesis!)
What I intend to look for is companies that share all, or a significant number of, directors with the SPV. This is based in part upon an assumption about power and control, in the sense that directors are legally accountable for the activities of a company and are used to assert control over it (erm, is that actually true?! They are surely liable, at least? I need to check this...) If the set of directors of two companies are largely the same, then as a working assumption we might decide that the companies are working to similar ends, either as individual companies, or as part of the operation of a wider corporate grouping, a grouping that in part can be identfied through the directors.
#We're going to grab some JSON data from the OpenCorporates API - so load in some helper libraries
import requests, json
def getOCdata(cid):
''' Get company data from OpenCorporates given OpenCorporates id '''
apiurl='https://api.opencorporates.com/v0.3'
r=requests.get(apiurl+cid)
return json.loads(r.text)['results']
def getOCdataFromCompanyNumber(cid,territory='gb'):
return getOCdata('/'.join(['/companies',territory,cid]))
def companyReport(ocdata):
c=ocdata['company']
txt='{0} (company number: {1}) of {2}, is an {3} company originally incorporated on {4}.'.format(c['name'],c['company_number'], c['registered_address_in_full'],c['current_status'],c['incorporation_date'])
print(txt)
pyramid=getOCdataFromCompanyNumber('5232548')
pyramid
companyReport(pyramid)
As well as obtaining information about registered companies, we can also look up information about directors.
In the company data, a list of officers is available with a separate ID identifying a particular officer in a particular role appointed to a particular company for a particular appointment period (which may be previous/closed or ongoing).
Officers are listed with information about their role (for example, director, or secretary) as well as information (by implication) about whether they are currently active.
Let's create a function that pulls out a listing of the currently active directors of a company from the company data.
#We can tunnel into the OpenCorporates data to get directors information
def getCurrentOfficers(ocd,role=[]):
''' Get current officers from OpenCorporates company data '''
curr=[]
for o in ocd['company']['officers']:
if o['officer']['end_date']==None:
if role!=[]:
if o['officer']['position'] in role:
curr.append(o['officer'])
else: curr.append(o['officer'])
return curr
#getCurrentOfficers(ocd)
pyramiddf=pd.DataFrame.from_dict(getCurrentOfficers(pyramid,'director'))
pyramiddf
def directorsSummary(cdata):
cd=pd.DataFrame.from_dict(getCurrentOfficers(cdata,'director'))
txt= '{0} currently has {1} active directors ({2}).'.format(cdata['company']['name'],len(cd),', '.join(cd['name']))
print(txt)
directorsSummary(pyramid)
Let's start pulling together a few functions to help us work with the OpenCorporates API a little more.
APISTUB='http://api.opencorporates.com/v0.3'
#To make calling the OpenCorporates API easier, we can register an account and get an API key
#I store my key in a key file and load it in as required
def getKey():
json_data=open('data/keys.json')
return json.load(json_data)['ockey']
ockey=getKey()
def deslash(x): return x.strip('/')
def signed(url,sig=True):
if sig: return url+'?api_token='+ockey
return url
def occStrip(ocURL):
return deslash(ocURL.replace('http://opencorporates.com/companies',''))
#Old code - needs revising to use requests library more idiomatically
def buildURL(items):
url=APISTUB
for i in items:
url='/'.join([url,deslash(i)])
return signed(url)
def getOCofficerData(ocid):
''' Function to get data from the OpenCorporates API about a director, by director ID'''
ocurl=buildURL(['officers',ocid])
r=requests.get(ocurl)
ocdata= json.loads(r.text)
return ocdata['results']
The getOCofficerData()
function allows us to get information from OpenCorporates about a director given their OpenCorporates director ID.
getOCofficerData('97067020')
The following fucntion is largely reused from some old code... The idea is that it allows you to look up a director by name on OpenCorporates and pull back records for officer appointments made to an officer with exactly the same name (case insensitively, I think?).
required=['director']
def getOCofficerCompaniesSearch(name,page=1,cidnames=pd.DataFrame(columns=['oname','ocurl','cname'])):
url=APISTUB+'/officers/search'
r=requests.get(url,params={'api_token':getKey(),'q':'"'+name+'"','per_page':100,'page':page})
#print(r.url)
ocdata= json.loads(r.text)['results']
for officer in ocdata['officers']:
if name==officer['officer']['name']:
if required!=[] and officer['officer']['position']in required:
#print 'Possible new company for',name,officer['officer']['company']['name']
#would a nominated secretary be interesting to search on? eg FIRST SECRETARIES LIMITED
#cidnames.append( ( occStrip(officer['officer']['company']['opencorporates_url']), occStrip(officer['officer']['company']['name']) ) )
cidnames=pd.concat([cidnames, pd.DataFrame([(name,occStrip(officer['officer']['company']['opencorporates_url']), occStrip(officer['officer']['company']['name']))],columns=['oname','ocurl','cname'])])
if page < ocdata['total_pages']:
#If there are lots of pages, eg for a nominee, we maybe need to warn the user....?
page=page+1
cidnames=getOCofficerCompaniesSearch(name,page,cidnames)
#http://api.opencorporates.com/v0.2/officers/search?q=john+smith
return cidnames
For the directors pulled back from a particular company, let's look up each director in turn to see what other companies they are current ("active") directors of, and then for each company we identify, count how many of the original set of directors are directors of those companies.
The assumption is, if an original company has N directors, and we find another company with at least the same N directors, maybe those companies share a common goal, or are part of a common project or corporate group.
on=pd.DataFrame()
for name in pyramiddf['name']:
on=pd.concat( [on, getOCofficerCompaniesSearch(name) ])
tmp=on.groupby(['cname','ocurl']).oname.nunique().order(ascending=False)
tmp
The oname
count is a count of the number of directors the company has in common with the original "target" company.
Let's see which companies have the same number of directors as the original company.
tmp[tmp==len(pyramiddf)].reset_index()
Let's see if we can pull all that into a single function, keyed by a company ID.
def voodooMagic(cnum):
cdata=getOCdataFromCompanyNumber(cnum)
companyReport(cdata)
directorsSummary(cdata)
directors=pd.DataFrame.from_dict(getCurrentOfficers(cdata,'director'))
directorCos=pd.DataFrame()
for name in directors['name']:
directorCos=pd.concat( [directorCos, getOCofficerCompaniesSearch(name) ])
sharedCos=directorCos.groupby(['cname','ocurl']).oname.nunique().order(ascending=False)
sharedCos=sharedCos[sharedCos==len(directors)].reset_index()
return directors,sharedCos,directorCos
directors,sharedCos,directoCos=voodooMagic('5232548')
print(directors)
print(sharedCos)
directors,sharedCos,directorCos=voodooMagic('4934390')
print(directors)
print(sharedCos)
directorCos.groupby(['cname','ocurl']).oname.nunique().order(ascending=False)[:15]
directors,sharedCos,directorCos=voodooMagic('07417700')
print(directors)
print(sharedCos)
directorCos.groupby(['cname','ocurl']).oname.nunique().order(ascending=False)[:15]
directors,sharedCos,directorCos=voodooMagic('8169805')
print(directors)
print(sharedCos)
directorCos.groupby(['cname','ocurl']).oname.nunique().order(ascending=False)[:15]