import pandas as pd
import matplotlib.cm
import matplotlib.pyplot as plt
import numpy as np
from string import upper
First some code that reads the data format supplied by Transparent California. These files have salary + benefits for every public employee in every city in California.
We filter this down to employees whose job title contains a department keyword -- here "police" -- and then bin their total compensation package to the nearest $10,000.
We group employees according to these bins so that we can count the number of employees who are paid in a certain range
def bin_pay(city, department, compensation):
df = pd.DataFrame.from_csv('%s-2012.csv' % city)
df['job_title'] = df['job_title'].apply(upper)
dept = df[df['job_title'].str.contains(department.upper())]
dept['binned_pay'] = dept[compensation].map(lambda x: (int(x) / 10000) * 10000)
dept = dept[dept['binned_pay'] > 0]
gb=dept.groupby(by=['binned_pay',])
mean_pay = dept[compensation].mean()
print "Average for %(city)s %(department)s $%(mean_pay)2.02f" % locals()
return gb['binned_pay'].count().apply(lambda n: (100 * ((n*1.0) / len(dept))))
Next, we enabling plotting a barchart with the distribution of pay in each city presented in a bar chart. We'll also print the average compensation by city. Note that the distributions in the chart are expressed in terms of a percentage of the employees who earn in a certain salary range. This enables us to compare larger departments (Oakland) with smaller ones (Albany).
def plot_compare(cities, department, compensation='total_pay_benefits'):
d = dict((city, bin_pay(city.lower().replace(' ', '-'), department, compensation)) for city in cities)
df = pd.DataFrame(data=d)
df.plot(kind='bar', colormap='winter', figsize=(18,12), subplots=True)
return df
Finally, we run the chart and also compute average compensation figures for each municipality.
We can see that Berkeley's pay is, on average, $40,000 greater than the average of another area city.
The charts reveal that this isn't only true of the average: the entire distribution is shifted out to the right (overall higher compensation) in Berkeley.
df = plot_compare(('Berkeley', 'Albany', 'El Cerrito', 'Richmond', 'Piedmont', 'Oakland', 'Emeryville'), 'police')
Average for berkeley police $226967.38 Average for albany police $150451.74 Average for el-cerrito police $148323.92 Average for richmond police $178934.00 Average for piedmont police $158073.18 Average for oakland police $138423.22 Average for emeryville police $173393.48
Average base compensation is higher too and the bell curve is once again further to the right than any other city.
df = plot_compare(('Berkeley', 'Albany', 'El Cerrito', 'Richmond', 'Piedmont', 'Oakland', 'Emeryville'),
'police',
compensation='base_pay')
Average for berkeley police $112893.68 Average for albany police $94725.92 Average for el-cerrito police $85002.56 Average for richmond police $94255.76 Average for piedmont police $88705.79 Average for oakland police $78706.06 Average for emeryville police $97243.20
Total benefits are also higher in Berkeley, both at the mean and for the bell curve as a whole.
df = plot_compare(('Berkeley', 'Albany', 'El Cerrito', 'Richmond', 'Piedmont', 'Oakland', 'Emeryville'),
'police',
compensation='total_benefits')
Average for berkeley police $86600.50 Average for albany police $43868.83 Average for el-cerrito police $53698.88 Average for richmond police $48248.46 Average for piedmont police $58259.04 Average for oakland police $42775.73 Average for emeryville police $53930.69
Conclusion
Berkeley pays more in salary and benefits than any other area city for its police department payroll. The phenomenon isn't limited to a few highly compensated employees; instead, wages are higher across the board.
The evidence suggests that Berkeley is paying its police force considerably more than the market requires. Police in Piedmont, Albany, El Cerrito, and so on are not dramatically less competent than in Berkeley, and yet are willing to be employed by those cities for far less money.