In biochemistry, the Michaelis–Menten equation is one of the best-known models of enzyme kinetics. The model takes the form of an equation describing the rate of enzymatic reactions, by relating reaction rate $v$ to $[S]$, the concentration of a substrate $S$. The velocity $v$ is parameterized by—also related to—the turnover number $k_{cat}$ and the Michaelis constant $K_M$. The most useful form of the equation for the biochemist relates the quotient of velocity $v$ and enzyme concentration $[E]$ to the substrate concentration $[S]$, the catalytic rate $k_{cat}$, and the Michaelis constant $K_M$ by
$$\frac{v}{[E]}=\frac{k_{cat}*[S]}{K_M + [S]}$$This notebook is a tutorial for the experimental biochemist who has collected data on the rate of product formation under different concentrations of substrate, as we frequently do in the Siegel group. Let's begin with data workup.
Our UV/vis plate assay follows the accumulation of 4-nitrophenol (pNP), and provides rates in units of OD/min. We need to convert these into M/min since what we actually care about is product concentration (not OD). A standard curve relating OD/min to concentration of product in M/min can be used to transform the OD/min data.
In this example, we have experimentally determined that $$ \textrm{ pNP } \frac{M}{min} = 0.0002 \textrm{ pNP } \frac{OD}{min} $$.
In this case, the extinction coefficient of BglB is 113330, and we diluted the enzyme twice: first 100-fold and then in the assay plate 4-fold.
example_data.csv
In this case, these unit conversions have already been performed on the example data. We could use the following Python code to do the unit conversions for us.
ext_coef = 113330
protein_yield = 1 #units of mg/mL
protein_conc = protein_yield / ext_coef / 100 / 4
df.rate = 0.0002 * df.rate / protein_conc
Let's generate a plot of our data using the Python package matplotlib
to see if it will fit the Michaelis-Menten equation.
First, let's import the packages we'll need and do some iPython magic to get the plots to display on this page.
import pandas
%matplotlib inline
Next, we'll read in the example data into a Pandas DataFrame
, a data structure similar to a table that's a whole lot smarter and more useful. You can call the DataFrame
whatever you like. I'll call this one df
.
df = pandas.read_csv( 'example_data.csv' )
Now that we have our data in a DataFrame
, we can use the built-in method plot()
of the DataFrame
to make a quick scatter plot of our data and display it on screen.
df.plot( x='substrate', y='rate', kind='scatter' )
<matplotlib.axes._subplots.AxesSubplot at 0x110b188d0>
That looks like it will fit the Michaelis-Menten equation. Since we've converted all our values to use the same units, we can estimate $k_{cat}$ and $K_M$ from looking at the plot.
I estimate $k_{cat}$ to be about 700 min$^{-1}$ and $K_M$, the substrate concentration where the rate it half its max, to be 0.005 M, or 5 mM.
Next, we'll use the Python module SciPy to perform a nonlinear least-squares optimization to determine the values of $k_{cat}$ and $K_M$ that fit our data best. First, we'll import SciPy and a couple of Numpy modules that we'll need later
from scipy.optimize import curve_fit
from numpy import diag, sqrt
and define the Michaelis-Menten equation in Python code
def v( s, kcat, km ):
return ( kcat * s ) / ( km + s )
We need to provide curve_fit
with initial guesses at the parameters, which we estimated from the scatter plot above. The value for $k_{cat}$ (the maximum rate observed) appears to be about 700 min$^{-1}$, and the value for $K_M$ looks to be about 0.005.
Put these in to a new tuple, called p0
.
p0 = ( 650, 0.005 )
Now we're ready to use curve_fit
. The curve_fit
documentation indicates that the function returns two arrays, called popt
(= parameter optimums) and pcov
(= parameter covariance). From the documentation:
popt : array Optimal values for the parameters so that the sum of the squared error of f(xdata, *popt) - ydata is minimized
pcov : 2d array The estimated covariance of popt. The diagonals provide the variance of the parameter estimate. To compute one standard deviation errors on the parameters use perr = np.sqrt(np.diag(pcov)).
I recommend setting up your code to have the function return into two variables called popt
and pcov
.
In this next step, we'll perform the optimization and convert our one standard deviation errors into percent, which I find easier to make sense of for large data sets.
popt, pcov = curve_fit( v, df.substrate, df.rate, p0=p0 )
perr = sqrt( diag( pcov ) )
# calculate percent errors
for i in range( len( popt ) ):
if not popt[ i ] or perr[ i ] > popt[ i ]:
popt[ i ] = perr[ i ] = None
else:
perr[ i ] = perr[ i ] / popt [ i ] * 100
Whew! That's it! Let's print out our results in a nice format, rounding numbers as necessary
results = {
'kcat': '{:0.1f}'.format( popt[0] ),
'kcat_std_err': '{:0.1f}%'.format( perr[0] ),
'km': '{:0.4f}'.format( popt[1] ),
'km_std_err': '{:0.1f}%'.format( perr[1] )
}
print pandas.Series( results )
kcat 693.4 kcat_std_err 1.8% km 0.0045 km_std_err 7.2% dtype: object
OK, that is a nice simple example, but what if you have collected data on more than one sample and want to fit them all?
If you have multiple samples you'd like to fit at once, we can write a slightly more sophisticated program by wrapping up our previous code in a function and then using the pandas.DataFrame.groupby()
method to apply it to all of our samples.
To write this function, all we need to consider is a Pandas DataFrame
object, df
, exactly as we already have.
def try_michaelis_menten_fit( df ):
popt, pcov = curve_fit( v, df.substrate, df.rate, p0=p0 )
perr = sqrt( diag( pcov ) )
# percent errors
for i in range( len( popt ) ):
if not popt[ i ] or perr[ i ] > popt[ i ]:
popt[ i ] = perr[ i ] = None
else:
perr[ i ] = perr[ i ] / popt [ i ] * 100
# format results
results = {
'kcat': '{:0.1f}'.format( popt[0] ),
'kcat_std_err': '{:0.1f}%'.format( perr[0] ),
'km': '{:0.4f}'.format( popt[1] ),
'km_std_err': '{:0.1f}%'.format( perr[1] )
}
return pandas.Series( results )
All we've done here is take our previous code and wrap it in a function called try_michaelis_menten_fit
that takes a DataFrame df
and returns a Series.
Try running it on the example data and see that it returns the same values.
example_df = pandas.read_csv( 'example_data.csv' )
try_michaelis_menten_fit( example_df )
kcat 693.4 kcat_std_err 1.8% km 0.0045 km_std_err 7.2% dtype: object
Wrapping the code in the function try_michaelis_menten_fit()
allows us to reuse the code on new DataFrame
s. It's not very useful until we have some more samples in our data set. Look for some experimental data containing multiple samples in multiple_example_data.csv
.
We're also going to need some help for looking at plots of a lot more (about 100-fold more) data. We'll recruit Seaborn for this.
import seaborn
multiple_example_df = pandas.read_csv( 'multiple_example_data.csv' )
seaborn.lmplot( fit_reg=False, col='sample', col_wrap=3, x='substrate', y='rate', data=multiple_example_df, sharey=False, size=2.3 )
<seaborn.axisgrid.FacetGrid at 0x110b75490>
Whew, that's a lot of data, representing a lot of work by a lot of people! A lot of these look like they will fit the Michaelis-Menten equation. Some of them don't. Our sanity-checking code will return "NaN" (Pandas jargon for "not a number") on those samples that don't fit the equation and those samples where the magnitude of the error is greater than the magnitude of the parameter.
Let's try apply
ing our try_michaelis_menten_fit()
function to the DataFrame
containing our data.
multiple_example_df.groupby( 'sample' ).apply( try_michaelis_menten_fit )
kcat | kcat_std_err | km | km_std_err | |
---|---|---|---|---|
sample | ||||
a192s | 945.8 | 1.0% | 0.0051 | 3.6% |
c167a | 478.9 | 3.0% | 0.0146 | 8.7% |
c167n+w120r | 0.4 | 21.9% | 0.0262 | 56.0% |
c167q | 503.7 | 1.1% | 0.0049 | 3.9% |
d403a | 1.2 | 22.7% | 0.0440 | 49.1% |
e154d | 878.1 | 5.3% | 0.0035 | 21.9% |
e164a | 0.2 | 3.6% | 0.0010 | 16.9% |
e177a | 986.2 | 1.0% | 0.0060 | 3.8% |
e177k | 554.9 | 1.3% | 0.0062 | 4.8% |
e177l | 669.5 | 1.5% | 0.0075 | 4.9% |
e17s | 641.2 | 1.5% | 0.0073 | 5.2% |
e222a | 89.9 | 4.7% | 0.0006 | 24.7% |
e222h | 159.7 | 2.0% | 0.0085 | 6.2% |
e222k | 108.0 | 2.9% | 0.0072 | 10.4% |
e222q | 667.6 | 1.7% | 0.0122 | 5.4% |
e222r | 42.3 | 4.4% | 0.0025 | 17.9% |
e222y | 11.7 | 6.6% | 0.0184 | 17.0% |
e353a | nan | nan% | nan | nan% |
e406a | 1.0 | 45.3% | 0.1175 | 70.1% |
e406d | 39.1 | 3.2% | 0.0341 | 7.5% |
e423s | 645.6 | 1.9% | 0.0066 | 6.3% |
f405a | 0.6 | 14.0% | 0.0035 | 43.9% |
f415a | 1.3 | 8.2% | 0.0166 | 24.0% |
f75a | 613.3 | 1.4% | 0.0055 | 5.2% |
g355a | 1.0 | 8.2% | 0.0059 | 27.9% |
h101r | 1059.1 | 1.5% | 0.0106 | 5.0% |
h119a | 143.2 | 7.4% | 0.0151 | 22.2% |
h119e | 0.7 | 22.3% | 0.1768 | 30.3% |
h119n | 1.9 | 3.6% | 0.0232 | 9.5% |
h178a | 112.8 | 3.0% | 0.0077 | 9.5% |
... | ... | ... | ... | ... |
s400a | 530.7 | 1.7% | 0.0032 | 7.0% |
t175r | 801.3 | 1.0% | 0.0036 | 4.2% |
t218a | 464.4 | 4.0% | 0.0065 | 14.4% |
t296a | 109.4 | 2.1% | 0.0110 | 7.0% |
t352a | 59.5 | 4.0% | 0.0143 | 12.3% |
v147s | 4.6 | 2.6% | 0.0065 | 9.6% |
v52g | 687.5 | 1.9% | 0.0082 | 6.6% |
w120a | 66.0 | 2.8% | 0.0667 | 5.3% |
w120f | 471.8 | 4.4% | 0.0161 | 12.9% |
w120h | 84.1 | 2.9% | 0.0892 | 4.8% |
w325a | 29.4 | 3.0% | 0.0016 | 14.0% |
w325c | 10.5 | 3.2% | 0.0042 | 12.7% |
w325h | 34.9 | 3.4% | 0.0031 | 14.1% |
w325l | 108.5 | 1.7% | 0.0057 | 6.1% |
w34a | 0.5 | 52.5% | nan | nan% |
w399a | 0.2 | 5.2% | 0.0166 | 15.2% |
w399c | 2.7 | 4.6% | 0.0703 | 8.4% |
w399g | 0.1 | 9.9% | 0.0001 | 63.7% |
w399s | 0.2 | 29.8% | 0.0258 | 70.4% |
w407a | 11.4 | 4.3% | 0.0076 | 15.2% |
w407g | nan | nan% | nan | nan% |
w407q | 0.5 | 17.6% | 0.0081 | 58.8% |
w407r | 1.0 | 86.3% | nan | nan% |
w409a | 0.4 | 31.7% | 0.0311 | 76.8% |
y166p | 26.5 | 4.5% | 0.0025 | 18.1% |
y18a | 196.5 | 4.7% | 0.0315 | 11.5% |
y294a | 165.6 | 1.0% | 0.0050 | 3.4% |
y294f | 734.9 | 1.5% | 0.0060 | 5.4% |
y295a | 0.2 | 12.3% | 0.0130 | 38.2% |
y295g | 0.1 | 27.5% | nan | nan% |
103 rows × 4 columns
Question: Why is estimating the parameters faster than drawing the plots?