Note: This is the instructor copy, for the tutorial use the participant copy (which is based on this copy with most of the code removed).
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
from IPython.core.display import Image
import MDAnalysis as mda
print(mda.__version__)
0.12.1
Defined datapath
to point to the directory where you downloaded the tutorial trajectories to.
import os.path
datapath = os.path.expanduser("~/Workshops/CECAM/tutorial/sandbox")
datapath
'/Users/oliver/Workshops/CECAM/tutorial/sandbox'
Look again at AdK trajectories...
eqtop = "../../sandbox/equilibrium/adk4AKE.psf"
eqtrj= "../../sandbox/equilibrium/1ake_007-nowater-core-dt240ps.dcd"
top = "../../sandbox/dims/adk4ake.psf"
trj = "../../sandbox/dims/dims_co_001.dcd"
#coDIMS = [os.path.join(datadir, "dims", "dims_co_{0:03n}.dcd".format(i)) for i in range(1,4)]
#ocDIMS = [os.path.join(datadir, "dims", "dims_oc_{0:03n}.dcd".format(i)) for i in range(1,4)]
u = mda.Universe(top, trj)
u
<Universe with 3341 atoms and 3365 bonds>
eq = mda.Universe(eqtop, eqtrj)
eq
<Universe with 3341 atoms and 3365 bonds>
The Universe.trajectory
object is the entry point to all trajectory functionality:
u.trajectory
< DCDReader '../../sandbox/dims/dims_co_001.dcd' with 102 frames of 3341 atoms>
Trajectories have a length in frames and in ps (the base MDAnalysis time unit):
u.trajectory.n_frames
102
len(u.trajectory)
102
u.trajectory.totaltime
101.99999383947338
Timestep
¶They also contain a Timestep
object, which holds the current
[A, B, C, alpha, beta, gamma]
ts = u.trajectory.ts
ts
< Timestep 0 with unit cell dimensions [ 0. 0. 0. 90. 90. 90.] >
ts.frame
0
ts.time
0.0
ts.dimensions
array([ 0., 0., 0., 90., 90., 90.], dtype=float32)
eq.trajectory.ts.dimensions
array([ 85.53471375, 85.53471375, 85.53471375, 90. , 90. , 90. ], dtype=float32)
ts.positions
array([[ 26.87939835, 53.00944901, 40.99979401], [ 26.80416107, 52.05094147, 41.55620956], [ 26.58082199, 53.76752472, 41.59049988], ..., [ 14.87982082, 56.23511505, 39.41582489], [ 14.59319496, 57.44499207, 39.77984619], [ 15.05454922, 55.95846176, 38.18389893]], dtype=float32)
How many coordinates?
ts.positions.shape
(3341, 3)
ts.velocities
--------------------------------------------------------------------------- NoDataError Traceback (most recent call last) <ipython-input-30-d44cc7e6970a> in <module>() ----> 1 ts.velocities /Users/oliver/Library/Python/2.7/lib/python/site-packages/MDAnalysis-0.12.1-py2.7-macosx-10.6-x86_64.egg/MDAnalysis/coordinates/base.pyc in velocities(self) 522 return self._velocities 523 else: --> 524 raise NoDataError("This Timestep has no velocities") 525 526 @velocities.setter NoDataError: This Timestep has no velocities
ts.forces
--------------------------------------------------------------------------- NoDataError Traceback (most recent call last) <ipython-input-31-df2ea954b43c> in <module>() ----> 1 ts.forces /Users/oliver/Library/Python/2.7/lib/python/site-packages/MDAnalysis-0.12.1-py2.7-macosx-10.6-x86_64.egg/MDAnalysis/coordinates/base.pyc in forces(self) 569 return self._forces 570 else: --> 571 raise NoDataError("This Timestep has no forces") 572 573 @forces.setter NoDataError: This Timestep has no forces
What is the time between frames?
ts.dt
0.9999999396026802
Universe.trajectory
behaves a bit like a list of Timesteps:
Timestep
: go to frameTimestep
: go to each frame in sequenceTrajectory indexing and slicing uses 0-based indices (as in standard Python) and MDAnalysis also numbers frames starting with 0. Thus the "tenth frame" in a trajectory has ts.frame == 9
.
Let's move to an arbitrary frame in the trajectory:
u.trajectory[42]
< Timestep 42 with unit cell dimensions [ 0. 0. 0. 90. 90. 90.] >
u.trajectory.ts.frame
42
ts.frame
42
u.trajectory[-23]
< Timestep 79 with unit cell dimensions [ 0. 0. 0. 90. 90. 90.] >
u.trajectory.ts
< Timestep 79 with unit cell dimensions [ 0. 0. 0. 90. 90. 90.] >
Note that you cannot "remember" a Timestep
: The following will not keep a copy of the timeset at frame 20:
ts20 = u.trajectory[20]
ts20.frame
20
u.trajectory[42]
< Timestep 42 with unit cell dimensions [ 0. 0. 0. 90. 90. 90.] >
ts20
< Timestep 42 with unit cell dimensions [ 0. 0. 0. 90. 90. 90.] >
If you really want to, you can make a copy (but this is not used very much):
ts42copy = u.trajectory.ts.copy()
ts42copy.frame
42
u.trajectory[2]
< Timestep 2 with unit cell dimensions [ 0. 0. 0. 90. 90. 90.] >
ts42copy.frame
42
The most common usage pattern in MDAnalysis: Access each frame in a trajectory:
results = []
for ts in u.trajectory:
results.append(ts.frame)
results = np.array(results)
results
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101])
... or arbitrary slices:
results = np.fromiter((ts.frame for ts in u.trajectory[10::20]), dtype=np.int64)
results
array([10, 30, 50, 70, 90])
... and for numpy afficionados, "fancy indexing" also works:
results = np.fromiter((ts.frame for ts in u.trajectory[[10, 20, 10, 0]]), dtype=np.int64)
results
array([10, 20, 10, 0])
Note: It does not make sense to create a list of timesteps like
timesteps = [ts for ts in u.trajectory]
or
timesteps = list(u.trajectory)
because each list item will refer to the same currently active timestep (the first one because at the end of iteration the trajectory rewinds).
Produce an array from ts.frame
that is reversed, ie [101, 100, ..., 1, 0]
results = np.array([ts.frame for ts in reversed(u.trajectory)])
results
array([101, 100, 99, 98, 97, 96, 95, 94, 93, 92, 91, 90, 89, 88, 87, 86, 85, 84, 83, 82, 81, 80, 79, 78, 77, 76, 75, 74, 73, 72, 71, 70, 69, 68, 67, 66, 65, 64, 63, 62, 61, 60, 59, 58, 57, 56, 55, 54, 53, 52, 51, 50, 49, 48, 47, 46, 45, 44, 43, 42, 41, 40, 39, 38, 37, 36, 35, 34, 33, 32, 31, 30, 29, 28, 27, 26, 25, 24, 23, 22, 21, 20, 19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0])
np.array([ts.frame for ts in u.trajectory[::-1]])
array([101, 100, 99, 98, 97, 96, 95, 94, 93, 92, 91, 90, 89, 88, 87, 86, 85, 84, 83, 82, 81, 80, 79, 78, 77, 76, 75, 74, 73, 72, 71, 70, 69, 68, 67, 66, 65, 64, 63, 62, 61, 60, 59, 58, 57, 56, 55, 54, 53, 52, 51, 50, 49, 48, 47, 46, 45, 44, 43, 42, 41, 40, 39, 38, 37, 36, 35, 34, 33, 32, 31, 30, 29, 28, 27, 26, 25, 24, 23, 22, 21, 20, 19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0])
Note: slicing and indexing works for many major trajectory formats (dcd, xtc, trr, Amber netcdf) but not for all of them (although work is in progress...)
Atomgroup
s¶Using the raw Timestep
positions is cumbersome. Moving the time step automatically updates all positions that Atomgroup.positions
sees at this instance:
u.trajectory[0]
< Timestep 0 with unit cell dimensions [ 0. 0. 0. 90. 90. 90.] >
protein = u.select_atoms("protein")
protein
<AtomGroup with 3341 atoms>
Calculate the radius of gyration
\begin{equation*}
R_{\text{gyr}} = \sqrt{\frac{1}{M}\sum_{i=1}^{N}m_{i}(\textbf{r}_{i} - \textbf{R})^{2}}
\end{equation*}
with AtomGroup.radius_of_gyration()
:
protein.radius_of_gyration()
16.686297031597615
What's the current frame?
u.trajectory.ts.frame
0
Move to the last one:
u.trajectory[-1]
< Timestep 101 with unit cell dimensions [ 0. 0. 0. 90. 90. 90.] >
protein.radius_of_gyration()
19.579294280925577
Remember: All coordinate-dependent properties of an AtomGroup
change when the trajectory index is changed. Think of a pointer moving along the trajectory file. AtomGroup
fetches coordinates from the frame that the pointer points to.
Collect (time, radius_of_gyration)
for the protein for frames and plot $R_{\mathrm{gyr}}(t)$, with time in ps.
protein = u.select_atoms("protein")
results = []
for ts in u.trajectory[:500]:
results.append((ts.time, protein.radius_of_gyration()))
time, rgyr = np.transpose(results)
ax = plt.subplot(111)
ax.plot(time, rgyr)
ax.set_xlabel("time (ps)")
ax.set_ylabel(r"$R_{\mathrm{gyr}}$ ($\AA$)")
<matplotlib.text.Text at 0x7f8de8c72910>
peq = eq.select_atoms("protein")
eqrgyr = np.array([(ts.time, peq.radius_of_gyration()) for ts in eq.trajectory]).transpose()
plt.plot(eqrgyr[0], eqrgyr[1])
[<matplotlib.lines.Line2D at 0x7f8de8981410>]
theta_NMP()
and theta_LID()
.import numpy as np
from numpy.linalg import norm
def theta_NMP(ag):
"""Calculate the NMP-CORE angle for E. coli AdK in degrees"""
C = ag.select_atoms("resid 115:125 and backbone").center_of_geometry()
B = ag.select_atoms("resid 90:100 and backbone").center_of_geometry()
A = ag.select_atoms("resid 35:55 and backbone").center_of_geometry()
BA = A - B
BC = C - B
theta = np.arccos(np.dot(BA, BC)/(norm(BA)*norm(BC)))
return np.rad2deg(theta)
def theta_LID(ag):
"""Calculate the LID-CORE angle for E. coli AdK in degrees"""
C = ag.select_atoms("resid 179:185 and backbone").center_of_geometry()
B = ag.select_atoms("resid 115:125 and backbone").center_of_geometry()
A = ag.select_atoms("resid 125:153 and backbone").center_of_geometry()
BA = A - B
BC = C - B
theta = np.arccos(np.dot(BA, BC)/(norm(BA)*norm(BC)))
return np.rad2deg(theta)
data = np.array([[ts.time, theta_NMP(u), theta_LID(u)] for ts in u.trajectory])
data[:3]
array([[ 0. , 43.29564285, 106.30675507], [ 0.99999994, 44.1594696 , 106.80901337], [ 1.99999988, 44.57506561, 108.23643494]])
time, aNMP, aLID = data.transpose()
ax1 = plt.subplot(121)
ax1.plot(time, aNMP, label="NMP")
ax1.plot(time, aLID, label="LID")
ax1.set_xlabel("time (ps)")
ax1.set_ylabel("angle (degree)")
ax1.legend(loc="best")
ax2 = plt.subplot(122)
ax2.plot(aNMP, aLID)
ax2.set_xlabel(r"$\theta_\mathrm{NMP}$")
ax2.set_ylabel(r"$\theta_\mathrm{LID}$")
ax2.set_aspect(1)
plt.tight_layout()
ax = plt.subplot(111)
ax.plot(rgyr-rgyr[0], aNMP-aNMP[0], label="NMP")
ax.plot(rgyr-rgyr[0], aLID-aLID[0], label="LID")
ax.set_xlabel(r"$\Delta R_\mathrm{gyr}$ ($\AA$)")
ax.set_ylabel(r"$\Delta\theta$ (degrees)")
ax.legend(loc="best")
<matplotlib.legend.Legend at 0x7f8de8401050>
Bonus: note, only use first 200 frames of the equilibrium trajectory.
eqdata = np.array([[ts.time, theta_NMP(eq), theta_LID(eq)] for ts in eq.trajectory[:200]]).transpose()
ax1 = plt.subplot(121)
ax1.plot(eqdata[0]/1000, eqdata[1], label="NMP")
ax1.plot(eqdata[0]/1000, eqdata[2], label="LID")
ax1.set_xlabel("time (ns)")
ax1.set_ylabel("angle (degrees)")
ax2 = plt.subplot(122)
ax2.plot(eqdata[1], eqdata[2])
ax2.plot(aNMP, aLID, color="red")
ax2.set_xlabel(r"$\theta_\mathrm{NMP}$")
ax2.set_ylabel(r"$\theta_\mathrm{LID}$")
ax2.set_aspect(1)
plt.tight_layout()
If you have a quantity that is ceap to compute, you can use it to filter a trajectory:
peq = eq.select_atoms("protein")
closed = [ts.frame for ts in eq.trajectory[:100] if peq.radius_of_gyration() < 19]
u.trajectory[closed]
<generator object listiter at 0x7f8de88b8d70>
plt.plot(eqdata[0, closed], eqdata[1, closed], '+')
plt.plot(eqdata[0, closed], eqdata[2, closed], '+')
[<matplotlib.lines.Line2D at 0x7f8de88544d0>]
One can easily modify the coordinates of the current timestep: Either directly in Timestep.positions
or via AtomGroup
methods such as translate
or rotate
.
In particular, we can optimally superimpose RMSD fitting.
import MDAnalysis.analysis.rms as rms
We want to calculate the RMSD for an optimal superposition of the C-alpha atoms. As a reference, we take the coordinates of the initial frame of the trajectory:
ca = u.select_atoms("name CA")
u.trajectory[0]
ref = ca.positions.copy()
ref_com = ca.center_of_mass()
Remove translations by moving to the center of mass of the reference:
u.trajectory[-1]
ca.translate(-ca.center_of_mass() + ref_com)
array([-0.10115989, 0.64150608, 1.31268827])
Then calculate the RMSD after optimum RMSD superposition with rmsd()
:
rms.rmsd(ca.positions, ref)
7.302982419868816
Actually, rmsd()
has the center keyword that does a superposition for us:
u.trajectory[20]
rms.rmsd(ca.positions, ref, center=True)
2.221923547136809
u.trajectory[-2]
rms.rmsd(ca.positions, ref, center=True)
6.804156136021932
You can manually superimpose using translation and the optimal rotation matrix (from rms.rotation_matrix()
) but for many use cases see the contents of the MDAnalysis.analysis.align module.
# maybe??? alignto()
Calculate $\mathrm{RMSD}(t)$ (relative to the initial frame) and plot it.
MDAnalysis.rms.RMSD
;-)data = np.array([[ts.time, rms.rmsd(ca.positions, ref, center=True)]
for ts in u.trajectory]).transpose()
data.shape
(2, 102)
ax1 = plt.subplot(111)
ax1.plot(data[0], data[1], label="DIMS")
ax1.set_xlabel("time (ps)")
ax1.set_ylabel(r"RMSD (\AA)")
ax1.legend(loc="best")
<matplotlib.legend.Legend at 0x7f8de78a8c90>
Writing out coordinates (David introduced the selection writer).
Basic pattern:
with
and context managern_atoms
(typically from the AtomGroup
that you are going to write)format
keyword argument if extension is not recognizedwrite()
each time step (or AtomGroup
)Converting trajectory formats:
xtc = "new.xtc"
with mda.Writer(xtc, n_atoms=u.atoms.n_atoms) as W:
for ts in u.trajectory:
W.write(ts)
!ls -l *.xtc
-rw-r--r-- 1 oliver oliver 1296904 Oct 13 15:01 new.xtc
uxtc = mda.Universe(psf, xtc)
uxtc
<Universe with 3341 atoms and 3365 bonds>
import os
os.unlink(xtc)
Working with many trajectories is tedious....