This notebook introduces the fundamental objects of MDAnalysis
: the Universe
, AtomGroup
, and Atom
objects. It also introduces the selection language, and the basics of reading and slicing trajectories.
Universes
and AtomGroups
¶"If you wish to make an apple pie from scratch, you must first invent the Universe."
~ Carl Sagan
One of the fundamental objects in the MDAnalysis
data model is the Universe
object. This contains all of a simulations' topology information at the least, but usually also includes trajectory information as well. A Universe
can be thought of as an interface to the raw details of a simulation.
First, we need to import MDAnalysis
, giving us access to all the components in its namespace:
import MDAnalysis as mda
In order to do anything, we do need some actual molecular dynamics data to work with. You should have downloaded a set of datafiles for this workshop; I placed mine in the following location relative to this notebook:
datapath = 'sandbox/'
To make a Universe, we need at the very least a topology file. We'll use one for a simulation system that includes adenylate kinase with the water removed:
import os
top = os.path.join(datapath, 'equilibrium/adk4AKE.psf')
u = mda.Universe(top)
print u
<Universe with 3341 atoms and 3365 bonds>
We now have a Universe
object. Since the PSF file we loaded contained both atom identities and bond information, the Universe
is able to access these details.
We can access all atoms in the Universe
through the Universe.atoms
attribute. This gives access to many details, including basic topology details for each atom.
u.atoms.indices
array([ 0, 1, 2, ..., 3338, 3339, 3340])
u.atoms.names
array(['N', 'HT1', 'HT2', ..., 'CA', 'HA1', 'HA2'], dtype='|S4')
u.atoms.resnames
array(['MET', 'MET', 'MET', ..., 'GLY', 'GLY', 'GLY'], dtype='|S3')
u.atoms.resids
array([ 1, 1, 1, ..., 214, 214, 214])
u.atoms.charges
array([-0.3 , 0.33, 0.33, ..., -0.02, 0.09, 0.09])
u.atoms.masses
array([ 14.007, 1.008, 1.008, ..., 12.011, 1.008, 1.008])
u.atoms.types
array(['NH3', 'HC', 'HC', ..., 'CT2', 'HB', 'HB'], dtype='|S4')
These are all properties of an AtomGroup
, of which Universe.atoms
is one such instance.
type(u.atoms)
MDAnalysis.core.AtomGroup.AtomGroup
Many of the methods of an AtomGroup
return numpy arrays of the same length as the AtomGroup
itself; that is, each element corresponds to each atom in the AtomGroup
, in order. That means that if we want only, say, the masses of the 3rd through the 11th atom in our topology, we could simply slice like so:
u.atoms.masses[2:11]
array([ 1.008, 1.008, 12.011, 1.008, 12.011, 1.008, 1.008, 12.011, 1.008])
Note: AtomGroup
indexes are 0-based, even if the topology file gives the atoms 1-based indices. This is for consistency with the rest of the Python ecosystem, including numpy
.
But AtomGroup
s themselves are actually sliceable, so we could instead do:
u.atoms[2:11].masses
array([ 1.008, 1.008, 12.011, 1.008, 12.011, 1.008, 1.008, 12.011, 1.008])
to achieve the same effect. Slicing of an AtomGroup
returns an AtomGroup
containing the atoms in the slice, so this gives a basic way of accessing a subset of the atoms present in the topology.
Fancy indexing also works. We can, for example, get an AtomGroup
containing the 3rd, 22nd, 3rd, and second-to-last atom in the Universe
with something like:
ag = u.atoms[[2, 21, 2, -2]]
ag.indices
array([ 2, 21, 2, 3339])
We can also use expressions that return boolean arrays to select out atoms of interest. We can, for exmple, get an AtomGroup
of all atoms in the system with mass less than 10 u with:
u.atoms[u.atoms.masses < 10]
<AtomGroup with 1685 atoms>
We can string together boolean expressions to obtain even more granular selections, for example getting all atoms with mass less than 10 u AND within an arginine residue:
u.atoms[(u.atoms.masses < 10) & (u.atoms.resnames == 'ARG')]
<AtomGroup with 169 atoms>
All topology properties of an AtomGroup
can be harnessed in this way to make complex subselections.
We can also work with individual atoms, though in practice this is rare, and in the future will be relatively inefficient when operating on multiple atoms compared to operating on a whole AtomGroup
. At the moment, however, these operations are comparable in speed.
a = u.atoms[0]
a
<Atom 1: N of type NH3 of resname MET, resid 1 and segid ADK>
print "name:", a.name
print "resid:", a.resid
print "resname:", a.resname
name: N resid: 1 resname: MET
import numpy as np
r = u.atoms[u.atoms.resnames == 'GLY'].resids
len(np.unique(r))
20
ResidueGroups
and SegmentGroups
¶The Universe
also gives higher-order topology objects, including ResidueGroups
and SegmentGroups
. We can access all residues in the Universe
with:
u.residues
<ResidueGroup [<Residue MET, 1>, <Residue ARG, 2>, <Residue ILE, 3>, <Residue ILE, 4>, <Residue LEU, 5>, <Residue LEU, 6>, <Residue GLY, 7>, <Residue ALA, 8>, <Residue PRO, 9>, <Residue GLY, 10>, <Residue ALA, 11>, <Residue GLY, 12>, <Residue LYS, 13>, <Residue GLY, 14>, <Residue THR, 15>, <Residue GLN, 16>, <Residue ALA, 17>, <Residue GLN, 18>, <Residue PHE, 19>, <Residue ILE, 20>, <Residue MET, 21>, <Residue GLU, 22>, <Residue LYS, 23>, <Residue TYR, 24>, <Residue GLY, 25>, <Residue ILE, 26>, <Residue PRO, 27>, <Residue GLN, 28>, <Residue ILE, 29>, <Residue SER, 30>, <Residue THR, 31>, <Residue GLY, 32>, <Residue ASP, 33>, <Residue MET, 34>, <Residue LEU, 35>, <Residue ARG, 36>, <Residue ALA, 37>, <Residue ALA, 38>, <Residue VAL, 39>, <Residue LYS, 40>, <Residue SER, 41>, <Residue GLY, 42>, <Residue SER, 43>, <Residue GLU, 44>, <Residue LEU, 45>, <Residue GLY, 46>, <Residue LYS, 47>, <Residue GLN, 48>, <Residue ALA, 49>, <Residue LYS, 50>, <Residue ASP, 51>, <Residue ILE, 52>, <Residue MET, 53>, <Residue ASP, 54>, <Residue ALA, 55>, <Residue GLY, 56>, <Residue LYS, 57>, <Residue LEU, 58>, <Residue VAL, 59>, <Residue THR, 60>, <Residue ASP, 61>, <Residue GLU, 62>, <Residue LEU, 63>, <Residue VAL, 64>, <Residue ILE, 65>, <Residue ALA, 66>, <Residue LEU, 67>, <Residue VAL, 68>, <Residue LYS, 69>, <Residue GLU, 70>, <Residue ARG, 71>, <Residue ILE, 72>, <Residue ALA, 73>, <Residue GLN, 74>, <Residue GLU, 75>, <Residue ASP, 76>, <Residue CYS, 77>, <Residue ARG, 78>, <Residue ASN, 79>, <Residue GLY, 80>, <Residue PHE, 81>, <Residue LEU, 82>, <Residue LEU, 83>, <Residue ASP, 84>, <Residue GLY, 85>, <Residue PHE, 86>, <Residue PRO, 87>, <Residue ARG, 88>, <Residue THR, 89>, <Residue ILE, 90>, <Residue PRO, 91>, <Residue GLN, 92>, <Residue ALA, 93>, <Residue ASP, 94>, <Residue ALA, 95>, <Residue MET, 96>, <Residue LYS, 97>, <Residue GLU, 98>, <Residue ALA, 99>, <Residue GLY, 100>, <Residue ILE, 101>, <Residue ASN, 102>, <Residue VAL, 103>, <Residue ASP, 104>, <Residue TYR, 105>, <Residue VAL, 106>, <Residue LEU, 107>, <Residue GLU, 108>, <Residue PHE, 109>, <Residue ASP, 110>, <Residue VAL, 111>, <Residue PRO, 112>, <Residue ASP, 113>, <Residue GLU, 114>, <Residue LEU, 115>, <Residue ILE, 116>, <Residue VAL, 117>, <Residue ASP, 118>, <Residue ARG, 119>, <Residue ILE, 120>, <Residue VAL, 121>, <Residue GLY, 122>, <Residue ARG, 123>, <Residue ARG, 124>, <Residue VAL, 125>, <Residue HSE, 126>, <Residue ALA, 127>, <Residue PRO, 128>, <Residue SER, 129>, <Residue GLY, 130>, <Residue ARG, 131>, <Residue VAL, 132>, <Residue TYR, 133>, <Residue HSE, 134>, <Residue VAL, 135>, <Residue LYS, 136>, <Residue PHE, 137>, <Residue ASN, 138>, <Residue PRO, 139>, <Residue PRO, 140>, <Residue LYS, 141>, <Residue VAL, 142>, <Residue GLU, 143>, <Residue GLY, 144>, <Residue LYS, 145>, <Residue ASP, 146>, <Residue ASP, 147>, <Residue VAL, 148>, <Residue THR, 149>, <Residue GLY, 150>, <Residue GLU, 151>, <Residue GLU, 152>, <Residue LEU, 153>, <Residue THR, 154>, <Residue THR, 155>, <Residue ARG, 156>, <Residue LYS, 157>, <Residue ASP, 158>, <Residue ASP, 159>, <Residue GLN, 160>, <Residue GLU, 161>, <Residue GLU, 162>, <Residue THR, 163>, <Residue VAL, 164>, <Residue ARG, 165>, <Residue LYS, 166>, <Residue ARG, 167>, <Residue LEU, 168>, <Residue VAL, 169>, <Residue GLU, 170>, <Residue TYR, 171>, <Residue HSE, 172>, <Residue GLN, 173>, <Residue MET, 174>, <Residue THR, 175>, <Residue ALA, 176>, <Residue PRO, 177>, <Residue LEU, 178>, <Residue ILE, 179>, <Residue GLY, 180>, <Residue TYR, 181>, <Residue TYR, 182>, <Residue SER, 183>, <Residue LYS, 184>, <Residue GLU, 185>, <Residue ALA, 186>, <Residue GLU, 187>, <Residue ALA, 188>, <Residue GLY, 189>, <Residue ASN, 190>, <Residue THR, 191>, <Residue LYS, 192>, <Residue TYR, 193>, <Residue ALA, 194>, <Residue LYS, 195>, <Residue VAL, 196>, <Residue ASP, 197>, <Residue GLY, 198>, <Residue THR, 199>, <Residue LYS, 200>, <Residue PRO, 201>, <Residue VAL, 202>, <Residue ALA, 203>, <Residue GLU, 204>, <Residue VAL, 205>, <Residue ARG, 206>, <Residue ALA, 207>, <Residue ASP, 208>, <Residue LEU, 209>, <Residue GLU, 210>, <Residue LYS, 211>, <Residue ILE, 212>, <Residue LEU, 213>, <Residue GLY, 214>]>
And all segments with:
u.segments
<SegmentGroup [<Segment ADK>]>
ResidueGroups
and SegmentGroups
also behave similarly to AtomGroups
, with many of their methods returning numpy
arrays with each element corresponding to a single residue or segment, respectively.
u.residues.resnames
array(['MET', 'ARG', 'ILE', 'ILE', 'LEU', 'LEU', 'GLY', 'ALA', 'PRO', 'GLY', 'ALA', 'GLY', 'LYS', 'GLY', 'THR', 'GLN', 'ALA', 'GLN', 'PHE', 'ILE', 'MET', 'GLU', 'LYS', 'TYR', 'GLY', 'ILE', 'PRO', 'GLN', 'ILE', 'SER', 'THR', 'GLY', 'ASP', 'MET', 'LEU', 'ARG', 'ALA', 'ALA', 'VAL', 'LYS', 'SER', 'GLY', 'SER', 'GLU', 'LEU', 'GLY', 'LYS', 'GLN', 'ALA', 'LYS', 'ASP', 'ILE', 'MET', 'ASP', 'ALA', 'GLY', 'LYS', 'LEU', 'VAL', 'THR', 'ASP', 'GLU', 'LEU', 'VAL', 'ILE', 'ALA', 'LEU', 'VAL', 'LYS', 'GLU', 'ARG', 'ILE', 'ALA', 'GLN', 'GLU', 'ASP', 'CYS', 'ARG', 'ASN', 'GLY', 'PHE', 'LEU', 'LEU', 'ASP', 'GLY', 'PHE', 'PRO', 'ARG', 'THR', 'ILE', 'PRO', 'GLN', 'ALA', 'ASP', 'ALA', 'MET', 'LYS', 'GLU', 'ALA', 'GLY', 'ILE', 'ASN', 'VAL', 'ASP', 'TYR', 'VAL', 'LEU', 'GLU', 'PHE', 'ASP', 'VAL', 'PRO', 'ASP', 'GLU', 'LEU', 'ILE', 'VAL', 'ASP', 'ARG', 'ILE', 'VAL', 'GLY', 'ARG', 'ARG', 'VAL', 'HSE', 'ALA', 'PRO', 'SER', 'GLY', 'ARG', 'VAL', 'TYR', 'HSE', 'VAL', 'LYS', 'PHE', 'ASN', 'PRO', 'PRO', 'LYS', 'VAL', 'GLU', 'GLY', 'LYS', 'ASP', 'ASP', 'VAL', 'THR', 'GLY', 'GLU', 'GLU', 'LEU', 'THR', 'THR', 'ARG', 'LYS', 'ASP', 'ASP', 'GLN', 'GLU', 'GLU', 'THR', 'VAL', 'ARG', 'LYS', 'ARG', 'LEU', 'VAL', 'GLU', 'TYR', 'HSE', 'GLN', 'MET', 'THR', 'ALA', 'PRO', 'LEU', 'ILE', 'GLY', 'TYR', 'TYR', 'SER', 'LYS', 'GLU', 'ALA', 'GLU', 'ALA', 'GLY', 'ASN', 'THR', 'LYS', 'TYR', 'ALA', 'LYS', 'VAL', 'ASP', 'GLY', 'THR', 'LYS', 'PRO', 'VAL', 'ALA', 'GLU', 'VAL', 'ARG', 'ALA', 'ASP', 'LEU', 'GLU', 'LYS', 'ILE', 'LEU', 'GLY'], dtype='|S3')
u.segments.segids
array(['ADK'], dtype='|S3')
We can more easily get the number of glycine residues using a ResidueGroup
.
u.atoms[u.atoms.resnames == 'GLY'].residues.n_residues
20
or perhaps more Pythonic:
len(u.atoms[u.atoms.resnames == 'GLY'].residues)
20
AtomGroups
are more interesting with coordinate data attached to them. Our topology file has none of this, so using
u.atoms.positions
--------------------------------------------------------------------------- AttributeError Traceback (most recent call last) <ipython-input-39-878f7dd8f697> in <module>() ----> 1 u.atoms.positions /home/alter/.local/lib/python2.7/site-packages/MDAnalysis/core/AtomGroup.pyc in __getattr__(self, name) 1041 except SelectionError: 1042 raise AttributeError("'{0}' object has no attribute '{1}'".format( -> 1043 self.__class__.__name__, name)) 1044 1045 def _get_named_atom(self, name): AttributeError: 'AtomGroup' object has no attribute 'positions'
yields an AttributeError
. Let's load some coordinates by creating a new Universe, but with a topology + a trajectory.
traj = os.path.join(datapath, 'equilibrium', '1ake_007-nowater-core-dt240ps.dcd')
u = mda.Universe(top, traj)
Now this will work.
u.atoms.positions
array([[ 2.87066364, 10.60445595, 9.75028801], [ 3.10920501, 11.32470894, 9.0389967 ], [ 3.74609542, 10.24300766, 10.17991066], ..., [ -7.49935293, 10.89219856, 12.33476448], [ -6.59658432, 10.83427048, 12.92472839], [ -8.34826946, 10.59926796, 12.93470669]], dtype=float32)
u.atoms[u.atoms.names == 'CA'].positions.mean(axis=0)
array([ 1.04244471, 2.23443413, -5.1523819 ], dtype=float32)
Alternatively, we have a builtin for this:
u.atoms[u.atoms.names == 'CA'].centroid()
array([ 1.04244471, 2.23443413, -5.1523819 ], dtype=float32)
We can also get at connectivity information between atoms, such as bonds, angles, and dihedrals.
u.bonds
<TopologyGroup containing 3365 Bonds>
u.angles
<TopologyGroup containing 6123 Angles>
u.dihedrals
<TopologyGroup containing 8921 Dihedrals>
u.bonds[3]
<Bond between: Atom 1 (N of MET 1 None) and Atom 5 (CA of MET1 None), length 1.49 A>
Want the actual value?
u.bonds[3].value()
1.4905602
import matplotlib.pyplot as plt
%matplotlib inline
plt.hist(u.bonds.values(), bins=30)
(array([ 23., 295., 57., 287., 1022., 0., 0., 31., 105., 126., 41., 71., 114., 88., 81., 101., 129., 186., 255., 221., 102., 15., 2., 0., 0., 0., 4., 2., 3., 4.]), array([ 0.95999745, 0.99054587, 1.02109429, 1.05164271, 1.08219113, 1.11273955, 1.14328797, 1.17383639, 1.20438481, 1.23493323, 1.26548165, 1.29603007, 1.32657849, 1.35712691, 1.38767533, 1.41822375, 1.44877217, 1.47932059, 1.50986901, 1.54041743, 1.57096585, 1.60151427, 1.63206269, 1.66261111, 1.69315953, 1.72370795, 1.75425637, 1.78480479, 1.81535321, 1.84590163, 1.87645005]), <a list of 30 Patch objects>)
These work the same way as AtomGroup
s. They're sliceable, and indexing them works too to give individual bonds, angles, dihedrals.
We've already seen that complex selections can be performed on AtomGroups
through boolean array indexing. However, MDAnalysis
also features a CHARMM-style atom selection mechanism that is often more convenient.
u.select_atoms?
Although boolean selections work well enough for selecting out atoms from AtomGroups, the selection language makes more complex selections possible with probably less effort. For example, we can selections of the
acidic = u.select_atoms("(resname GLU or resname ASP)")
acidic
<AtomGroup with 474 atoms>
acidic_o = acidic.select_atoms('(name OD* or name OE*)')
acidic_o
<AtomGroup with 70 atoms>
basic_n = u.select_atoms( "((resname LYS or resname ARG) and (name NZ or name NH*))")
basic_n
<AtomGroup with 44 atoms>
If we want only residues that are involved in salt bridges, we can use our AtomGroups
as part of additional selections. We can also use the around
selection operator to specify only atoms within 4 angstroms. At the end we can get the full residues back.
acidic_res = u.select_atoms("group acidic and around 4 group basic", acidic=acidic_o, basic=basic_n).residues
basic_res = u.select_atoms("group basic and around 4 group acidic", acidic=acidic_o, basic=basic_n).residues
We can write these selections out into a variety of formats; in particular we can output as a VMD selection.
# addition of AtomGroups yields an AtomGroup
(acidic_res + basic_res).write('saltbridge.vmd', name='saltbridge')
Then load into VMD
source saltbridge.vmd
and use the new named selection "saltbridge" in Representations.
AdK has three domains: CORE, NMP, LID. The NMP and LID move relative to the CORE. Movements can be defined by two angles between the two domains, $\theta_{\text{NMP}}$ and $\theta_{\text{LID}}$.
These angles are defined as:
from IPython.core.display import Image
Image("MDAnalysisTutorial/doc/sphinx/figs/angle_defs.png")
AdK undergoes a conformational transition during which the NMP and LID domain move relative to the CORE domain. The transition can be characterized by the angles between the domains:
Image("MDAnalysisTutorial/doc/sphinx/figs/6b_angle_def_open.png", width=500)
For simplicity, define $\theta_{\text{NMP}}$ and $\theta_{\text{LID}}$ between the centers of mass of backbone atoms of groups of the following atoms A
, B
, C
:
Definition of $\theta_{\text{NMP}}$
A: 115-125, B: 90-100, C: 35-55
Definition of $\theta_{\text{LID}}$
A: 179-185, B: 115-125, C: 125-153
The angle between vectors $\vec{BA}$ and $\vec{BC}$ is \begin{equation*} \theta = \arccos\left(\frac{\vec{BA}\cdot\vec{BC}}{|\vec{BA}||\vec{BC}|}\right) \end{equation*}
With numpy
:
np.arccos(np.cdot(BA, BC))/(np.linalg.norm(BA) * np.linalg.norm(BC))
Write two functions theta_NMP()
and theta_LID()
that calculate these angles, given the universe.
Hint:
A = u.select_atoms("resid 115:125 and backbone").center_of_geometry()
import numpy as np
from numpy.linalg import norm
def theta_NMP(u):
"""Calculate the NMP-CORE angle for E. coli AdK in degrees"""
C = u.select_atoms("resid 115:125 and backbone").center_of_geometry()
B = u.select_atoms("resid 90:100 and backbone").center_of_geometry()
A = u.select_atoms("resid 35:55 and backbone").center_of_geometry()
BA = A - B
BC = C - B
theta = np.arccos(np.dot(BA, BC)/(norm(BA)*norm(BC)))
return np.rad2deg(theta)
def theta_LID(u):
"""Calculate the LID-CORE angle for E. coli AdK in degrees"""
C = u.select_atoms("resid 179:185 and backbone").center_of_geometry()
B = u.select_atoms("resid 115:125 and backbone").center_of_geometry()
A = u.select_atoms("resid 125:153 and backbone").center_of_geometry()
BA = A - B
BC = C - B
theta = np.arccos(np.dot(BA, BC)/(norm(BA)*norm(BC)))
return np.rad2deg(theta)
And now Oliver.