#!/usr/bin/env python
# coding: utf-8

# This notebook introduces the fundamental objects of `MDAnalysis`: the `Universe`, `AtomGroup`, and `Atom` objects. It also introduces the selection language, and the basics of reading and slicing trajectories.

# # 1. Fundamental objects

# ## `Universes` and `AtomGroups`

# > "If you wish to make an apple pie from scratch, you must first invent the Universe." 
# 
# > ~ Carl Sagan

# One of the fundamental objects in the `MDAnalysis` data model is the `Universe` object. This contains all of a simulations' topology information at the least, but usually also includes trajectory information as well. A `Universe` can be thought of as an interface to the raw details of a simulation.

# First, we need to import `MDAnalysis`, giving us access to all the components in its namespace:

# In[8]:


import MDAnalysis as mda


# In order to do anything, we do need some actual molecular dynamics data to work with. You should have downloaded a set of datafiles for this workshop; I placed mine in the following location relative to this notebook:

# In[9]:


datapath = 'sandbox/'


# To make a Universe, we need at the very least a topology file. We'll use one for a simulation system that includes adenylate kinase with the water removed:

# In[10]:


import os
top = os.path.join(datapath, 'equilibrium/adk4AKE.psf')


# In[11]:


u = mda.Universe(top)


# In[12]:


print u


# We now have a `Universe` object. Since the PSF file we loaded contained both atom identities and bond information, the `Universe` is able to access these details.
# 
# We can access all atoms in the `Universe` through the `Universe.atoms` attribute. This gives access to many details, including basic topology details for each atom.

# In[13]:


u.atoms.indices


# In[14]:


u.atoms.names


# In[15]:


u.atoms.resnames


# In[16]:


u.atoms.resids


# In[17]:


u.atoms.charges


# In[18]:


u.atoms.masses


# In[19]:


u.atoms.types


# These are all properties of an `AtomGroup`, of which `Universe.atoms` is one such instance.

# In[20]:


type(u.atoms)


# Many of the methods of an `AtomGroup` return numpy arrays of the same length as the `AtomGroup` itself; that is, each element corresponds to each atom in the `AtomGroup`, in order. That means that if we want only, say, the masses of the 3rd through the 11th atom in our topology, we could simply slice like so:

# In[21]:


u.atoms.masses[2:11]


# **Note**: `AtomGroup` indexes are 0-based, even if the topology file gives the atoms 1-based indices. This is for consistency with the rest of the Python ecosystem, including `numpy`. 

# But `AtomGroup`s themselves are actually sliceable, so we could instead do:

# In[22]:


u.atoms[2:11].masses


# to achieve the same effect. Slicing of an `AtomGroup` returns an `AtomGroup` containing the atoms in the slice, so this gives a basic way of accessing a subset of the atoms present in the topology.

# Fancy indexing also works. We can, for example, get an `AtomGroup` containing the 3rd, 22nd, 3rd, and second-to-last atom in the `Universe` with something like:

# In[23]:


ag = u.atoms[[2, 21, 2, -2]]


# In[24]:


ag.indices


# We can also use expressions that return boolean arrays to select out atoms of interest. We can, for exmple, get an `AtomGroup` of all atoms in the system with mass less than 10 u with:

# In[25]:


u.atoms[u.atoms.masses < 10]


# We can string together boolean expressions to obtain even more granular selections, for example getting all atoms with mass less than 10 u AND within an arginine residue:

# In[26]:


u.atoms[(u.atoms.masses < 10) & (u.atoms.resnames == 'ARG')]


# All topology properties of an `AtomGroup` can be harnessed in this way to make complex subselections.

# ## Working with individual atoms

# We can also work with individual atoms, though in practice this is rare, and in the future will be relatively inefficient when operating on multiple atoms compared to operating on a whole `AtomGroup`. At the moment, however, these operations are comparable in speed.

# In[27]:


a = u.atoms[0]


# In[28]:


a


# In[29]:


print "name:", a.name
print "resid:", a.resid
print "resname:", a.resname


# ### Challenge: count the number of glycine residues in the protein

# In[30]:


import numpy as np


# In[31]:


r = u.atoms[u.atoms.resnames == 'GLY'].resids


# In[32]:


len(np.unique(r))


# ## `ResidueGroups` and `SegmentGroups`

# The `Universe` also gives higher-order topology objects, including `ResidueGroups` and `SegmentGroups`. We can access all residues in the `Universe` with:

# In[76]:


u.residues


# And all segments with:

# In[34]:


u.segments


# `ResidueGroups` and `SegmentGroups` also behave similarly to `AtomGroups`, with many of their methods returning `numpy` arrays with each element corresponding to a single residue or segment, respectively.

# In[35]:


u.residues.resnames


# In[36]:


u.segments.segids


# We can more easily get the number of glycine residues using a `ResidueGroup`.

# In[37]:


u.atoms[u.atoms.resnames == 'GLY'].residues.n_residues


# or perhaps more Pythonic:

# In[38]:


len(u.atoms[u.atoms.resnames == 'GLY'].residues)


# ## Using coordinates

# `AtomGroups` are more interesting with coordinate data attached to them. Our topology file has none of this, so using

# In[39]:


u.atoms.positions


# yields an `AttributeError`. Let's load some coordinates by creating a new Universe, but with a topology + a trajectory.

# In[41]:


traj = os.path.join(datapath, 'equilibrium', '1ake_007-nowater-core-dt240ps.dcd')


# In[42]:


u = mda.Universe(top, traj)


# Now this will work.

# In[43]:


u.atoms.positions


# ### Challenge: calculate the center of geometry of the C$_\alpha$ atoms

# In[62]:


u.atoms[u.atoms.names == 'CA'].positions.mean(axis=0)


# Alternatively, we have a builtin for this:

# In[63]:


u.atoms[u.atoms.names == 'CA'].centroid()


# ## Bonds, angles, and dihedrals

# We can also get at connectivity information between atoms, such as bonds, angles, and dihedrals.

# In[125]:


u.bonds


# In[126]:


u.angles


# In[127]:


u.dihedrals


# In[131]:


u.bonds[3]


# Want the actual value?

# In[138]:


u.bonds[3].value()


# In[48]:


import matplotlib.pyplot as plt
get_ipython().run_line_magic('matplotlib', 'inline')


# In[51]:


plt.hist(u.bonds.values(), bins=30)


# These work the same way as `AtomGroup`s. They're sliceable, and indexing them works too to give individual bonds, angles, dihedrals.

# # 2. Selecting atoms with selection strings

# We've already seen that complex selections can be performed on `AtomGroups` through boolean array indexing. However, `MDAnalysis` also features a CHARMM-style atom selection mechanism that is often more convenient.

# In[55]:


get_ipython().run_line_magic('pinfo', 'u.select_atoms')


# Although boolean selections work well enough for selecting out atoms from AtomGroups, the selection language makes more complex selections possible with probably less effort. For example, we can selections of the 

# In[71]:


acidic = u.select_atoms("(resname GLU or resname ASP)")
acidic


# In[70]:


acidic_o = acidic.select_atoms('(name OD* or name OE*)')
acidic_o


# In[73]:


basic_n = u.select_atoms( "((resname LYS or resname ARG) and (name NZ or name NH*))")
basic_n


# If we want only residues that are involved in salt bridges, we can use our `AtomGroups` as part of additional selections. We can also use the `around` selection operator to specify only atoms within 4 angstroms. At the end we can get the full residues back.

# In[56]:


acidic_res = u.select_atoms("group acidic and around 4 group basic", acidic=acidic_o, basic=basic_n).residues
basic_res = u.select_atoms("group basic and around 4 group acidic", acidic=acidic_o, basic=basic_n).residues


# We can write these selections out into a variety of formats; in particular we can output as a VMD selection.

# In[75]:


# addition of AtomGroups yields an AtomGroup
(acidic_res + basic_res).write('saltbridge.vmd', name='saltbridge')


# Then load into VMD
# 
#     source saltbridge.vmd
# 
# and use the new named selection "saltbridge" in *Representations*.

# ### Example: domain movement of adenylate kinase (AdK)
# 
# AdK has three domains: CORE, NMP, LID. The NMP and LID move relative to the CORE. Movements can be defined by two angles between the two domains, $\theta_{\text{NMP}}$ and $\theta_{\text{LID}}$.
# 
# These angles are defined as:
# * CORE residues 1-29, 60-121, 160-214 (gray)
# * NMP residues 30-59 (blue)
# * LID residues 122-159 (yellow)

# In[92]:


from IPython.core.display import Image

Image("MDAnalysisTutorial/doc/sphinx/figs/angle_defs.png")


# AdK undergoes a conformational transition during which the NMP and LID domain move relative to the CORE domain. The transition can be characterized by the angles between the domains:

# In[93]:


Image("MDAnalysisTutorial/doc/sphinx/figs/6b_angle_def_open.png", width=500)


# For simplicity, define $\theta_{\text{NMP}}$ and $\theta_{\text{LID}}$ between the centers of mass of *backbone* atoms of groups of the following atoms `A`, `B`, `C`:
# 
# Definition of $\theta_{\text{NMP}}$
# 
# A: 115-125, B: 90-100, C: 35-55
# 
# Definition of $\theta_{\text{LID}}$
# 
# A: 179-185, B: 115-125, C: 125-153
# 
# The angle between vectors $\vec{BA}$ and $\vec{BC}$ is \begin{equation*} \theta = \arccos\left(\frac{\vec{BA}\cdot\vec{BC}}{|\vec{BA}||\vec{BC}|}\right) \end{equation*}
# 
# With `numpy`:
# ```python
# np.arccos(np.cdot(BA, BC))/(np.linalg.norm(BA) * np.linalg.norm(BC))
# ```
# 

# Write two functions `theta_NMP()` and `theta_LID()` that calculate these angles, given the universe.
# 
# Hint:

# In[90]:


A = u.select_atoms("resid 115:125 and backbone").center_of_geometry()


# In[91]:


import numpy as np
from numpy.linalg import norm

def theta_NMP(u):
    """Calculate the NMP-CORE angle for E. coli AdK in degrees"""
    C = u.select_atoms("resid 115:125 and backbone").center_of_geometry()
    B = u.select_atoms("resid 90:100 and backbone").center_of_geometry()
    A = u.select_atoms("resid 35:55 and backbone").center_of_geometry()
    BA = A - B
    BC = C - B
    theta = np.arccos(np.dot(BA, BC)/(norm(BA)*norm(BC)))
    return np.rad2deg(theta)

def theta_LID(u):
    """Calculate the LID-CORE angle for E. coli AdK in degrees"""
    C = u.select_atoms("resid 179:185 and backbone").center_of_geometry()
    B = u.select_atoms("resid 115:125 and backbone").center_of_geometry()
    A = u.select_atoms("resid 125:153 and backbone").center_of_geometry()
    BA = A - B
    BC = C - B
    theta = np.arccos(np.dot(BA, BC)/(norm(BA)*norm(BC)))
    return np.rad2deg(theta)


# And now Oliver.