#!/usr/bin/env python
# coding: utf-8

# # Walkthrough
# 
# This notebook follows an example trackpy project from start to finish. We take video of micron-sized particles diffusing in water, track them, and analyze the trajectories to obtain the viscosity of water.
# 
# At the bottom of the notebook, we very briefly survey the more advanced features of trackpy. Browse the [rest of the documentation](https://soft-matter.github.io/trackpy/stable) to learn more.

# ## Scientific IPython Setup

# We need Python's plotting library, matplotlib. Your environment might load matplotlib automatically, but for this tutorial I'll load it explicitly using this convention. If you are unfamiliar with matplotlib, do the same as I do here, and everything that follows will work without modification.

# In[1]:


from __future__ import division, unicode_literals, print_function  # for compatibility with Python 2 and 3

import matplotlib as mpl
import matplotlib.pyplot as plt

# change the following to %matplotlib notebook for interactive plotting
get_ipython().run_line_magic('matplotlib', 'inline')

# Optionally, tweak styles.
mpl.rc('figure',  figsize=(10, 5))
mpl.rc('image', cmap='gray')


# We also might want to use scientific Python libraries. Finally, we'll import ``trackpy`` itself and its sister project, `pims`.

# In[2]:


import numpy as np
import pandas as pd
from pandas import DataFrame, Series  # for convenience

import pims
import trackpy as tp


# We use the alias ``tp`` for brevity.

# ## Step 1: Read the Data

# ### Opening images or video
# 
# To get our data into Python, we use our sister project, [PIMS](https://github.com/soft-matter/pims) (Python Image Sequence). PIMS makes it easy and convenient to load and process video data from many formats with one consistent interface.
# 
# Using PIMS, trackpy can read:
# * a directory or zipfile of sequential images
# * a multi-frame TIFF file
# * a video (AVI, MOV, etc.)
# * specialty formats used in microscopy and scientific video capture:
#     * `Cine`, NorPix `seq`
#     * `LSM`
#     * Files supported by [Bioformats](https://www.openmicroscopy.org/site/support/bio-formats5.1/supported-formats.html)
#     * `ND2` using [PIMS_ND2](https://github.com/soft-matter/pims_nd2)
# 
# (Some of the formats require some extra dependencies. For a complete list, see the [README page](https://github.com/soft-matter/pims) for PIMS, or the installation instructions in the documentation.)
# 
# For many formats, using `pims.open` just works. Since these images are in color, we also need to set up a *pipeline* to convert each image to grayscale when it is read.

# In[3]:


@pims.pipeline
def gray(image):
    return image[:, :, 1]  # Take just the green channel

frames = gray(pims.open('../sample_data/bulk_water/*.png'))


# In[4]:


frames


# We can access any frame like `frames[frame_number]`. The image is represented as a numpy array of intensities. If you're using the [Anaconda distribution](https://www.anaconda.com/distribution/) of Python, these should be in the range [0, 255]. If you have a more custom environment they may be in the range [0, 1], in which case you'll have to experiment with the `minmass` parameters below.

# In[5]:


print(frames[0])  # the first frame


# In an IPython notebook, the frame is represented directly by displaying the image.

# In[6]:


frames[0]


# Alternatively, to make a proper plot with axes and control over scaling, use matplotlib's `imshow()` method.

# In[7]:


plt.imshow(frames[0]);


# Frames behave like numpy arrays, but with a few extra properties.

# In[8]:


frames[123].frame_no


# In[9]:


frames[123].metadata  # Scientific formats can pass experiment meta data here.


# ## Step 2: Locate Features
# 
# Start with just the first frame. Estimate the size of the features (in pixels). The size must be an odd integer, and it is better to err on the large side, as we'll see below. We estimate 11 pixels.

# In[10]:


f = tp.locate(frames[0], 11, invert=True)


# The algorithm looks for *bright* features; since the features in this set of images are dark, we set ``invert=True``.
# 
# ``locate`` returns a spreadsheet-like object called a DataFrame. It lists 
# 
# * each feature's position,
# * various characterizations of its appearance, which we will use to filter out spurious features,
# * the "signal" strength and an estimate of uncertainty, both derived from [this paper](https://doi.org/10.1529/biophysj.104.042457)
# 
# More information about DataFrames may be found in the [pandas documentation](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html#pandas.DataFrame). DataFrames can easily be exported to formats like CSV, Excel, SQL, HDF5, etc.

# In[11]:


f.head()  # shows the first few rows of data


# In[12]:


tp.annotate(f, frames[0]);


# ### Refine parameters to elminate spurious features
# Many of these circles are clearly wrong; they are fleeting peaks in brightness that aren't actually particles. Rejecting them often improves results and speeds up feature-finding. There are many ways to distinguish real particles from spurious ones. The most important way is to look at total brightness ("mass").

# In[13]:


fig, ax = plt.subplots()
ax.hist(f['mass'], bins=20)

# Optionally, label the axes.
ax.set(xlabel='mass', ylabel='count');


# We can then specify the `minmass` parameter. If your image is especially noisy, you may also find the `threshold` parameter  useful.

# In[14]:


f = tp.locate(frames[0], 11, invert=True, minmass=20)


# In[15]:


tp.annotate(f, frames[0]);


# There are more options for controling and optimizing feature-finding. You can review them in the [documentation](https://soft-matter.github.io/trackpy/stable/), where the most comprehensive description is in the API reference. Or, pull them up as you work by typing ``tp.locate?`` into IPython.

# ### Check for subpixel accuracy
# As Eric Weeks points out in his related tutorial, a quick way to check for subpixel accuracy is to check that the decimal part of the x and/or y positions are evenly distributed. Trackpy provides a convenience plotting function for this called `subpx_bias`:

# In[16]:


tp.subpx_bias(f)


# If we use a mask size that is too small, the histogram often shows a dip in the middle.

# In[17]:


tp.subpx_bias(tp.locate(frames[0], 7, invert=True, minmass=20));


# ### Locate features in all frames
# Or, to start, just explore a subset of the frames.
# 
#  selection | syntax example
#  --------- | --------------
#  all the frames | `frames[:]` or simply `frames`.
#  the first 10 frames | `frames[:10]`
#  the last 10 frames | `frames[-10:]`
#  a range of frames | `frames[100:200]`
#  every 10th frame | `frame[::10]`
#  a list of specific frames | `frames[[100, 107, 113]]`
# 
# We'll locate features in the first 300 frames from this video. We use ``tp.batch``, which calls ``tp.locate`` on each frame and collects the results.

# In[18]:


f = tp.batch(frames[:300], 11, minmass=20, invert=True);


# As each frame is analyzed, `tp.batch` reports the frame number and how many features were found. If this number runs unexpectedly low or high, you may wish to interrupt it and try different parameters. 
# 
# If your images are small, you may find that printing this number actually slows down `batch` significantly! In that case you can run `tp.quiet()` to turn it off.

# ## Step 3: Link features into particle trajectories
# We have the locations of the particles in each frame. Next we'll track particles from frame to frame, giving each one a number for identification.
# 
# First, we must must specify a maximum displacement, the farthest a particle can travel between frames. We should choose the smallest reasonable value because a large value slows computation time considerably. In this case, 5 pixels is reasonable.
# 
# Second, we allow for the possibility that a particle might be missed for a few frames and then seen again. (Perhaps its "mass" slipped below our cutoff due to noise in the video.) Memory keeps track of disappeared particles and maintains their ID for up to some number of frames after their last appearance. Here we use 3 frames.

# In[19]:


# tp.quiet()  # Turn off progress reports for best performance
t = tp.link(f, 5, memory=3)


# The result is the features DataFrame `f` with an additional column, `particle`, identifying each feature with a label. We denote this new DataFrame `t`.

# In[20]:


t.head()


# ### Filter spurious trajectories

# We have more filtering to do. Ephemeral trajectories — seen only for a few frames — are usually spurious and never useful. The convenience function `filter_stubs` keeps only trajectories that last for a given number of frames.

# In[21]:


t1 = tp.filter_stubs(t, 25)
# Compare the number of particles in the unfiltered and filtered data.
print('Before:', t['particle'].nunique())
print('After:', t1['particle'].nunique())


# We can also filter trajectories by their particles' appearance. At this stage, with trajectories linked, we can look at a feature's "average appearance" throughout its trajectory, giving a more accurate picture.

# In[22]:


plt.figure()
tp.mass_size(t1.groupby('particle').mean()); # convenience function -- just plots size vs. mass


# The particles with especially low mass, or that are especially large or non-circular (eccentric), are probably out of focus or aggregated, respectively. It is best to experiment by trial and error, filtering out regions of mass-size space and looking at the results using `tp.annotate`. In the end, we need to separate the good particles from the spurious ones, and it doesn't matter how we get it done.

# In[23]:


t2 = t1[((t1['mass'] > 50) & (t1['size'] < 2.6) &
         (t1['ecc'] < 0.3))]


# In[24]:


plt.figure()
tp.annotate(t2[t2['frame'] == 0], frames[0]);


# Trace the trajectories using `plot_traj()`:

# In[25]:


plt.figure()
tp.plot_traj(t2);


# ### Remove overall drift
# 
# Compute the overall drifting motion, which we will subtract away, adopting the reference frame of the particles' average position.

# In[26]:


d = tp.compute_drift(t2)


# In[27]:


d.plot()
plt.show()


# In[28]:


tm = tp.subtract_drift(t2.copy(), d)


# With the overall drifting motion subtracted out, we plot the trajectories again. We observe nice random walks.

# In[29]:


ax = tp.plot_traj(tm)
plt.show()


# ## Step 4: Analyze trajectories
# 
# Trackpy includes several functions to help with some common analyses for particle trajectories. (See the "Static Analysis" and "Motion Analysis" sections of the API reference.)
# 
# Here, we can show that these data are consistent with colloidal particles undergoing Brownian motion in water.
# 
# ### Mean Squared Displacement of Individal Probes

# Compute the mean squared displacement (MSD) of each particle using the `imsd` function, and plot MSD vs. lag time.

# In[30]:


im = tp.imsd(tm, 100/285., 24)  # microns per pixel = 100/285., frames per second = 24


# In[31]:


fig, ax = plt.subplots()
ax.plot(im.index, im, 'k-', alpha=0.1)  # black lines, semitransparent
ax.set(ylabel=r'$\langle \Delta r^2 \rangle$ [$\mu$m$^2$]',
       xlabel='lag time $t$')
ax.set_xscale('log')
ax.set_yscale('log')


# Since we only analyzed 300 frames, the statistics are poor at large lag times. With more frames, we can study larger lag times.
# 
# ### Ensemble Mean Squared Displacement
# 
# Now use the `emsd` function to compute the ensemble mean squared displacement (EMSD) of all particles:

# In[32]:


em = tp.emsd(tm, 100/285., 24) # microns per pixel = 100/285., frames per second = 24


# In[33]:


fig, ax = plt.subplots()
ax.plot(em.index, em, 'o')
ax.set_xscale('log')
ax.set_yscale('log')
ax.set(ylabel=r'$\langle \Delta r^2 \rangle$ [$\mu$m$^2$]',
       xlabel='lag time $t$')
ax.set(ylim=(1e-2, 10));


# We can easily fit this ensemble mean-squared displacement to a power law, $At^n$, using a convenience function, `fit_powerlaw`, which performs a linear regression in log space.

# In[34]:


plt.figure()
plt.ylabel(r'$\langle \Delta r^2 \rangle$ [$\mu$m$^2$]')
plt.xlabel('lag time $t$');
tp.utils.fit_powerlaw(em)  # performs linear best fit in log space, plots]


# In water, a viscous material, the expected power-law exponent $n = 1$. The value of $A = 4D$, where $D$ is the particles' diffusivity. $D$ is related to viscosity $\eta$, particle radius $r$, and temperature $T$ through the Stokes-Einstein equation:
# 
# $$
# D = \displaystyle\frac{k_B T}{6\pi\eta r}
# $$
# 
# where $k_B$ is the Boltzmann constant. For particles with a 1 $\mu\text{m}$ diameter in room-temperature water, $A \approx 1.66$ $\mu\textrm{m}^2 / \textrm{s}$. Our values of $n$ and $A$ above are not far off. (If you'd like to know more about this measurement and its uncertainty, [this paper](https://dx.doi.org/10.1119/1.4803529) is a thorough discussion at the advanced undergraduate level.)

# **This is the end of the walkthrough. Keep reading to review a few more advanced capabilities in trackpy.**
# 
# # Preview of Some Advanced Features
# 
# **Check out the other tutorials for in-depth explorations of these topics and more!**

# ## Parallelization
# Feature-finding can easily be parallelized: each frame is an independent task, and tasks can be divided among the multiple CPUs in a modern computer. Starting with trackpy v0.4.2, this capability is built into `batch`. See the [parallelization tutorial](http://nbviewer.ipython.org/github/soft-matter/trackpy-examples/blob/master/notebooks/parallel-locate.ipynb) for details.

# In[35]:


get_ipython().run_line_magic('timeit', "tp.batch(frames[:20], 11, invert=True, minmass=20, processes='auto')")


# ## Optional High Performance Component: Numba
# 
# The core, time-consuming steps in particle location and linking are implemented in Python/numpy and also in pure Python optimized for numba. If numba is installed, trackpy will detect it and use it by default. You can switch it on and off to compare performance.

# In[36]:


get_ipython().run_line_magic('timeit', "tp.batch(frames[:20], 11, invert=True, minmass=20, engine='numba')")


# In[37]:


get_ipython().run_line_magic('timeit', "tp.batch(frames[:20], 11, invert=True, minmass=20, engine='python')")


# The linking functions similarly use numba if it's available; you can control this manually using the `link_strategy` option.

# ## Streaming
# 
# The feature-finding and trajectory-linking functions `batch` and `link` keep all of their results in memory. This approach is simple, but it isn't necessary. We can prcoess an unlimited number of frames if we save the results as we go.
# 
# Trackpy includes a class to manage storing an retrieving data framewise in the widely-used HDF5 format. The general idea is easily extensible to other formats.

# In[38]:


with tp.PandasHDFStore('data.h5') as s:
    tp.batch(frames, 11, invert=True, minmass=200, output=s)


# In[39]:


with tp.PandasHDFStore('data.h5') as s:
    # As before, we require a minimum "life" of 5 frames and a memory of 3 frames
    for linked in tp.link_df_iter(s, 5, memory=3):
        s.put(linked)


# You can get results by frame with `s.get(frame_number)` or, when you have sufficient memory, retrieve them all. The results are identifical to what you would obtain using the basic functions `batch` and `link`.

# In[40]:


with tp.PandasHDFStore('data.h5') as s:
    trajectories = pd.concat(iter(s))


# In[ ]: