PyBroMo - 2. Generate smFRET data, including mixtures¶

This notebook is part of PyBroMo a python-based single-molecule Brownian motion diffusion simulator that simulates confocal smFRET experiments.

Overview¶

In this notebook we show how to generated smFRET data files from the diffusion trajectories.

Loading the software¶

Import all the relevant libraries:

In [ ]:

%matplotlib inline
from pathlib import Path
import numpy as np
import tables
import matplotlib.pyplot as plt
import seaborn as sns
import pybromo as pbm
print('Numpy version:', np.__version__)
print('PyTables version:', tables.__version__)
print('PyBroMo version:', pbm.__version__)

Create smFRET data-files¶

Create a file for a single FRET efficiency¶

In this section we show how to save a single smFRET data file. In the next section we will perform the same steps in a loop to generate a sequence of smFRET data files.

Here we load a diffusion simulation opening a file to save timstamps in write mode. Use 'a' (i.e. append) to keep previously simulated timestamps for the given diffusion.

In [ ]:

code = 'e30f'  # 1s simulation, NumericPSF
#code = 'e6fe'  # 1s simulation, GaussianPSF
S = pbm.ParticlesSimulation.from_datafile(code, mode='w')

In [ ]:

# Number of populations with distinct diffusion coefficient
S.particles.num_populations

In [ ]:

# Number of particles in each population
S.particles.particles_counts

In [ ]:

# Diffusion coefficient paired with the number particles in each population
S.particles.diffusion_coeff_counts

Note on particles population¶

Population defined in the diffusion simulations are not necessarily equal to the populations used for timestamps simulation.

For example, you may decide to assign the same diffusion coefficient to all particles which requires only one population during the diffusion simulation. When simulating timestamps, however, you can split your particles assigning different "brightness" or FRET, creating many populations from a single diffusion population.

You may also, simulate timestamps for fewer particles than present in the diffusion trajectory file.

To avoid errors, I suggest to follow one of these rule of thumbs:

Use the same particles per population both during diffusion and during timestamps simulations
Use only one population during diffusion and split populations during the timestamps simulation.

Other scenarios are possible, but you should carefully read the code to make sure pybromo is doing exactly what you intend to do.

Simulate timestamps of smFRET¶

Example1: single FRET population¶

Define the simulation parameters with the following syntax:

In [ ]:

params = dict(
    em_rates = (200e3,),    # Peak emission rates (cps) for each population (D+A)
    E_values = (0.75,),     # FRET efficiency for each population
    num_particles = (3,),   # Number of particles in each population
    bg_rate_d = 1500,       # Poisson background rate (cps) Donor channel
    bg_rate_a = 800,        # Poisson background rate (cps) Acceptor channel
    )

Create the object that will run the simulation and print a summary:

In [ ]:

mix_sim = pbm.TimestampSimulation(S, **params)
mix_sim.summarize()

Run the simualtion:

In [ ]:

rs = np.random.RandomState(1234)
mix_sim.run(rs=rs, 
            overwrite=True,      # overwite existing timstamp arrays
            skip_existing=True,  # skip simulation of existing timestamps arrays to save time
            save_pos=True,       # save particle position at emission time
           )

Save simulation to a smFRET Photon-HDF5 file:

In [ ]:

mix_sim.save_photon_hdf5(identity=dict(author='John Doe', 
                                       author_affiliation='Planet Mars'))

Example 2: 2 FRET populations¶

To simulate 2 population we just define the parameters with one value per population, except for the Poisson background rate that is a single value for each channel.

In [ ]:

params = dict(
    em_rates = (200e3, 180e3),   # Peak emission rates (cps) for each population (D+A)
    E_values = (0.75, 0.35),     # FRET efficiency for each population
    num_particles = (3, 1),      # Number of particles in each population
    bg_rate_d = 1500,       # Poisson background rate (cps) Donor channel
    bg_rate_a = 800,        # Poisson background rate (cps) Acceptor channel
    )

In [ ]:

mix_sim = pbm.TimestampSimulation(S, **params)
mix_sim.summarize()

In [ ]:

rs = np.random.RandomState(1234)
mix_sim.run(rs=rs, overwrite=False, save_pos=True)

In [ ]:

mix_sim.save_photon_hdf5()

Burst analysis¶

The generated Photon-HDF5 files can be analyzed by any smFRET burst analysis program. Here we show an example using the opensource FRETBursts program:

In [ ]:

import fretbursts as fb

In [ ]:

filepath = list(Path('./').glob(f'smFRET_{code}*'))
filepath

In [ ]:

d = fb.loader.photon_hdf5(str(filepath[0]))

In [ ]:

d.A_em

In [ ]:

fb.dplot(d, fb.timetrace);

In [ ]:

d.calc_bg(fun=fb.bg.exp_fit, tail_min_us='auto', F_bg=1.7)

In [ ]:

d.bg_dd, d.bg_ad

In [ ]:

d.burst_search(F=7)

In [ ]:

d.num_bursts

In [ ]:

ds = d.select_bursts(fb.select_bursts.size, th1=20)

In [ ]:

ds.num_bursts

In [ ]:

fb.dplot(d, fb.timetrace, bursts=True);

In [ ]:

fb.dplot(ds, fb.hist_fret, pdf=False)
plt.axvline(0.75);

NOTE: Unless you simulated a diffusion of 30s or more the previous histogram will be very poor.

In [ ]:

fb.bext.burst_data(ds)

Play with photon positons¶

The smFRET file name:

In [ ]:

d.fname

Read the positions array:

In [ ]:

import tables
with tables.open_file(d.fname, 'r') as h5file:
    positions = h5file.root.photon_data.user.positions.read()

We also get the timestamps and particles arrays:

In [ ]:

timestamps = d.ph_times_m[0]
particles = d.particles[0]

These are all the particles ID in the file, the last ID is not a real particle but is associated with timestamps from background:

In [ ]:

np.unique(particles)

Positions must have the same number of rows as number of timestamps or particles. Let's test it:

In [ ]:

assert positions.shape[0] == timestamps.size == particles.size

In [ ]:

positions.shape

Print first 5 positions:

In [ ]:

positions[:5]

NOTE: positions are NaN when the timestamp is from background.

Check that we have NaNs if and only if the timestamp is from background:

In [ ]:

bg_timestamp = particles == np.unique(particles)[-1]

assert np.all(bg_timestamp == np.isnan(positions[:, 0]))

Now we take the burst data including index of burst start and stop (i_start and i_stop columns):

In [ ]:

bursts = fb.bext.burst_data(ds, include_ph_index=True)
bursts.head()

In [ ]:

burstph = fb.bext.burst_photons(ds)
burstph['particle'] = np.hstack(
    fb.burstlib.iter_bursts_ph(ds.particles[0], ds.mburst[0]))
burstpos = np.vstack(
    fb.burstlib.iter_bursts_ph(positions, ds.mburst[0]))
burstph['x_um'] = burstpos[:, 0] * 1e6
burstph['y_um'] = burstpos[:, 1] * 1e6
burstph['z_um'] = burstpos[:, 2] * 1e6
burstph['r_um'] = np.linalg.norm(burstpos[:, :2], axis=1) * 1e6
burstph.head()

We can assigning to each burst the particle with most photons:

In [ ]:

bursts['particle'] = np.zeros(bursts.shape[0], dtype='uint8')
for iburst, bph in burstph.groupby('burst'):
    par, counts = np.unique(bph.particle, return_counts=True)
    bursts.loc[iburst, 'particle'] = par[counts.argmax()]    
bursts.head()

Positions where each burst starts:

In [ ]:

positions[bursts.i_start]

In [ ]:

particles[bursts.i_start]

All photon emission positions in one burst:

In [ ]:

burst_idx = 0
pos_burst = burstph.loc[burst_idx]
pos_burst

Photon emission positions in all bursts:

In [ ]:

# Increase the resolution of the figures displayed in the notebook
%config InlineBackend.figure_format = 'retina' # 'png' for default res

In [ ]:

S.psf

In [ ]:

if S.psf.kind == 'numeric':
    PSF = pbm.NumericPSF()
    psf = PSF.hdata
    z_peak = PSF.zi[PSF.zm]  # z position of PSF peak in μm
else:
    x = np.arange(0, 4, 0.01) * 1e-6
    z = np.arange(-6, 6, 0.01) * 1e-6
    X, Z = np.meshgrid(x, z)
    psf = S.psf.eval_xz(X, Z)
cmap = plt.cm.YlGnBu
cmap.set_under(alpha=0)
kwargs = dict(interpolation='bicubic', origin='lower', cmap=cmap, vmin=1e-1, zorder=1)

fig, ax = plt.subplots(figsize=(8, 2))
ax.imshow(psf.T, extent=(-6, 6, 0, 4), **kwargs)
ax.plot(burstph.z_um, burstph.r_um, '.', ms=5, color='C1', alpha=0.3)
ax.set(ylabel='R (μm)', xlabel='Z (μm)', ylim=(0, 1), xlim=(-2, 2),
       title=f'Position of photon emission (PSF {S.psf.kind})');

In [ ]:

fig, ax = plt.subplots(figsize=(8, 2))
ax.imshow(psf.T, extent=(-6, 6, 0, 4), **kwargs)
sns.scatterplot(x="z_um", y="r_um", hue="burst", data=burstph.reset_index(),
                zorder=10, ax=ax, marker='o', linewidth=0,
                palette='Spectral', alpha=0.3, s=20)
if S.psf.kind == 'numeric':
    ax.axvline(PSF.zi[PSF.zm], color='k')
ax.set(ylabel='R (μm)', xlabel='Z (μm)', ylim=(0, 1), xlim=(-2, 2),
       title='Position of photon emission by burst');

In [ ]:

S.store.close()
S.ts_store.close()

In [ ]: