This notebook is part of PyBroMo a python-based single-molecule Brownian motion diffusion simulator that simulates confocal smFRET experiments.
In this notebook we show how to generated smFRET data files from the diffusion trajectories.
Import all the relevant libraries:
%matplotlib inline
from pathlib import Path
import numpy as np
import tables
import matplotlib.pyplot as plt
import seaborn as sns
import pybromo as pbm
print('Numpy version:', np.__version__)
print('PyTables version:', tables.__version__)
print('PyBroMo version:', pbm.__version__)
In this section we show how to save a single smFRET data file. In the next section we will perform the same steps in a loop to generate a sequence of smFRET data files.
Here we load a diffusion simulation opening a file to save
timstamps in write mode. Use 'a'
(i.e. append) to keep
previously simulated timestamps for the given diffusion.
code = 'e30f' # 1s simulation, NumericPSF
#code = 'e6fe' # 1s simulation, GaussianPSF
S = pbm.ParticlesSimulation.from_datafile(code, mode='w')
# Number of populations with distinct diffusion coefficient
S.particles.num_populations
# Number of particles in each population
S.particles.particles_counts
# Diffusion coefficient paired with the number particles in each population
S.particles.diffusion_coeff_counts
Population defined in the diffusion simulations are not necessarily equal to the populations used for timestamps simulation.
For example, you may decide to assign the same diffusion coefficient to all particles which requires only one population during the diffusion simulation. When simulating timestamps, however, you can split your particles assigning different "brightness" or FRET, creating many populations from a single diffusion population.
You may also, simulate timestamps for fewer particles than present in the diffusion trajectory file.
To avoid errors, I suggest to follow one of these rule of thumbs:
Use the same particles per population both during diffusion and during timestamps simulations
Use only one population during diffusion and split populations during the timestamps simulation.
Other scenarios are possible, but you should carefully read the code to make sure pybromo is doing exactly what you intend to do.
params = dict(
em_rates = (200e3,), # Peak emission rates (cps) for each population (D+A)
E_values = (0.75,), # FRET efficiency for each population
num_particles = (3,), # Number of particles in each population
bg_rate_d = 1500, # Poisson background rate (cps) Donor channel
bg_rate_a = 800, # Poisson background rate (cps) Acceptor channel
)
Create the object that will run the simulation and print a summary:
mix_sim = pbm.TimestampSimulation(S, **params)
mix_sim.summarize()
Run the simualtion:
rs = np.random.RandomState(1234)
mix_sim.run(rs=rs,
overwrite=True, # overwite existing timstamp arrays
skip_existing=True, # skip simulation of existing timestamps arrays to save time
save_pos=True, # save particle position at emission time
)
Save simulation to a smFRET Photon-HDF5 file:
mix_sim.save_photon_hdf5(identity=dict(author='John Doe',
author_affiliation='Planet Mars'))
To simulate 2 population we just define the parameters with one value per population, except for the Poisson background rate that is a single value for each channel.
params = dict(
em_rates = (200e3, 180e3), # Peak emission rates (cps) for each population (D+A)
E_values = (0.75, 0.35), # FRET efficiency for each population
num_particles = (3, 1), # Number of particles in each population
bg_rate_d = 1500, # Poisson background rate (cps) Donor channel
bg_rate_a = 800, # Poisson background rate (cps) Acceptor channel
)
mix_sim = pbm.TimestampSimulation(S, **params)
mix_sim.summarize()
rs = np.random.RandomState(1234)
mix_sim.run(rs=rs, overwrite=False, save_pos=True)
mix_sim.save_photon_hdf5()
The generated Photon-HDF5 files can be analyzed by any smFRET burst analysis program. Here we show an example using the opensource FRETBursts program:
import fretbursts as fb
filepath = list(Path('./').glob(f'smFRET_{code}*'))
filepath
d = fb.loader.photon_hdf5(str(filepath[0]))
d
d.A_em
fb.dplot(d, fb.timetrace);
d.calc_bg(fun=fb.bg.exp_fit, tail_min_us='auto', F_bg=1.7)
d.bg_dd, d.bg_ad
d.burst_search(F=7)
d.num_bursts
ds = d.select_bursts(fb.select_bursts.size, th1=20)
ds.num_bursts
fb.dplot(d, fb.timetrace, bursts=True);
fb.dplot(ds, fb.hist_fret, pdf=False)
plt.axvline(0.75);
NOTE: Unless you simulated a diffusion of 30s or more the previous histogram will be very poor.
fb.bext.burst_data(ds)
The smFRET file name:
d.fname
Read the positions array:
import tables
with tables.open_file(d.fname, 'r') as h5file:
positions = h5file.root.photon_data.user.positions.read()
We also get the timestamps and particles arrays:
timestamps = d.ph_times_m[0]
particles = d.particles[0]
These are all the particles ID in the file, the last ID is not a real particle but is associated with timestamps from background:
np.unique(particles)
Positions must have the same number of rows as number of timestamps or particles. Let's test it:
assert positions.shape[0] == timestamps.size == particles.size
positions.shape
Print first 5 positions:
positions[:5]
NOTE: positions are NaN when the timestamp is from background.
Check that we have NaNs if and only if the timestamp is from background:
bg_timestamp = particles == np.unique(particles)[-1]
assert np.all(bg_timestamp == np.isnan(positions[:, 0]))
Now we take the burst data including index of burst start and stop
(i_start
and i_stop
columns):
bursts = fb.bext.burst_data(ds, include_ph_index=True)
bursts.head()
burstph = fb.bext.burst_photons(ds)
burstph['particle'] = np.hstack(
fb.burstlib.iter_bursts_ph(ds.particles[0], ds.mburst[0]))
burstpos = np.vstack(
fb.burstlib.iter_bursts_ph(positions, ds.mburst[0]))
burstph['x_um'] = burstpos[:, 0] * 1e6
burstph['y_um'] = burstpos[:, 1] * 1e6
burstph['z_um'] = burstpos[:, 2] * 1e6
burstph['r_um'] = np.linalg.norm(burstpos[:, :2], axis=1) * 1e6
burstph.head()
We can assigning to each burst the particle with most photons:
bursts['particle'] = np.zeros(bursts.shape[0], dtype='uint8')
for iburst, bph in burstph.groupby('burst'):
par, counts = np.unique(bph.particle, return_counts=True)
bursts.loc[iburst, 'particle'] = par[counts.argmax()]
bursts.head()
Positions where each burst starts:
positions[bursts.i_start]
particles[bursts.i_start]
All photon emission positions in one burst:
burst_idx = 0
pos_burst = burstph.loc[burst_idx]
pos_burst
Photon emission positions in all bursts:
# Increase the resolution of the figures displayed in the notebook
%config InlineBackend.figure_format = 'retina' # 'png' for default res
S.psf
if S.psf.kind == 'numeric':
PSF = pbm.NumericPSF()
psf = PSF.hdata
z_peak = PSF.zi[PSF.zm] # z position of PSF peak in μm
else:
x = np.arange(0, 4, 0.01) * 1e-6
z = np.arange(-6, 6, 0.01) * 1e-6
X, Z = np.meshgrid(x, z)
psf = S.psf.eval_xz(X, Z)
cmap = plt.cm.YlGnBu
cmap.set_under(alpha=0)
kwargs = dict(interpolation='bicubic', origin='lower', cmap=cmap, vmin=1e-1, zorder=1)
fig, ax = plt.subplots(figsize=(8, 2))
ax.imshow(psf.T, extent=(-6, 6, 0, 4), **kwargs)
ax.plot(burstph.z_um, burstph.r_um, '.', ms=5, color='C1', alpha=0.3)
ax.set(ylabel='R (μm)', xlabel='Z (μm)', ylim=(0, 1), xlim=(-2, 2),
title=f'Position of photon emission (PSF {S.psf.kind})');
fig, ax = plt.subplots(figsize=(8, 2))
ax.imshow(psf.T, extent=(-6, 6, 0, 4), **kwargs)
sns.scatterplot(x="z_um", y="r_um", hue="burst", data=burstph.reset_index(),
zorder=10, ax=ax, marker='o', linewidth=0,
palette='Spectral', alpha=0.3, s=20)
if S.psf.kind == 'numeric':
ax.axvline(PSF.zi[PSF.zm], color='k')
ax.set(ylabel='R (μm)', xlabel='Z (μm)', ylim=(0, 1), xlim=(-2, 2),
title='Position of photon emission by burst');
S.store.close()
S.ts_store.close()