This notebook is part of PyBroMo a python-based single-molecule Brownian motion diffusion simulator that simulates confocal smFRET experiments.
Summary. This notebook describes the HDF5 files used to save diffusion trajectories and raw timestamps. PyBroMo can also generate complete smFRET data files in Photon-HDF5 format, but for this format the reader can refer to official Photon-HDF5 specifications.
import numpy as np
import tables
import pybromo as pbm
print('Numpy version:', np.__version__)
print('PyTables version:', tables.__version__)
print('PyBroMo version:', pbm.__version__)
Numpy version: 1.10.1 PyTables version: 3.2.2 PyBroMo version: 0.6
/Users/anto/miniconda3/lib/python3.5/site-packages/matplotlib/__init__.py:872: UserWarning: axes.color_cycle is deprecated and replaced with axes.prop_cycle; please use the latter. warnings.warn(self.msg_depr % (key, alt_key))
S = pbm.ParticlesSimulation.from_datafile('0168')
- Found matching timestamps.
The simulation is saved in a HDF5 file, one for each running engine. The file has the following content:
/parameters
Numeric parameters (storead as scalar arrays)
D
t_step
t_max
EID
ID
chunksize
: used for emission
and position
arraysnp
: number of particlespMol
: particles concentration in pico-MolarNon-numeric parameters (stored as group attributes)
box
: the Box()
object (stores boundaries and size)particles
: the Particles()
object, a list of Particle()
(containing the initial position positions) and seed./psf
NumericPSF()
object on a simulation reload.default_psf
: hard link to the PSF used in the simualation, used as persistent name/trajectories
emission
: 2D array of emission traces: one row per particle. Shape: (num_particles, time)emission_tot
: 1D array of emission trace: total emission from all the particles: Shape: (time)position
: 3D array of positions. Shape (num_particles, 3, time)/timestamps
rate_max
, bg_rate
and seed
.The HDF5 file handle is in S.store.data_file
after you run S.open_store()
:
S.compact_name_core()
'016898_P20_D1.2e-11_P15_D6e-12_75pM_step0.5us'
S.compact_name_core(t_max=True)
'016898_P20_D1.2e-11_P15_D6e-12_75pM_step0.5us_t_max1.0s'
S.compact_name()
'016898_P20_D1.2e-11_P15_D6e-12_75pM_step0.5us_t_max1.0s_ID0-0'
print ('Main groups:\n')
for node in S.store.h5file.root:
print (node)
for n in node:
print ('\t%s' % n.name)
print ('\t %s' % n.title)
Main groups: /parameters (Group) 'Simulation parameters' D Diffusion coefficient (m^2/s) EID IPython Engine ID (int) ID Simulation ID (int) chunksize Chunksize for arrays np Number of simulated particles pico_mol Particles concentration (pM) t_max Simulation total time (s) t_step Simulation time-step (s) /psf (Group) 'PSFs used in the simulation' default_psf PSF x-z slice (PSFLab array) xz_realistic_z50_150_160_580nm_n1335_HR2 PSF x-z slice (PSFLab array) /trajectories (Group) 'Simulated trajectories' emission Emission trace of each particle emission_tot Summed emission trace of all the particles position X-Y-Z position trace of each particle
h5file = S.store.h5file
group = '/parameters'
print ('Numeric attributes (nodes) in %s:\n' % group)
print (S.store.h5file.get_node(group))
for node in S.store.h5file.get_node(group)._f_list_nodes():
print ('\t%s (%s)' % (node.name, str(node.read())))
print ('\t %s' % node.title)
Numeric attributes (nodes) in /parameters: /parameters (Group) 'Simulation parameters' D (9.428571428571427e-12) Diffusion coefficient (m^2/s) EID (0) IPython Engine ID (int) ID (0) Simulation ID (int) chunksize (524288) Chunksize for arrays np (35) Number of simulated particles pico_mol (75.67560551416292) Particles concentration (pM) t_max (1) Simulation total time (s) t_step (5e-07) Simulation time-step (s)
group = '/parameters'
print ('Attributes in %s:\n' % group)
print (S.store.h5file.get_node(group))
for attr in S.store.h5file.get_node(group)._v_attrs._f_list():
print ('\t%s' % attr)
print ('\t %s' % type(S.store.h5file.get_node(group)._v_attrs[attr]))
Attributes in /parameters: /parameters (Group) 'Simulation parameters' box <class 'pybromo.diffusion.Box'> particles <class 'pybromo.diffusion.Particles'>
particles = S.store.h5file.root.parameters._f_getattr('particles')
len(particles), particles.rs_hash
(35, 'bfb')
group = '/psf'
print ('Nodes in in %s:\n' % group)
print (S.store.h5file.get_node(group))
for node in S.store.h5file.get_node(group)._f_list_nodes():
print ('\t%s %s' % (node.name, node.shape))
print ('\t %s' % node.title)
Nodes in in /psf: /psf (Group) 'PSFs used in the simulation' default_psf (193, 129) PSF x-z slice (PSFLab array) xz_realistic_z50_150_160_580nm_n1335_HR2 (193, 129) PSF x-z slice (PSFLab array)
default_psf
attributes¶node_name = '/psf/default_psf'
node = S.store.h5file.get_node(node_name)
print ("\n%s %s: '%s'" % (node.name, node.shape, node.title))
print ('\n List of attributes:')
for attr in node.attrs._f_list():
print ('\t%s' % attr)
print ("\t %s" % repr(node._v_attrs[attr]))
default_psf (193, 129): 'PSF x-z slice (PSFLab array)' List of attributes: dir_ '/Users/anto/src/PyBroMo/pybromo/psf_data' fname 'xz_realistic_z50_150_160_580nm_n1335_HR2' x_step 0.0625 z_step 0.0625
group = '/trajectories'
print ('Nodes in in %s:\n' % group)
print (S.store.h5file.get_node(group))
for node in S.store.h5file.get_node(group)._f_list_nodes():
print ('\t%s %s' % (node.name, node.shape))
print ('\t %s' % node.title)
Nodes in in /trajectories: /trajectories (Group) 'Simulated trajectories' emission (35, 2000000) Emission trace of each particle emission_tot (0,) Summed emission trace of all the particles position (35, 3, 2000000) X-Y-Z position trace of each particle
group = '/trajectories'
print ('Attributes in %s:\n' % group)
print (S.store.h5file.get_node(group))
for attr in S.store.h5file.get_node(group)._v_attrs._f_list():
print ('\t%s' % attr)
print ('\t %s' % type(S.store.h5file.get_node(group)._v_attrs[attr]))
Attributes in /trajectories: /trajectories (Group) 'Simulated trajectories' init_random_state <class 'tuple'> last_random_state <class 'tuple'> psf_name <class 'numpy.str_'>
emission
attributes¶node_name = '/trajectories/emission'
node = S.store.h5file.get_node(node_name)
print ("\n%s %s: '%s'" % (node.name, node.shape, node.title))
print ('\n List of attributes:')
for attr in node.attrs._f_list():
print ('\t%s' % attr)
attr_data = repr(node._v_attrs[attr])
if len(attr_data) > 300:
attr_data = hash_(node._v_attrs[attr])
print ("\t %s" % attr_data)
emission (35, 2000000): 'Emission trace of each particle' List of attributes: PyBroMo '0.5+43.g13dd56c' creation_time '2015-09-22 19:36:56'
position
attributes¶node_name = '/trajectories/position'
if node_name in S.store.h5file:
node = S.store.h5file.get_node(node_name)
print ("\n%s %s: '%s'" % (node.name, node.shape, node.title))
print ('\n List of attributes:')
for attr in node.attrs._f_list():
print ('\t%s' % attr)
print ("\t %s" % repr(node._v_attrs[attr]))
else:
print ('%s not present.')
position (35, 3, 2000000): 'X-Y-Z position trace of each particle' List of attributes: PyBroMo '0.5+43.g13dd56c' creation_time '2015-09-22 19:36:56'
group = '/timestamps'
print ('Nodes in in %s:\n' % group)
print (S.ts_store.h5file.get_node(group))
for node in S.ts_store.h5file.get_node(group)._f_list_nodes():
print ('\t%s' % node.name)
print ('\t %s' % node.title)
Nodes in in /timestamps: /timestamps (Group) 'Simulated timestamps' Pop1_P20_Pstart0_max_rate150000cps_BG800cps_Pop2_P15_Pstart20_max_rate63000cps_BG800cps_t_1s_rs_0eb3bf Simulated photon timestamps Pop1_P20_Pstart0_max_rate150000cps_BG800cps_Pop2_P15_Pstart20_max_rate63000cps_BG800cps_t_1s_rs_0eb3bf_par Particle number for each timestamp Pop1_P20_Pstart0_max_rate150000cps_BG800cps_t_1s_rs_9b9a03 Simulated photon timestamps Pop1_P20_Pstart0_max_rate150000cps_BG800cps_t_1s_rs_9b9a03_par Particle number for each timestamp Pop1_P20_Pstart0_max_rate50000cps_BG1500cps_Pop2_P15_Pstart20_max_rate117000cps_BG1500cps_t_1s_rs_9832bb Simulated photon timestamps Pop1_P20_Pstart0_max_rate50000cps_BG1500cps_Pop2_P15_Pstart20_max_rate117000cps_BG1500cps_t_1s_rs_9832bb_par Particle number for each timestamp Pop1_P20_Pstart0_max_rate50000cps_BG1500cps_t_1s_rs_9832bb Simulated photon timestamps Pop1_P20_Pstart0_max_rate50000cps_BG1500cps_t_1s_rs_9832bb_par Particle number for each timestamp
group = '/timestamps'
print ('Attributes in %s:\n' % group)
print (S.ts_store.h5file.get_node(group))
for attr in S.ts_store.h5file.get_node(group)._v_attrs._f_list():
print ('\t%s' % attr)
print ('\t %s' % type(S.ts_store.h5file.get_node(group)._v_attrs[attr]))
Attributes in /timestamps: /timestamps (Group) 'Simulated timestamps' init_random_state <class 'tuple'> last_random_state <class 'tuple'>
group = '/timestamps'
print ('>> Nodes in %s' % S.ts_store.h5file.get_node(group))
for node in S.ts_store.h5file.get_node(group)._f_list_nodes():
#print '\t%s' % node.name
#print '\t %s' % node.title
print ("\n%s %s: '%s'" % (node.name, node.shape, node.title))
print ('\n List of attributes:')
for attr in node.attrs._f_list():
print ('\t%s' % attr)
attr_data = repr(node._v_attrs[attr])
if len(attr_data) > 300:
attr_data = pbm.hash_(node._v_attrs[attr])
print ("\t %s" % attr_data)
print ('\n>> Attributes in %s:\n' % group)
print (S.ts_store.h5file.get_node(group))
for attr in S.ts_store.h5file.get_node(group)._v_attrs._f_list():
print ('\t%s' % attr)
print ('\t %s' % type(S.ts_store.h5file.get_node(group)._v_attrs[attr]))
>> Nodes in /timestamps (Group) 'Simulated timestamps' Pop1_P20_Pstart0_max_rate150000cps_BG800cps_Pop2_P15_Pstart20_max_rate63000cps_BG800cps_t_1s_rs_0eb3bf (1528,): 'Simulated photon timestamps' List of attributes: PyBroMo '0.6' bg_rate 800 clk_p 4.9999999999999998e-08 creation_time '2015-11-21 23:54:43' init_random_state 0eb3bf09731db4625a815c574b0eaca0139488e3 last_random_state 4aebc456d3ae5f4b44461c225c50034dd9a196e0 max_rates [150000.0, 62999.999999999993] populations [slice(0, 20, None), slice(20, 35, None)] Pop1_P20_Pstart0_max_rate150000cps_BG800cps_Pop2_P15_Pstart20_max_rate63000cps_BG800cps_t_1s_rs_0eb3bf_par (1528,): 'Particle number for each timestamp' List of attributes: PyBroMo '0.6' bg_particle 35 creation_time '2015-11-21 23:54:43' num_particles 35 Pop1_P20_Pstart0_max_rate150000cps_BG800cps_t_1s_rs_9b9a03 (1065,): 'Simulated photon timestamps' List of attributes: PyBroMo '0.6' bg_rate 800 clk_p 4.9999999999999998e-08 creation_time '2015-11-21 23:54:38' init_random_state 9b9a0390ffb50b993b12ea26dfbbd19c89c3aeb7 last_random_state 0c3ae0c619e18a56783d70aa8e2eef2308d1561e max_rates [150000.0] populations [slice(0, 20, None)] Pop1_P20_Pstart0_max_rate150000cps_BG800cps_t_1s_rs_9b9a03_par (1065,): 'Particle number for each timestamp' List of attributes: PyBroMo '0.6' bg_particle 35 creation_time '2015-11-21 23:54:38' num_particles 35 Pop1_P20_Pstart0_max_rate50000cps_BG1500cps_Pop2_P15_Pstart20_max_rate117000cps_BG1500cps_t_1s_rs_9832bb (2277,): 'Simulated photon timestamps' List of attributes: PyBroMo '0.6' bg_rate 1500 clk_p 4.9999999999999998e-08 creation_time '2015-11-21 23:54:40' init_random_state 9832bb4fc1d8db4ac18ed0c9b6a22ec61fb2fc81 last_random_state 0eb3bf09731db4625a815c574b0eaca0139488e3 max_rates [50000.0, 117000.0] populations [slice(0, 20, None), slice(20, 35, None)] Pop1_P20_Pstart0_max_rate50000cps_BG1500cps_Pop2_P15_Pstart20_max_rate117000cps_BG1500cps_t_1s_rs_9832bb_par (2277,): 'Particle number for each timestamp' List of attributes: PyBroMo '0.6' bg_particle 35 creation_time '2015-11-21 23:54:40' num_particles 35 Pop1_P20_Pstart0_max_rate50000cps_BG1500cps_t_1s_rs_9832bb (1590,): 'Simulated photon timestamps' List of attributes: PyBroMo '0.6' bg_rate 1500 clk_p 4.9999999999999998e-08 creation_time '2015-11-21 23:54:36' init_random_state 9832bb4fc1d8db4ac18ed0c9b6a22ec61fb2fc81 last_random_state 9b9a0390ffb50b993b12ea26dfbbd19c89c3aeb7 max_rates [50000.0] populations [slice(0, 20, None)] Pop1_P20_Pstart0_max_rate50000cps_BG1500cps_t_1s_rs_9832bb_par (1590,): 'Particle number for each timestamp' List of attributes: PyBroMo '0.6' bg_particle 35 creation_time '2015-11-21 23:54:36' num_particles 35 >> Attributes in /timestamps: /timestamps (Group) 'Simulated timestamps' init_random_state <class 'tuple'> last_random_state <class 'tuple'>
S.store.close()
S.ts_store.close()
PyBromo uses Numpy's RandomState
object to track the current state of the random number generator at different
simulation stages.
Tracking the random state has a dual purpose. First, it allows to reproduce any simulation step.
Second, it allows to mantain an high-quality pseudo-random number stream when the simulation is resumed. This point is more subtle. Simulation can be performed in different steps. When resuming a simulation to proceed to the nex step it is important to use the last saved random state. In fact, by resetting the random state using an arbitrary seed we may easily introduce correlation between the previous and current stream of random numbers. For example streams generated with seeds 1 and 2 will be correlated (up to 1e6 samples!) because many bits in the seed are the same. This is a property of the Mersenne twister algorithm (see also this paper). To avoid this well-known problem we need to use a single stream by freezing (saving) and restoring the random state at each step.
Notice that there are 3 steps that require random numbers:
The random state is tracked as follows:
gen_particles
the new Particles
objectwill contain the .init_random_state
attribute (and, as a convenience, a 3 digit
hash in .rs_hash
)
2. Whem performing the Brownian motion simulation with .sim_brownian_motion
,
the '/trajectories' group (shortcut S.traj_group
) will store the initial and
the final state as group attributes: .init_random_state
and .last_random_state
.
The assumption is that we simulate only 1 Browian motion diffusion per object
so makes sense to store these values as group attributes.
3. When generating timestamps with S.sim_timestamps_em_store
, we will generate
timestamps several times for different emission or background rates. Therefore
the '/timestamps' group (shortcutS.ts_group
) contains the last_random_state
attribute and each timestamp array contains the init_random_state
attribute.
NOTE: Since the random-state data structure (a tuple of a string, an array, and 3 numbers) is numpy-specific we save it without any conversion. In fact, using the same random state in another programming language would require a deep understanding of the Mersenne twister implementation. Not to mention that a proprietary language may not provide such level of details to the user. Anyway, if you are motivated to use the random state in another language, an additional conversion from numpy format would be the least of your problems.