n.b. you will need cctbx.xfel
installed to run this, and will need to run it with the command libtbx.ipython
.
API documentation can be found at http://cci.lbl.gov/cctbx_docs/xfel/xfel.clustering.html#cluster-cluster Note that this is in active development, so may be updated freequently/contain new methods or classes that are not yet finished.
This tutorial will demonstrate the API level usage of the XFEL data exploration toolkit, using a test data set with only 49 images, for simplicity. On my local machine, this is at:
TESTDATA = ['/users/oli/Dropbox/Stanford_Postdoc/CODING/cctbx_testing/PolG_test_data']
import logging
import numpy as np
import matplotlib.pyplot as plt
import logging
import brewer2mpl
# Set up logging
reload(logging) # work-around for IPython
FORMAT = '%(message)s'
logging.basicConfig(level=logging.INFO, format=FORMAT)
# pretty colors
cols = brewer2mpl.get_map('BrBG', 'Diverging', 3).mpl_colors
Let's start by creating a cluster object. We will use the from_directories
class method to create a cluster object.
from xfel.clustering.cluster import Cluster
t_clus = Cluster.from_directories(TESTDATA)
Next, we'll do some hierarchical clustering on this to get a sense of the unit cells and point groups that make up our data. sub_clusters will be a list of clusters, and we're ignoring the other return value (the axes object we just plotted onto).
sub_clusters, _ = t_clus.ab_cluster(labels=False, write_file_lists=False)
Hierarchical clustering of unit cells Using Andrews-Bernstein distance from Andrews & Bernstein J Appl Cryst 47:346 (2014) Distances have been calculated
So let's look at the composition sub_clusters
list we got. Since this is acting on a group of clusters, we will import a tool from the cluster_groups
module.
from xfel.clustering.cluster_groups import unit_cell_info
pretty_str = unit_cell_info(sub_clusters)
print pretty_str
12 clusters. C_id Num in cluster Med_a Med_b Med_c Med_alpha Med_beta Med_gamma cluster_10 2 230.7(1.1 ) 290.2(3.7 ) 784.3(1.4 ) 90.00 (0.00) 90.00 (0.00) 90.00 (0.00) 2 in P222. cluster_11 14 224.6(4.6 ) 286.5(1.8 ) 400.0(7.0 ) 90.00 (0.00) 90.00 (0.00) 90.00 (0.00) 10 in C222, 4 in P222. cluster_12 24 227.4(1.6 ) 227.4(1.6 ) 286.8(2.4 ) 90.00 (0.00) 90.00 (0.00) 120.00(0.00) 24 in P3. Standard deviations are in brackets. 9 singletons: Point group a b c alpha beta gamma C2 225.7 270.5 1187.4 90.0 90.0 94.0 C222 231.2 286.7 1131.1 90.0 90.0 90.0 P2 225.9 229.4 287.2 90.0 90.0 119.4 C2 228.4 299.2 392.5 93.1 90.0 90.0 C2 23.7 330.1 358.3 97.9 90.0 90.0 C2 226.3 290.9 399.7 91.3 90.0 90.0 P222 226.8 396.7 572.0 90.0 90.0 90.0 C2 223.1 486.0 682.3 92.7 90.0 90.0 P1 484.1 600.6 689.2 99.9 91.1 104.5
We see the two biggest clusters that were in red and green in the plot above. Lets call these clu_a
and clu_b
. These are just the last two elements of the list, since it is sorted.
clu_b, clu_a = sub_clusters[-2:]
print "clu_a size: {}\nclu_b size: {}".format(len(clu_a.members), len(clu_b.members))
clu_a size: 24 clu_b size: 14
Let's now pretend we are interested only in the smaller cluster, clu_b
. Let us examine the intensity distribution for this, again ignoring the returned axes object:
_ = clu_b.all_frames_intensity_stats()
That's not looking great: a very high B value, and some rising in the intensities at higher resoltion. These are in fact pretty poor quality data, so that's consitent. Now let's look at the point group composition and see if either of these look better separately:
clu_b.pg_composition
{'C222': 10, 'P222': 4}
clu_bC = clu_b.point_group_filter('C222')
clu_bP = clu_b.point_group_filter('P222')
_ = clu_bC.all_frames_intensity_stats()
Still looking a bit woking at high res..
_ = clu_bP.all_frames_intensity_stats()
This looks a little more sensible. Now, let's look at the info string:
print clu_bP.info
Made from files in ['/users/oli/Dropbox/Stanford_Postdoc/CODING/cctbx_testing/PolG_test_data'] ############################## Next filter ############################## Made using ab_cluster with t=10000, distance method, and single linkage 14 of 49 images passedon to this cluster ############################## Next filter ############################## Cluster filtered by for point group P222. 4 of 14 images passedon to this cluster
Note how this keeps track of the cluster's provenance, so we know where it came from. Let's now write these images of interest out (so that we can, for example look at them individually using cctbx.image_viewer
,
clu_bP.dump_file_list(out_file_name='temp.lst')
cat temp.lst
/users/oli/Dropbox/Stanford_Postdoc/CODING/cctbx_testing/PolG_test_data/int-s00-2013-11-10T17:53Z14.813_00000.pickle /users/oli/Dropbox/Stanford_Postdoc/CODING/cctbx_testing/PolG_test_data/int-s03-2013-11-10T17:51Z07.396_00000.pickle /users/oli/Dropbox/Stanford_Postdoc/CODING/cctbx_testing/PolG_test_data/int-s03-2013-11-10T17:54Z36.359_00000.pickle /users/oli/Dropbox/Stanford_Postdoc/CODING/cctbx_testing/PolG_test_data/int-s04-2013-11-10T18:28Z45.434_00000.pickle
Marvelous!