This IPython notebook contains the commands for the analysis described in ER Hyde et al., The oral and skin microbiomes of captive Komodo Dragons are significantly shared with their habitat.
This dataset represents the largest captive Komodo dragon microbiome dataset to date. Twelve zoos across the U.S. (Zoo Atlanta, ABQ BioPark, Bronx Zoo, Denver Zoo, Fort Worth Zoo, Gladys Porter Zoo, Houston Zoo, Honolulu Zoo, Jacksonville Zoo and Gardens, Los Angeles Zoo, the Virginia Aquarium, and Woodland Park Zoo) agreed to sample the skin, salivary, and fecal microbiomes of their resident Komodo dragons. Additionally, the Denver and Los Angeles Zoos collected samples from the Komodos' environments (soil, rock, plastic, etc.). Other reptiles were also sampled (Gray's Monitor at the LA Zoo, Varanus rudicollis and Varanus indicus at the Greeley Zoo, and wild-caught rattlesnakes in Colorado); however, due to small sample sizes, samples from these reptiles were not included in the current analysis. The V4 region of the 16S rRNA gene was amplified and sequenced on the Illumina HiSeq platform; samples were barcoded, which allowed for multiplexed sequencing. You can download the raw sequencing files and mapping files used for this study before getting started with this Notebook.
Note: If you are using an Apple computer, use the curl
command as OS X does not come with wget pre-installed.
# Download the data using wget (if you are using OS X do not run this command)
!wget ftp://ftp.microbio.me/pub/Hyde_et_al_captive_komodo_dragon_microbiome.tgz
# Or alternativelly, if you're using OS X
#!curl -OL ftp://ftp.microbio.me/pub/Hyde_et_al_captive_komodo_dragon_microbiome.tgz
# Untar the files
tar xzvf Hyde_et_al_captive_komodo_dragon_microbiome.tgz
We first need to demultiplex and quality filter the Illumina sequences. QIIME demultiplexes and quality filters using a single script: split_libraries_fastq.py. The inputs of this script are the sequencing outputs (raw reads and barcode fastq files) and the mapping file. We changed the maximum unacceptable Phred score from the default value of q=3 to a value of q=19 to ensure high-quality reads; the remaining defaults (r=3, p=75%, and n=0) were used. We also used the --rev_comp_mapping_barcodes option as our mapping file contains the reverse compliment of the barcodes.
!split_libraries_fastq.py -i $PWD/komodo/KomodoDragon_16sV4_L001_R1_001.fastq \
-b $PWD/komodo/KomodoDragon_16sV4_L001_I1_001.fastq \
-m $PWD/Komodo_Mapping_file.txt \
-q 19 --rev_comp_mapping_barcodes \
-o $PWD/sl_out
After preprocessing the raw data, we then cluster the sequences into Operational Taxonomic Units (OTUs), assign taxonomy to each OTU, build a phylogenetic tree, and make an OTU table. There are three approaches to OTU picking: closed-reference, open-reference, and de novo OTU picking. Closed-reference OTU picking is the fastest method. Sequences are aligned against a reference database, and all sequences that do not align against the reference database are discarded. De novo OTU picking is the most time-consuming method, as sequences are aligned against each other and clustered into OTUs de novo. Open-reference OTU picking is a mixed of closed-reference and de novo OTU picking, as it performs an initial closed-reference OTU picking step, followed by a de novo OTU picking step on those reads that did not cluster to the database. We first performed closed-reference OTU picking, using uclust as the clustering algorithm (QIIME default) and the May 2013 release of the Greengenes database as the reference database. A sequence identity of 97% was used for clustering.
!pick_closed_reference_otus.py -i $PWD/sl_out/seqs.fna -o $PWD/closed_ref_otus
Since closed-reference OTU picking discards the sequences that did not match the reference, we will check how many sequences clustered against the reference database:
from biom import load_table
from qiime.util import count_seqs_in_filepaths
closed_ref_table = load_table('closed_ref_otus/otu_table.biom')
clustered_seqs = closed_ref_table.matrix_data.sum()
count_data, _, _ = count_seqs_in_filepaths(['sl_out/seqs.fna'])
num_input_seqs = count_data[0][0][0]
clustered_seqs / num_input_seqs * 100
We can see that only 72.7% of sequences in the dataset aligned to the reference database. To recover more information from the dataset, we can perform an open-reference OTU picking workflow, again using uclust as the underlying clustering algorithm and the May 2013 release of the Greengenes database as the reference.
!pick_open_reference_otus.py -i $PWD/sl_out/seqs.fna -o $PWD/open_ref_otus
We now inspect the resulting OTU table to check how many sequences we kept using the open reference method.
open_ref_table = load_table('open_ref_otus/otu_table_mc2_w_tax.biom')
clustered_seqs = open_ref_table.matrix_data.sum()
clustered_seqs / num_input_seqs * 100
We can see that in the open reference case we recover 95.2% of the sequences in the dataset. Since we kept more information with the open reference approach, this is the table that we are going to use for Downstream Analyses.
Before doing any Downstream Analyses, we are going to refine our data by removing low abundance, spurious OTUs which may be product of PCR or sequencing error by removing those OTUs that each represent less than 0.005% of the total number of sequences in the dataset (as recommended by Navas-Molina et al., 2013, Methods in Enzymology vol 531).
!filter_otus_from_otu_table.py -i $PWD/open_ref_otus/otu_table_mc2_w_tax.biom \
-o $PWD/open_ref_otus/otu_table_mc2_w_tax_mcf00005.biom \
--min_count_fraction 0.00005
We then perform an additional filtering step to remove samples that do not belong to Komodo dragons or their environments, plust an additional 5 samples with insufficient metadata to determine whether they belong to Komodo dragons or Komodo environments.
!filter_samples_from_otu_table.py -i $PWD/open_ref_otus/otu_table_mc2_w_tax_mcf00005.biom \
-o $PWD/open_ref_otus/otu_table_komodo.biom \
-m $PWD/Komodo_Mapping_file.txt \
-s 'HOST_COMMON_NAME:Komodo Dragon'
Alpha diversity is defined as the diversity of organisms in one sample or environment, and QIIME implements dozens of the most widely used alpha diversity indices (both phylogenetic and non-phylogenetic). Rarefaction plots, which not only enable visualization of the diversity of samples or environments, are also useful for assessing the sequencing effort sufficient for representing and comparing microbial communities in your dataset. The plot curves reach asymptotes when the sequencing depth does not contribute additional OTUs, indicating that the depth of sequencing was sufficient to sample the majority of diversity present in the sample. It is recommended that input OTU tables are filtered so that any samples containing less reads than the maximum number of reads you will sample are removed from the OTU table; if this step is omitted, some alpha diversity curves will not reach the end of the x-axis on the rarefaction plot. You can do this using the command filter_samples_from_otu_table.py. Because we are only interested in comparing the diversity of Komodo dragon skin, salivary, and fecal microbiomes right now, we will will filter the Komodo-only OTU table so that environmental samples are not present, and then we will filter so that any samples with less than 3000 reads are removed from the OTU table (this number was chosen to maximize the number of reads per sample while minimizing the number of samples discarded from analysis).
# First filter the table to get only the Komodo dragon skin, salivary and fecal microbiomes
!filter_samples_from_otu_table.py -i $PWD/open_ref_otus/otu_table_komodo.biom \
-o $PWD/open_ref_otus/otu_table_komodo_no_env.biom \
-m $PWD/Komodo_Mapping_file.txt \
-s 'BODY_PRODUCT:*,!NA'
# Then filter to get only the samples with more than 3000 reads
!filter_samples_from_otu_table.py -i $PWD/open_ref_otus/otu_table_komodo_no_env.biom \
-o $PWD/open_ref_otus/otu_table_komodo_no_env_n3000.biom \
-n 3000
Although QIIME will perform a series of steps for calculating alpha diversity and creating rarefaction plots using the script alpha_rarefaction.py, we will perform each of these steps individually, as we alter some default parameters. The first step is rarefying the OTU table so that all samples have the same number of reads associated with them. This helps eliminate bias caused by different sequencing depths between samples. We will randomly rarefy the OTU table ten times (QIIME default), and later, the alpha diversity metrics calculated on each of these tables will be averaged. The script multiple_rarefactions.py requests the minimum and maximum number of sequences to rarefy to, as well as the size of each sampling step between the minimum and maximum number. Here, our minimum number will be 10, and our maximum will be 3300, with steps of 100. We will first make a directory within our working directory called "alpha_diversity" into which we can point the outputs of all alpha diversity scripts.
!mkdir alpha_diversity
!multiple_rarefactions.py -i $PWD/open_ref_otus/otu_table_komodo_no_env_n3000.biom \
-o $PWD/alpha_diversity/multiple_rarefactions_komodo_no_env/ \
-m 10 -x 3300 -s 100
We then calculate the desired alpha diversity metrics for each of the rarefied OTU tables created in the previous step. We will calculate the number of observed species (number of OTUs) as well as the Shannon Diversity index (a measure of richness and evenness of the community) using the comman alpha_diversity.py.
!alpha_diversity.py -i $PWD/alpha_diversity/multiple_rarefactions_komodo_no_env/ \
-o $PWD/alpha_diversity/adiv_shannon_OS_komodo_no_env/ \
-m shannon,observed_species
We then collate the resulting alpha diversity results so that one file, representing the averages for all OTU tables at each read depth, is created for each alpha diversity metric.
!collate_alpha.py -i $PWD/alpha_diversity/adiv_shannon_OS_komodo_no_env/ \
-o $PWD/alpha_diversity/collated_alpha_komodo_no_env/
Finally, we create rarefaction plots.
!make_rarefaction_plots.py -i $PWD/alpha_diversity/collated_alpha_komodo_no_env/ \
-m $PWD/Komodo_Mapping_file.txt \
-o $PWD/alpha_diversity/rarefaction_plots_komodo_no_env/
We can open the html file to manipulate the rarefaction plots in our browser. Choosing the metadata category BODY_PRODUCT, we see that the number of OTUs and the Shannon Diversity index is lower for fecal samples than for skin or saliva samples. This is interesting, as in humans, the GI tract microbiome is the most diverse microbiome. We can further test whether this difference is significant and create a boxplot of the diversity metric distribution for each group of samples by running the script compare_alpha_diversity.py. This script uses as input the collated alpha diversity metrics files created in the collate_alpha.py step, as well as the mapping file.
# With observed species
!compare_alpha_diversity.py -i $PWD/alpha_diversity/collated_alpha_komodo_no_env/observed_species.txt \
-m $PWD/Komodo_Mapping_file.txt \
-o $PWD/alpha_diversity/OS_komodo_no_env \
-c BODY_PRODUCT
# With Shannon Diversity index
!compare_alpha_diversity.py -i $PWD/alpha_diversity/collated_alpha_komodo_no_env/shannon.txt \
-m $PWD/Komodo_Mapping_file.txt \
-o $PWD/alpha_diversity/Shannon_Komodo_no_env \
-c BODY_PRODUCT
We also want to compare the diversity of environmental samples to Komodo skin, saliva, and fecal microbiomes. We can therefore repeat each of the analyses steps outlined above, using an OTU table that includes Komodo skin, saliva, fecal, and environmental samples.
# Filter the OTU table to get the samples with more than 3000 reads
!filter_samples_from_otu_table.py -i $PWD/open_ref_otus/otu_table_komodo.biom \
-o $PWD/open_ref_otus/otu_table_komodo_n3000.biom \
-n 3000
# Perform multiple rarefactions
!multiple_rarefactions.py -i $PWD/open_ref_otus/otu_table_komodo_n3000.biom \
-o $PWD/alpha_diversity/multiple_rarefactions_komodo/ \
-m 10 -x 3300 -s 100
# Perform alpha diversity
!alpha_diversity.py -i $PWD/alpha_diversity/multiple_rarefactions_komodo/ \
-o $PWD/alpha_diversity/adiv_shannon_OS_komodo/ \
-m shannon,observed_species
# Collate alpha diversity
!collate_alpha.py -i $PWD/alpha_diversity/adiv_shannon_OS_komodo/ \
-o $PWD/alpha_diversity/collated_alpha_komodo/
# Generate plots
!make_rarefaction_plots.py -i $PWD/alpha_diversity/collated_alpha_komodo/ \
-m $PWD/Komodo_Mapping_file.txt \
-o $PWD/alpha_diversity/rarefaction_plots_komodo/
# Compare alpha diversity
!compare_alpha_diversity.py -i $PWD/alpha_diversity/collated_alpha_komodo/observed_species.txt \
-m $PWD/Komodo_Mapping_file.txt \
-o $PWD/alpha_diversity/OS_Komodo \
-c BODY_PRODUCT
!compare_alpha_diversity.py -i $PWD/alpha_diversity/collated_alpha_komodo/shannon.txt \
-m $PWD/Komodo_Mapping_file.txt \
-o $PWD/alpha_diversity/shannon_Komodo \
-c BODY_PRODUCT
To visualize which bacterial taxa comprise the microbiomes of individual samples or environments, we can create taxa plots. We first rarefy the OTU table to 3323 reads per sample.
!single_rarefaction.py -i $PWD/open_ref_otus/otu_table_komodo.biom \
-o $PWD/open_ref_otus/otu_table_komodo_even3323.biom \
-d 3323
First, we want to characterize the Komodo salivary, skin, and fecal microbiomes. We can collapse the OTU table such that all salivary, skin, and fecal samples are an average of the entire cohort, with one column in the OTU table for each metadata category. Note, because this OTU table also contains environmental samples, a column for environmental samples ("NA") will also be created. We will not focus on environmental samples yet, but can return to this taxa plot later.
!mkdir taxa_summaries
!collapse_samples.py -b $PWD/open_ref_otus/otu_table_komodo_even3323.biom \
-m $PWD/Komodo_Mapping_file.txt \
--output_biom_fp $PWD/taxa_summaries/otu_table_komodo_even3323_by_body_product.biom \
--output_mapping_fp $PWD/taxa_summaries/Komodo_Mapping_file_by_body_product.txt \
--collapse_fields BODY_PRODUCT
We can then run a single script in QIIME which will split the OTU table into 5 different taxonomic-level OTU tables, labelled L2, L3, L4, L5, and L6.txt (L2 being Phylum and L6 being genus) and then create interactive html files (as well as 2D pdfs for each possible plot) for exploring the average relative abundances of individual taxa in each group of samples.
!summarize_taxa_through_plots.py -i $PWD/taxa_summaries/otu_table_komodo_even3323_by_body_product.biom \
-m $PWD/taxa_summaries/Komodo_Mapping_file_by_body_product.txt \
-o $PWD/taxa_summaries/Komodo_by_body_product_summaries/
Beta diversity is defined as the difference in the diversities across samples or environments. QIIME can compute many phylogenetic and non-phylogenetic beta diversity metrics, although UniFrac is the most generally useful metric. Using the script beta_diversity_through_plots.py, QIIME generates a distance matrix by computing the beta diversity between each pair of input sample as well as an Emperor html file for visualizing a PCoA plot interactively in 3-dimensions. We will create two sets of distance matrices and PCoA plots-one that includes Komodo dragon skin, salivary, and fecal samples and one that also includes environmental samples.
# First create a directory for all beta diversity results
!mkdir beta_div
# Komodo skin, saliva and fecal samples plus Komodo environmental samples
!beta_diversity_through_plots.py -i $PWD/open_ref_otus/otu_table_komodo_even3323.biom \
-m $PWD/Komodo_Mapping_file.txt \
-t $PWD/open_ref_otus/rep_set.tre \
-o $PWD/beta_div/bdiv_even3323_komodo
# Filter environment samples from rarefied table
!filter_samples_from_otu_table.py -i $PWD/open_ref_otus/otu_table_komodo_even3323.biom \
-o $PWD/open_ref_otus/otu_table_komodo_no_env_even3323.biom \
-m $PWD/Komodo_Mapping_file.txt \
-s 'BODY_PRODUCT:*,!NA'
# Only Komodo skin, saliva and fecal samples
!beta_diversity_through_plots.py -i $PWD/open_ref_otus/otu_table_komodo_no_env_even3323.biom \
-m $PWD/Komodo_Mapping_file.txt \
-t $PWD/open_ref_otus/rep_set.tre \
-o $PWD/beta_div/bdiv_even3323_komodo_no_env
Coloring the dots by the metadata category BODY_PRODUCT, we see that fecal samples cluster separately from skin and saliva samples; additionally, environmental samples cluster among skin and saliva, but not fecal, samples.
We can test the signficance of the clustering and whether body site is a driver of this clustering pattern using the script compare_categories.py. We will use two tests: anosim and permanova. This script requires the UniFrac distance matrix produced during the beta_diversity_through_plots.py step; we will only use the distance matrix containing Komodo body, but not environmental, samples.
# anosim - unweighted unifrac
!compare_categories.py -i $PWD/beta_div/bdiv_even3323_komodo_no_env/unweighted_unifrac_dm.txt \
-m $PWD/Komodo_Mapping_file.txt \
-o $PWD/beta_div/bdiv_even3323_komodo_no_env/cc_body_product_unweighted_unifrac_anosim \
-c BODY_PRODUCT \
--method anosim
# anosim - weighted unifrac
!compare_categories.py -i $PWD/beta_div/bdiv_even3323_komodo_no_env/weighted_unifrac_dm.txt \
-m $PWD/Komodo_Mapping_file.txt \
-o $PWD/beta_div/bdiv_even3323_komodo_no_env/cc_body_product_weighted_unifrac_anosim \
-c BODY_PRODUCT \
--method anosim
# permanova - unweighted unifrac
!compare_categories.py -i $PWD/beta_div/bdiv_even3323_komodo_no_env/unweighted_unifrac_dm.txt \
-m $PWD/Komodo_Mapping_file.txt \
-o $PWD/beta_div/bdiv_even3323_komodo_no_env/cc_body_product_unweighted_unifrac_permanova \
-c BODY_PRODUCT \
--method permanova
# permanova - weighted unifrac
!compare_categories.py -i $PWD/beta_div/bdiv_even3323_komodo_no_env/weighted_unifrac_dm.txt \
-m $PWD/Komodo_Mapping_file.txt \
-o $PWD/beta_div/bdiv_even3323_komodo_no_env/cc_body_product_weighted_unifrac_permanova \
-c BODY_PRODUCT \
--method permanova
To determine whether or not there is a core Komodo dragon skin, salivary, or fecal microbiome, we can run a script in QIIME called compute_core_microbiome.py. This script will determine which OTUs are present in a user defined subset of samples; for example, we will set the minimum number of samples at 50% of the cohort, and the maximum number at 100% of the cohort. We can then indicate that we want 11 steps in between the minimum and maximum, to ensure that a core microbiome is calculated in steps of 5% (55%, 60%, 65%, etc.). We want to determine the core microbiome for skin, saliva, and fecal samples, and will therefore run this script three times, indicating which body site to use each time.
# First create a directory for all the core microbiome results
!mkdir core_microbiome
# Saliva
!compute_core_microbiome.py -i $PWD/open_ref_otus/otu_table_komodo_even3323.biom \
--mapping_fp $PWD/Komodo_Mapping_file.txt \
--valid_states "BODY_PRODUCT:UBERON:saliva" \
--max_fraction_for_core 1.0 \
--min_fraction_for_core 0.5 \
--num_fraction_for_core_steps 11 \
-o $PWD/core_microbiome/cm_saliva
# Skin
!compute_core_microbiome.py -i $PWD/open_ref_otus/otu_table_komodo_even3323.biom \
--mapping_fp $PWD/Komodo_Mapping_file.txt \
--valid_states "BODY_PRODUCT:UBERON:sebum" \
--max_fraction_for_core 1.0 \
--min_fraction_for_core 0.5 \
--num_fraction_for_core_steps 11 \
-o $PWD/core_microbiome/cm_skin
# Feces
!compute_core_microbiome.py -i $PWD/open_ref_otus/otu_table_komodo_even3323.biom \
--mapping_fp $PWD/Komodo_Mapping_file.txt \
--valid_states "BODY_PRODUCT:UBERON:feces" \
--max_fraction_for_core 1.0 \
--min_fraction_for_core 0.5 \
--num_fraction_for_core_steps 11 \
-o $PWD/core_microbiome/cm_feces
We are also interesed in how much of its microbiome the caprive Komodo dragon shares with its environment. Some have suggested that Komodos passively acquire potential pathogens from the environment; others posit that Komodos share pathogens with each other, usually through feeding on the same carrion, though this could be extrapolated to include any environmental object two Komodos may touch. Nevertheless, the Komodo environment (captive or wild) has been virtually ignored; therefore, analysing the extent of Komodo-environment microbiome sharing represents an important knowledge gap. We can use SourceTracker to determine whether and which microbiomes the Komodo shares with its environment.
In this dataset, two zoos provided Komodo and environmental samples. For our primary analysis, we will focus only on the samples from three Denver Zoo dragons: Anika, Kristika and Raja, due to larger sample sizes and matched dragon-environmental samples.
# Filter the table to get Denver samples only
!filter_samples_from_otu_table.py -i $PWD/open_ref_otus/otu_table_komodo_even3323.biom \
-o $PWD/SourceTracker/tables/otu_table_komodo_even3323_denver.biom \
-m $PWD/Komodo_Mapping_file.txt \
-s 'DRAGON_NAME:Anika,Kristika,Raja'
Following the recommendations for running SourceTracker, we should filter those OTUs from our OTU table that are present in less than 1% of the samples. In our case, we only have 37 samples, so there is no reason to filter.
SourceTracker does not accept biom formatted files, so we need to transform our biom table to a classic tab delimited text format.
!biom convert -i $PWD/SourceTracker/tables/otu_table_komodo_even3323_denver.biom \
-o $PWD/SourceTracker/tables/otu_table_komodo_even3323_denver.txt \
--table-type "OTU table" --to-tsv
We can now run SourceTracker:
!R --slave --vanilla --args -i $PWD/SourceTracker/tables/otu_table_komodo_even3323_denver.txt \
-m $PWD/SourceTracker/mappings/Komodo_Mapping_file_ST_denver.txt \
-o $PWD/SourceTracker/st_denver/ \
< $SOURCETRACKER_PATH/sourcetracker_for_qiime.r
We can then perform SourceTracker on all Denver and Honolulu Zoo samples to confirm whether the results observed in three Denver Zoo dragons are recapitulated using a larger dataset representing two geographically distinct zoos.
# Filter the table to contain only samples from Denver and Honolulu Zoo
!filter_samples_from_otu_table.py -i $PWD/open_ref_otus/otu_table_komodo_even3323.biom \
-o $PWD/SourceTracker/tables/otu_table_komodo_even3323_denver_honolulu.biom \
-m $PWD/Komodo_Mapping_file.txt \
-s 'PROVENANCE:Honolulu Zoo,Denver'
# Convert the biom table to classic tab delimited text file
!biom convert -i $PWD/SourceTracker/tables/otu_table_komodo_even3323_denver_honolulu.biom \
-o $PWD/SourceTracker/tables/otu_table_komodo_even3323_denver_honolulu.txt \
--table-type "OTU table" --to-tsv
# Run SourceTracker
!R --slave --vanilla --args -i $PWD/SourceTracker/tables/otu_table_komodo_even3323_denver_honolulu.txt \
-m $PWD/SourceTracker/mappings/Komodo_Mapping_file_ST_denver_honolulu.txt \
-o $PWD/SourceTracker/st_denver_honolulu/ \
< $SOURCETRACKER_PATH/sourcetracker_for_qiime.r
We can also perform independence tests on our sources (Komodo skin, saliva, and fecal microbiomes) to determine whether each source is independant from each other source. To do this, we simply add "-s" to the SourceTracker command:
!R --slave --vanilla --args -i $PWD/SourceTracker/tables/otu_table_komodo_even3323_denver_honolulu.txt \
-m $PWD/SourceTracker/mappings/Komodo_Mapping_file_ST_denver_honolulu.txt \
-o $PWD/SourceTracker/st_denver_honolulu_independence_test/ \
-s < $SOURCETRACKER_PATH/sourcetracker_for_qiime.r
We next want to establish the specificity of the Komodo dragon environment. Therefore, we will combine the Komodo dragon dataset with a published wild amphibian-environmental dataset. We first combine the split_library_fastq.py outputs from both projects into a single .fna file, and then use that file as input for pick_open_reference_otus.py.
# Create a directory to store the Komodo-Amphibians results
!mkdir komodo_amphibians
# Concatenate the split libraries output from both datasets
!cat sl_out/seqs.fna amphibians/seqs.fna >> komodo_amphibians/seqs.fna
# Run open reference OTU picking
!pick_open_reference_otus.py -i $PWD/komodo_amphibians/seqs.fna -o $PWD/komodo_amphibians/open_ref_otus
Once we have the OTU table, we can then perform the same quality filter outlined above to remove spurious OTUs:
!filter_otus_from_otu_table.py -i $PWD/komodo_amphibians/open_ref_otus/otu_table_mc2_w_tax.biom \
-o $PWD/komodo_amphibians/open_ref_otus/otu_table_mc2_w_tax_mcf00005.biom \
--min_count_fraction 0.00005
As outlined above, the Komodo dataset actually includes samples from other Varanids, but due to limited sample size, we are interested only in the Komodo dragon and their environment samples. Additionally, in the Amphibian dataset there are two control samples that should also be filtered out:
# Filter the Amphibian's control samples:
!filter_samples_from_otu_table.py -i $PWD/komodo_amphibians/open_ref_otus/otu_table_mc2_w_tax_mcf00005.biom \
-o $PWD/komodo_amphibians/open_ref_otus/otu_table_mc2_w_tax_mcf00005_no_Amph_controls.biom \
-m $PWD/Amphibians_Komodo_Mapping_file.txt \
-s 'host_species_abbrev:*,!Sterile_water_control,!sterile_glove_control'
# Filter the non-Komodo related varanid samples
!filter_samples_from_otu_table.py -i $PWD/komodo_amphibians/open_ref_otus/otu_table_mc2_w_tax_mcf00005_no_Amph_controls.biom \
-o $PWD/komodo_amphibians/open_ref_otus/otu_table_AK.biom \
-m $PWD/Amphibians_Komodo_Mapping_file.txt \
-s "host_common_name:*,!Gray's Monitor,!NA,!Prairie Rattlesnake,!Varanus Indicus,!Varanus Rudicollis"
We then rarefy our table prior to any analysis:
!single_rarefaction.py -i $PWD/komodo_amphibians/open_ref_otus/otu_table_AK.biom \
-o $PWD/komodo_amphibians/open_ref_otus/otu_table_AK_even5870.biom \
-d 5870
We then perform beta diversity analysis to confirm that Komodo and amphibian microbial communities are different.
!beta_diversity_through_plots.py -i $PWD/komodo_amphibians/open_ref_otus/otu_table_AK_even5870.biom \
-m $PWD/Amphibians_Komodo_Mapping_file.txt \
-t $PWD/komodo_amphibians/open_ref_otus/rep_set.tre \
-o $PWD/komodo_amphibians/bdiv_even_5870
We can also make OTU networks, which we can visualize using Cytoscape (please see the Cytoscape tutorial available here), allowing us to determine if there are any shared OTUs between the two datasets.
!make_otu_network.py -i $PWD/komodo_amphibians/open_ref_otus/otu_table_AK_even5870.biom \
-m $PWD/Amphibians_Komodo_Mapping_file.txt \
-o $PWD/komodo_amphibians/otu_network
To further explore the specificity of the captive Komodo dragon environment, as well as to further our analysis of host-microbiome sharing in closed, "built" environments vs. open environments, we can combine the Komodo dragon dataset with a published human-pet-house dataset. As we did when we combined the Komodo dataset with the amphibian dataset, we will first combine split_libraries_fastq.py files from both datasets and then run the open reference OTU picking workflow.
# Create a directory to store the Komodo-Humans-Pets results
!mkdir komodo_humans_pets
# Concatenate the split libraries output from both datasets
!cat sl_out/seqs.fna humans_pets/seqs.fna >> komodo_humans_pets/seqs.fna
# Run open reference OTU picking
!pick_open_reference_otus.py -i $PWD/komodo_humans_pets/seqs.fna -o $PWD/komodo_humans_pets/open_ref_otus
As in the previous cases, we applied the recommended filter for spurious OTUs and the filter for getting only Komodo related samples from the Komodo dataset:
# Filter the spurious OTUs
!filter_otus_from_otu_table.py -i $PWD/komodo_humans_pets/open_ref_otus/otu_table_mc2_w_tax.biom \
-o $PWD/komodo_humans_pets/open_ref_otus/otu_table_mc2_w_tax_mcf00005.biom \
--min_count_fraction 0.00005
# Filter other varanids data
!filter_samples_from_otu_table.py -i $PWD/komodo_humans_pets/open_ref_otus/otu_table_mc2_w_tax_mcf00005.biom \
-m $PWD/Humans_Komodo_Mapping_file.txt \
-o $PWD/komodo_humans_pets/open_ref_otus/otu_table_HPK.biom \
-s 'HOST_COMMON_NAME:Komodo Dragon,no_data'
We then rarefy our table prior to any analysis:
!single_rarefaction.py -i $PWD/komodo_humans_pets/open_ref_otus/otu_table_HPK.biom \
-o $PWD/komodo_humans_pets/open_ref_otus/otu_table_HPK_even5367.biom \
-d 5367
We then perform beta diversity analyses and associated statistical tests.
!beta_diversity_through_plots.py -i $PWD/komodo_humans_pets/open_ref_otus/otu_table_HPK_even5367.biom \
-m $PWD/Humans_Komodo_Mapping_file.txt \
-t $PWD/komodo_humans_pets/open_ref_otus/rep_set.tre \
-o $PWD/komodo_humans_pets/bdiv_even_5367
We can also make OTU networks, as we did above for the Komodo-amphibian combined dataset. In this case, in order to avoid a really complex network that will be hard to see anything, we will filter out those OTUs that are not present in 1% of the total number of samples (in our case 18 samples):
# Filter OTUs present in less than 1% of the samples
!filter_otus_from_otu_table.py -i $PWD/komodo_humans_pets/open_ref_otus/otu_table_HPK_even5367.biom \
-o $PWD/komodo_humans_pets/open_ref_otus/otu_table_HPK_even5367_msc18.biom \
-s 18
# Create the network
!make_otu_network.py -i $PWD/komodo_humans_pets/open_ref_otus/otu_table_HPK_even5367_msc18.biom \
-m $PWD/Humans_Komodo_Mapping_file.txt \
-o $PWD/komodo_humans_pets/otu_networks
Now that we have established that Komodo environments are specific to Komodo dragons, we will then analyze host environment sharing among humans/pets and their homes as well as amphibians and their environments. In this way we can answer the following two questions:
In order to run the SourceTracker Analyses in the Human/Pet-House data, we first need to filter the Komodo data out of the OTU table containing human/pet/home and Komodo/environment samples. We need to do this in 2 steps. First, we remove the samples that belong to the Komodo dragons and their environment. Second, we filter those OTUs that have zero counts after the first filtering step because they're only present in the Komodo dragon samples.
# Create a directory for only humans/pet-house data
!mkdir $PWD/komodo_humans_pets/humans_pets_only
# Filter Komodo dragon and their environment samples
!filter_samples_from_otu_table.py -i $PWD/komodo_humans_pets/open_ref_otus/otu_table_HPK_even5367.biom \
-m $PWD/Humans_Komodo_Mapping_file.txt \
-o $PWD/komodo_humans_pets/humans_pets_only/otu_table_humans_even5367.biom \
-s 'HOST_COMMON_NAME:no_data'
Since we have more samples in this case, we can now apply the recommended filter to exclude those OTUs from our OTU table that are present in less than 1% of the samples, which is 16:
!filter_otus_from_otu_table.py -i $PWD/komodo_humans_pets/humans_pets_only/otu_table_humans_even5367.biom \
-o $PWD/SourceTracker/tables/otu_table_humans_even5367_msc16.biom \
-s 16
Now we can convert our OTU table from biom to the classic tab delimited format and run SourceTracker, including the test of source independence:
# Convert from BIOM to classic txt table
!biom convert -i $PWD/SourceTracker/tables/otu_table_humans_even5367_msc16.biom \
-o $PWD/SourceTracker/tables/otu_table_humans_even5367_msc16.txt \
--table-type "OTU table" --to-tsv
# Run SourceTracker
!R --slave --vanilla --args -i $PWD/SourceTracker/tables/otu_table_humans_even5367_msc16.txt \
-m $PWD/SourceTracker/mappings/Humans_Mapping_file_ST.txt \
-o $PWD/SourceTracker/st_humans/ \
< $SOURCETRACKER_PATH/sourcetracker_for_qiime.r
# Run SourceTracker test of source independence
!R --slave --vanilla --args -i $PWD/SourceTracker/tables/otu_table_humans_even5367_msc16.txt \
-m $PWD/SourceTracker/mappings/Humans_Mapping_file_ST.txt \
-o $PWD/SourceTracker/st_humans_source_independence_test/ \
-s < $SOURCETRACKER_PATH/sourcetracker_for_qiime.r
Here we are going to apply the same table filterings as in the case above: removing all Komodo-related samples and then apply the recommended filter to filter out OTUs that are not present in 1% of the samples (in this case 2)
# Create a directory for only amphibians data
!mkdir $PWD/komodo_amphibians/amphibians_only
# Filter Komodo dragon and their environment samples
!filter_samples_from_otu_table.py -i $PWD/komodo_amphibians/open_ref_otus/otu_table_AK_even5870.biom \
-m $PWD/Amphibians_Komodo_Mapping_file.txt \
-o $PWD/komodo_amphibians/amphibians_only/otu_table_amphibians_even5870.biom \
-s 'host_common_name:*,!Komodo Dragon'
# Filter OTUs not present in 1% (2) of the samples
!filter_otus_from_otu_table.py -i $PWD/komodo_amphibians/amphibians_only/otu_table_amphibians_even5870.biom \
-o $PWD/SourceTracker/tables/otu_table_amphibians_even5870_msc2.biom \
-s 2
We can now convert our BIOM table to classic tab delimited text file and run SourceTracker:
# Convert from BIOM to classic txt table
!biom convert -i $PWD/SourceTracker/tables/otu_table_amphibians_even5870_msc2.biom \
-o $PWD/SourceTracker/tables/otu_table_amphibians_even5870_msc2.txt \
--table-type "OTU table" --to-tsv
# Run SourceTracker
!R --slave --vanilla --args -i $PWD/SourceTracker/tables/otu_table_amphibians_even5870_msc2.txt \
-m $PWD/SourceTracker/mappings/Amphibians_Mapping_file_ST.txt \
-o $PWD/SourceTracker/st_amphibians/ \
< $SOURCETRACKER_PATH/sourcetracker_for_qiime.r
# Run SourceTracker test of source independence
!R --slave --vanilla --args -i $PWD/SourceTracker/tables/otu_table_amphibians_even5870_msc2.txt \
-m $PWD/SourceTracker/mappings/Amphibians_Mapping_file_ST.txt \
-o $PWD/SourceTracker/st_amphibians_source_independence_test/ \
-s < $SOURCETRACKER_PATH/sourcetracker_for_qiime.r
Because we only see host-evironment sharing in water samples (with minimal to no sharing of the host microbiome with soil and sediment samples), we want to determine whether the environment shares its microbes with the host. Therefore, we perform SourceTracker analyses as before, but adjust our mapping file so that amphibian skin is the sink and water, soil, and sediment are the sources. We also perform independence tests.
# Run SourceTracker
!R --slave --vanilla --args -i $PWD/SourceTracker/tables/otu_table_amphibians_even5870_msc2.txt \
-m $PWD/SourceTracker/mappings/Amphibians_Mapping_file_ST_reversed.txt \
-o $PWD/SourceTracker/st_amphibians_reversed/ \
< $SOURCETRACKER_PATH/sourcetracker_for_qiime.r
# Run SourceTracker test of source independence
!R --slave --vanilla --args -i $PWD/SourceTracker/tables/otu_table_amphibians_even5870_msc2.txt \
-m $PWD/SourceTracker/mappings/Amphibians_Mapping_file_ST_reversed.txt \
-o $PWD/SourceTracker/st_amphibians_reversed_independence_test/ \
-s < $SOURCETRACKER_PATH/sourcetracker_for_qiime.r
To further support the idea that host-environment microbe sharing is more similar between captive Komodo dragons and humans/pets and their environments than between captive Komodos and wild amphibians and their evironments, we will next compare the Unifrac distance matrices. First, we'll compare Komodo dragons to humans, as we expect the distances between these vertebrate hosts and their environments to be similarly small. Then, we'll compare Komodos to amphibians, as we expect the distances between amphibians and their environments to be larger than those between Komodos and their environments. We first use the make_distance_boxplots commands within Qiime to get an idea of what the pattern is; we will be sure to pass --save_raw_data, which will produce .txt files that we'll later use to create only the boxplot comparisons of interest. This command also performs two-sample t-tests to determine if distance distributions are significantly different.
# First, we filter the distance matrix so that only the Komodo dragons with paired
# environmental samples are included (Denver and Honolulu):
# Komodo vs Humans/Pets
!filter_distance_matrix.py -i $PWD/komodo_humans_pets/bdiv_even_5367/unweighted_unifrac_dm.txt \
-m $PWD/Humans_Komodo_Mapping_file.txt \
-o $PWD/komodo_humans_pets/bdiv_even_5367/unweighted_unifrac_dm_filt.txt \
-s 'PROVENANCE:no_data,Denver,Honolul Zoo'
!make_distance_boxplots.py -d $PWD/komodo_humans_pets/bdiv_even_5367/unweighted_unifrac_dm_filt.txt \
-m $PWD/Humans_Komodo_Mapping_file.txt2 \
-o $PWD/komodo_humans_pets/bdiv_even_5367/uw_distance_boxplots \
-f "Sample_Type" -n 999 --sort median --save_raw_data
# Komodo vs Amphibians
!filter_distance_matrix.py -i $PWD/komodo_amphibians/bdiv_even_5870/unweighted_unifrac_dm.txt \
-m $PWD/Amphibians_Komodo_Mapping_file.txt \
-o $PWD/komodo_amphibians/bdiv_even_5870/unweighted_unifrac_dm_filt.txt \
-s 'PROVENANCE:no_data,Denver,Honolul Zoo'
!make_distance_boxplots.py -d $PWD/komodo_amphibians/bdiv_even_5870/unweighted_unifrac_dm_filt.txt \
-m $PWD/Amphibians_Komodo_Mapping_file.txt \
-o $PWD/komodo_amphibians/bdiv_even_5870/uw_distance_boxplots \
-f "Env" -n 999 --sort median --save_raw_data
Now we are ready to create boxplots containing only the comparisons of interest, as the Qiime-produced boxplots show more comparisons than we are interested in. In order to do that, we are going to perform some custom scripting. We will start by defining the function parse_distances
, which will parse the raw data files that we saved from the make_distance_boxplots.py
QIIME command:
import numpy as np
import matplotlib.pyplot as plt
def parse_distances(fp):
# Parse the input file
data = {}
with open(fp, "U") as f:
for line in f:
line = line.strip()
values = line.split()
data[values[0]] = np.asarray(values[1:], dtype=np.float64)
return data
Now we can use this function to parse the distances:
hk_dists = parse_distances("komodo_humans_pets/bdiv_even_5367/uw_distance_boxplots/Sample_Type_Distances.txt")
ak_dists = parse_distances("komodo_amphibians/bdiv_even_5870/uw_distance_boxplots/Env_Distances.txt")
Now we can take a look at the Komodo vs Human/Pet distances:
plt.boxplot([hk_dists['Human_Environment_vs._Human-Pet'], hk_dists['Komodo_Environment_vs._Komodo']])
plt.savefig("komodo_humans_pets/bdiv_even_5367/uw_distance_boxplots/boxplots.pdf", format="pdf")
plt.show()
And to the Komodo vs Amphibian:
plt.boxplot([ak_dists['Amphibian_Environment_vs._Amphibian_Skin'], ak_dists['Komodo_Environment_vs._Komodo']])
plt.savefig("komodo_amphibians/bdiv_even_5870/uw_distance_boxplots/boxplots.pdf", format="pdf")
plt.show()