This tutorial assumes you have already read the general use pyRAD tutorial.
The double digest library preparation method was developed and described by Peterson et al. 2012. Here I will be talking about __ paired-end ddRAD __ data and describe my recommendations for how to setup a params file to analyze them in pyRAD.
A ddRAD library is prepared by cutting genomic DNA with two different restriction enzymes and selecting the intervening fragments that are within a certain size window. These will contain overhangs from the respective cut sites on either side. One side will have a barcode+illumina adapter ligated, and the other end will have only the reverse Illumina adapter ligated. The first reads may come in one or multiple files with "_R1_" in the name, and the second reads are in a separate file/s with "_R2_". Second read files will contain the corresponding pair from the first read files in the exact same order.
from IPython.display import SVG, Image
SVG("https://dl.dropboxusercontent.com/u/2538935/PYRAD_TUTORIALS/figures/diag_ddradpair.svg")
Your data will likely come to you non-demultiplexed (meaning not sorted by which individual the reads belong to). You can demultiplex the reads according to their barcodes using pyRAD or separate software. For simplicity, I suggest using pyRAD, but for data with combinatorial barcodes you will need to use a more advanced de-multiplexing software.
A number of programs are available to check for overlap of paired end reads, and you can run your data through these programs before being input to pyRAD. However, as of pyRAD v.2.0 I now include a robust built-in overlap detection method utilizing USEARCH. This can be used to separate merged from non-merged reads to be analyzed separately in pyRAD. This method is illustrated in the Advanced tutorial.
SVG("https://dl.dropboxusercontent.com/u/2538935/PYRAD_TUTORIALS/figures/fig_tree4sims.svg")
For this tutorial I simulated paired-end ddRAD loci on a 12 taxon species tree, shown above. You can download the data using the script below and assemble these data by following along with all of the instructions below:
%%bash
wget http://goo.gl/cGUuWd 2&> /dev/null
gunzip cGUuWd
rm cGUuWd
You can simply download the data, but it is also worth describing how the data were generated. I simulated ddRAD-like data using the egglib coalescent simulator with the following options.
indels = 0.005 ## low rate of indels (prob. mutation is indel)
dropout = 0 ## if 1, mutations to restriction site can cause locus dropout.
nloci = 1000 ## Total Nloci simulated (less if locus dropout or merged reads)
ninds = 1 ## sampled individuals per tip taxon
shortfrag = 300 ## smallest size of digested fragments
longfrag = 600 ## largest size of digested fragments
Because I simulate sequences at 20X coverage, and the reads are read from both ends (paired-end), the total number of reads is:
reads = 12*ninds*nloci*20*2
print "%s sequenced reads" % reads
480000 sequenced reads
Here I execute the simulation script (available here) to simulate the data, and write it to a file called simpairddRADs.fastq.gz. If you simply downloaded the data you do not need to do this step, I show it only for those interested.
## you will need the Python package egglib installed to run the simRRLs script.
! python ../simscripts/simRRLs.py 0.005 0 1000 1 300,600 pairddrad simpairddRADs
simulating pairddrad data simulating 1000 loci at 20X coverage across 12 tip taxa with 1 samples per taxon indels arise at frequency of 0.005000 per mutation mutations in restriction site = False sequencing error rate = 0.0005 theta=4Nu= 0.0014 creating new barcode map .
We simulated 1000 loci per sample. Because we simulated no sequence overlap (min frag size = 300), and no mutations to restriction sites we should recover all 1000 loci shared across our entire data set.
We first create an empty params.txt input file for the pyRAD analysis.
The following command will create a template which we will fill in with all relevant parameter settings for the analysis.
! ~/pyrad_v.2.0/pyRAD -n
------------------------------------------------------------ pyRAD : RADseq for phylogenetics & introgression analyses ------------------------------------------------------------ new params.txt file created
Take a look at the default options. Each line designates a parameter, and contains a "##" symbol after which comments can be added, and which includes a description of the parameter. The affected step for each parameter is shown in parentheses. The first 14 parameters are required. Numbers 15-35 are optional.
! cat params.txt
==== parameter inputs for pyRAD version 2.00 ============================ affected step == ./ ## 1. Working directory (all) ./*.fastq.gz ## 2. Loc. of non-demultiplexed files (if not line 18) (s1) ./*.barcodes ## 3. Loc. of barcode file (if not line 18) (s1) usearch7.0.1090_i86linux32 ## 4. command (or path) to call usearch v.7 (s3,s6) muscle ## 5. command (or path) to call muscle (s3,s7) TGCAG ## 6. restriction overhang (e.g., C|TGCAG -> TGCAG) (s1,s2) 2 ## 7. N processors to use in parallel (all) 6 ## 8. Mindepth: minimum dpth of coverage for a cluster (s4,s5) 4 ## 9. NQual: max # sites with qual < minQ (line 20) (s2) .90 ## 10. Wclust: clustering threshold as a decimal (s3,s6) rad ## 11. Datatype: rad,gbs,ddrad,pairgbs,pairddrad,merge (all) 4 ## 12. MinCov: min samples in a final locus (s7) 3 ## 13. MaxSH: max inds with shared hetero site (s7) c90d6m4p3 ## 14. prefix name for final output (no spaces) (s7) ==== optional params below this line =================================== affected step == ## 15.opt.: select subset (list or use prefix*) (s2-s7) ## 16.opt.: add-on (outgroup) taxa (list or prefix*) (s6,s7) ## 17.opt.: exclude taxa (list or prefix*) (s7) ## 18.opt.: Loc. of de-multiplexed data (s2) ## 19.opt.: maxM: N mismatches in barcodes (def. 1) (s1) ## 20.opt.: MinQ: min Quality score after offset (def. 20) (s2) ## 21.opt.: Filter: 0=NQual 1=NQual+adapters. 2=1+strict (s2) ## 22.opt.: a priori E,H (def. 0.001,0.01, if not estimated) (s4) ## 23.opt.: maxN: Ns in a consensus seq (def. 4) (s5) ## 24.opt.: maxH: hetero. sites in consensus seq (def. 4) (s5) ## 25.opt.: diploid: limit 2 alleles in a consensus (def. 1) (s5) ## 26.opt.: maxSNPs: step 7. (def=100). Paired (def=100,100) (s7) ## 27.opt.: maxIndels: within-clust,across-clust (def. 3,99) (s3,s7) ## 28.opt.: random number seed (def. 112233) (s3,s6) ## 29.opt.: trim overhang left,right on final loci, def(0,0) (s7) ## 30.opt.: add output formats: a,n,s,u (see documentation) (s7) ## 31.opt.: call maj. consens if dpth < stat. limit (def. 0) (s5) ## 32.opt.: merge/remove paired overlap (def 1), 0=nocheck (s2) ## 33.opt.: keep trimmed reads (def=0). Enter min length. (s2) ## 34.opt.: max stack size (int), def= max(500,mean+2*SD) (s3) ## 35.opt.: minDerep: exclude dereps with <= N copies, def=0 (s3) ==== list hierachical cluster groups below this line =====================================
I will use the script below to substitute new values, but you should simply use any text editor to make changes. For this analysis I made the following changes from the defaults:
5. set the two restriction enzymes used to generate the ddRAD data
10. lowered clustering threshold to .85
11. set datatype to pairddrad
14. changed the output name prefix
19. mismatches for demulitiplexing set to 0, exact match.
24. Raised maxH. Lower is better for filtering paralogs.
30. added additional output formats (e.g., nexus and SNPs)
%%bash
sed -i '/## 6. /c\TGCAG,AATT ## 6. cutsites... ' ./params.txt
sed -i '/## 10. /c\.85 ## 10. lowered clust thresh' ./params.txt
sed -i '/## 11. /c\pairddrad ## 11. datatype... ' ./params.txt
sed -i '/## 14. /c\c85d6m4p3 ## 13. prefix name ' ./params.txt
sed -i '/## 19./c\0 ## 19. errbarcode... ' ./params.txt
sed -i '/## 24./c\10 ## 24. maxH... ' ./params.txt
sed -i '/## 30./c\a,s,u,n ## 30. outformats... ' ./params.txt
Let's take a look at the edited params.txt file
! cat params.txt
==== parameter inputs for pyRAD version 2.00 ============================ affected step == ./ ## 1. Working directory (all) ./*.fastq.gz ## 2. Loc. of non-demultiplexed files (if not line 18) (s1) ./*.barcodes ## 3. Loc. of barcode file (if not line 18) (s1) usearch7.0.1090_i86linux32 ## 4. command (or path) to call usearch v.7 (s3,s6) muscle ## 5. command (or path) to call muscle (s3,s7) TGCAG,AATT ## 6. cutsites... 2 ## 7. N processors to use in parallel (all) 6 ## 8. Mindepth: minimum dpth of coverage for a cluster (s4,s5) 4 ## 9. NQual: max # sites with qual < minQ (line 20) (s2) .85 ## 10. lowered clust thresh pairddrad ## 11. datatype... 4 ## 12. MinCov: min samples in a final locus (s7) 3 ## 13. MaxSH: max inds with shared hetero site (s7) c85d6m5p3 ## 13. prefix name ==== optional params below this line =================================== affected step == ## 15.opt.: select subset (list or use prefix*) (s2-s7) ## 16.opt.: add-on (outgroup) taxa (list or prefix*) (s6,s7) ## 17.opt.: exclude taxa (list or prefix*) (s7) ## 18.opt.: Loc. of de-multiplexed data (s2) 0 ## 19. errbarcode... ## 20.opt.: MinQ: min Quality score after offset (def. 20) (s2) ## 21.opt.: Filter: 0=NQual 1=NQual+adapters. 2=1+strict (s2) ## 22.opt.: a priori E,H (def. 0.001,0.01, if not estimated) (s4) ## 23.opt.: maxN: Ns in a consensus seq (def. 4) (s5) 10 ## 24. maxH... ## 25.opt.: diploid: limit 2 alleles in a consensus (def. 1) (s5) ## 26.opt.: maxSNPs: step 7. (def=100). Paired (def=100,100) (s7) ## 27.opt.: maxIndels: within-clust,across-clust (def. 3,99) (s3,s7) ## 28.opt.: random number seed (def. 112233) (s3,s6) ## 29.opt.: trim overhang left,right on final loci, def(0,0) (s7) a,s,u,n ## 30. outformats... ## 31.opt.: call maj. consens if dpth < stat. limit (def. 0) (s5) ## 32.opt.: merge/remove paired overlap (def 1), 0=nocheck (s2) ## 33.opt.: keep trimmed reads (def=0). Enter min length. (s2) ## 34.opt.: max stack size (int), def= max(500,mean+2*SD) (s3) ## 35.opt.: minDerep: exclude dereps with <= N copies, def=0 (s3) ==== list hierachical cluster groups below this line =====================================
Four examples of acceptable input file name formats for paired-end data:
1. xxxx_R1_001.fastq xxxx_R2_001.fastq
2. xxxx_R1_001.fastq.gz xxxx_R2_001.fastq.gz
3. xxxx_R1_100.fq.gz xxxx_R2_100.fq.gz
4. xxxx_R1_.fq xxxx_R2_.fq
The file ending can be .fastq, .fq, or .gz.
There should be a unique name or number shared by each pair and the characters _R1_ and _R2_.
For every file name with _R1_ there should be a corresponding _R2_ file.
If your data are already demultiplexed skip step 1 and see step 2 below.
! ~/pyrad_v.2.0/pyRAD -p params.txt -s 1
------------------------------------------------------------ pyRAD : RADseq for phylogenetics & introgression analyses ------------------------------------------------------------ step 1: sorting reads by barcode .
Now we can look at the stats output for this below:
! cat stats/s1.sorting.txt
file Nreads cut_found bar_matched simpairddRADs_001.fastq.gz 240000 240000 240000 sample true_bar obs_bars N_obs 1C0 AGATAA AGATAA 20000 2F0 AGTGAT AGTGAT 20000 1B0 ATGTGA ATGTGA 20000 3I0 ATGTTG ATGTTG 20000 3K0 ATTAGG ATTAGG 20000 2E0 GAGTTA GAGTTA 20000 1D0 GGATAT GGATAT 20000 3J0 TAGGAG TAGGAG 20000 2H0 TATAAT TATAAT 20000 3L0 TGAAGG TGAAGG 20000 1A0 TGATGG TGATGG 20000 2G0 TTTTTA TTTTTA 20000 nomatch _ 0
The de-multiplexed reads are written to a new file for each individual in a new directory created within your working directory called fastq/
The reads were sorted into a separate file for the first (R1) and second (R2) reads for each individual.
If your data were previously de-multiplexed you need the following things before step 2:
your sorted file names should be formatted similar to above, but with sample names substituted for 1A0, 1A1, etc.
the files can be zipped (.gz) or not (.fq or .fastq).
the barcode should be removed (not on left side of reads)
the restriction site should not be removed, but if it is, enter a '@' symbol before the location of your sorted data.
Enter on line 18 of the params file the location of your sorted data.
Next we apply the quality filtering.
We left the filter setting (line 21) at its default of 0, meaning it will apply only a filter based on quality scores. Low quality sites are converted to Ns, and any locus with more than X number of Ns is discarded, where X is the number set on line 9 of the params file.
We could also check for overlap of paired-end reads and filter for the presence of Illumina adapters in this step, however we assume here that the data were properly size selected and do not contain any of these problems. To see those filters applied see the Advanced tutorials.
! ~/pyrad_v.2.0/pyRAD -p params.txt -s 2
------------------------------------------------------------ pyRAD : RADseq for phylogenetics & introgression analyses ------------------------------------------------------------ step 2: quality filtering ............
Statistics for the number of reads that passed filtering can be found in the stats/ directory.
! cat stats/s2.rawedit.txt
sample Nreads exclude trimmed passed 2H0 20000 0 0 20000 3I0 20000 0 0 20000 3L0 20000 0 0 20000 3J0 20000 0 0 20000 1C0 20000 0 0 20000 3K0 20000 0 0 20000 1D0 20000 0 0 20000 1B0 20000 0 0 20000 2G0 20000 0 0 20000 1A0 20000 0 0 20000 2F0 20000 0 0 20000 2E0 20000 0 0 20000 Nreads = total number of reads for a sample exclude = reads that were excluded trimmed = reads that had adapter trimmed but were kept passed = total kept reads note: you can set minimum length of kept trimmed reads on line 35 the params file kept trimmed (fragment) reads written to /home/deren/Dropbox/Public/PYRAD_TUTORIALS/tutorial_pairddRAD/mergedreads/
The filtered data files are converted to fasta format and written to a directory called edits/. I show this below:
! ls -l edits/
total 50064 -rw-rw-r-- 1 deren deren 4268890 Mar 9 18:57 1A0.edit -rw-rw-r-- 1 deren deren 4268890 Mar 9 18:57 1B0.edit -rw-rw-r-- 1 deren deren 4268890 Mar 9 18:57 1C0.edit -rw-rw-r-- 1 deren deren 4268890 Mar 9 18:57 1D0.edit -rw-rw-r-- 1 deren deren 4268890 Mar 9 18:57 2E0.edit -rw-rw-r-- 1 deren deren 4268890 Mar 9 18:57 2F0.edit -rw-rw-r-- 1 deren deren 4268890 Mar 9 18:57 2G0.edit -rw-rw-r-- 1 deren deren 4268890 Mar 9 18:57 2H0.edit -rw-rw-r-- 1 deren deren 4268890 Mar 9 18:57 3I0.edit -rw-rw-r-- 1 deren deren 4268890 Mar 9 18:57 3J0.edit -rw-rw-r-- 1 deren deren 4268890 Mar 9 18:57 3K0.edit -rw-rw-r-- 1 deren deren 4268890 Mar 9 18:57 3L0.edit
Here you can see that the first and second reads have been concatenated into a single read separated by 'xxxx' in these files, and the read quality scores have now been discarded.
! head -n 8 edits/1A0.edit
>1A0_0_pair TGCAGATCTAAGTCTTTTAGGTTACACGAACGAATGAGCGCCCCCCTGTGTGTCGTTCGTTCACTGTCCGTCTTGTACCCTACCTCGAACCTAxxxxGAACCCCATTACCACCTGCGGGAAATGGCCGGAGGTTACATTTCGAAAGTGATTCTTCTTGGCACATGAATGACAAAGGTTCTCCTTCGCTGCTGCAATT >1A0_1_pair TGCAGATCTAAGTCTTTTAGGTTACACGAACGAATGAGCGCCCCCCTGTGTGTCGTTCGTTCACTGTCCCTCTTGTACCCTACCTCGAACCTAxxxxGAACCCCATTACCACCTGCGGGAAATGGCCGGAGGTTACATTTCGAAAGTGATTCTTCTTGGCACATGAATGACAAAGGTTCTCCTTCGCTGCTGCAATT >1A0_2_pair TGCAGATCTAAGTCTTTTAGGTTACACGAACGAATGAGCGCCCCCCTGTGTGTCGTTCGTTCACTGTCCGTCTTGTACCCTACCTCGAACCTAxxxxGAACCCCATTACCACCTGCGGGAAATGGCCGGAGGTTACATTTCGAAAGTGATTCTTCTTGGCACATGAATGACAAAGGTTCTCCTTCGCTGCTGCAATT >1A0_3_pair TGCAGATCTAAGTCTTTTAGGTTACACGAACGAATGAGCGCCCCCCTGTGTGTCGTTCGTTCACTGTCCGTCTTGTACCCTACCTCGAACCTAxxxxGAACCCCATTACCACCTGCGGGAAATGGCCGGAGGTTACATTTCGAAAGTGATTCTTCTTGGCACATGAATGACAAAGGTTCTCCTTCGCTGCTGCAATT
From here you have two options for how to proceed. The first is to treat these concatenated reads like single end ddRAD sequences and cluster them as concatenated reads. To do this change the datatype option in the params folder from "pairddrad" to "ddrad", and proceed with steps 3-7. It will detect and remove the 'xxxx' seperator and cluster the long reads.
The more complex option, and the one I prefer, is to cluster the reads separately, which I call the "split clustering" method. This will do the following:
This is illustrated graphically below:
SVG("https://dl.dropboxusercontent.com/u/2538935/PYRAD_TUTORIALS/figures/pairclustermethod.svg")
Image("https://dl.dropboxusercontent.com/u/2538935/PYRAD_TUTORIALS/figures/goodpairs.png")
Image("https://dl.dropboxusercontent.com/u/2538935/PYRAD_TUTORIALS/figures/badpairs.png")
The benefit of this method over clustering concatenated first+second reads is that second reads can be retained that are either more divergent than the clustering threshold, or which contain many low quality base calls (Ns), thus potentially yielding more SNPs. Also, for very large data sets it is much faster to cluster shorter length reads. The drawbacks are that it discards sequences for which the first and second reads do not appear to be from the same DNA fragment, whereas the concatenated clustering method might cluster them separately.
The split clustering method filters the alignment of second reads based on the number of indels in the alignment. This number can be set on line 27 of the params file. The default values are 3,6,99,99. Meaning for within-sample clusters we allow 3 indels in first read clusters and 6 in second read clusters. For across-sample clusters we allow 99 in first read clusters and 99 in second read clusters. If only two numbers are set (e.g., 3,10) this is equivalent to 3,3,10,10.
! ~/pyrad_v.2.0/pyRAD -p params.txt -s 3
------------------------------------------------------------ pyRAD : RADseq for phylogenetics & introgression analyses ------------------------------------------------------------ de-replicating files for clustering... step 3: within-sample clustering of 12 samples at '.85' similarity using up to 2 processors 1A0.edit finished, 1000loci 1D0.edit finished, 1000loci 2E0.edit finished, 1000loci 2G0.edit finished, 1000loci 2H0.edit finished, 1000loci 3I0.edit finished, 1000loci 2F0.edit finished, 1000loci 3L0.edit finished, 1000loci 1B0.edit finished, 1000loci 3K0.edit finished, 1000loci 3J0.edit finished, 1000loci 1C0.edit finished, 1000loci
! cat stats/s3.clusters.txt
taxa total dpt.me dpt.sd d>5.tot d>5.me d>5.sd badpairs 1A0 1000 20.0 0.0 1000 20.0 0.0 0 1D0 1000 20.0 0.0 1000 20.0 0.0 0 2E0 1000 20.0 0.0 1000 20.0 0.0 0 2G0 1000 20.0 0.0 1000 20.0 0.0 0 2H0 1000 20.0 0.0 1000 20.0 0.0 0 3I0 1000 20.0 0.0 1000 20.0 0.0 0 2F0 1000 20.0 0.0 1000 20.0 0.0 0 3L0 1000 20.0 0.0 1000 20.0 0.0 0 1B0 1000 20.0 0.0 1000 20.0 0.0 0 3K0 1000 20.0 0.0 1000 20.0 0.0 0 3J0 1000 20.0 0.0 1000 20.0 0.0 0 1C0 1000 20.0 0.0 1000 20.0 0.0 0 ## total = total number of clusters, including singletons ## dpt.me = mean depth of clusters ## dpt.sd = standard deviation of cluster depth ## >N.tot = number of clusters with depth greater than N ## >N.me = mean depth of clusters with depth greater than N ## >N.sd = standard deviation of cluster depth for clusters with depth greater than N ## badpairs = mismatched 1st & 2nd reads (only for paired ddRAD data)
We next make consensus base calls for each cluster within each individual. First we estimate the error rate and heterozygosity within each sample:
! ~/pyrad_v.2.0/pyRAD -p params.txt -s 4
------------------------------------------------------------ pyRAD : RADseq for phylogenetics & introgression analyses ------------------------------------------------------------ step 4: estimating error rate and heterozygosity ............
Calling consensus sequences applies a number of filters, as listed in the params file, to remove potential paralogs or highly repetitive markers from the data set. For paired-end data the maxN filter only applies to first reads, since we don't mind allowing low quality base calls in second reads. The maxH filter applies to both reads together. The diploid filter (only allow 2 haplotypes in a consensus sequence) also applies to the two reads together.
! ~/pyrad_v.2.0/pyRAD -p params.txt -s 5
------------------------------------------------------------ pyRAD : RADseq for phylogenetics & introgression analyses ------------------------------------------------------------ step 5: created consensus seqs for 12 samples, using E=0.00051 H=0.00137 ............
! cat stats/s5.consens.txt
taxon nloci f1loci f2loci nsites npoly poly 3I0.clustS.gz 1000 1000 1000 183005 246 0.0013442 2H0.clustS.gz 1000 1000 1000 183002 260 0.0014207 1C0.clustS.gz 1000 1000 1000 183010 265 0.001448 1D0.clustS.gz 1000 1000 1000 183000 256 0.0013989 3L0.clustS.gz 1000 1000 1000 183007 261 0.0014262 2G0.clustS.gz 1000 1000 1000 183002 260 0.0014207 1B0.clustS.gz 1000 1000 1000 183004 265 0.0014481 3J0.clustS.gz 1000 1000 1000 183005 227 0.0012404 1A0.clustS.gz 1000 1000 1000 183010 242 0.0013223 3K0.clustS.gz 1000 1000 1000 183003 243 0.0013278 2E0.clustS.gz 1000 1000 1000 183005 248 0.0013552 2F0.clustS.gz 1000 1000 1000 183002 259 0.0014153 ## nloci = number of loci ## f1loci = number of loci with >N depth coverage ## f2loci = number of loci with >N depth and passed paralog filter ## nsites = number of sites across f loci ## npoly = number of polymorphic sites in nsites ## poly = frequency of polymorphic sites
This step clusters consensus sequences across samples. It can take a very long time for very large data sets. If run normally it will print its progress to the screen so you can judge how long it might take. Try this first. If it seems too slow (many days or weeks) you can cancel the job and try running the two-step clustering method instead (see the Advanced tutorials). Here I show the results for a data set that runs very quickly.
! ~/pyrad_v.2.0/pyRAD -p params.txt -s 6
------------------------------------------------------------ pyRAD : RADseq for phylogenetics & introgression analyses ------------------------------------------------------------ step 6: clustering across 12 samples at '.85' similarity usearch v7.0.1090_i86linux32, 4.0Gb RAM (7.9Gb total), 4 cores (C) Copyright 2013 Robert C. Edgar, all rights reserved. http://drive5.com/usearch Licensed to: daeaton.chicago@gmail.com Seqs 12000 (12.0k) Clusters 1000 Max size 12 Avg size 12.0 Min size 12 Singletons 0, 0.0% of seqs, 0.0% of clusters Max mem 28Mb Time 3.00s Throughput 4000.0 seqs/sec.
Alignment of loci, statistics, and output of various formatted data files.
! ~/pyrad_v.2.0/pyRAD -p params.txt -s 7
------------------------------------------------------------ pyRAD : RADseq for phylogenetics & introgression analyses ------------------------------------------------------------ ingroup 1A0,1B0,1C0,1D0,2E0,2F0,2G0,2H0,3I0,3J0,3K0,3L0 addon exclude ... final stats written to: /home/deren/Dropbox/Public/PYRAD_TUTORIALS/tutorial_pairddRAD/stats/c85d6m5p3.stats output files written to: /home/deren/Dropbox/Public/PYRAD_TUTORIALS/tutorial_pairddRAD/outfiles/ directory
Stats for this run. In this case 1000 loci were shared across all 12 samples.
! cat stats/c85d6m4p3.stats
1000 ## loci with > minsp containing data 1000 ## loci with > minsp containing data & paralogs removed 1000 ## loci with > minsp containing data & paralogs removed & final filtering ## number of loci recovered in final data set for each taxon. taxon nloci 1A0 1000 1B0 1000 1C0 1000 1D0 1000 2E0 1000 2F0 1000 2G0 1000 2H0 1000 3I0 1000 3J0 1000 3K0 1000 3L0 1000 ## nloci = number of loci with data for exactly ntaxa ## ntotal = number of loci for which at least ntaxa have data ntaxa nloci saved ntotal 1 - 2 - - 3 - - 4 0 * 1000 5 0 * 1000 6 0 * 1000 7 0 * 1000 8 0 * 1000 9 0 * 1000 10 0 * 1000 11 0 * 1000 12 1000 * 1000 ## var = number of loci containing n variable sites. ## pis = number of loci containing n parsimony informative sites. n var PIS 0 1 97 1 237 210 2 325 270 3 303 197 4 269 122 5 235 63 6 197 30 7 131 8 8 87 2 9 54 1 10 30 0 11 17 0 12 13 0 13 3 0 14 0 0 15 0 0 16 1 0 total var= 8026 total pis= 2405
Examples of the available output formats can be seen in the general use tutorial.
...