We will explore how to handle RAD data - trying to use IPlant - Stacks
Some clam data
cd /Volumes/web/whale/
/Volumes/web/whale
!gunzip clam_RAD_s_1_sequence.txt.txt.gz
^C
!head -100 clam_RAD_s_1_sequence.txt.txt
@HWI-ST700693_0075:1:1101:1225:2121#0/1 GNCTACATGCTGTGTACTCTTGTGGCGAAACTGCTG +HWI-ST700693_0075:1:1101:1225:2121#0/1 ^BV^\bcbbceeeeeeeeeeeeeeeeeeeedeeee_ @HWI-ST700693_0075:1:1101:1224:2148#0/1 AGTAGGTGCAGGCACAAATTGATAGTATTTACCAGT +HWI-ST700693_0075:1:1101:1224:2148#0/1 dffafdffcfedd_eeddeedaedeefefffbdffe @HWI-ST700693_0075:1:1101:1199:2158#0/1 AGGTGTTGCAGGACTGGATACTTAAAATATATTGCA +HWI-ST700693_0075:1:1101:1199:2158#0/1 dddcdeffeffedffeeded_^eb\ddd[d_]`KY_ @HWI-ST700693_0075:1:1101:1114:2184#0/1 AGTTAATGCAGGTTACAAATATAACAACTGACAATG +HWI-ST700693_0075:1:1101:1114:2184#0/1 ggggggggfgggggggggggggggggggggfggggg @HWI-ST700693_0075:1:1101:1218:2203#0/1 AGTTAATGCAGGACATGTTTCAAATATAGGGTCTAA +HWI-ST700693_0075:1:1101:1218:2203#0/1 gggdgggggggggggggggggggggggggggggggf @HWI-ST700693_0075:1:1101:1191:2217#0/1 GCCTTTTGTACATAACTTTTATAAATCATTTTGTAA +HWI-ST700693_0075:1:1101:1191:2217#0/1 ff\ffgggggggggggggggggggeggggggggggg @HWI-ST700693_0075:1:1101:1244:2228#0/1 AGGGTCTGCAGGGCAGCGTTGACGGATGGGAGATAC +HWI-ST700693_0075:1:1101:1244:2228#0/1 ggggggggfgggggggggggfgfgggggggegdgfg @HWI-ST700693_0075:1:1101:1210:2241#0/1 ACTCTTTGCAGGGCGAGTGGTGTGAAAAAGAATTTT +HWI-ST700693_0075:1:1101:1210:2241#0/1 gggggfggegggggggggfgddggggggggggfegf @HWI-ST700693_0075:1:1101:1160:2247#0/1 AGGGTCTGCAGGGCGAGTGGTGTGAAAAAGAATTTT +HWI-ST700693_0075:1:1101:1160:2247#0/1 ggggggggfggggggggggggggggggggggggggg @HWI-ST700693_0075:1:1101:1404:2116#0/1 ANGACCTCGGCCAAGTTCGATAACTAGCCAAATCGG +HWI-ST700693_0075:1:1101:1404:2116#0/1 ^B]\^cccccgggggggggggggfggggggfggggg @HWI-ST700693_0075:1:1101:1467:2124#0/1 ANGGTCTGCAGGACATGAGCATTTTTCCCATAGAAA +HWI-ST700693_0075:1:1101:1467:2124#0/1 bBbbbfdfdfgggggggggggggggggggggggggg @HWI-ST700693_0075:1:1101:1267:2139#0/1 TGCCGCCACATGCAAAGGAATTTCCCTAAATAGTCA +HWI-ST700693_0075:1:1101:1267:2139#0/1 gggggefccfggggggggggggggggggggfgfgfd @HWI-ST700693_0075:1:1101:1393:2140#0/1 AGCCATTGCAGGGATGTGCAGGCTGATCTTGGTCTG +HWI-ST700693_0075:1:1101:1393:2140#0/1 ggfgggggffggggdfgggffgggf_cdaccgeeef @HWI-ST700693_0075:1:1101:1430:2141#0/1 TGCTGCTAGGATGGTCCTAGATGCCCAAGCACCAAT +HWI-ST700693_0075:1:1101:1430:2141#0/1 ff_fffffefaeefcfffffffffffffffffffff @HWI-ST700693_0075:1:1101:1304:2180#0/1 CGCCTCGACGCAGCTACTATAGAAATCGCATTACAA +HWI-ST700693_0075:1:1101:1304:2180#0/1 ggaggggggagegggggggggeggggggdggggggf @HWI-ST700693_0075:1:1101:1497:2181#0/1 CATGCAGGGTTCACGTTACAGGTCACCGATGCCCAG +HWI-ST700693_0075:1:1101:1497:2181#0/1 gZggggggggggggggggggggeggggggggggggg @HWI-ST700693_0075:1:1101:1279:2186#0/1 AACTTGAGCCAGAACCTGATATAAACGTGTGTATTG +HWI-ST700693_0075:1:1101:1279:2186#0/1 cNYcadcdcccdcddddddddddddcdddddddddd @HWI-ST700693_0075:1:1101:1476:2202#0/1 ACTCTTTGCAGGGCGATGAGATAAAAGGCAGTTTCT +HWI-ST700693_0075:1:1101:1476:2202#0/1 ggegdgggcgggggggggggggdggggggggggggf @HWI-ST700693_0075:1:1101:1479:2223#0/1 GCTCTTTGCAGGGCGATGAGATAAAAGGCAGTTTCT +HWI-ST700693_0075:1:1101:1479:2223#0/1 fgggggggfggggggdggegcggdf_ggbbggggaf @HWI-ST700693_0075:1:1101:1336:2239#0/1 AGGTGTTGCAGGAAGGTCGTTAATTCAATTTTAGTT +HWI-ST700693_0075:1:1101:1336:2239#0/1 ggggggggegdeggefbfffffffffgegggfbgge @HWI-ST700693_0075:1:1101:1272:2241#0/1 AGTTAATGCAGGCGTCAGACTTCATAGGATGGTCGT +HWI-ST700693_0075:1:1101:1272:2241#0/1 ggggggggfgggggggggggggggggggegggcegb @HWI-ST700693_0075:1:1101:1723:2114#0/1 ANCGCATGCAGGTCACCAACTGATCTCTTTCTCTTG +HWI-ST700693_0075:1:1101:1723:2114#0/1 _B___dddddggggggggggggggggggggfggggg @HWI-ST700693_0075:1:1101:1515:2128#0/1 GCCCAAAACCCTTCCACCATATGACCCAGTTTCAAA +HWI-ST700693_0075:1:1101:1515:2128#0/1 gggggggg`ggggggggggggfgggggggggggggg @HWI-ST700693_0075:1:1101:1724:2139#0/1 AGCCATTGCAGGGTTTCATTTAAACGCAATGTCAGT +HWI-ST700693_0075:1:1101:1724:2139#0/1 ggggggggeggggggggggggggggeggeggggefg @HWI-ST700693_0075:1:1101:1621:2144#0/1 GGCTACAAGAATGAAAACTTTGTCCGCTGCCATTTC +HWI-ST700693_0075:1:1101:1621:2144#0/1 eeXecdeecccfeefffffffffffffffffffffe
Some text for perspective - methods:
Restriction site associated DNA (RAD) marker library preparation
Restriction site associated DNA (RAD) marker libraries were constructed to identify diagnostic markers among cohorts. Genomic DNA was isolated separately from the gill tissue BARN (n=4) and MASH (n=4) clams using DNAzol (Molecular Research Center) as per manufacturers recommendations. Libraries were prepared as described by Miller et al 2007. Briefly samples (n=8) were digested Sbf-1 (New England Biolabs), then each hybridized with a unique barcode, and RAD adapters (PI and P2) were ligated on DNA fragments. Size selection of DNA fragments was achieved by running PCR on a 1% EZ gel (Invitrogen) with E-gel 1 kb Plus DNA ladder followed by purification using the MiniElute gel purification protocol. Subsequent library construction and sequencing was carried out by the University of Washington High Throughput Genomics Unit (HTGU) using the Illumina HiSeq2000 system.
Restriction site associated DNA (RAD) marker library analysis
Initial sequence read processing of RAD tags was carried out as previously described by Miller et (2012). Quality scores were used to remove raw sequencing reads with a probability of sequencing error greater than 10%. Using custom perl scripts (Miller et al. 2012) we then grouped raw sequences reads by individual and removed barcodes and restriction sites for a total sequence read length of 24 base pairs.
Two types SNP analyses were performed including population specific SNP variation characterization and the identification of SNPs that could potentially distinguish populations. In order to examine population specific SNP variation quality trimmed reads from each cohort (BARN and MASH) were assembled independently using the following parameters; limit = 8, and mismatch cost = 2 (Genomics Workbench 4.0; CLC Bio). SNP detection was carried out using the following parameters: maximum gap and mismatch count = 2, minimum average quality = 15, minimum central quality = 20, minimum coverage = 10, minimum variant frequency = 35% (Genomics Workbench 4.0; CLC Bio).
For the second form of SNP analysis Novoindex and Novoalign (Novocraft Technologies) were used to aseemble RAD-tags to identify RAD-tags within a cohort that were identical (lacked any polymorphisms. These “isotigs” from each cohort were then compared by assembling reads and carrying out SNP detection as described above. Any SNP that was identified indicated that the locus is fixed for the individuals in each cohort examined.
results
RAD After quality trimming there 14.5 million reads remained from BARN clams (n=4) and 8.4 million reads from MASH clams (n=4), with a read length of 24bp. All but one individuals (MASH) had between 2.4 and 4.7 millions reads. To assess overall genetic diversity between the populations, reads were characterized in silico for each cohort. A de novo assembly of BARN reads resulted in 4,491 contigs containing 543 putative SNPs Additional File 1. This corresponds to approximately 5.0 SNPs / 1kb. Assembly of the MASH reads resulted in 9,824 contigs containing 1372 SNPs Additional File 2. For this MASH library this corresponds to approximately 5.8 SNPs / 1kb.
In addition to characterizing putative SNP within cohorts, we also set out to identify fixed loci that could be used to distinguish between cohorts. The number of identical contigs for each library was 8606 and 4845 for MASH and BARN, respectively. Comparing these two sets of isotigs revealed 2090 analogous sequences across cohorts with 1945 identical matches. The remaining 145 putative SNPs thus provide diagnostic markers to distinguish strains based on the samples used here. Of the 145 corresponding contigs only one mapped back to the transcriptome and this transcript was not annotated.
via Storer Evernote
Download sequence files in Illumina fastq format. This will not open in notepad. Need program such as "Large Text File Viewer". Link to this will be on catalyst site. these file are 16-20 GB
First letters of seq are barcode. Second line are quality score for each sequence
EXAMPLE @name (line also contains information on samples location on slide) sequence
quality scores
typically first sequences are low quality
General notes:
(carrot symbol) tells perl to write a file
Unix format needs up key spits out previous command so that you can edit
line 9 is telling it to read file line by line lines 11-14 is telling it to read the code line by line
x+++ Note:What does it mean?
Why do we want to count the number of sequences? We can filter sequences based on the number of counts (eg too low or too high such as 10,000) Cutoff based on distribution of read counts
Novoalign: longest step (24hrs) This assembly is program is good because it does not make any assumptions about trying to lengthen sequences eg creates stacks
Can only run on cluster need 64 bit unix system Installed on node 4 Navigate to Novocraft folder on desktop open terminal create directory (cd) in Novocraft
Next Step is align sequences back to their index
The higher the scores the worse the alignment. The score is the number of mismatched basepairs.
To get the files on and off the cluster use DENALI
NOVO file interpretation R-matched to other sequences U-only matched to itself Next is the score and then the sequences it has matched to Sometimes a score of 30 indicates a single nucleotide difference
For next week... rerun scripts and try running the other Novo align script
cd /Volumes/web/cnidarian/s_1_sequence_fastqc
/Volumes/web/cnidarian/s_1_sequence_fastqc
http://eagle.fish.washington.edu/cnidarian/s_1_sequence_fastqc/fastqc_report.html
from IPython.display import HTML
HTML('<iframe src=http://eagle.fish.washington.edu/cnidarian/s_1_sequence_fastqc/fastqc_report.html width=100% height=550></iframe>')
cd /Volumes/Shoelace/Dropbox/Steven/
[Errno 2] No such file or directory: '/Volumes/Shoelace/Dropbox/Steven/' /Volumes/web/cnidarian/s_1_sequence_fastqc
mkdir gt
cd gt
/Volumes/Shoelace/Dropbox/Steven/gt
!git clone https://github.com/sr320/fish546.git
Cloning into fish546... remote: Counting objects: 3, done. remote: Total 3 (delta 0), reused 0 (delta 0) Unpacking objects: 100% (3/3), done.
ls
fish546/
cd fish546/
/Volumes/Shoelace/Dropbox/Steven/gt/fish546
ls
README.md
drag some files in .....
!git status
# On branch master # Untracked files: # (use "git add <file>..." to include in what will be committed) # # .DS_Store # Rad_perl_pipeline/ nothing added to commit but untracked files present (use "git add" to track)
!git add Rad_perl_pipeline/
!git commit -m "adding perl scripts courtesy of Storer"
[master a3ee1d3] adding perl scripts courtesy of Storer 10 files changed, 360 insertions(+), 0 deletions(-) create mode 100644 Rad_perl_pipeline/.DS_Store create mode 100644 Rad_perl_pipeline/perlpipelinefiles.zip create mode 100644 Rad_perl_pipeline/perlpipelinefiles/.DS_Store create mode 100755 Rad_perl_pipeline/perlpipelinefiles/BarcodeSplit.pl create mode 100755 Rad_perl_pipeline/perlpipelinefiles/FilterLoci.pl create mode 100755 Rad_perl_pipeline/perlpipelinefiles/HashSeqs.pl create mode 100755 Rad_perl_pipeline/perlpipelinefiles/IdentifyLoci.pl create mode 100755 Rad_perl_pipeline/perlpipelinefiles/QualityFilter.pl create mode 100755 Rad_perl_pipeline/perlpipelinefiles/SNPLocation.pl create mode 100755 Rad_perl_pipeline/perlpipelinefiles/pipeline.txt
!git status
# On branch master # Your branch is ahead of 'origin/master' by 1 commit. # # Untracked files: # (use "git add <file>..." to include in what will be committed) # # .DS_Store nothing added to commit but untracked files present (use "git add" to track)
!git commit
# On branch master # Your branch is ahead of 'origin/master' by 1 commit. # # Untracked files: # (use "git add <file>..." to include in what will be committed) # # .DS_Store nothing added to commit but untracked files present (use "git add" to track)
!git add IMG_1566.jpg
!git commit -m "adding image"
[master e6ccb50] adding image 1 files changed, 0 insertions(+), 0 deletions(-) create mode 100644 IMG_1566.jpg
cd /Users/sr320/Dropbox/Steven/gt/fish546
/Users/sr320/Dropbox/Steven/gt/fish546
!git status
# On branch master nothing to commit, working directory clean
!git remote
origin
!git log
commit ae5b7b65fa618ea8f0acff5cfdb3298654ad83bd Author: sr320 <roberts.sbr@gmail.com> Date: Tue Feb 25 11:02:22 2014 -0800 Add DESeq commit dccf56f83a5e10e68ac0adecce50a17ed0ab6a2a Author: sr320 <roberts.sbr@gmail.com> Date: Tue Feb 25 09:48:44 2014 -0800 RAD perl scripts via Storer commit 35567a29285a0e7e59cb92720a4655d326c70d49 Author: sr320 <roberts.sbr@gmail.com> Date: Tue Feb 25 05:45:56 2014 -0800 Initial commit
!git push
warning: push.default is unset; its implicit value is changing in Git 2.0 from 'matching' to 'simple'. To squelch this message and maintain the current behavior after the default changes, use: git config --global push.default matching To squelch this message and adopt the new behavior now, use: git config --global push.default simple See 'git help config' and search for 'push.default' for further information. (the 'simple' mode was introduced in Git 1.7.11. Use the similar mode 'current' instead of 'simple' if you sometimes use older versions of Git) Username for 'https://github.com': ^C
cd /Volumes/web/whale/fish546/
/Volumes/web/whale/fish546
cd rad
/Volumes/web/whale/fish546/rad
!icd /iplant/home/sr320/analyses/radtag_demultiplex_andy2-2014-02-27-10-42-44.518
!iget -r -f samples
cd samples/
/Volumes/web/whale/fish546/rad/samples
ls
process_radtags.log sample_AATATC.fq sample_ACAGCG.fq sample_CCCGGT.fq sample_AAGACG.fq sample_AATGAG.fq sample_CACCTC.fq sample_CCCTAA.fq sample_AAGCTA.fq sample_ACAAGA.fq sample_CAGGCA.fq
!head /Volumes/web/whale/fish546/rad/samples/sample_AAGCTA.fq
@1_1101_13406_2145_1 TGCAGTCTCGCTCGCTCTTTGGCCACAGAGTCCAGGGTCTGGCGGGCGTCGGCCAGCTCGGTCTCGTAGGCGGCCTTCAGGCCCGACAGTTCCCGGCTGACCTCGGACTCCGACTCTGTGATGCGCAGTCGCAGGCCTGCATTCTCCGTCTCCAGAAAGCGCACCTTGTCGATGTAGACGGCCAGCCGGTCGTTGAGGTTGCACAGGTCCTCCTTCTCCCGGGGCCGGGGGATACGGTTGGGGGG + B1B33BAFF1AFF1EFGHHHHHHHHGHHHHHHHHHHGHHHHHHGGGGGGGEGCGGEGCGGGAE?GGFG/EFEGCGFHGHFFHGGGCGCCGGFHHCGCC?CFH0C<C?CGHHF-C@EGFGHHFFF@G-@EGBG-.ABGGEEBFFFFFF?AFFBF9F/;F//9?;@?FFBBB-;ABAF/FBAB;@@FFA=;-@=>BBAB99--:/9;9-/;:FFFBFB/99/9@--@;--@-@--9/9B9E-@?@@- @1_1101_18318_2145_1 TGCAGGAAAGAGAAGCATGCTCAGATTCTTCTTCTCTGACAATTGATGACAGATATTTTGTAGAACGAGGGTTGTGTGGACAGAATTGTATTATTTTATACATTTATTATAATAAAATGATCTGTATCCGTGTACGCAGGGCCTCAGTCTTTCAGTAAGATAACGGGGTCAGAGTTTAACTCGGACTGAACAAATATAATCAGAGGATCGAACCTGTGACCTTCTTTCTAACCCTCAGGTCTCCA + B@BB331111111AEFFHHHHHHHHHHHHHHHHHHHHHHHHHHHGHHHHHHHHHHHHHHHHHHHHHGGGGGGHGGGHHGEGHGHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHGGGHHHHGGGCGGGGHHHGFHHHHHFHHHHHGHHHGGCGGGHHGHHHHHHGHHGGGCCHGGHHHGHHHFFGFHFBGEC/B?CGE@G:;0;0CGFGGGEBFFFG.A9..FF0;9BF0 @1_1101_12102_2148_1 TGCAGCAGATCGGAGACGAGCTGGATGGAAATATGGAGCTCCAAAGGTGAGCTTTCCAATATCGTTCCTGAAGAATTATTAATGAGTATAATTAGCCTGTCTGTAATCACTAAATCTTGTTTGTTCCCCCCCCACAGAATGATAAACAACTCTTCGCTCAGTCCCACAAAAGACATGTTTATGAGAGTTGCCATTGAGATCTTTTCAGATGGAAAGTTCAACTGGGGCCGGGGGGTCGCACTGTT
!head -30 /Volumes/web/whale/fish546/rad/samples/process_radtags.log
/usr/local2/STACKS/stacks/bin/process_radtags -p andy_dir -b anofimbarcodes_102413.txt -o samples -e pstI --inline_null -c -q -r -i gzfastq process_radtags executed 2014-02-27 11:05:58 File Retained Reads Low Quality Ambiguous Barcodes Ambiguous RAD-Tag Total sablefishPstI1_S1_L001_R1_001.fastq.gz 13754658 5 372693 46987 14174343 Total Sequences 14174343 Ambiguous Barcodes 372693 Low Quality 5 Ambiguous RAD-Tag 46987 Retained Reads 13754658 Barcode Total No RadTag Retained AAGACG 1544350 4470 1539880 AAGCTA 1412759 3286 1409471 AATATC 1352594 4347 1348246 AATGAG 2186604 8541 2178063 ACAAGA 960638 4366 956272 ACAGCG 1408640 4526 1404113 CACCTC 1185959 7357 1178602 CAGGCA 1376357 3700 1372657 CCCGGT 1110046 2411 1107634 CCCTAA 1263703 3983 1259720 Sequences not recorded Barcode Total CGTATG 6253 CCTCTC 3893 CACACA 2889 ATGAGT 2606
Example code (run on hummingbird remotely)
!ustacks -t fastq -f /Volumes/web/whale/fish546/rad/samples/sample_AATGAG.fq -o /Volumes/web/whale/fish546/rad/samples -i 4 -p 10 -m 2 --model_type 'bounded' --alpha 0.05 --bound_low 0.001 --bound_high 0.01 -r -d
Ran 5 libaries
pwd
u'/Volumes/web/whale/fish546/rad/samples'
ls
process_radtags.log sample_AATATC.fq sample_AAGACG.alleles.tsv sample_AATGAG.fq sample_AAGACG.fq sample_ACAAGA.fq sample_AAGACG.snps.tsv sample_ACAGCG.fq sample_AAGACG.tags.tsv sample_CACCTC.fq sample_AAGCTA.alleles.tsv sample_CAGGCA.fq sample_AAGCTA.fq sample_CCCGGT.fq sample_AAGCTA.snps.tsv sample_CCCTAA.fq sample_AAGCTA.tags.tsv
!head sample_AAGACG.alleles.tsv
0 1 2 AGACTTA 60 3 0 1 2 ATTCCAA 20 1 0 1 2 CGATTTT 20 1 0 1 3 ACA 20 1 0 1 3 TTA 60 3 0 1 3 TTG 20 1 0 1 7 AGCGC 20 1 0 1 7 CTGAA 20 1 0 1 7 CTGAC 40 2 0 1 8 CT 66.6667 2
!head sample_AAGACG.snps.tsv
0 1 2 159 -5.10668 A C 0 1 2 213 -5.10668 G T 0 1 2 220 -5.10668 A T 0 1 2 228 -5.10668 C T 0 1 2 231 -5.10668 T C 0 1 2 233 -5.10668 T A 0 1 2 239 -5.10668 A T 0 1 3 206 -5.10668 T A 0 1 3 220 -5.10668 T C 0 1 3 228 -5.10668 A G
!head sample_AAGACG.tags.tsv
0 1 1 0 + consensus TGCAGATGGATTCTGTTGGTGCACCACAAAGCACCTTCAAGTAATCACATCGCTTATAGATAATCTATATATAGACATATATATCTATATATGTCTATATATCTATATATGTATTTTGTATAAAATACAAAAGCCCTGCATATGTTAATTTGGAAGCAATACAGTCTGATTTGGGGGAGTTTTAATTGGAACCATGGGAAAGAAATGCTTATGAAGGCAAGGAGAGAAAACACACTCAGATGCTT 0 0 0 0 1 1 model UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUU 0 1 1 primary 0 1_1106_5151_10390_1 TGCAGATGGATTCTGTTGGTGCACCACAAAGCACCTTCAAGTAATCACATCGCTTATAGATAATCTATATATAGACATATATATCTATATATGTCTATATATCTATATATGTATTTTGTATAAAATACAAAAGCCCTGCATATGTTAATTTGGAAGCAATACAGTCTGATTTGGGGGAGTTTTAATTGGAACCATGGGAAAGAAATGCTTATGAAGGCAAGGAGAGAAAACACACTCAGATGCTT 0 1 1 primary 0 1_1110_4587_19590_1 TGCAGATGGATTCTGTTGGTGCACCACAAAGCACCTTCAAGTAATCACATCGCTTATAGATAATCTATATATAGACATATATATCTATATATGTCTATATATCTATATATGTATTTTGTATAAAATACAAAAGCCCTGCATATGTTAATTTGGAAGCAATACAGTCTGATTTGGGGGAGTTTTAATTGGAACCATGGGAAAGAAATGCTTATGAAGGCAAGGAGAGAAAACACACTCAGATGCTT 0 1 2 0 + consensus TGCAGTGCAGCTTTAAACACAGGAGGGTCGGCAGAATGTCAAACAGAACCACCGTCACTGCTGTGACTCTGAAGCCACAGTCAGGATTTTTGGGGAAGAACCGCCAACAATGGAAGCCGTGCTGGCCGATACGGATCACCGATCATGGTAATAAAATCTAGAGATGAGAAAGTATCATGAATTGATAATTCTTTGTTTATGGGCAGCCAACATGTGCAACATATTCCACTCTCTCTCTTAGAGAC 0 0 0 0 1 2 model OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOEOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOEOOOOOOEOOOOOOOEOOEOEOOOOOEOOOOO 0 1 2 primary 0 1_1101_10914_9604_1 TGCAGTGCAGCTTTAAACACAGGAGGGTCGGCAGAATGTCAAACAGAACCACCGTCACTGCTGTGACTCTGAAGCCACAGTCAGGATTTTTGGGGAAGAACCGCCAACAATGGAAGCCGTGCTGGCCGATACGGATCACCGATCATGGTAATAAAATCTAGAGATGAGAAAGTATCATGAATTGATAATTCTTTGTTTATGGGCAGCCAACATGTGCAACATATTCCACTCTCTCTCTTAGAGAC 0 1 2 primary 0 1_1103_12944_26723_1 TGCAGTGCAGCTTTAAACACAGGAGGGTCGGCAGAATGTCAAACAGAACCACCGTCACTGCTGTGACTCTGAAGCCACAGTCAGGATTTTTGGGGAAGAACCGCCAACAATGGAAGCCGTGCTGGCCGATACGGATCACCGATCATGGTAATAAAATCTAGAGATGAGAAAGTATCATGAATTGATAATTCTTTGTTTATGGGCAGCCAACATGTGCAACATATTCCACTCTCTCTCTTAGAGAC 0 1 2 primary 0 1_1111_24429_15159_1 TGCAGTGCAGCTTTAAACACAGGAGGGTCGGCAGAATGTCAAACAGAACCACCGTCACTGCTGTGACTCTGAAGCCACAGTCAGGATTTTTGGGGAAGAACCGCCAACAATGGAAGCCGTGCTGGCCGATACGGATCACCGATCATGGTAATAAAATCTAGAGATGAGAAAGTATCATGAATTGATAATTCTTTGTTTATGGGCAGCCAACATGTGCAACATATTCCACTCTCTCTCTTAGAGAC 0 1 2 secondary 1_2113_9036_27152_1 TGCAGTGCAGCTTTAAACACAGGAGGGTCGGCAGAATGTCAAACAGAACCACCGTCACTGCTGTGACTCTGAAGCCACAGTCAGGATTTTTGGGGAAGAACCGCCAACAATGGAAGCCGTGCTGGCCGATACGGATCACCGATCATGGTAATAAAATCTCGAGATGAGAAAGTATCATGAATTGATAATTCTTTGTTTATGGGCAGCCAACATGTGCAACATATTCCATTCTCTCTCTTTGAGAC
!head /Volumes/web/whale/fish546/rad/samples/batch_1.catalog.alleles.tsv
0 1 2 AGACTTA 0 0 0 1 2 ATTCCAA 0 0 0 1 2 CGATTTT 0 0 0 1 3 ACA 0 0 0 1 3 TTA 0 0 0 1 3 TTG 0 0 0 1 4 ACCA 0 0 0 1 4 CATT 0 0 0 1 6 A 0 0 0 1 6 C 0 0
!head /Volumes/web/whale/fish546/rad/samples/batch_1.catalog.snps.tsv
0 1 2 159 -5.10668 A C 0 1 2 213 -5.10668 G T 0 1 2 220 -5.10668 A T 0 1 2 228 -5.10668 C T 0 1 2 231 -5.10668 T C 0 1 2 233 -5.10668 T A 0 1 2 239 -5.10668 A T 0 1 3 206 -5.10668 T A 0 1 3 220 -5.10668 T C 0 1 3 228 -5.10668 A G
!head /Volumes/web/whale/fish546/rad/samples/batch_1.catalog.tags.tsv
0 1 1 0 + consensus 0 1_1 TGCAGATGGATTCTGTTGGTGCACCACAAAGCACCTTCAAGTAATCACATCGCTTATAGATAATCTATATATAGACATATATATCTATATATGTCTATATATCTATATATGTATTTTGTATAAAATACAAAAGCCCTGCATATGTTAATTTGGAAGCAATACAGTCTGATTTGGGGGAGTTTTAATTGGAACCATGGGAAAGAAATGCTTATGAAGGCAAGGAGAGAAAACACACTCAGATGCTT 0 0 0 0 1 2 0 + consensus 0 1_2 TGCAGTGCAGCTTTAAACACAGGAGGGTCGGCAGAATGTCAAACAGAACCACCGTCACTGCTGTGACTCTGAAGCCACAGTCAGGATTTTTGGGGAAGAACCGCCAACAATGGAAGCCGTGCTGGCCGATACGGATCACCGATCATGGTAATAAAATCTAGAGATGAGAAAGTATCATGAATTGATAATTCTTTGTTTATGGGCAGCCAACATGTGCAACATATTCCACTCTCTCTCTTAGAGAC 0 0 0 0 1 3 0 + consensus 0 1_3 TGCAGTAGCTAGCGTTAACTCCATGAGTTGGTTTAAAACAACCTCACCAGCTGTCATTGTTGTAGAACCTTGTTAAATACTGTAGCACCCAGCAATGGCCGAAGCTATGCATTGGTACCAACCATGTCATGCTAGCTTGTCGGGATCGGGTATAAAAGCCCTTCACATTTGGTTGAGGGAACATTTGGAATGATCATATTCAAAAATGTCACTTGAACTCTCACTTCAAGATATGTCAATGAAAA 0 0 0 0 1 4 0 + consensus 0 1_4,4_112317 TGCAGGGCCCCTGAGTCTCTCTGTGGCTGTTTGACAGTAATAGGACTGGTTTTATTCCTGTGAGACCTGCCTGCATAGATAACATCTCTCCTGTGCATACATAAATATTAGAACAGTCCATGGGACATTTCAAGTGGAGAGGATGAATTATTTATCGCGCCAAGTTGAGAGAGTTGAGTTTATACAGTGCCGACAGTGGCTCTTAATTGTTAGCTGACGGTTAGAAATCGTTTTTTTAAAAGCTT 0 0 0 0 1 5 0 + consensus 0 1_5 TGCAGTGGAGGACAGAGGACATGGCTATGAACAAAATGGAAGAAGAGACAGAGGCAGATTGTTTCTGTATCTGTAGCCTATTTAAACATTTTTATTTCACACATTTGCCATTTAAATGCTCCTCATCCTCTGAAGGTACATGAGGCTTCTTATGTTTCACCTGCAATGGTAGCTGTGACATCAGACGACTGATTCAACCGGAAGAGAGGAACTGTAAGCTGAGCAGAGGGCCACTGCATGTTCAC 0 0 0 0 1 6 0 + consensus 0 1_6,4_59742 TGCAGCTTAAATAAAGCTTCATTTTCGGATTCGAGGTCTTATTTGGATGGATTTCAGCAGCTATGGCTTTGTGTTTTGTTCCCACGAGTTTTGATTTTGCTACAGCTGTGATTAAAACCGCTGACTGGAATCTCAACAGCGGCTAAAGCTCATGTGTTTTTTGCAATATTTATAGCCCATCCACCACAACAGAAGTTATAGAAAAAATATCAGTAAAAGAAAAGTTTGAAAAAGCAATTGAAAAT 0 0 0 0 1 7 0 + consensus 0 1_7,4_136809 TGCAGGACACACATGAAAGTAAATGCAGCCTATCACAGTATGGCACGATACAATCATATCCTTTCTTTTTCTTGCTGATTATTTATAATTGGAGGTATCCATAGCAACAGCAGCTCTGTATATTTTAATGCAAAAGATACATTTTATGGCTTTAGGTTTAAACCGTGCTTTGTAGAGCTGATGGGTGTTTAACAGAGCAGCCTATCAGTGCATGGCCAGCTTGTGGCTGAGTGGAGCCCCTTTAC 0 0 0 0 1 8 0 + consensus 0 1_8 TGCAGATTTACTGGGCAGGTAGTATTAAGAGTCACTCCTTTGCTCAAGGTCTGGTTGACATGACTAAATTTGAAGCTTTTTGTTGAGCAGCTTGCAGAAATAGAAATGCTGCGATCTAATTAAACCATAGCTTGCACTTTCATGTCAAGCAAATGCATCCTCAATAGCCAGAAGCTGAATGTTCTTGAAAAATATGGAGGTGAGAGCCAACTAAGTCCAGTGGACCAGATCAAATCCTTAAATTA 0 0 0 0 1 9 0 + consensus 0 1_9 TGCAGCGGGGACTGAAGTAGATACGTGGGCCGAGGACGCATGGCTTCAAACGCCAGGCAGTGGTACAGACCAACGGGTGTCAGACGGTGAGATGAGCTCAGCCGCCGATGGAGTGACTCCTCGCAGCAGTTCTACCAAGTCCTCCCCTGTTTCCATATCTATTATTACAGCACAGGGTAAGACGGCAAATTATAGATTCCACATAGACTTAATGGTTAGGGATTTAACGTGCAATTGCATTTTCT 0 0 0 0 1 10 0 + consensus 0 1_10,3_4904 TGCAGTATTTTTACTAGTTATATTCTCTTTTAAAGTTGTTACTTTCTGGAAGCGATAGACCCTTCGCCTCAGTTTCAACCCTGAAGCGCCGTCTGGCTAATCAAAGCGTAGCATCAGTCAGACCAGGGTTCCTCAACTACTAGTCTCTTTCTCATAGACTCATATGAAACGGTTTCAGATGTTCTTCATCTTCAGGCTGGTTTTGGACTTGGCGCTAGTTATGGTTAAGCGTTGTGTAATAGTGA 0 0 0
!sstacks -b 1 -c /Volumes/web/whale/fish546/rad/samples/batch_1 -s /Volumes/web/whale/fish546/rad/samples/sample_ACAAGA -o /Volumes/web/whale/fish546/rad/samples/ -p 12
output
Parsing /Volumes/web/whale/fish546/rad/samples/batch_1.catalog.tags.tsv
Parsing /Volumes/web/whale/fish546/rad/samples/batch_1.catalog.snps.tsv
Parsing /Volumes/web/whale/fish546/rad/samples/batch_1.catalog.alleles.tsv
Parsing /Volumes/web/whale/fish546/rad/samples/sample_ACAAGA.tags.tsv
Parsing /Volumes/web/whale/fish546/rad/samples/sample_ACAAGA.snps.tsv
Parsing /Volumes/web/whale/fish546/rad/samples/sample_ACAAGA.alleles.tsv
Searching for sequence matches...
40395 stacks compared against the catalog containing 289998 loci.
16987 matching loci, 8346 contained no verified haplotypes.
156 loci matched more than one catalog locus and were excluded.
8029 loci contained SNPs unaccounted for in the catalog and were excluded.
19339 total haplotypes examined from matching loci, 9781 verified.
Outputing to file /Volumes/web/whale/fish546/rad/samples/sample_ACAAGA.matches.tsv
!head /Volumes/web/whale/fish546/rad/samples/sample_ACAAGA.matches.tsv
0 1 17370 5 1 C 2 0 1 133131 5 9 consensus 3 0 1 13913 5 34 AAGAT 2 0 1 165533 5 50 CA 2 0 1 48330 5 51 T 2 0 1 43034 5 52 TGG 2 0 1 177000 5 68 consensus 3 0 1 20735 5 75 TAA 2 0 1 55930 5 79 CCC 2 0 1 139304 5 92 TCACC 2
Fst kernel smoothing: off
Bootstrap resampling: off
Percent samples limit per population: 0
Locus Population limit: 1
Minimum stack depth: 4
Minor allele frequency cutoff: 0
Applying Fst correction: none.
Parsing population map.
Found 4 input file(s).
1 population found
1: sample_AAGACG, sample_AAGCTA, sample_AATATC, sample_AATGAG
1 group of populations found
1: 1
Parsing /Volumes/web/whale/fish546/rad/samples/batch_1.catalog.tags.tsv
Parsing /Volumes/web/whale/fish546/rad/samples/batch_1.catalog.snps.tsv
Parsing /Volumes/web/whale/fish546/rad/samples/batch_1.catalog.alleles.tsv
Catalog is not reference aligned, arbitrarily ordering catalog loci.
Parsing /Volumes/web/whale/fish546/rad/samples/sample_AAGACG.matches.tsv
Parsing /Volumes/web/whale/fish546/rad/samples/sample_AAGCTA.matches.tsv
Parsing /Volumes/web/whale/fish546/rad/samples/sample_AATATC.matches.tsv
Parsing /Volumes/web/whale/fish546/rad/samples/sample_AATGAG.matches.tsv
Populating observed haplotypes for 4 samples, 289998 loci.
Removed 215394 samples from loci that are below the minimum stack depth of 4x
Removing 154311 loci that did not pass sample/population constraints... retained 135687 loci.
Loading model outputs for 4 samples, 135687 loci.
Parsing /Volumes/web/whale/fish546/rad/samples/sample_AAGACG.tags.tsv
Parsing /Volumes/web/whale/fish546/rad/samples/sample_AAGCTA.tags.tsv
Parsing /Volumes/web/whale/fish546/rad/samples/sample_AATATC.tags.tsv
Parsing /Volumes/web/whale/fish546/rad/samples/sample_AATGAG.tags.tsv
Generating nucleotide-level summary statistics for population 1
Population 1 contained 2093 incompatible loci.
Tallying loci across populations...done.
Writing 135687 loci to haplotype statistics file, '/Volumes/web/whale/fish546/rad/samples/batch_1.hapstats.tsv'
Calculating haplotype F statistics... done.
Writing haplotype F statistics... wrote 123147 loci to haplotype Phi_st file, '/Volumes/web/whale/fish546/rad/samples/batch_1.phistats.tsv'
Writing 135687 loci to summary statistics file, '/Volumes/web/whale/fish546/rad/samples/batch_1.sumstats.tsv'
Writing population data to VCF file '/Volumes/web/whale/fish546/rad/samples/batch_1.vcf'
Loading SNP data for 4 samples.
Parsing /Volumes/web/whale/fish546/rad/samples/sample_AAGACG.snps.tsv
Parsing /Volumes/web/whale/fish546/rad/samples/sample_AAGCTA.snps.tsv
Parsing /Volumes/web/whale/fish546/rad/samples/sample_AATATC.snps.tsv
Parsing /Volumes/web/whale/fish546/rad/samples/sample_AATGAG.snps.tsv
Writing 135687 loci to observed haplotype file, '/Volumes/web/whale/fish546/rad/samples/batch_1.haplotypes.tsv'
!head /Volumes/web/whale/fish546/rad/samples/batch_1.haplotypes.tsv
Catalog ID Cnt sample_AAGACG sample_AAGCTA sample_AATATC sample_AATGAG 2 1 AGACTTA/ATTCCAA/CGATTTT - - - 3 1 ACA/TTA/TTG - - - 4 2 CATT - - ACCA/CATT 7 2 CAGCGCGC/CCTCGGAA/CCTCGGAC - - CCTAGGAC/CCTCTGAC/TCTCGGAC 9 1 AC/AT/CC - - - 11 2 CGG/TGG - - CGG 14 2 GG/TA - TA - 15 1 consensus - - - 16 1 AA/GT - - -
!ls /Volumes/web/whale/fish546/rad/samples/
batch_1.catalog.alleles.tsv sample_AATATC.alleles.tsv batch_1.catalog.snps.tsv sample_AATATC.fq batch_1.catalog.tags.tsv sample_AATATC.matches.tsv batch_1.haplotypes.tsv sample_AATATC.snps.tsv batch_1.hapstats.tsv sample_AATATC.tags.tsv batch_1.phistats.tsv sample_AATGAG.alleles.tsv batch_1.populations.log sample_AATGAG.fq batch_1.sumstats.tsv sample_AATGAG.matches.tsv batch_1.sumstats_summary.tsv sample_AATGAG.snps.tsv batch_1.vcf sample_AATGAG.tags.tsv process_radtags.log sample_ACAAGA.alleles.tsv sample_AAGACG.alleles.tsv sample_ACAAGA.fq sample_AAGACG.fq sample_ACAAGA.matches.tsv sample_AAGACG.matches.tsv sample_ACAAGA.snps.tsv sample_AAGACG.snps.tsv sample_ACAAGA.tags.tsv sample_AAGACG.tags.tsv sample_ACAGCG.fq sample_AAGCTA.alleles.tsv sample_CACCTC.fq sample_AAGCTA.fq sample_CAGGCA.fq sample_AAGCTA.matches.tsv sample_CCCGGT.fq sample_AAGCTA.snps.tsv sample_CCCTAA.fq sample_AAGCTA.tags.tsv