Blast output from de novo assembly of Carmel Exposed and Control Libraries
#tab-delited fasta
!date
Wed Mar 5 11:16:17 PST 2014
!head /Volumes/web/cnidarian/BlackAbalone_Contigs_v3.fa
>Roberts_20100712_CC_F3_trimmed_contig_1 Average coverage: 38.44 ATTTAAAGTGTTTGATAAAGACAATAACGGCTTCATCAGTAAATCCGAGCTCCGGCAGGT CATGGTGTCTTTGGAGGGTCACAAGGTCACCGAGCAGGAAATCAGCGAC >Roberts_20100712_CC_F3_trimmed_contig_2 Average coverage: 153.77 CTTCAGCACAACTCAGGTGTCTGTCCGGCCGTTACAGCACACCCAGTTTGAGCGGTTCAT CCCTGCAGCCTACCCATATTACACCAGTGCCTTCTCCATGATGTTTGGAGTCCTTATACT GAGTATAGTGTTCTCATGCCCTGTCCTTCTTGGATTCC >Roberts_20100712_CC_F3_trimmed_contig_3 Average coverage: 175.58 TGGAGGTGGGGTGCCTCATAGATGGTTTGGACTTGCCGGTCCTATAGGAGCAGGGGAATG GGGAAACCCAAGGTCGAGCTACCTGACACACGCCTTGGCCG
!sed 's/Roberts_20100712_CC_F3_trimmed_/BlackAbalone_v3_/g' </Volumes/web/cnidarian/BlackAbalone_Contigs_v3.fa> /Volumes/web/cnidarian/lft_BlackAbalone_v3_fasta.fa
#sed 's/abc/XYZ/g' <infile >outfile
!head /Volumes/web/cnidarian/lft_BlackAbalone_v3_fasta.fa
>BlackAbalone_v3_contig_1 Average coverage: 38.44 ATTTAAAGTGTTTGATAAAGACAATAACGGCTTCATCAGTAAATCCGAGCTCCGGCAGGT CATGGTGTCTTTGGAGGGTCACAAGGTCACCGAGCAGGAAATCAGCGAC >BlackAbalone_v3_contig_2 Average coverage: 153.77 CTTCAGCACAACTCAGGTGTCTGTCCGGCCGTTACAGCACACCCAGTTTGAGCGGTTCAT CCCTGCAGCCTACCCATATTACACCAGTGCCTTCTCCATGATGTTTGGAGTCCTTATACT GAGTATAGTGTTCTCATGCCCTGTCCTTCTTGGATTCC >BlackAbalone_v3_contig_3 Average coverage: 175.58 TGGAGGTGGGGTGCCTCATAGATGGTTTGGACTTGCCGGTCCTATAGGAGCAGGGGAATG GGGAAACCCAAGGTCGAGCTACCTGACACACGCCTTGGCCG
!grep -c ">" /Volumes/web/cnidarian/lft_BlackAbalone_v3_fasta.fa
13884
Details of assembly in CLC
code
./blastx -query /Volumes/web-1/cnidarian/BlackAbalone_Contigs_v3.fa -db /Volumes/web-1/whale/fish546/blast/db/swissprot -out /Volumes/web-1/cnidarian/lft_BlackAbalone_v3_swissprot_blastout -outfmt 6 -evalue 1E-5 -max_target_seqs 1 -num_threads 4
!head /Volumes/web/cnidarian/lft_BlackAbalone_v3_swissprot_blastout
Roberts_20100712_CC_F3_trimmed_contig_1 gi|41018621|sp|P60204.2|CALM_EMENI 64.71 34 11 1 2 103 90 122 6e-08 48.5 Roberts_20100712_CC_F3_trimmed_contig_2 gi|218546747|sp|B1H3C9.1|OST48_XENTR 81.82 44 8 0 2 133 384 427 5e-18 79.7 Roberts_20100712_CC_F3_trimmed_contig_6 gi|133940|sp|P02350.2|RS31_XENLA 100.00 57 0 0 1 171 10 66 4e-32 115 Roberts_20100712_CC_F3_trimmed_contig_7 gi|21362398|sp|P70097.1|C560_CRIGR 46.76 139 70 2 7 414 3 140 9e-29 108 Roberts_20100712_CC_F3_trimmed_contig_9 gi|302393789|sp|P62972.2|UBIQP_XENLA 100.00 69 0 0 2 208 8 76 9e-42 139 Roberts_20100712_CC_F3_trimmed_contig_15 gi|74851961|sp|Q54GK6.1|RL222_DICDI 59.46 37 15 0 2 112 45 81 7e-08 48.1 Roberts_20100712_CC_F3_trimmed_contig_19 gi|231498|sp|P30163.1|ACT2_ONCVO 98.08 260 5 0 1 780 32 291 0.0 537 Roberts_20100712_CC_F3_trimmed_contig_22 gi|158706130|sp|Q08CS6.2|MOXD2_DANRE 33.73 83 51 2 1 237 448 530 1e-08 54.7 Roberts_20100712_CC_F3_trimmed_contig_26 gi|6226551|sp|P29957.3|AMY_PSEHA 50.98 51 21 1 1 153 527 573 9e-08 51.2 Roberts_20100712_CC_F3_trimmed_contig_28 gi|133802|sp|P20342.3|RS15_XENLA 100.00 44 0 0 3 134 102 145 5e-25 95.1
!tail /Volumes/web/cnidarian/lft_BlackAbalone_v3_swissprot_blastout
Roberts_20100712_CC_F3_trimmed_contig_13606 gi|48474943|sp|Q99NF1.1|BCDO2_MOUSE 60.00 30 12 0 2 91 226 255 3e-06 45.8 Roberts_20100712_CC_F3_trimmed_contig_13614 gi|82179554|sp|Q5M9A7.1|PGAP2_XENLA 68.29 41 13 0 2 124 60 100 2e-14 67.0 Roberts_20100712_CC_F3_trimmed_contig_13615 gi|158957572|sp|Q0P457.2|KTI12_DANRE 68.57 35 11 0 3 107 239 273 2e-10 56.6 Roberts_20100712_CC_F3_trimmed_contig_13618 gi|46577124|sp|Q9H3K6.1|BOLA2_HUMAN 60.00 35 14 0 3 107 52 86 2e-09 52.8 Roberts_20100712_CC_F3_trimmed_contig_13654 gi|2493067|sp|Q36967.1|ATP6_SALTR 95.12 41 2 0 3 125 2 42 1e-18 76.6 Roberts_20100712_CC_F3_trimmed_contig_13686 gi|66774221|sp|Q9CZR2.2|NALD2_MOUSE 60.61 33 13 0 6 104 670 702 3e-07 49.3 Roberts_20100712_CC_F3_trimmed_contig_13693 gi|71153494|sp|Q9H583.3|HEAT1_HUMAN 47.73 44 23 0 5 136 1991 2034 2e-06 47.4 Roberts_20100712_CC_F3_trimmed_contig_13748 gi|122136056|sp|Q2KIM0.1|FUCO_BOVIN 55.56 36 16 0 6 113 202 237 7e-08 50.4 Roberts_20100712_CC_F3_trimmed_contig_13845 gi|229462816|sp|Q9UKP5.2|ATS6_HUMAN 39.29 84 44 3 46 297 556 632 3e-09 57.4 Roberts_20100712_CC_F3_trimmed_contig_13882 gi|166217735|sp|A1USB4.1|GATB_BARBK 61.29 31 12 0 10 102 12 42 1e-06 47.0
!grep -c "Roberts" /Volumes/web/cnidarian/lft_BlackAbalone_v3_swissprot_blastout
1842
#now with version28
!blastx -query /Volumes/web/cnidarian/BlackAbalone_Contigs_v3.fa -db /Volumes/web/whale/fish546/blast/db/swissprot -out /Volumes/web/cnidarian/lft_BlackAbalone_v3_swissprot_blastout_tax1 -outfmt "6 std stitle staxids sscinames scomnames sblastnames" -evalue 1E-10 -max_target_seqs 1
Selenocysteine (U) at position 613 replaced by X Selenocysteine (U) at position 613 replaced by X Selenocysteine (U) at position 24 replaced by X Selenocysteine (U) at position 63 replaced by X Selenocysteine (U) at position 25 replaced by X Selenocysteine (U) at position 63 replaced by X Selenocysteine (U) at position 60 replaced by X Selenocysteine (U) at position 73 replaced by X Selenocysteine (U) at position 73 replaced by X Selenocysteine (U) at position 73 replaced by X Selenocysteine (U) at position 73 replaced by X Selenocysteine (U) at position 73 replaced by X Selenocysteine (U) at position 73 replaced by X Selenocysteine (U) at position 73 replaced by X Selenocysteine (U) at position 73 replaced by X Selenocysteine (U) at position 73 replaced by X Selenocysteine (U) at position 64 replaced by X
#Updating July 24, 2013
#Want to GOslim pie the transcriptome based on Swiss-Prot
#tr ',' "\t"
!tr '|' "\t" </Volumes/web/cnidarian/lft_BlackAbalone_v3_swissprot_blastout> /Volumes/web/cnidarian/lft_BlackAbalone_v3_swissprot_blastout_b
!head /Volumes/web/cnidarian/lft_BlackAbalone_v3_swissprot_blastout_b
Roberts_20100712_CC_F3_trimmed_contig_1 gi 41018621 sp P60204.2 CALM_EMENI 64.71 34 11 1 2 103 90 122 6e-08 48.5 Roberts_20100712_CC_F3_trimmed_contig_2 gi 218546747 sp B1H3C9.1 OST48_XENTR 81.82 44 8 0 2 133 384 427 5e-18 79.7 Roberts_20100712_CC_F3_trimmed_contig_6 gi 133940 sp P02350.2 RS31_XENLA 100.00 57 0 0 1 171 10 66 4e-32 115 Roberts_20100712_CC_F3_trimmed_contig_7 gi 21362398 sp P70097.1 C560_CRIGR 46.76 139 70 2 7 414 3 140 9e-29 108 Roberts_20100712_CC_F3_trimmed_contig_9 gi 302393789 sp P62972.2 UBIQP_XENLA 100.00 69 0 0 2 208 8 76 9e-42 139 Roberts_20100712_CC_F3_trimmed_contig_15 gi 74851961 sp Q54GK6.1 RL222_DICDI 59.46 37 15 0 2 112 45 81 7e-08 48.1 Roberts_20100712_CC_F3_trimmed_contig_19 gi 231498 sp P30163.1 ACT2_ONCVO 98.08 260 5 0 1 780 32 291 0.0 537 Roberts_20100712_CC_F3_trimmed_contig_22 gi 158706130 sp Q08CS6.2 MOXD2_DANRE 33.73 83 51 2 1 237 448 530 1e-08 54.7 Roberts_20100712_CC_F3_trimmed_contig_26 gi 6226551 sp P29957.3 AMY_PSEHA 50.98 51 21 1 1 153 527 573 9e-08 51.2 Roberts_20100712_CC_F3_trimmed_contig_28 gi 133802 sp P20342.3 RS15_XENLA 100.00 44 0 0 3 134 102 145 5e-25 95.1
#also need to get rid of version # on Swiss-Prot ID
#note that it will also break bitscore
!tr '.' "\t" </Volumes/web/cnidarian/lft_BlackAbalone_v3_swissprot_blastout_b> /Volumes/web/cnidarian/lft_BlackAbalone_v3_swissprot_blastout_c
!head /Volumes/web/cnidarian/lft_BlackAbalone_v3_swissprot_blastout_c
Roberts_20100712_CC_F3_trimmed_contig_1 gi 41018621 sp P60204 2 CALM_EMENI 64 71 34 11 1 2 103 90 122 6e-08 48 5 Roberts_20100712_CC_F3_trimmed_contig_2 gi 218546747 sp B1H3C9 1 OST48_XENTR 81 82 44 8 0 2 133 384 427 5e-18 79 7 Roberts_20100712_CC_F3_trimmed_contig_6 gi 133940 sp P02350 2 RS31_XENLA 100 00 57 0 0 1 171 10 66 4e-32 115 Roberts_20100712_CC_F3_trimmed_contig_7 gi 21362398 sp P70097 1 C560_CRIGR 46 76 139 70 2 7 414 3 140 9e-29 108 Roberts_20100712_CC_F3_trimmed_contig_9 gi 302393789 sp P62972 2 UBIQP_XENLA 100 00 69 0 0 2 208 8 76 9e-42 139 Roberts_20100712_CC_F3_trimmed_contig_15 gi 74851961 sp Q54GK6 1 RL222_DICDI 59 46 37 15 0 2 112 45 81 7e-08 48 1 Roberts_20100712_CC_F3_trimmed_contig_19 gi 231498 sp P30163 1 ACT2_ONCVO 98 08 260 5 0 1 780 32 291 0 0 537 Roberts_20100712_CC_F3_trimmed_contig_22 gi 158706130 sp Q08CS6 2 MOXD2_DANRE 33 73 83 51 2 1 237 448 530 1e-08 54 7 Roberts_20100712_CC_F3_trimmed_contig_26 gi 6226551 sp P29957 3 AMY_PSEHA 50 98 51 21 1 1 153 527 573 9e-08 51 2 Roberts_20100712_CC_F3_trimmed_contig_28 gi 133802 sp P20342 3 RS15_XENLA 100 00 44 0 0 3 134 102 145 5e-25 95 1
!sed 's/Roberts_20100712_CC_F3_trimmed/Haliotis_cra_v3/g' </Volumes/web/cnidarian/lft_BlackAbalone_v3_swissprot_blastout_c> /Volumes/web/cnidarian/lft_BlackAbalone_v3_swissprot_blastout_d
#sed 's/abc/XYZ/g' <infile >outfile
!head /Volumes/web/cnidarian/lft_BlackAbalone_v3_swissprot_blastout_d
Haliotis_cra_v3_contig_1 gi 41018621 sp P60204 2 CALM_EMENI 64 71 34 11 1 2 103 90 122 6e-08 48 5 Haliotis_cra_v3_contig_2 gi 218546747 sp B1H3C9 1 OST48_XENTR 81 82 44 8 0 2 133 384 427 5e-18 79 7 Haliotis_cra_v3_contig_6 gi 133940 sp P02350 2 RS31_XENLA 100 00 57 0 0 1 171 10 66 4e-32 115 Haliotis_cra_v3_contig_7 gi 21362398 sp P70097 1 C560_CRIGR 46 76 139 70 2 7 414 3 140 9e-29 108 Haliotis_cra_v3_contig_9 gi 302393789 sp P62972 2 UBIQP_XENLA 100 00 69 0 0 2 208 8 76 9e-42 139 Haliotis_cra_v3_contig_15 gi 74851961 sp Q54GK6 1 RL222_DICDI 59 46 37 15 0 2 112 45 81 7e-08 48 1 Haliotis_cra_v3_contig_19 gi 231498 sp P30163 1 ACT2_ONCVO 98 08 260 5 0 1 780 32 291 0 0 537 Haliotis_cra_v3_contig_22 gi 158706130 sp Q08CS6 2 MOXD2_DANRE 33 73 83 51 2 1 237 448 530 1e-08 54 7 Haliotis_cra_v3_contig_26 gi 6226551 sp P29957 3 AMY_PSEHA 50 98 51 21 1 1 153 527 573 9e-08 51 2 Haliotis_cra_v3_contig_28 gi 133802 sp P20342 3 RS15_XENLA 100 00 44 0 0 3 134 102 145 5e-25 95 1
!wc /Volumes/web/cnidarian/lft_BlackAbalone_v3_swissprot_blastout_d
1842 34752 186913 /Volumes/web/cnidarian/lft_BlackAbalone_v3_swissprot_blastout_d
#lets try again to see if can commandline up to SQLShare
!cd /Users/sr320/pythonclient/tools
#cp samtools /usr/local/bin
!cp uploadone.py /usr/local/bin
cp: /usr/local/bin: Permission denied
#SQLShare direct.
!python /Users/sr320/sqlshare-pythonclient/tools/fetchdata.py -d "[sr320@washington.edu].[lft_BlackAbalone_v3_SP_GO_pathway]" -f tsv -o /Volumes/web/cnidarian/lft_BlackAbalone_sp_go_path.txt
!head -2 /Volumes/web/cnidarian/lft_BlackAbalone_sp_go_path.txt
!wc /Volumes/web/cnidarian/lft_BlackAbalone_sp_go_path.txt
1843 96683 938139 /Volumes/web/cnidarian/lft_BlackAbalone_sp_go_path.txt
SELECT distinct *
FROM [sr320@washington.edu].[lft_BlackAbalone_v3_swissprot_blastout_d]d
left join
[sr320@washington.edu].[SPID and GO Numbers]go
on
d.SPID = go.SPID
adding Slim info...
SELECT distinct *
FROM [sr320@washington.edu].[lft_BlackAbalone_v3_swissprot_blastout_d]d
left join
[sr320@washington.edu].[SPID and GO Numbers]go
on
d.SPID = go.SPID
left join
[sr320@washington.edu].[GO_to_GOslim]slim
on
go.GOID = slim.GO_id
To get GO Slim info
SELECT Distinct
ContigID,
GOSlim_bin,
evalue
FROM [sr320@washington.edu].[lft_BlackAbalone_v3_GO]
Where aspect like 'P'
might should consider thresholding evalue
SELECT Distinct
ContigID,
GOSlim_bin,
evalue
FROM [sr320@washington.edu].[lft_BlackAbalone_v3_GO]
Where aspect like 'P'
and
evalue < 1E-10
!head /Volumes/web/cnidarian/lft_BlackAbalone_v3GOslim.csv
!wc /Volumes/web/cnidarian/lft_BlackAbalone_v3GOslim.csv
2352 5731 133085 /Volumes/web/cnidarian/lft_BlackAbalone_v3GOslim.csv
#query with no evalue limit to verify accuracy
!head /Volumes/web/cnidarian/lft_BlackAbalone_v3GOslim_jp.csv
!wc /Volumes/web/cnidarian/lft_BlackAbalone_v3GOslim_jp.csv
3430 8417 194682 /Volumes/web/cnidarian/lft_BlackAbalone_v3GOslim_jp.csv
#will try the R
%pylab inline
Welcome to pylab, a matplotlib-based Python environment [backend: module://IPython.zmq.pylab.backend_inline]. For more information, type 'help(pylab)'.
import numpy as np
import matplotlib.pylab as plt
import rpy2
--------------------------------------------------------------------------- ImportError Traceback (most recent call last) <ipython-input-7-2dcec3d9bd52> in <module>() ----> 1 import rpy2 ImportError: No module named rpy2
Gave up and used EXCEL
getting data into format
SELECT Distinct
ContigID,
SPID
FROM [sr320@washington.edu].[lft_BlackAbalone_v3_GO]
Where
evalue < 1E-10
code
./blastx -query /Volumes/web-1/cnidarian/BlackAbalone_Contigs_v3.fa -db /Volumes/web-1/whale/fish546/blast/db/nr -out /Volumes/web-1/cnidarian/lft_BlackAbalone_v3_nr_blastout -outfmt 6 -evalue 1E-5 -max_target_seqs 1 -num_threads 4
!head /Volumes/web/cnidarian/lft_BlackAbalone_v3_nr_blastout
Roberts_20100712_CC_F3_trimmed_contig_1 gi|166714376|gb|ABY87953.1| 67.65 34 10 1 2 103 90 122 7e-07 49.7 Roberts_20100712_CC_F3_trimmed_contig_2 gi|109020229|ref|XP_001111903.1| 81.82 44 8 0 2 133 72 115 1e-17 79.3 Roberts_20100712_CC_F3_trimmed_contig_6 gi|149068841|gb|EDM18393.1| 100.00 57 0 0 1 171 10 66 4e-32 116 Roberts_20100712_CC_F3_trimmed_contig_7 gi|169153945|emb|CAQ14310.1| 48.94 141 67 2 4 414 1 140 4e-34 127 Roberts_20100712_CC_F3_trimmed_contig_9 gi|342326368|gb|AEL23099.1| 100.00 69 0 0 2 208 15 83 3e-41 140 Roberts_20100712_CC_F3_trimmed_contig_10 gi|166406892|gb|ABY87409.1| 75.76 33 8 0 6 104 43 75 4e-09 56.2 Roberts_20100712_CC_F3_trimmed_contig_14 gi|91992330|gb|ABE72920.1| 96.43 56 2 0 43 210 371 426 2e-30 120 Roberts_20100712_CC_F3_trimmed_contig_15 gi|340382506|ref|XP_003389760.1| 72.97 37 10 0 2 112 56 92 6e-09 55.5 Roberts_20100712_CC_F3_trimmed_contig_17 gi|340370586|ref|XP_003383827.1| 60.53 38 15 0 1 114 218 255 5e-12 65.5 Roberts_20100712_CC_F3_trimmed_contig_19 gi|37528876|gb|AAQ92368.1| 99.62 260 1 0 1 780 32 291 0.0 542
!tail /Volumes/web/cnidarian/lft_BlackAbalone_v3_nr_blastout
Roberts_20100712_CC_F3_trimmed_contig_13618 gi|348544575|ref|XP_003459756.1| 71.43 35 10 0 3 107 73 107 8e-10 58.2 Roberts_20100712_CC_F3_trimmed_contig_13654 gi|62240178|gb|AAX77257.1| 100.00 41 0 0 3 125 120 160 8e-19 83.6 Roberts_20100712_CC_F3_trimmed_contig_13686 gi|340384210|ref|XP_003390607.1| 66.67 33 11 0 6 104 700 732 1e-07 55.1 Roberts_20100712_CC_F3_trimmed_contig_13693 gi|115767231|ref|XP_782598.2| 59.09 44 18 0 5 136 1316 1359 8e-09 58.9 Roberts_20100712_CC_F3_trimmed_contig_13748 gi|328793281|ref|XP_395852.4| 75.00 36 9 0 6 113 209 244 7e-10 60.8 Roberts_20100712_CC_F3_trimmed_contig_13760 gi|32407325|gb|AAP41556.1| 93.33 30 2 0 2 91 299 328 3e-07 53.1 Roberts_20100712_CC_F3_trimmed_contig_13804 gi|166406789|gb|ABY87358.1| 70.45 44 13 0 2 133 239 282 3e-14 73.6 Roberts_20100712_CC_F3_trimmed_contig_13845 gi|156378102|ref|XP_001630983.1| 54.69 64 22 3 49 240 58 114 6e-09 57.8 Roberts_20100712_CC_F3_trimmed_contig_13855 gi|261754259|ref|ZP_05997968.1| 80.49 41 8 0 123 1 14 54 2e-15 72.0 Roberts_20100712_CC_F3_trimmed_contig_13882 gi|144899095|emb|CAM75959.1| 66.67 30 10 0 13 102 1 30 1e-05 48.9
!grep -c "Roberts" /Volumes/web/cnidarian/lft_BlackAbalone_v3_nr_blastout
2525
code
./blastn -query /Volumes/web-1/cnidarian/BlackAbalone_Contigs_v3.fa -db /Volumes/CLC_blastdatabases/nt -out /Volumes/web-1/cnidarian/lft_BlackAbalone_v3_nt_blastout -outfmt 6 -evalue 1E-5 -max_target_seqs 1 -num_threads 6 -task blastn
!head /Volumes/web/cnidarian/lft_BlackAbalone_v3_nt_blastout
Roberts_20100712_CC_F3_trimmed_contig_2 gi|115947556|ref|XM_796732.2| 84.21 114 18 0 2 115 605 718 7e-26 125 Roberts_20100712_CC_F3_trimmed_contig_4 gi|177667010|gb|EU595789.1| 85.97 221 31 0 3 223 9228 9008 8e-66 259 Roberts_20100712_CC_F3_trimmed_contig_6 gi|345329379|ref|XM_001506155.2| 86.96 161 21 0 1 161 60 220 5e-47 196 Roberts_20100712_CC_F3_trimmed_contig_9 gi|306922143|dbj|AB490993.1| 88.67 203 23 0 5 207 364 566 6e-67 262 Roberts_20100712_CC_F3_trimmed_contig_10 gi|166406891|gb|EU244393.1| 80.85 94 18 0 12 105 133 226 4e-15 89.7 Roberts_20100712_CC_F3_trimmed_contig_14 gi|91992331|gb|DQ453716.1| 97.65 213 5 0 1 213 955 1167 1e-96 361 Roberts_20100712_CC_F3_trimmed_contig_15 gi|345492297|ref|XM_001600219.2| 75.89 112 27 0 1 112 225 336 2e-12 80.6 Roberts_20100712_CC_F3_trimmed_contig_17 gi|260831083|ref|XM_002610443.1| 86.27 51 7 0 37 87 762 812 2e-06 60.8 Roberts_20100712_CC_F3_trimmed_contig_19 gi|270313645|gb|GU263793.1| 99.10 782 7 0 1 782 164 945 0.0 1379 Roberts_20100712_CC_F3_trimmed_contig_20 gi|89331166|dbj|AB234872.1| 95.93 369 15 0 1 369 544 912 1e-167 598
!tail /Volumes/web/cnidarian/lft_BlackAbalone_v3_nt_blastout
Roberts_20100712_CC_F3_trimmed_contig_13802 gi|300386193|gb|GU995619.1| 81.36 118 21 1 7 124 422 538 1e-21 111 Roberts_20100712_CC_F3_trimmed_contig_13804 gi|166406788|gb|EU244341.1| 81.95 133 24 0 1 133 716 848 4e-28 132 Roberts_20100712_CC_F3_trimmed_contig_13813 gi|300385867|gb|GU995293.1| 87.30 63 5 2 3 63 229 290 9e-10 71.6 Roberts_20100712_CC_F3_trimmed_contig_13820 gi|356473005|gb|HQ650445.1| 88.89 54 6 0 1 54 54 107 8e-10 71.6 Roberts_20100712_CC_F3_trimmed_contig_13828 gi|60219427|emb|CR388147.13| 83.61 61 6 2 95 155 89790 89734 2e-06 60.8 Roberts_20100712_CC_F3_trimmed_contig_13855 gi|3930574|gb|AF069062.1|AF069062 99.23 130 1 0 1 130 289 418 2e-57 230 Roberts_20100712_CC_F3_trimmed_contig_13858 gi|13195722|gb|AF133090.2|AF133090 100.00 115 0 0 1 115 73 187 5e-51 208 Roberts_20100712_CC_F3_trimmed_contig_13860 gi|190356750|emb|AM999887.1| 84.09 132 19 2 1 131 1236811 1236941 3e-29 136 Roberts_20100712_CC_F3_trimmed_contig_13863 gi|82541884|gb|DQ291132.1| 87.76 49 6 0 13 61 102093 102045 5e-07 62.6 Roberts_20100712_CC_F3_trimmed_contig_13871 gi|13195722|gb|AF133090.2|AF133090 100.00 91 0 0 1 91 547 637 5e-38 165
!grep -c "Roberts" /Volumes/web/cnidarian/lft_BlackAbalone_v3_nt_blastout
2492
./blastn -query /Volumes/web-1/cnidarian/BlackAbalone_Contigs_v3.fa -db /Volumes/CLC_blastdatabases/nt -out /Volumes/web-1/cnidarian/lft_BlackAbalone_v3_nt_blastout_taxa1 -outfmt "6 qseqid sseqid sallseqid pident length evalue bitscore staxids sscinames scomnames sblastnames qcovs" -evalue 1E-5 -max_target_seqs 1 -num_threads 10 -task blastn
without luck
!head /Volumes/web/cnidarian/lft_BlackAbalone_v3_nt_blastout_taxa1
Roberts_20100712_CC_F3_trimmed_contig_2 gi|115947556|ref|XM_796732.2| gi|115947556|ref|XM_796732.2| 84.21 114 7e-26 125 7668 N/A N/A N/A 72 Roberts_20100712_CC_F3_trimmed_contig_4 gi|177667010|gb|EU595789.1| gi|177667010|gb|EU595789.1| 85.97 221 8e-66 259 42344 N/A N/A N/A 93 Roberts_20100712_CC_F3_trimmed_contig_6 gi|345329379|ref|XM_001506155.2| gi|345329379|ref|XM_001506155.2| 86.96 161 5e-47 196 9258 N/A N/A N/A 93 Roberts_20100712_CC_F3_trimmed_contig_9 gi|306922143|dbj|AB490993.1| gi|306922143|dbj|AB490993.1| 88.67 203 6e-67 262 214486 N/A N/A N/A 89 Roberts_20100712_CC_F3_trimmed_contig_9 gi|306922143|dbj|AB490993.1| gi|306922143|dbj|AB490993.1| 87.25 204 1e-62 248 214486 N/A N/A N/A 89 Roberts_20100712_CC_F3_trimmed_contig_9 gi|306922143|dbj|AB490993.1| gi|306922143|dbj|AB490993.1| 85.71 203 9e-59 235 214486 N/A N/A N/A 89 Roberts_20100712_CC_F3_trimmed_contig_9 gi|306922143|dbj|AB490993.1| gi|306922143|dbj|AB490993.1| 90.00 110 1e-32 149 214486 N/A N/A N/A 89 Roberts_20100712_CC_F3_trimmed_contig_10 gi|166406891|gb|EU244393.1| gi|166406891|gb|EU244393.1| 80.85 94 4e-15 89.7 36095 N/A N/A N/A 75 Roberts_20100712_CC_F3_trimmed_contig_14 gi|91992331|gb|DQ453716.1| gi|91992331|gb|DQ453716.1| 97.65 213 1e-96 361 6454 N/A N/A N/A 76 Roberts_20100712_CC_F3_trimmed_contig_15 gi|345492297|ref|XM_001600219.2| gi|345492297|ref|XM_001600219.2| 75.89 112 2e-12 80.6 7425 N/A N/A N/A 100
!blastn -help
USAGE blastn [-h] [-help] [-import_search_strategy filename] [-export_search_strategy filename] [-task task_name] [-db database_name] [-dbsize num_letters] [-gilist filename] [-seqidlist filename] [-negative_gilist filename] [-entrez_query entrez_query] [-db_soft_mask filtering_algorithm] [-db_hard_mask filtering_algorithm] [-subject subject_input_file] [-subject_loc range] [-query input_file] [-out output_file] [-evalue evalue] [-word_size int_value] [-gapopen open_penalty] [-gapextend extend_penalty] [-perc_identity float_value] [-xdrop_ungap float_value] [-xdrop_gap float_value] [-xdrop_gap_final float_value] [-searchsp int_value] [-max_hsps_per_subject int_value] [-penalty penalty] [-reward reward] [-no_greedy] [-min_raw_gapped_score int_value] [-template_type type] [-template_length int_value] [-dust DUST_options] [-filtering_db filtering_database] [-window_masker_taxid window_masker_taxid] [-window_masker_db window_masker_db] [-soft_masking soft_masking] [-ungapped] [-culling_limit int_value] [-best_hit_overhang float_value] [-best_hit_score_edge float_value] [-window_size int_value] [-off_diagonal_range int_value] [-use_index boolean] [-index_name string] [-lcase_masking] [-query_loc range] [-strand strand] [-parse_deflines] [-outfmt format] [-show_gis] [-num_descriptions int_value] [-num_alignments int_value] [-html] [-max_target_seqs num_sequences] [-num_threads int_value] [-remote] [-version] DESCRIPTION Nucleotide-Nucleotide BLAST 2.2.28+ OPTIONAL ARGUMENTS -h Print USAGE and DESCRIPTION; ignore all other parameters -help Print USAGE, DESCRIPTION and ARGUMENTS; ignore all other parameters -version Print version number; ignore other arguments *** Input query options -query <File_In> Input file name Default = `-' -query_loc <String> Location on the query sequence in 1-based offsets (Format: start-stop) -strand <String, `both', `minus', `plus'> Query strand(s) to search against database/subject Default = `both' *** General search options -task <String, Permissible values: 'blastn' 'blastn-short' 'dc-megablast' 'megablast' 'rmblastn' > Task to execute Default = `megablast' -db <String> BLAST database name * Incompatible with: subject, subject_loc -out <File_Out> Output file name Default = `-' -evalue <Real> Expectation value (E) threshold for saving hits Default = `10' -word_size <Integer, >=4> Word size for wordfinder algorithm (length of best perfect match) -gapopen <Integer> Cost to open a gap -gapextend <Integer> Cost to extend a gap -penalty <Integer, <=0> Penalty for a nucleotide mismatch -reward <Integer, >=0> Reward for a nucleotide match -use_index <Boolean> Use MegaBLAST database index -index_name <String> MegaBLAST database index name *** BLAST-2-Sequences options -subject <File_In> Subject sequence(s) to search * Incompatible with: db, gilist, seqidlist, negative_gilist, db_soft_mask, db_hard_mask -subject_loc <String> Location on the subject sequence in 1-based offsets (Format: start-stop) * Incompatible with: db, gilist, seqidlist, negative_gilist, db_soft_mask, db_hard_mask, remote *** Formatting options -outfmt <String> alignment view options: 0 = pairwise, 1 = query-anchored showing identities, 2 = query-anchored no identities, 3 = flat query-anchored, show identities, 4 = flat query-anchored, no identities, 5 = XML Blast output, 6 = tabular, 7 = tabular with comment lines, 8 = Text ASN.1, 9 = Binary ASN.1, 10 = Comma-separated values, 11 = BLAST archive format (ASN.1) Options 6, 7, and 10 can be additionally configured to produce a custom format specified by space delimited format specifiers. The supported format specifiers are: qseqid means Query Seq-id qgi means Query GI qacc means Query accesion qaccver means Query accesion.version qlen means Query sequence length sseqid means Subject Seq-id sallseqid means All subject Seq-id(s), separated by a ';' sgi means Subject GI sallgi means All subject GIs sacc means Subject accession saccver means Subject accession.version sallacc means All subject accessions slen means Subject sequence length qstart means Start of alignment in query qend means End of alignment in query sstart means Start of alignment in subject send means End of alignment in subject qseq means Aligned part of query sequence sseq means Aligned part of subject sequence evalue means Expect value bitscore means Bit score score means Raw score length means Alignment length pident means Percentage of identical matches nident means Number of identical matches mismatch means Number of mismatches positive means Number of positive-scoring matches gapopen means Number of gap openings gaps means Total number of gaps ppos means Percentage of positive-scoring matches frames means Query and subject frames separated by a '/' qframe means Query frame sframe means Subject frame btop means Blast traceback operations (BTOP) staxids means Subject Taxonomy ID(s), separated by a ';' sscinames means Subject Scientific Name(s), separated by a ';' scomnames means Subject Common Name(s), separated by a ';' sblastnames means Subject Blast Name(s), separated by a ';' (in alphabetical order) sskingdoms means Subject Super Kingdom(s), separated by a ';' (in alphabetical order) stitle means Subject Title salltitles means All Subject Title(s), separated by a '<>' sstrand means Subject Strand qcovs means Query Coverage Per Subject qcovhsp means Query Coverage Per HSP When not provided, the default value is: 'qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore', which is equivalent to the keyword 'std' Default = `0' -show_gis Show NCBI GIs in deflines? -num_descriptions <Integer, >=0> Number of database sequences to show one-line descriptions for Not applicable for outfmt > 4 Default = `500' * Incompatible with: max_target_seqs -num_alignments <Integer, >=0> Number of database sequences to show alignments for Default = `250' * Incompatible with: max_target_seqs -html Produce HTML output? *** Query filtering options -dust <String> Filter query sequence with DUST (Format: 'yes', 'level window linker', or 'no' to disable) Default = `20 64 1' -filtering_db <String> BLAST database containing filtering elements (i.e.: repeats) -window_masker_taxid <Integer> Enable WindowMasker filtering using a Taxonomic ID -window_masker_db <String> Enable WindowMasker filtering using this repeats database. -soft_masking <Boolean> Apply filtering locations as soft masks Default = `true' -lcase_masking Use lower case filtering in query and subject sequence(s)? *** Restrict search or results -gilist <String> Restrict search of database to list of GI's * Incompatible with: negative_gilist, seqidlist, remote, subject, subject_loc -seqidlist <String> Restrict search of database to list of SeqId's * Incompatible with: gilist, negative_gilist, remote, subject, subject_loc -negative_gilist <String> Restrict search of database to everything except the listed GIs * Incompatible with: gilist, seqidlist, remote, subject, subject_loc -entrez_query <String> Restrict search with the given Entrez query * Requires: remote -db_soft_mask <String> Filtering algorithm ID to apply to the BLAST database as soft masking * Incompatible with: db_hard_mask, subject, subject_loc -db_hard_mask <String> Filtering algorithm ID to apply to the BLAST database as hard masking * Incompatible with: db_soft_mask, subject, subject_loc -perc_identity <Real, 0..100> Percent identity -culling_limit <Integer, >=0> If the query range of a hit is enveloped by that of at least this many higher-scoring hits, delete the hit * Incompatible with: best_hit_overhang, best_hit_score_edge -best_hit_overhang <Real, (>=0 and =<0.5)> Best Hit algorithm overhang value (recommended value: 0.1) * Incompatible with: culling_limit -best_hit_score_edge <Real, (>=0 and =<0.5)> Best Hit algorithm score edge value (recommended value: 0.1) * Incompatible with: culling_limit -max_target_seqs <Integer, >=1> Maximum number of aligned sequences to keep Not applicable for outfmt <= 4 Default = `500' * Incompatible with: num_descriptions, num_alignments *** Discontiguous MegaBLAST options -template_type <String, `coding', `coding_and_optimal', `optimal'> Discontiguous MegaBLAST template type * Requires: template_length -template_length <Integer, Permissible values: '16' '18' '21' > Discontiguous MegaBLAST template length * Requires: template_type *** Statistical options -dbsize <Int8> Effective length of the database -searchsp <Int8, >=0> Effective length of the search space -max_hsps_per_subject <Integer, >=0> Override maximum number of HSPs per subject to save for ungapped searches (0 means do not override) Default = `0' *** Search strategy options -import_search_strategy <File_In> Search strategy to use * Incompatible with: export_search_strategy -export_search_strategy <File_Out> File name to record the search strategy used * Incompatible with: import_search_strategy *** Extension options -xdrop_ungap <Real> X-dropoff value (in bits) for ungapped extensions -xdrop_gap <Real> X-dropoff value (in bits) for preliminary gapped extensions -xdrop_gap_final <Real> X-dropoff value (in bits) for final gapped alignment -no_greedy Use non-greedy dynamic programming extension -min_raw_gapped_score <Integer> Minimum raw gapped score to keep an alignment in the preliminary gapped and traceback stages -ungapped Perform ungapped alignment only? -window_size <Integer, >=0> Multiple hits window size, use 0 to specify 1-hit algorithm -off_diagonal_range <Integer, >=0> Number of off-diagonals to search for the 2nd hit, use 0 to turn off Default = `0' *** Miscellaneous options -parse_deflines Should the query and subject defline(s) be parsed? -num_threads <Integer, >=1> Number of threads (CPUs) to use in the BLAST search Default = `1' * Incompatible with: remote -remote Execute search remotely? * Incompatible with: gilist, seqidlist, negative_gilist, subject_loc, num_threads
!head /Volumes/web/cnidarian/lft_BlackAbalone_v3_nt_blastout_taxa2
Roberts_20100712_CC_F3_trimmed_contig_2 gi|115947556|ref|XM_796732.2| 84.21 114 18 0 2 115 605 718 7e-26 125 PREDICTED: Strongylocentrotus purpuratus similar to MGC80921 protein, transcript variant 2 (LOC578948), partial mRNA 7668 N/A N/A N/A Roberts_20100712_CC_F3_trimmed_contig_4 gi|177667010|gb|EU595789.1| 85.97 221 31 0 3 223 9228 9008 8e-66 259 Haliotis discus hannai mitochondrion, partial genome 42344 N/A N/A N/A Roberts_20100712_CC_F3_trimmed_contig_6 gi|345329379|ref|XM_001506155.2| 86.96 161 21 0 1 161 60 220 5e-47 196 PREDICTED: Ornithorhynchus anatinus 40S ribosomal protein S3-A-like (LOC100074614), partial mRNA 9258 N/A N/A N/A Roberts_20100712_CC_F3_trimmed_contig_9 gi|306922143|dbj|AB490993.1| 88.67 203 23 0 5 207 364 566 6e-67 262 Sebastes schlegelii mRNA, clone: BRF 39-G6, induced by treatment of LPS 214486 N/A N/A N/A Roberts_20100712_CC_F3_trimmed_contig_9 gi|306922143|dbj|AB490993.1| 87.25 204 25 1 5 207 592 795 1e-62 248 Sebastes schlegelii mRNA, clone: BRF 39-G6, induced by treatment of LPS 214486 N/A N/A N/A Roberts_20100712_CC_F3_trimmed_contig_9 gi|306922143|dbj|AB490993.1| 85.71 203 29 0 5 207 136 338 9e-59 235 Sebastes schlegelii mRNA, clone: BRF 39-G6, induced by treatment of LPS 214486 N/A N/A N/A Roberts_20100712_CC_F3_trimmed_contig_9 gi|306922143|dbj|AB490993.1| 90.00 110 11 0 98 207 1 110 1e-32 149 Sebastes schlegelii mRNA, clone: BRF 39-G6, induced by treatment of LPS 214486 N/A N/A N/A Roberts_20100712_CC_F3_trimmed_contig_10 gi|166406891|gb|EU244393.1| 80.85 94 18 0 12 105 133 226 4e-15 89.7 Haliotis diversicolor clone HDr4CJ446 CD63 antigen-like protein mRNA, partial cds 36095 N/A N/A N/A Roberts_20100712_CC_F3_trimmed_contig_14 gi|91992331|gb|DQ453716.1| 97.65 213 5 0 1 213 955 1167 1e-96 361 Haliotis rufescens vitelline envelope zona pellucida domain 3 (VEZP3) mRNA, complete cds 6454 N/A N/A N/A Roberts_20100712_CC_F3_trimmed_contig_15 gi|345492297|ref|XM_001600219.2| 75.89 112 27 0 1 112 225 336 2e-12 80.6 PREDICTED: Nasonia vitripennis 60S ribosomal protein L22-like (LOC100115956), mRNA 7425 N/A N/A N/A
!sed 's/Roberts_20100712_CC_F3_trimmed_/BlackAbalone_v3_/g' </Volumes/web/cnidarian/lft_BlackAbalone_v3_nt_blastout_taxa2> /Volumes/web/cnidarian/lft_BlackAbalone_v3_nt_blastout_taxa3
#sed 's/abc/XYZ/g' <infile> outfile
!head /Volumes/web/cnidarian/lft_BlackAbalone_v3_nt_blastout_taxa3
BlackAbalone_v3_contig_2 gi|115947556|ref|XM_796732.2| 84.21 114 18 0 2 115 605 718 7e-26 125 PREDICTED: Strongylocentrotus purpuratus similar to MGC80921 protein, transcript variant 2 (LOC578948), partial mRNA 7668 N/A N/A N/A BlackAbalone_v3_contig_4 gi|177667010|gb|EU595789.1| 85.97 221 31 0 3 223 9228 9008 8e-66 259 Haliotis discus hannai mitochondrion, partial genome 42344 N/A N/A N/A BlackAbalone_v3_contig_6 gi|345329379|ref|XM_001506155.2| 86.96 161 21 0 1 161 60 220 5e-47 196 PREDICTED: Ornithorhynchus anatinus 40S ribosomal protein S3-A-like (LOC100074614), partial mRNA 9258 N/A N/A N/A BlackAbalone_v3_contig_9 gi|306922143|dbj|AB490993.1| 88.67 203 23 0 5 207 364 566 6e-67 262 Sebastes schlegelii mRNA, clone: BRF 39-G6, induced by treatment of LPS 214486 N/A N/A N/A BlackAbalone_v3_contig_9 gi|306922143|dbj|AB490993.1| 87.25 204 25 1 5 207 592 795 1e-62 248 Sebastes schlegelii mRNA, clone: BRF 39-G6, induced by treatment of LPS 214486 N/A N/A N/A BlackAbalone_v3_contig_9 gi|306922143|dbj|AB490993.1| 85.71 203 29 0 5 207 136 338 9e-59 235 Sebastes schlegelii mRNA, clone: BRF 39-G6, induced by treatment of LPS 214486 N/A N/A N/A BlackAbalone_v3_contig_9 gi|306922143|dbj|AB490993.1| 90.00 110 11 0 98 207 1 110 1e-32 149 Sebastes schlegelii mRNA, clone: BRF 39-G6, induced by treatment of LPS 214486 N/A N/A N/A BlackAbalone_v3_contig_10 gi|166406891|gb|EU244393.1| 80.85 94 18 0 12 105 133 226 4e-15 89.7 Haliotis diversicolor clone HDr4CJ446 CD63 antigen-like protein mRNA, partial cds 36095 N/A N/A N/A BlackAbalone_v3_contig_14 gi|91992331|gb|DQ453716.1| 97.65 213 5 0 1 213 955 1167 1e-96 361 Haliotis rufescens vitelline envelope zona pellucida domain 3 (VEZP3) mRNA, complete cds 6454 N/A N/A N/A BlackAbalone_v3_contig_15 gi|345492297|ref|XM_001600219.2| 75.89 112 27 0 1 112 225 336 2e-12 80.6 PREDICTED: Nasonia vitripennis 60S ribosomal protein L22-like (LOC100115956), mRNA 7425 N/A N/A N/A
SELECT
id,
foldchange,
pval,
Column13 as NCBI_nt_Des
FROM [lisa418@washington.edu].[BlackAB_DESeq.txt]de
left join [sr320@washington.edu].[lft_BlackAbalone_v3_nt_blastout_taxa3]lft
on
de.id = lft.Column1
code
./blastn -query /Volumes/web-1/cnidarian/BlackAbalone_Contigs_v3.fa -db /Volumes/web-1/whale/fish546/blast/db/Haliotis_kam_transcriptome -out /Volumes/web-1/cnidarian/lft_BlackAbalone_v3_Hal_kam_blastout -outfmt 6 -evalue 1E-5 -max_target_seqs 1 -num_threads 2 -task blastn
!head /Volumes/web/cnidarian/lft_BlackAbalone_v3_Hal_kam_blastout
Roberts_20100712_CC_F3_trimmed_contig_2 Haliotis_kam_contig3505 98.50 133 2 0 1 133 70 202 6e-62 232 Roberts_20100712_CC_F3_trimmed_contig_4 Haliotis_kam_contig17 90.13 223 22 0 1 223 1408 1186 2e-83 304 Roberts_20100712_CC_F3_trimmed_contig_5 Haliotis_kam_contig4402 92.68 355 9 2 1 338 388 34 4e-151 529 Roberts_20100712_CC_F3_trimmed_contig_6 Haliotis_kam_contig272 100.00 173 0 0 1 173 38 210 2e-86 313 Roberts_20100712_CC_F3_trimmed_contig_9 Haliotis_kam_contig1153 98.45 129 2 0 1 129 129 1 1e-59 224 Roberts_20100712_CC_F3_trimmed_contig_10 Haliotis_kam_contig59 85.42 96 14 0 10 105 1230 1135 1e-25 111 Roberts_20100712_CC_F3_trimmed_contig_15 Haliotis_kam_contig455 100.00 112 0 0 1 112 300 189 2e-53 203 Roberts_20100712_CC_F3_trimmed_contig_19 Haliotis_kam_contig3534 85.14 222 33 0 412 633 223 2 4e-67 251 Roberts_20100712_CC_F3_trimmed_contig_28 Haliotis_kam_contig329 96.36 165 6 0 1 165 337 501 8e-74 271 Roberts_20100712_CC_F3_trimmed_contig_30 Haliotis_kam_contig854 98.82 85 1 0 1 85 557 641 3e-37 149
!tail /Volumes/web/cnidarian/lft_BlackAbalone_v3_Hal_kam_blastout
Roberts_20100712_CC_F3_trimmed_contig_13830 Haliotis_kam_contig2270 97.06 136 3 1 1 135 240 105 8e-60 224 Roberts_20100712_CC_F3_trimmed_contig_13832 Haliotis_kam_contig2911 97.06 102 3 0 9 110 185 84 1e-43 170 Roberts_20100712_CC_F3_trimmed_contig_13838 Haliotis_kam_contig6166 94.19 86 5 0 6 91 135 220 3e-32 132 Roberts_20100712_CC_F3_trimmed_contig_13843 Haliotis_kam_contig4288 96.15 130 5 0 45 174 10 139 6e-56 212 Roberts_20100712_CC_F3_trimmed_contig_13850 Haliotis_kam_contig4288 83.53 85 14 0 2 86 138 222 1e-19 91.5 Roberts_20100712_CC_F3_trimmed_contig_13851 Haliotis_kam_contig413 96.67 60 2 0 1 60 510 569 2e-22 100 Roberts_20100712_CC_F3_trimmed_contig_13855 Haliotis_kam_contig6826 84.62 130 19 1 2 130 195 66 7e-35 141 Roberts_20100712_CC_F3_trimmed_contig_13860 Haliotis_kam_contig911 71.90 121 28 2 13 129 121 3 6e-11 62.6 Roberts_20100712_CC_F3_trimmed_contig_13870 Haliotis_kam_contig2899 100.00 51 0 0 1 51 51 1 2e-20 93.3 Roberts_20100712_CC_F3_trimmed_contig_13873 Haliotis_kam_contig1191 97.20 107 3 0 1 107 821 715 2e-46 179
!grep -c "Roberts" /Volumes/web/cnidarian/lft_BlackAbalone_v3_Hal_kam_blastout
5471
code
./blastn -query /Volumes/web-1/cnidarian/BlackAbalone_Contigs_v3.fa -db /Volumes/web-1/whale/fish546/blast/db/Haliotis_midae_franchini -out /Volumes/web-1/cnidarian/lft_BlackAbalone_v3_Hal_midae_blastout -outfmt 6 -evalue 1E-5 -max_target_seqs 1 -num_threads 2 -task blastn
!head /Volumes/web/cnidarian/lft_BlackAbalone_v3_Hal_midae_blastout
Roberts_20100712_CC_F3_trimmed_contig_1 Contig_3052_Coverage_84.98 90.65 107 10 0 2 108 201 95 6e-37 149 Roberts_20100712_CC_F3_trimmed_contig_2 Contig_6000_Coverage_67.17 95.33 107 5 0 1 107 109 3 3e-43 170 Roberts_20100712_CC_F3_trimmed_contig_3 Contig_18668_Coverage_137.10 85.45 55 8 0 34 88 622 568 2e-11 64.4 Roberts_20100712_CC_F3_trimmed_contig_4 Contig_22684_Coverage_350.78 86.36 66 9 0 21 86 553 618 2e-15 78.8 Roberts_20100712_CC_F3_trimmed_contig_5 Contig_5029_Coverage_76.79 81.44 194 32 3 48 240 282 92 4e-45 178 Roberts_20100712_CC_F3_trimmed_contig_7 Contig_1511_Coverage_54.42 90.30 237 23 0 181 417 297 61 6e-89 324 Roberts_20100712_CC_F3_trimmed_contig_8 Contig_789_Coverage_53.73 87.00 100 13 0 1 100 634 535 7e-29 122 Roberts_20100712_CC_F3_trimmed_contig_9 Contig_21030_Coverage_485.53 91.75 206 17 0 2 207 205 410 1e-80 295 Roberts_20100712_CC_F3_trimmed_contig_10 Contig_290_Coverage_208.28 82.98 94 16 0 12 105 610 703 1e-21 98.7 Roberts_20100712_CC_F3_trimmed_contig_11 Contig_4226_Coverage_62.51 85.00 80 12 0 66 145 477 556 2e-19 91.5
!tail /Volumes/web/cnidarian/lft_BlackAbalone_v3_Hal_midae_blastout
Roberts_20100712_CC_F3_trimmed_contig_13839 Contig_13235_Coverage_19.54 87.39 111 14 0 4 114 41 151 1e-33 138 Roberts_20100712_CC_F3_trimmed_contig_13847 Contig_94_Coverage_67.50 83.65 104 17 0 1 104 726 623 2e-25 111 Roberts_20100712_CC_F3_trimmed_contig_13853 Contig_16227_Coverage_53.40 82.14 56 8 1 3 58 56 109 5e-08 53.6 Roberts_20100712_CC_F3_trimmed_contig_13855 Contig_6084_Coverage_35.44 86.67 120 16 0 1 120 350 231 9e-36 145 Roberts_20100712_CC_F3_trimmed_contig_13857 Contig_94_Coverage_67.50 89.17 157 17 0 1 157 190 34 4e-54 206 Roberts_20100712_CC_F3_trimmed_contig_13859 Contig_17733_Coverage_12.63 91.18 34 3 0 47 80 18 51 2e-06 48.2 Roberts_20100712_CC_F3_trimmed_contig_13860 Contig_2605_Coverage_135.73 71.32 129 36 1 13 140 917 1045 9e-11 62.6 Roberts_20100712_CC_F3_trimmed_contig_13861 Contig_21412_Coverage_92.26 74.53 106 22 2 16 116 227 122 6e-12 66.2 Roberts_20100712_CC_F3_trimmed_contig_13879 Contig_10459_Coverage_16.77 74.47 94 24 0 8 101 44 137 7e-11 62.6 Roberts_20100712_CC_F3_trimmed_contig_13883 Contig_3412_Coverage_65.56 76.83 82 16 2 1 81 471 550 3e-09 57.2
!grep -c "Roberts" /Volumes/web/cnidarian/lft_BlackAbalone_v3_Hal_midae_blastout
6072
#running blackAb versus other Abalone at 1e-20
!blastn -query /Volumes/web/cnidarian/BlackAbalone_Contigs_v3.fa -db /Volumes/web/whale/fish546/blast/db/Haliotis_rufescens_transcriptome -out /Volumes/web/cnidarian/lft_BlackAbalone_v3_Hal_ruf_blastout -outfmt 6 -evalue 1E-20 -max_target_seqs 1 -num_threads 2 -task blastn
!blastn -query /Volumes/web/cnidarian/BlackAbalone_Contigs_v3.fa -db /Volumes/web/whale/fish546/blast/db/Haliotis_midae_franchini -out /Volumes/web/cnidarian/lft_BlackAbalone_v3_Hal_midae_blastout_b -outfmt 6 -evalue 1E-20 -max_target_seqs 1 -num_threads 2 -task blastn
!blastn -query /Volumes/web/cnidarian/BlackAbalone_Contigs_v3.fa -db /Volumes/web/whale/fish546/blast/db/Haliotis_kam_transcriptome -out /Volumes/web/cnidarian/lft_BlackAbalone_v3_Hal_kam_blastout_b -outfmt 6 -evalue 1E-20 -max_target_seqs 1 -num_threads 2 -task blastn
Modified from Emma's paper..
To quantify the completeness of the transcriptome, contigs were assessed to determine if they contained orthologs to proteins found in all Metazoa. Specifically, OrthoDB (Waterhouse et al. 2011, http://cegg.unige.ch/orthodb6) was used to obtain a suite of proteins from Lottia gigantea (the giant owl limpet) found as single copy, which have orthologs in all other metazoans in OrthoDB. Sequence comparisons (tBLASTn; Altschul et al. 1997) were performed to find matching contigs. An e-value threshold of 1.0E-10 was used.
!head /Volumes/web/cnidarian/OrthDB_meta_Lotgi1.fa
>AAEL010815 AAEL010815-PA Q0IED2 Putative uncharacterized protein IPR001060,IPR018808 EOG62FZF8 AAEGY GEKNNGYEVLYQNMKYGLSATKELAEYFRERSNLEEYNSKLLTKLANKAGSGGGGTFSPLWIILKSTTERLSELHAAKVQ KLTELVKNINKYAEELHKKHKSVKEEESSTQDAVHAMKESTTAVAKAKDVYNTRLQELEKARKDNSAKEIEKSEAKLRKQ QDDYKALVEKHNIIKQEFEKKMTITCKRFQEIEEAHLKQMKEFLTSYMEIVQNNFDLVGQVHSDLKRQFLELTVDKLLEQ FVLNKYTGLEKPEFIELDLVKLGSRSLGTTATAATSNNQLLINTSMPNATSGGSVTTVAEGSVTDSPALSSAAVPTNSPV NLSTSPPASGGRGSLLDALGGSTDRPMSPAAAGDSSASSSAQSTAKTRSRESRDSTTSGGADSVSTSTAGGAGGGGGGSA ISAPTSPNDVHNSNQHGGSNNSNGLASTFIGRNALLRGSKCKCSIDFFSPNSVYLSIWSRREKAKSKKTKKKKDSTENCK TFRCVSICNRTSHNRCIEVYSNFYKTVALKFSILIVFSFENITSFESSSFISEDKDETTKASDAASSNLQTTSAVSTGNV APTATPEVDEDGYSIQPRETTWDSTTLTEKSNNFYSSSDSDSEDERGERKIHVEIKPLNNGAAPISASVDELRATVENLS LSPIGALSSRSQSVSQQLGDRPSNGNDPPNASNASTPTTVHPYAPLQSPTLSMSTTSNNRYADLGDIFSEVGDISASAPA
!fgrep -c ">" /Volumes/web/cnidarian/OrthDB_meta_Lotgi1.fa
5684
!makeblastdb -in /Volumes/web/cnidarian/BlackAbalone_Contigs_v3.fa -dbtype nucl -out /Volumes/Bay3/Software/ncbi-blast-2.2.27\+/db/BlackAbalone_Contigs_v3
Building a new DB, current time: 07/17/2013 08:36:07 New DB name: /Volumes/Bay3/Software/ncbi-blast-2.2.27+/db/BlackAbalone_Contigs_v3 New DB title: /Volumes/web/cnidarian/BlackAbalone_Contigs_v3.fa Sequence type: Nucleotide Keep Linkouts: T Keep MBits: T Maximum file size: 1000000000B Adding sequences from FASTA; added 13884 sequences in 0.828346 seconds.
!tblastn -query /Volumes/Bay3/Software/ncbi-blast-2.2.26+/query/OrthDB_meta_Lotgi1.fa -db /Volumes/Bay3/Software/ncbi-blast-2.2.27\+/db/BlackAbalone_Contigs_v3 -outfmt 6 -out /Volumes/web/cnidarian/OrthoDB_Lotgi1_BlackAb_v3_tblastn.txt -max_target_seqs 1 -num_threads 2 -evalue 1E-10
!tblastn -query /Volumes/Bay3/Software/ncbi-blast-2.2.26+/query/OrthDB_meta_Lotgi1.fa -db /Volumes/Bay3/Software/ncbi-blast-2.2.27\+/db/BlackAbalone_Contigs_v3 -outfmt 6 -out /Volumes/web/cnidarian/OrthoDB_Lotgi1_BlackAb_v3_tblastn_b.txt -max_target_seqs 1 -num_threads 3 -evalue 1E-20
!head /Volumes/web/cnidarian/OrthoDB_Lotgi1_BlackAb_v3_tblastn_b.txt
AAEL010318 Roberts_20100712_CC_F3_trimmed_contig_3682 90.00 60 5 1 73 132 1 177 5e-32 116 ENSACAG00000011841 Roberts_20100712_CC_F3_trimmed_contig_3682 88.14 59 7 0 82 140 1 177 6e-33 119 ENSACAG00000002544 Roberts_20100712_CC_F3_trimmed_contig_3682 94.92 59 3 0 50 108 1 177 1e-34 123 ENSACAG00000016470 Roberts_20100712_CC_F3_trimmed_contig_3682 91.53 59 5 0 82 140 1 177 1e-34 123 ACEP20785 Roberts_20100712_CC_F3_trimmed_contig_3682 88.33 60 6 1 53 112 1 177 1e-31 114 ACEP20789 Roberts_20100712_CC_F3_trimmed_contig_3682 90.00 60 5 1 82 141 1 177 2e-32 117 ADAR004412 Roberts_20100712_CC_F3_trimmed_contig_3682 90.00 60 5 1 82 141 1 177 8e-32 115 AECH10224 Roberts_20100712_CC_F3_trimmed_contig_3682 88.33 60 6 1 104 163 1 177 2e-31 115 AECH10223 Roberts_20100712_CC_F3_trimmed_contig_3682 90.00 60 5 1 125 184 1 177 2e-32 117 g8984 Roberts_20100712_CC_F3_trimmed_contig_3682 90.00 60 5 1 82 141 1 177 1e-32 117
#wc Print byte, word, and line counts
!wc /Volumes/web/cnidarian/OrthoDB_Lotgi1_BlackAb_v3_tblastn.txt
880 10560 87475 /Volumes/web/cnidarian/OrthoDB_Lotgi1_BlackAb_v3_tblastn.txt
!fgrep -c "Roberts_20100712_CC" /Volumes/web/cnidarian/OrthoDB_Lotgi1_BlackAb_v3_tblastn.txt
880
!fgrep -c "Roberts_20100712_CC" /Volumes/web/cnidarian/OrthoDB_Lotgi1_BlackAb_v3_tblastn_b.txt
828
!wc /Volumes/web/cnidarian/OrthoDB_Lotgi1_BlackAb_v3_tblastn_b.txt
828 9936 82229 /Volumes/web/cnidarian/OrthoDB_Lotgi1_BlackAb_v3_tblastn_b.txt
concerned about the fact there are so few black abalone contigs represented here.
http://eagle.fish.washington.edu/cnidarian/OrthoDB_Lotgi1_BlackAb_v3_tblastn.txt
#-A is the number of lines After
!fgrep -A 10 "Roberts_20100712_CC_F3_trimmed_contig_3682" /Volumes/web/cnidarian/BlackAbalone_Contigs_v3.fa
>Roberts_20100712_CC_F3_trimmed_contig_3682 Average coverage: 80.91 ATCAGAATAATGTGGTCTCAACGAGACCCTTCCTTGAGAAAGTCTGGAGTGGGCAATGTG TTCATCAAGAATTTGGACAAGAGCATCGACAACAAAGCTCTGTATGACACATTCTCTGCT TTTGGCAACATCCTGTCTTGTAAGATAGCTTCTGATGAAAATGGCTCCAAGGGTTATGG >Roberts_20100712_CC_F3_trimmed_contig_3683 Average coverage: 18.90 TTGCAATCTAGAAATACGTCCGCTCTTGTACTGTAGCAGTTTTTACAATTACGCCATTGC ATCGAACCCATTAAGACCAGATCATTTACTCCTAGACGAG >Roberts_20100712_CC_F3_trimmed_contig_3684 Average coverage: 97.66 GTCTCCTGCCTCTGGTTGGAATTAACAGAAGTGATGAGTTTGTGAAGGAAGTGTGTGATC AGTGCAGCTTCGCCTCCATGCAGAAAAGCAAGTCACCCATGTCCAAGGTCATGTATAGAA AAGGTGAGGTT
#in the meantime will blast other Haliotis databases
!tblastn -query /Volumes/Bay3/Software/ncbi-blast-2.2.26+/query/OrthDB_meta_Lotgi1.fa -db /Volumes/web/whale/fish546/blast/db/Haliotis_kam_transcriptome -outfmt 6 -out /Volumes/web/cnidarian/OrthoDB_Lotgi1_Haliotis_kam_tblastn.txt -max_target_seqs 1 -num_threads 1 -evalue 1E-20
!tblastn -query /Volumes/Bay3/Software/ncbi-blast-2.2.26+/query/OrthDB_meta_Lotgi1.fa -db /Volumes/web/whale/fish546/blast/db/Haliotis_midae_franchini -outfmt 6 -out /Volumes/web/cnidarian/OrthoDB_Lotgi1_Haliotis_midae_franchini_tblastn.txt -max_target_seqs 1 -num_threads 1 -evalue 1E-20
!makeblastdb -in /Volumes/web/cnidarian/H.rufescens_contig.fa -dbtype nucl -out /Volumes/web/whale/fish546/blast/db/Haliotis_rufescens_transcriptome
Building a new DB, current time: 07/18/2013 15:46:33 New DB name: /Volumes/web/whale/fish546/blast/db/Haliotis_rufescens_transcriptome New DB title: /Volumes/web/cnidarian/H.rufescens_contig.fa Sequence type: Nucleotide Keep Linkouts: T Keep MBits: T Maximum file size: 1000000000B Adding sequences from FASTA; added 162928 sequences in 19.2224 seconds.
!tblastn -query /Volumes/Bay3/Software/ncbi-blast-2.2.26+/query/OrthDB_meta_Lotgi1.fa -db /Volumes/web/whale/fish546/blast/db/Haliotis_rufescens_transcriptome -outfmt 6 -out /Volumes/web/cnidarian/OrthoDB_Lotgi1_Haliotis_rufescens_tblastn.txt -max_target_seqs 1 -num_threads 1 -evalue 1E-20
!head /Volumes/web/cnidarian/OrthoDB_Lotgi1_Haliotis_kam_tblastn.txt
AAEL010318 Haliotis_kam_contig350 76.46 429 93 5 1 423 235 1515 0.0 679 ENSACAG00000011841 Haliotis_kam_contig350 60.55 659 229 11 1 651 208 2115 0.0 723 ENSACAG00000002544 Haliotis_kam_contig350 65.60 593 189 6 4 585 313 2079 0.0 740 ENSACAG00000016470 Haliotis_kam_contig350 66.67 627 193 8 1 614 208 2079 0.0 800 ACEP20785 Haliotis_kam_contig350 66.04 583 165 12 36 595 400 2118 0.0 690 ACEP20789 Haliotis_kam_contig350 69.40 647 175 10 1 634 208 2118 0.0 843 ADAR004412 Haliotis_kam_contig350 62.10 657 205 11 1 640 208 2097 0.0 734 AECH10224 Haliotis_kam_contig350 67.41 669 173 12 1 656 208 2118 0.0 826 AECH10223 Haliotis_kam_contig350 65.36 690 173 12 1 677 208 2118 0.0 826 g8984 Haliotis_kam_contig350 70.59 646 164 10 1 629 208 2118 0.0 891
!wc /Volumes/web/cnidarian/OrthoDB_Lotgi1_Haliotis_kam_tblastn.txt
438 5256 35941 /Volumes/web/cnidarian/OrthoDB_Lotgi1_Haliotis_kam_tblastn.txt
!head /Volumes/web/cnidarian/OrthoDB_Lotgi1_Haliotis_midae_franchini_tblastn.txt
gi|221125234|ref|XP_002165797.1| Contig_3019_Coverage_17.03 55.00 80 36 0 1 80 274 35 5e-25 97.4 g8694 Contig_1661_Coverage_123.10 37.29 177 97 2 1714 1877 679 152 1e-34 135 AAEL000339 Contig_308_Coverage_33.26 38.55 166 98 2 255 416 1 498 1e-35 131 ENSACAG00000006114 Contig_308_Coverage_33.26 37.95 166 99 2 361 522 1 498 5e-34 127 ENSACAG00000006855 Contig_308_Coverage_33.26 39.16 166 97 2 145 306 1 498 2e-37 135 ENSACAG00000003680 Contig_308_Coverage_33.26 37.35 166 100 2 326 487 1 498 5e-32 121 ACEP15907 Contig_308_Coverage_33.26 38.79 165 97 2 230 390 1 495 4e-36 132 ADAR002823 Contig_308_Coverage_33.26 39.88 163 94 2 118 276 1 489 5e-37 131 AECH18169 Contig_308_Coverage_33.26 38.79 165 96 3 347 506 1 495 9e-34 127 g1442 Contig_308_Coverage_33.26 37.95 166 99 2 319 480 1 498 9e-36 132
!wc /Volumes/web/cnidarian/OrthoDB_Lotgi1_Haliotis_midae_franchini_tblastn.txt
1432 17184 122489 /Volumes/web/cnidarian/OrthoDB_Lotgi1_Haliotis_midae_franchini_tblastn.txt
!head /Volumes/web/cnidarian/OrthoDB_Lotgi1_Haliotis_rufescens_tblastn.txt
AAEL010815 contig145 46.44 267 138 3 1 262 90 890 1e-61 216 ENSACAG00000009572 contig145 46.13 271 142 2 4 274 72 872 1e-76 255 ENSACAG00000017803 contig145 46.59 264 141 0 1 264 63 854 1e-73 232 ACEP19080 contig145 48.79 248 126 1 1 247 129 872 4e-74 249 ADAR004115 contig145 48.25 228 116 2 5 230 72 755 8e-65 224 AECH14598 contig145 51.31 267 129 1 5 270 72 872 5e-88 289 g3073 contig145 46.40 250 133 1 1 250 129 875 6e-72 244 AGAP012683 contig145 46.67 255 134 2 10 262 90 854 6e-66 212 AGAP002024 contig145 46.30 270 140 3 5 269 72 881 9e-66 229 ENSAMEG00000010758 contig145 46.27 268 144 0 1 268 63 866 9e-73 245
!wc /Volumes/web/cnidarian/OrthoDB_Lotgi1_Haliotis_rufescens_tblastn.txt
5311 63732 377427 /Volumes/web/cnidarian/OrthoDB_Lotgi1_Haliotis_rufescens_tblastn.txt