Assumptions are that you are working in a directory with fasta file named query.fa
. And blast algorithms are in PATH.
#Setting Working Directory
wd="/Volumes/web/whale/fish546/pipeline_test_dir2"
#Setting directory of Blast Databases
dbd="/Volumes/Bay3/Software/ncbi-blast-2.2.29\+/db/"
#Database name
dbn="uniprot_sprot_r2013_12"
#Blast algorithim
ba="blastx"
#Location of SQLShare python tools: you can empty ("") if tools are in PATH
spd="/Users/sr320/sqlshare-pythonclient/tools/"
cd {wd}
/Volumes/web/whale/fish546/pipeline_test_dir2
!{ba} -query query.fa -db {dbd}{dbn} -out {dbn}_{ba}_out.tab -evalue 1E-50 -num_threads 4 -max_hsps_per_subject 1 -max_target_seqs 1 -outfmt 6
Selenocysteine (U) at position 52 replaced by X Selenocysteine (U) at position 49 replaced by X Selenocysteine (U) at position 47 replaced by X Selenocysteine (U) at position 47 replaced by X Selenocysteine (U) at position 47 replaced by X Selenocysteine (U) at position 47 replaced by X Selenocysteine (U) at position 47 replaced by X Selenocysteine (U) at position 40 replaced by X Selenocysteine (U) at position 40 replaced by X Selenocysteine (U) at position 40 replaced by X Selenocysteine (U) at position 40 replaced by X ^C
!head -1 {dbn}_{ba}_out.tab
ConsensusfromContig5 sp|Q9JHQ5|LZTL1_MOUSE 74.40 125 31 1 7 378 24 148 1e-59 192
#Translate pipes to tab so SPID is in separate column for Joining
!tr '|' "\t" <{dbn}_{ba}_out.tab> {dbn}_{ba}_out2.tab
!head -1 {dbn}_{ba}_out2.tab
ConsensusfromContig5 sp Q9JHQ5 LZTL1_MOUSE 74.40 125 31 1 7 378 24 148 1e-59 192
#Uploads formatted blast table to SQLshare; currently has generic name and meant to be temporary: Warning will overwrite.
!python {spd}singleupload.py -d scratchblast_out {dbn}_{ba}_out2.tab
processing chunk line 0 to 26 (0.000550031661987 s elapsed) pushing uniprot_sprot_r2013_12_blastx_out2.tab... parsing 910A8183... finished scratchblast_out
!python {spd}fetchdata.py -s "SELECT * FROM [sr320@washington.edu].[scratchblast_out]blast Left Join [sr320@washington.edu].[uniprot-reviewed_wGO_010714]unp ON blast.Column3 = unp.Entry Left Join [sr320@washington.edu].[SPID and GO Numbers]go ON unp.Entry = go.SPID Left Join [sr320@washington.edu].[GO_to_GOslim]slim ON slim.GO_id = go.GOID" -f tsv -o {dbn}_join2goslim.txt
!head -2 {dbn}_join2goslim.txt
!python {spd}singleupload.py -d scratchjoin_slim {dbn}_join2goslim.txt
processing chunk line 0 to 328 (0.00167202949524 s elapsed) pushing uniprot_sprot_r2013_12_join2goslim.txt... parsing 8CCBA49D... finished scratchjoin_slim
!python {spd}fetchdata.py -s "SELECT Distinct Column1 as query, Column3 as SPID, GOSlim_bin FROM [sr320@washington.edu].[scratchjoin_slim] Where aspect = 'P'" -f tsv -o justslim.txt
!head justslim.txt
!python {spd}singleupload.py -d scratchpie justslim.txt
processing chunk line 0 to 57 (0.000425100326538 s elapsed) pushing justslim.txt... parsing 9A53B55D... finished scratchpie
!python {spd}fetchdata.py -s "SELECT GOSlim_bin, COUNT(GOSlim_bin) as termcount from [sr320@washington.edu].[scratchpie] Group by GOSlim_bin" -f tsv -o justpie.txt
#for now have to add file name
from pandas import *
# read data from data file into a pandas DataFrame
slimpie = read_table("justpie.txt", # name of the data file
#sep=",", # what character separates each column?
na_values=["", " "]) # what values should be considered "blank" values?
slimpie
GOSlim_bin | termcount | |
---|---|---|
0 | cell cycle and proliferation | 1 |
1 | cell organization and biogenesis | 5 |
2 | cell-cell signaling | 1 |
3 | death | 1 |
4 | developmental processes | 5 |
5 | other biological processes | 7 |
6 | other metabolic processes | 11 |
7 | protein metabolism | 7 |
8 | RNA metabolism | 5 |
9 | signal transduction | 4 |
10 | stress response | 5 |
11 | transport | 4 |
Fun Stuff from SQLShare graph