Setting up Variables¶

Assumptions are that you are working in a directory with fasta file named query.fa. And blast algorithms are in PATH.

In [1]:

#Setting Working Directory
wd="/Volumes/web/whale/fish546/pipeline_test_dir2"
#Setting directory of Blast Databases
dbd="/Volumes/Bay3/Software/ncbi-blast-2.2.29\+/db/"
#Database name
dbn="uniprot_sprot_r2013_12"
#Blast algorithim
ba="blastx"
#Location of SQLShare python tools: you can empty ("") if tools are in PATH
spd="/Users/sr320/sqlshare-pythonclient/tools/"

In [2]:

cd {wd}

/Volumes/web/whale/fish546/pipeline_test_dir2

In [3]:

!{ba} -query query.fa -db {dbd}{dbn} -out {dbn}_{ba}_out.tab -evalue 1E-50 -num_threads 4 -max_hsps_per_subject 1 -max_target_seqs 1 -outfmt 6

Selenocysteine (U) at position 52 replaced by X
Selenocysteine (U) at position 49 replaced by X
Selenocysteine (U) at position 47 replaced by X
Selenocysteine (U) at position 47 replaced by X
Selenocysteine (U) at position 47 replaced by X
Selenocysteine (U) at position 47 replaced by X
Selenocysteine (U) at position 47 replaced by X
Selenocysteine (U) at position 40 replaced by X
Selenocysteine (U) at position 40 replaced by X
Selenocysteine (U) at position 40 replaced by X
Selenocysteine (U) at position 40 replaced by X
^C

In [4]:

!head -1 {dbn}_{ba}_out.tab

ConsensusfromContig5	sp|Q9JHQ5|LZTL1_MOUSE	74.40	125	31	1	7	378	24	148	1e-59	 192

In [5]:

#Translate pipes to tab so SPID is in separate column for Joining
!tr '|' "\t" <{dbn}_{ba}_out.tab> {dbn}_{ba}_out2.tab

In [6]:

!head -1 {dbn}_{ba}_out2.tab

ConsensusfromContig5	sp	Q9JHQ5	LZTL1_MOUSE	74.40	125	31	1	7	378	24	148	1e-59	 192

In [7]:

#Uploads formatted blast table to SQLshare; currently has generic name and meant to be temporary: Warning will overwrite.
!python {spd}singleupload.py -d scratchblast_out {dbn}_{ba}_out2.tab

processing chunk line 0 to 26 (0.000550031661987 s elapsed)
pushing uniprot_sprot_r2013_12_blastx_out2.tab...
parsing 910A8183...
finished scratchblast_out

In [8]:

!python {spd}fetchdata.py -s "SELECT * FROM [sr320@washington.edu].[scratchblast_out]blast Left Join [sr320@washington.edu].[uniprot-reviewed_wGO_010714]unp ON blast.Column3 = unp.Entry Left Join [sr320@washington.edu].[SPID and GO Numbers]go ON unp.Entry = go.SPID Left Join [sr320@washington.edu].[GO_to_GOslim]slim ON slim.GO_id = go.GOID" -f tsv -o {dbn}_join2goslim.txt

In [10]:

!head -2 {dbn}_join2goslim.txt

In [11]:

!python {spd}singleupload.py -d scratchjoin_slim {dbn}_join2goslim.txt

processing chunk line 0 to 328 (0.00167202949524 s elapsed)
pushing uniprot_sprot_r2013_12_join2goslim.txt...
parsing 8CCBA49D...
finished scratchjoin_slim

In [12]:

!python {spd}fetchdata.py -s "SELECT Distinct Column1 as query, Column3 as SPID, GOSlim_bin FROM [sr320@washington.edu].[scratchjoin_slim] Where aspect = 'P'" -f tsv -o justslim.txt

In [13]:

!head justslim.txt

In [14]:

!python {spd}singleupload.py -d scratchpie justslim.txt

processing chunk line 0 to 57 (0.000425100326538 s elapsed)
pushing justslim.txt...
parsing 9A53B55D...
finished scratchpie

In [15]:

!python {spd}fetchdata.py -s "SELECT GOSlim_bin, COUNT(GOSlim_bin) as termcount from [sr320@washington.edu].[scratchpie] Group by GOSlim_bin" -f tsv -o justpie.txt

In [16]:

#for now have to add file name
from pandas import *

# read data from data file into a pandas DataFrame  
slimpie = read_table("justpie.txt", # name of the data file
            #sep=",", # what character separates each column?
            na_values=["", " "]) # what values should be considered "blank" values?

In [18]:

slimpie

Out[18]:

	GOSlim_bin	termcount
0	cell cycle and proliferation	1
1	cell organization and biogenesis	5
2	cell-cell signaling	1
3	death	1
4	developmental processes	5
5	other biological processes	7
6	other metabolic processes	11
7	protein metabolism	7
8	RNA metabolism	5
9	signal transduction	4
10	stress response	5
11	transport	4

Fun Stuff from SQLShare graph