source: FIND SOURCE (history Oct. 10 computer in meeting room)
!head /Volumes/web/scaphapoda/Grace/Transcriptomes/mercenaria/query.fa
>Mmercenaria_Contig_1 AAGAGTGACTGGTACCACCTGTGTACTACAATGGTTATTTGATACAACTAAATGTAAGCGGTACCACCATGTATTACAATGTGAAATTAGTATCAATAAGTGTGGCTGGTACCTTTATATATTACAGGTGCTGTTATGTTTGACAGGAATACTGATGTGAGATAGTTACTTCCATACTATGTGTAACCTACGGTCCGGCACGTTGAATGGTGGGGTG >Mmercenaria_Contig_2 ACAGCTGTCTGATTACTTATACAAAGAACACGGGTTTAAAGCAGAAATGATTGATACTCTGTACAACTATGCCAAGTTTCAGTATGAATGTGGTAATTATTCTGCAGCAGCTGAATATCTCTACTTTGTTAGAATCCTGCTACCACCAAATGACAGAAATTACTTGAATGCATTATGGGGGAAGTTAGCTTCAGAGATTCTCATGCAAACGACCAGTG >Mmercenaria_Contig_3 TGACGAGACTCTCAAGTTCATTGCAAGAAAGTTTACTGATGCAAAAATGTAATTTATCTCAGTGAAGGTCTATAGGAGTATCCCAGCTTCTTTTGAGGAGTCAACAATTTTCATAGCTGTAGTTAGATGCCAGTCTTCTGTAGAAACTACCCAGGATTCCATTATTTCTTCTGATTGATCAGTGGTTGCCTAGCAATGAAGTGTTTCACAAAAAGCT >Mmercenaria_Contig_4 TATTTTGAGCATAACTTATAACCCGTTCAACGTTCAAGGTATTGACTTCTGACTGGGAATATAAGTAGGTGGCAATGAAACCATGTGCAGATCGGTAGGTCAAATGTTAATGTAGTCAGATCTAACTGTCATATTTCATGGTCCGTACTCGACCTCCTTTAATCTAAAACTTTTGACGTATATTGCACCGCTTTGCGGAGATCTTGTTTGATTATAATTTGACTTTGTTATGGCTTTCACTAGTTT >Mmercenaria_Contig_5 TAAAAGAACGCATACACCCATCAGTTTTGAAACTATTTTAGTTAATTTCATTATACAATTCAGAGTAGGTGTCAAAATTTCATGATAAACTGTCAGAACTGGTAGAAGTCTCCGTGATCGACCATTTTACATTTATTTCCCTAACAGATTGTTGTTTATCTCTACGACATCGTTAAATTAAATAGCAACATTTTAAGAACATCTCCGACAATCA
!fgrep -c ">" /Volumes/web/scaphapoda/Grace/Transcriptomes/mercenaria/query.fa
8482
wd="/Volumes/web/scaphapoda/Grace/Transcriptomes/mercenaria"
dircode="me"
cd {wd}
/Volumes/web/scaphapoda/Grace/Transcriptomes/mercenaria
!blastx \
-query query.fa \
-db /Volumes/Data/blast_db/uniprot_sprot \
-max_target_seqs 1 \
-max_hsps 1 \
-outfmt 6 \
-num_threads 8 \
-out blast_sprot.tab
Selenocysteine (U) at position 144 replaced by X Selenocysteine (U) at position 144 replaced by X Selenocysteine (U) at position 144 replaced by X Selenocysteine (U) at position 121 replaced by X Selenocysteine (U) at position 47 replaced by X Selenocysteine (U) at position 387 replaced by X Selenocysteine (U) at position 387 replaced by X Selenocysteine (U) at position 388 replaced by X Selenocysteine (U) at position 387 replaced by X Selenocysteine (U) at position 140 replaced by X Selenocysteine (U) at position 73 replaced by X Selenocysteine (U) at position 73 replaced by X Selenocysteine (U) at position 73 replaced by X Selenocysteine (U) at position 73 replaced by X Selenocysteine (U) at position 73 replaced by X Selenocysteine (U) at position 73 replaced by X Selenocysteine (U) at position 73 replaced by X Selenocysteine (U) at position 73 replaced by X Selenocysteine (U) at position 7 replaced by X Selenocysteine (U) at position 47 replaced by X Selenocysteine (U) at position 196 replaced by X Selenocysteine (U) at position 648 replaced by X Selenocysteine (U) at position 642 replaced by X Selenocysteine (U) at position 651 replaced by X Selenocysteine (U) at position 52 replaced by X Selenocysteine (U) at position 47 replaced by X Selenocysteine (U) at position 47 replaced by X Selenocysteine (U) at position 75 replaced by X Selenocysteine (U) at position 49 replaced by X Selenocysteine (U) at position 47 replaced by X Selenocysteine (U) at position 47 replaced by X Selenocysteine (U) at position 47 replaced by X Selenocysteine (U) at position 47 replaced by X Selenocysteine (U) at position 52 replaced by X Selenocysteine (U) at position 47 replaced by X Selenocysteine (U) at position 46 replaced by X Selenocysteine (U) at position 47 replaced by X Selenocysteine (U) at position 43 replaced by X Selenocysteine (U) at position 40 replaced by X Selenocysteine (U) at position 40 replaced by X Selenocysteine (U) at position 40 replaced by X Selenocysteine (U) at position 40 replaced by X Selenocysteine (U) at position 40 replaced by X Selenocysteine (U) at position 40 replaced by X Selenocysteine (U) at position 40 replaced by X Selenocysteine (U) at position 40 replaced by X Selenocysteine (U) at position 73 replaced by X Selenocysteine (U) at position 73 replaced by X Selenocysteine (U) at position 73 replaced by X Selenocysteine (U) at position 73 replaced by X Selenocysteine (U) at position 73 replaced by X Selenocysteine (U) at position 73 replaced by X Selenocysteine (U) at position 73 replaced by X Selenocysteine (U) at position 73 replaced by X Selenocysteine (U) at position 73 replaced by X Selenocysteine (U) at position 64 replaced by X Selenocysteine (U) at position 15 replaced by X Selenocysteine (U) at position 637 replaced by X Selenocysteine (U) at position 690 replaced by X Selenocysteine (U) at position 525 replaced by X Selenocysteine (U) at position 690 replaced by X Selenocysteine (U) at position 498 replaced by X Selenocysteine (U) at position 666 replaced by X Selenocysteine (U) at position 494 replaced by X Selenocysteine (U) at position 648 replaced by X Selenocysteine (U) at position 612 replaced by X Selenocysteine (U) at position 498 replaced by X Selenocysteine (U) at position 498 replaced by X Selenocysteine (U) at position 498 replaced by X Selenocysteine (U) at position 510 replaced by X Selenocysteine (U) at position 523 replaced by X Selenocysteine (U) at position 523 replaced by X Selenocysteine (U) at position 525 replaced by X Selenocysteine (U) at position 642 replaced by X Selenocysteine (U) at position 651 replaced by X Selenocysteine (U) at position 120 replaced by X Selenocysteine (U) at position 127 replaced by X Selenocysteine (U) at position 462 replaced by X Selenocysteine (U) at position 64 replaced by X Selenocysteine (U) at position 196 replaced by X Selenocysteine (U) at position 498 replaced by X Selenocysteine (U) at position 196 replaced by X Selenocysteine (U) at position 84 replaced by X Selenocysteine (U) at position 59 replaced by X Selenocysteine (U) at position 297 replaced by X Selenocysteine (U) at position 307 replaced by X Selenocysteine (U) at position 338 replaced by X Selenocysteine (U) at position 350 replaced by X Selenocysteine (U) at position 363 replaced by X Selenocysteine (U) at position 365 replaced by X Selenocysteine (U) at position 372 replaced by X Selenocysteine (U) at position 388 replaced by X Selenocysteine (U) at position 390 replaced by X Selenocysteine (U) at position 397 replaced by X Selenocysteine (U) at position 399 replaced by X Selenocysteine (U) at position 133 replaced by X Selenocysteine (U) at position 266 replaced by X Selenocysteine (U) at position 133 replaced by X Selenocysteine (U) at position 266 replaced by X Selenocysteine (U) at position 133 replaced by X Selenocysteine (U) at position 266 replaced by X Selenocysteine (U) at position 388 replaced by X Selenocysteine (U) at position 17 replaced by X Selenocysteine (U) at position 128 replaced by X Selenocysteine (U) at position 261 replaced by X Selenocysteine (U) at position 642 replaced by X Selenocysteine (U) at position 613 replaced by X Selenocysteine (U) at position 498 replaced by X Selenocysteine (U) at position 648 replaced by X Selenocysteine (U) at position 498 replaced by X Selenocysteine (U) at position 498 replaced by X Selenocysteine (U) at position 47 replaced by X Selenocysteine (U) at position 65 replaced by X Selenocysteine (U) at position 47 replaced by X Selenocysteine (U) at position 47 replaced by X Selenocysteine (U) at position 49 replaced by X Selenocysteine (U) at position 53 replaced by X Selenocysteine (U) at position 49 replaced by X Selenocysteine (U) at position 49 replaced by X Selenocysteine (U) at position 49 replaced by X Selenocysteine (U) at position 54 replaced by X Selenocysteine (U) at position 613 replaced by X Selenocysteine (U) at position 613 replaced by X Selenocysteine (U) at position 59 replaced by X Selenocysteine (U) at position 267 replaced by X Selenocysteine (U) at position 273 replaced by X Selenocysteine (U) at position 279 replaced by X Selenocysteine (U) at position 290 replaced by X Selenocysteine (U) at position 292 replaced by X Selenocysteine (U) at position 294 replaced by X Selenocysteine (U) at position 310 replaced by X Selenocysteine (U) at position 320 replaced by X Selenocysteine (U) at position 322 replaced by X Selenocysteine (U) at position 336 replaced by X Selenocysteine (U) at position 338 replaced by X Selenocysteine (U) at position 346 replaced by X Selenocysteine (U) at position 353 replaced by X Selenocysteine (U) at position 355 replaced by X Selenocysteine (U) at position 362 replaced by X Selenocysteine (U) at position 364 replaced by X Selenocysteine (U) at position 47 replaced by X Selenocysteine (U) at position 121 replaced by X Selenocysteine (U) at position 126 replaced by X Selenocysteine (U) at position 126 replaced by X Selenocysteine (U) at position 126 replaced by X Selenocysteine (U) at position 126 replaced by X Selenocysteine (U) at position 126 replaced by X Selenocysteine (U) at position 126 replaced by X Selenocysteine (U) at position 75 replaced by X Selenocysteine (U) at position 47 replaced by X Selenocysteine (U) at position 47 replaced by X Selenocysteine (U) at position 43 replaced by X Selenocysteine (U) at position 73 replaced by X Selenocysteine (U) at position 73 replaced by X Selenocysteine (U) at position 129 replaced by X Selenocysteine (U) at position 129 replaced by X Selenocysteine (U) at position 73 replaced by X Selenocysteine (U) at position 73 replaced by X Selenocysteine (U) at position 73 replaced by X Selenocysteine (U) at position 73 replaced by X Selenocysteine (U) at position 73 replaced by X Selenocysteine (U) at position 73 replaced by X Selenocysteine (U) at position 73 replaced by X Selenocysteine (U) at position 73 replaced by X Selenocysteine (U) at position 387 replaced by X Selenocysteine (U) at position 132 replaced by X Selenocysteine (U) at position 47 replaced by X Selenocysteine (U) at position 47 replaced by X Selenocysteine (U) at position 49 replaced by X Selenocysteine (U) at position 47 replaced by X Selenocysteine (U) at position 47 replaced by X Selenocysteine (U) at position 47 replaced by X Selenocysteine (U) at position 47 replaced by X Selenocysteine (U) at position 52 replaced by X Selenocysteine (U) at position 47 replaced by X Selenocysteine (U) at position 46 replaced by X Selenocysteine (U) at position 47 replaced by X Selenocysteine (U) at position 43 replaced by X Selenocysteine (U) at position 18 replaced by X Selenocysteine (U) at position 38 replaced by X Selenocysteine (U) at position 13 replaced by X Selenocysteine (U) at position 17 replaced by X Selenocysteine (U) at position 17 replaced by X Selenocysteine (U) at position 15 replaced by X Selenocysteine (U) at position 17 replaced by X Selenocysteine (U) at position 15 replaced by X Selenocysteine (U) at position 15 replaced by X Selenocysteine (U) at position 15 replaced by X Selenocysteine (U) at position 25 replaced by X Selenocysteine (U) at position 16 replaced by X Selenocysteine (U) at position 17 replaced by X Selenocysteine (U) at position 13 replaced by X Selenocysteine (U) at position 16 replaced by X Selenocysteine (U) at position 16 replaced by X Selenocysteine (U) at position 16 replaced by X Selenocysteine (U) at position 16 replaced by X Selenocysteine (U) at position 21 replaced by X Selenocysteine (U) at position 24 replaced by X Selenocysteine (U) at position 60 replaced by X Selenocysteine (U) at position 63 replaced by X Selenocysteine (U) at position 63 replaced by X Selenocysteine (U) at position 666 replaced by X Selenocysteine (U) at position 126 replaced by X Selenocysteine (U) at position 15 replaced by X Selenocysteine (U) at position 637 replaced by X Selenocysteine (U) at position 612 replaced by X Selenocysteine (U) at position 122 replaced by X Selenocysteine (U) at position 47 replaced by X Selenocysteine (U) at position 47 replaced by X Selenocysteine (U) at position 73 replaced by X Selenocysteine (U) at position 73 replaced by X Selenocysteine (U) at position 73 replaced by X Selenocysteine (U) at position 73 replaced by X Selenocysteine (U) at position 73 replaced by X Selenocysteine (U) at position 73 replaced by X Selenocysteine (U) at position 73 replaced by X Selenocysteine (U) at position 73 replaced by X Selenocysteine (U) at position 73 replaced by X Selenocysteine (U) at position 73 replaced by X Selenocysteine (U) at position 523 replaced by X Selenocysteine (U) at position 525 replaced by X Selenocysteine (U) at position 642 replaced by X Selenocysteine (U) at position 349 replaced by X Selenocysteine (U) at position 18 replaced by X Selenocysteine (U) at position 38 replaced by X Selenocysteine (U) at position 16 replaced by X Selenocysteine (U) at position 13 replaced by X Selenocysteine (U) at position 17 replaced by X Selenocysteine (U) at position 17 replaced by X Selenocysteine (U) at position 15 replaced by X Selenocysteine (U) at position 17 replaced by X Selenocysteine (U) at position 15 replaced by X Selenocysteine (U) at position 15 replaced by X Selenocysteine (U) at position 15 replaced by X Selenocysteine (U) at position 25 replaced by X Selenocysteine (U) at position 16 replaced by X Selenocysteine (U) at position 17 replaced by X Selenocysteine (U) at position 13 replaced by X Selenocysteine (U) at position 16 replaced by X Selenocysteine (U) at position 16 replaced by X Selenocysteine (U) at position 19 replaced by X Selenocysteine (U) at position 16 replaced by X Selenocysteine (U) at position 7 replaced by X Selenocysteine (U) at position 16 replaced by X Selenocysteine (U) at position 21 replaced by X Selenocysteine (U) at position 189 replaced by X Selenocysteine (U) at position 24 replaced by X Selenocysteine (U) at position 60 replaced by X Selenocysteine (U) at position 63 replaced by X Selenocysteine (U) at position 63 replaced by X Selenocysteine (U) at position 196 replaced by X Selenocysteine (U) at position 189 replaced by X Selenocysteine (U) at position 350 replaced by X
!wc -l blast_sprot.tab
7174 blast_sprot.tab
!tr '|' "\t" <blast_sprot.tab> blast_sprot_sql.tab
!head blast_sprot_sql.tab
Mmercenaria_Contig_1 sp P06538 DPOL_ADE12 26.09 46 34 0 141 4 100 145 6.2 28.5 Mmercenaria_Contig_2 sp Q6DRI1 EI3EA_DANRE 75.00 68 17 0 5 208 114 181 2e-29 112 Mmercenaria_Contig_3 sp O94823 AT10B_HUMAN 61.11 18 7 0 162 215 99 116 2.2 29.6 Mmercenaria_Contig_5 sp P0A5H8 EFPP_MYCTU 63.16 19 7 0 117 61 20 38 0.64 31.2 Mmercenaria_Contig_6 sp Q9WU60 ATRN_MOUSE 28.85 52 33 1 168 13 808 855 0.12 33.9 Mmercenaria_Contig_8 sp P18547 VNCS_PAVPN 50.00 22 11 0 111 176 362 383 0.85 30.8 Mmercenaria_Contig_9 sp A8WGF4 IF122_XENTR 67.16 67 22 0 1 201 894 960 6e-24 99.4 Mmercenaria_Contig_10 sp Q4QK86 MUKB_HAEI8 29.79 47 33 0 16 156 262 308 1.6 30.0 Mmercenaria_Contig_11 sp Q0AQ76 THIG_MARMM 34.09 44 27 1 210 79 84 125 3.4 28.9 Mmercenaria_Contig_12 sp P15106 GLNA_STRCO 39.29 28 17 0 40 123 124 151 0.84 30.8
!python /Applications/sqlshare-pythonclient-master/tools/singleupload.py \
-d {dircode}_uniprot \
blast_sprot_sql.tab
processing chunk line 0 to 7174 (0.00320911407471 s elapsed) pushing blast_sprot_sql.tab... parsing DDA8388F... finished me_uniprot
!python /Applications/sqlshare-pythonclient-master/tools/fetchdata.py \
-s "SELECT Column1, term, GOSlim_bin, aspect, ProteinName FROM [graceac9@washington.edu].[me_uniprot]me left join [samwhite@washington.edu].[UniprotProtNamesReviewed_yes20130610]sp on me.Column3=sp.SPID left join [sr320@washington.edu].[SPID and GO Numbers]go on me.Column3=go.SPID left join [sr320@washington.edu].[GO_to_GOslim]slim on go.GOID=slim.GO_id where aspect like 'P'" \
-f tsv \
-o {dircode}_descriptions.txt
!head {dircode}_descriptions.txt
pylab inline
Populating the interactive namespace from numpy and matplotlib
from pandas import *
gs = read_table('me_descriptions.txt')
gs.groupby('GOSlim_bin').Column1.count().plot(kind='barh', color=list('y'))
<matplotlib.axes.AxesSubplot at 0x10e13b7d0>
!egrep --color "male|female|genitalia|gonad|ovarian|reproduction|estrogen|testosterone|gametogenesis|germination|ovulation|penile|prostate|vulval" {dircode}_descriptions.txt / {dircode}_reprot.txt
egrep: /: Is a directory
!head -2 {dircode}_reprot.txt
#counting list of associated GO terms
!cut -f 2 {dircode}_reprot.txt | sort | uniq -c
2 "dorsal/ventral axis specification, ovarian follicular epithelium" 2 "male courtship behavior, veined wing generated song production" 2 "maternal determination of dorsal/ventral axis, ovarian follicular epithelium, germ-line encoded" 1 "negative regulation of transcription, DNA-dependent" 1 "regulation of transcription, DNA-dependent" 1 "transcription, DNA-dependent" 1 ATP catabolic process 1 B cell receptor signaling pathway 2 G-protein coupled receptor protein signaling pathway 1 RNA splicing 1 actin cytoskeleton organization 1 actin filament organization 1 antigen receptor-mediated signaling pathway 2 biological_process 1 cell adhesion 1 cell migration 1 chloride transport 1 development of primary female sexual characteristics 1 embryonic process involved in female pregnancy 1 epidermal growth factor receptor signaling pathway 1 estrogen biosynthetic process 1 estrogen catabolic process 1 exocytosis 1 external genitalia morphogenesis 1 fat cell differentiation 1 female genitalia morphogenesis 2 female germ-line stem cell division 4 female gonad development 1 female meiosis I 8 female pregnancy 2 female pronucleus formation 2 germarium-derived female germ-line cyst encapsulation 3 germarium-derived female germ-line cyst formation 7 gonad development 1 gonad morphogenesis 3 gonadal mesoderm development 18 hermaphrodite genitalia development 1 insulin receptor signaling pathway 1 integrin-mediated signaling pathway 4 inter-male aggressive behavior 2 ion transport 1 iron ion homeostasis 1 mRNA processing 1 male courtship behavior 7 male gonad development 1 male mating behavior 3 male meiosis 2 male meiosis I 1 male meiosis chromosome segregation 2 male pronucleus formation 4 male sex determination 2 mating plug formation 1 multicellular organismal development 1 negative regulation of estrogen receptor signaling pathway 1 negative regulation of seed germination 2 negative regulation of vulval development 1 nerve growth factor receptor signaling pathway 7 ovarian follicle cell development 3 ovarian follicle cell migration 1 ovarian follicle cell stalk formation 5 ovarian follicle development 3 ovarian fusome organization 8 ovarian ring canal formation 3 ovulation 1 ovulation cycle 2 ovulation from ovarian follicle 1 oxidation reduction 2 peptide cross-linking 1 plasma membrane fusion 1 platelet-derived growth factor receptor signaling pathway 4 pollen germination 1 positive regulation of cell migration 2 positive regulation of estrogen receptor signaling pathway 1 positive regulation of ovulation 2 positive regulation of receptor recycling 2 positive regulation of vulval development 1 potassium ion transport 1 prostate gland development 1 prostate gland growth 2 protein transport 1 regulation of estrogen receptor signaling pathway 1 regulation of exocytosis 1 regulation of vulval development 48 reproduction 7 response to estrogen stimulus 1 response to gonadotropin stimulus 4 response to testosterone stimulus 8 sexual reproduction 1 signal transduction 2 spindle assembly involved in female meiosis I 2 spore germination 1 sporulation resulting in formation of a cellular spore 1 squamous basal epithelial stem cell differentiation involved in prostate gland acinus development 1 synaptic transmission 1 terminal region determination 4 transport 1 vesicle-mediated transport 27 viral reproduction
!wc -l {dircode}_reprot.txt
288 me_reprot.txt