BLAST, the Basic Local Alignment Search Tool, allows a user to search with either a protein (amino acid based) or nucleotide (DNA or RNA based) sequence and find statistically significant similar sequences within a BLAST sequence database. The finer details of how BLAST works can reviewed at the NCBI website or Wikipedia.
Yes. The website allows a user to run a BLAST search against the protein target sequences or the biotherapeutic sequences. Users can also download all of the 'sequence databases' from the ftpsite (look for the .fa.gz downloads) and run BLAST searches locally. Currently the ChEMBL Web Services do not provide this functionality, but this will be added in the future.
If you want to search ChEMBL, you could use the ChEMBL web interface. Alternatively you could run the search locally working through the following steps:
As you see, it is not too complicated and steps 1-3 have already carried out on this version of myChEMBL. Before we run a search we will first set up some parameters:
import re
# Input parameters
blast_exe = '/home/chembl/blast/ncbi-blast-2.2.29+/bin/blastp'
query_file = '/tmp/test.fa'
eval_threshold = 0.001
num_descriptions = 5
num_alignments = 5
database = '/home/chembl/blast/chembl/chembl_21.fa'
# Output parameters
results_txt = '/tmp/test.out'
results_xml = '/tmp/test.xml'
results_csv = '/tmp/test.csv'
# Query sequence used throughout this tutorial
# ** Feel free to edit the protein sequence below **
# ** DO NOT INCLUDE WHITESPACES IN SEQUENCE HEADER LINE **
# **
query_sequence = '''
>Q96P68_OXGR1_HUMAN
MNEPLDYLANASDFPDYAAAFGNCTDENIPLKMHYLPVIYGIIFLVGFPGNAVVISTYIF
KMRPWKSSTIIMLNLACTDLLYLTSLPFLIHYYASGENWIFGDFMCKFIRFSFHFNLYSS
ILFLTCFSIFRYCVIIHPMSCFSIHKTRCAVVACAVVWIISLVAVIPMTFLITSTNRTNR
SACLDLTSSDELNTIKWYNLILTATTFCLPLVIVTLCYTTIIHTLTHGLQTDSCLKQKAR
RLTILLLLAFYVCFLPFHILRVIRIESRLLSISCSIENQIHEAYIVSRPLAALNTFGNLL
LYVVVSDNFQQAVCSTVRCKVSGNLEQAKKISYSNNP
>Q86XF0_DHFRL1_HUMAN
MFLLLNCIVAVSQNMGIGKNGDLPRPPLRNEFRYFQRMTTTSSVEGKQNLVIMGRKTWFS
IPEKNRPLKDRINLVLSRELKEPPQGAHFLARSLDDALKLTERPELANKVDMIWIVGGSS
VYKEAMNHLGHLKLFVTRIMQDFESDTFFSEIDLEKYKLLPEYPGVLSDVQEGKHIKYKF
EVCEKDD
>Q9UKX5_ITGA11_HUMAN
MDLPRGLVVAWALSLWPGFTDTFNMDTRKPRVIPGSRTAFFGYTVQQHDISGNKWLVVGA
PLETNGYQKTGDVYKCPVIHGNCTKLNLGRVTLSNVSERKDNMRLGLSLATNPKDNSFLA
CSPLWSHECGSSYYTTGMCSRVNSNFRFSKTVAPALQRCQTYMDIVIVLDGSNSIYPWVE
VQHFLINILKKFYIGPGQIQVGVVQYGEDVVHEFHLNDYRSVKDVVEAASHIEQRGGTET
RTAFGIEFARSEAFQKGGRKGAKKVMIVITDGESHDSPDLEKVIQQSERDNVTRYAVAVL
GYYNRRGINPETFLNEIKYIASDPDDKHFFNVTDEAALKDIVDALGDRIFSLEGTNKNET
SFGLEMSQTGFSSHVVEDGVLLGAVGAYDWNGAVLKETSAGKVIPLRESYLKEFPEELKN
HGAYLGYTVTSVVSSRQGRVYVAGAPRFNHTGKVILFTMHNNRSLTIHQAMRGQQIGSYF
GSEITSVDIDGDGVTDVLLVGAPMYFNEGRERGKVYVYELRQNLFVYNGTLKDSHSYQNA
RFGSSIASVRDLNQDSYNDVVVGAPLEDNHAGAIYIFHGFRGSILKTPKQRITASELATG
LQYFGCSIHGQLDLNEDGLIDLAVGALGNAVILWSRPVVQINASLHFEPSKINIFHRDCK
RSGRDATCLAAFLCFTPIFLAPHFQTTTVGIRYNATMDERRYTPRAHLDEGGDRFTNRAV
LLSSGQELCERINFHVLDTADYVKPVTFSVEYSLEDPDHGPMLDDGWPTTLRVSVPFWNG
CNEDEHCVPDLVLDARSDLPTAMEYCQRVLRKPAQDCSAYTLSFDTTVFIIESTRQRVAV
EATLENRGENAYSTVLNISQSANLQFASLIQKEDSDGSIECVNEERRLQKQVCNVSYPFF
RAKAKVAFRLDFEFSKSIFLHHLEIELAAGSDSNERDSTKEDNVAPLRFHLKYEADVLFT
RSSSLSHYEVKPNSSLERYDGIGPPFSCIFRIQNLGLFPIHGMMMKITIPIATRSGNRLL
KLRDFLTDEANTSCNIWGNSTEYRPTPVEEDLRRAPQLNHSNSDVVSINCNIRLVPNQEI
NFHLLGNLWLRSLKALKYKSMKIMVNAALQRQFHSPFIFREEDPSRQIVFEISKQEDWQV
PIWIIVGSTLGGLLLLALLVLALWKLGFFRSARRRREPGLDPTPKVLE
>P06804_TNFA_MOUSE
MSTESMIRDVELAEEALPQKMGGFQNSRRCLCLSLFSFLLVAGATTLFCLLNFGVIGPQR
DEKFPNGLPLISSMAQTLTLRSSSQNSSDKPVAHVVANHQVEEQLEWLSQRANALLANGM
DLKDNQLVVPADGLYLVYSQVLFKGQGCPDYVLLTHTVSRFAISYQEKVNLLSAVKSPCP
KDTPEGAELKPWYEPIYLGGVFQLEKGDQLSAEVNLPKYLDFAESGQVYFGVIAL
>P48050_KCNJ4_HUMAN
MHGHSRNGQAHVPRRKRRNRFVKKNGQCNVYFANLSNKSQRYMADIFTTCVDTRWRYMLM
IFSAAFLVSWLFFGLLFWCIAFFHGDLEASPGVPAAGGPAAGGGGAAPVAPKPCIMHVNG
FLGAFLFSVETQTTIGYGFRCVTEECPLAVIAVVVQSIVGCVIDSFMIGTIMAKMARPKK
RAQTLLFSHHAVISVRDGKLCLMWRVGNLRKSHIVEAHVRAQLIKPYMTQEGEYLPLDQR
DLNVGYDIGLDRIFLVSPIIIVHEIDEDSPLYGMGKEELESEDFEIVVILEGMVEATAMT
TQARSSYLASEILWGHRFEPVVFEEKSHYKVDYSRFHKTYEVAGTPCCSARELQESKITV
LPAPPPPPSAFCYENELALMSQEEEEMEEEAAAAAAVAAGLGLEAGSKEEAGIIRMLEFG
SHLDLERMQASLPLDNISYRRESAI
>Q80Z70_SE1L1_RAT
MQVRVRLLLLLCAVLLGSAAASSDEETNQDESLDSKGALPTDGSVKDHTTGKVVLLARDL
LILKDSEVESLLQDEEESSKSQEEVSVTEDISFLDSPNPSSKTYEELKRVRKPVLTAIEG
TAHGEPCHFPFLFLDKEYDECTSDGREDGRLWCATTYDYKTDEKWGFCETEEDAAKRRQM
QEAEAIYQSGMKILNGSTRKNQKREAYRYLQKAAGMNHTKALERVSYALLFGDYLTQNIQ
AAKEMFEKLTEEGSPKGQTGLGFLYASGLGVNSSQAKALVYYTFGALGGNLIAHMVLGYR
YWAGIGVLQSCESALTHYRLVANHVASDISLTGGSVVQRIRLPDEVENPGMNSGMLEEDL
IQYYQFLAEKGDVQAQVGLGQLHLHGGRGVEQNHQRAFDYFNLAANAGNSHAMAFLGKMY
SEGSDIVPQSNETALHYFKKAADMGNPVGQSGLGMAYLYGRGVQVNYDLALKYFQKAAEQ
GWVDGQLQLGSMYYNGIGVKRDYKQALKYFNLASQGGHILAFYNLAQMHASGTGVMRSCH
TAVELFKNVCERGRWSERLMTAYNSYKDDDYNAAVVQYLLLAEQGYEVAQSNAAFILDQR
EATIVGENETYPRALLHWNRAASQGYTVARIKLGDYHFYGFGTDVDYETAFIHYRLASEQ
QHSAQAMFNLGYMHEKGLGIKQDIHLAKRFYDMAAEASPDAQVPVFLALCKLGVVYFLQY
IREANIRDLFTQLDMDQLLGPEWDLYLMTIIALLLGTVIAYRQRQHQDIPVPRPPGPRPA
PPQQEGPPEQQPPQ
>P33277_GAP1_SCHPO
MTKRHSGTLSSSVLPQTNRLSLLRNRESTSVLYTIDLDMESDVEDAFFHLDRELHDLKQQ
ISSQSKQNFVLERDVRYLDSKIALLIQNRMAQEEQHEFAKRLNDNYNAVKGSFPDDRKLQ
LYGALFFLLQSEPAYIASLVRRVKLFNMDALLQIVMFNIYGNQYESREEHLLLSLFQMVL
TTEFEATSDVLSLLRANTPVSRMLTTYTRRGPGQAYLRSILYQCINDVAIHPDLQLDIHP
LSVYRYLVNTGQLSPSEDDNLLTNEEVSEFPAVKNAIQERSAQLLLLTKRFLDAVLNSID
EIPYGIRWVCKLIRNLTNRLFPSISDSTICSLIGGFFFLRFVNPAIISPQTSMLLDSCPS
DNVRKTLATIAKIIQSVANGTSSTKTHLDVSFQPMLKEYEEKVHNLLRKLGNVGDFFEAL
ELDQYIALSKKSLALEMTVNEIYLTHEIILENLDNLYDPDSHVHLILQELGEPCKSVPQE
DNCLVTLPLYNRWDSSIPDLKQNLKVTREDILYVDAKTLFIQLLRLLPSGHPATRVPLDL
PLIADSVSSLKSMSLMKKGIRAIELLDELSTLRLVDKENRYEPLTSEVEKEFIDLDALYE
RIRAERDALQDVHRAICDHNEYLQTQLQIYGSYLNNARSQIKPSHSDSKGFSRGVGVVGI
KPKNIKSSNTVKLSSQQLKKESVLLNCTIPEFNVSNTYFTFSSPSTDNFVIAVYQRGHSK
VLVEVCICLDDVLQRRYASNPVVDLGFLTFEANKLYHLFEQLFLRK
>Q96PD4_IL17F_HUMAN
MTVKTLHGPAMVKYLLLSILGLAFLSEAAARKIPKVGHTFFQKPESCPPVPGGSMKLDIG
IINENQRVSMSRNIESRSTSPWNYTVTWDPNRYPSEVVQAQCRNLGCINAQGKEDISMNS
VPIQQETLVVRRKHQGCSVSFQLEKVLVTVGCTCVTPVIHHVQ
>P10144_GRAB_HUMAN
MQPILLLLAFLLLPRADAGEIIGGHEAKPHSRPYMAYLMIWDQKSLKRCGGFLIRDDFVL
TAAHCWGSSINVTLGAHNIKEQEPTQQFIPVKRPIPHPAYNPKNFSNDIMLLQLERKAKR
TRAVQPLRLPSNKAQVKPGQTCSVAGWGQTAPLGKHSHTLQEVKMTVQEDRKCESDLRHY
YDSTIELCVGDPEIKKTSFKGDSGGPLVCNKVAQGIVSYGRNNGMPPRACTKVSSFVHWI
KKTMKRY
'''
# We will use the query sequence lengths later - just store these for now
query_sequence_details = {}
query_sequence_order = [];
seq_counter = 0
for seq in query_sequence.split('>'):
seq = seq.strip(' \n\t')
if(len(seq) == 0):
continue
seq_header = seq.split('\n')[0].strip()
seq_length = len(''.join(seq.split('\n')[1:]))
seq_counter = seq_counter+1
query_sequence_details[seq_header] = {}
query_sequence_details[seq_header]['seq_length'] = seq_length
query_sequence_order.append(seq_header)
# Write test query sequence above to query_file location
text_file = open(query_file, "w")
text_file.write(query_sequence)
text_file.close()
Now that we have defined some query parameters we can run a BLAST search. The query below will execute the BLAST search and the raw BLAST output will be printed afterwards (Note, the -num_descriptions and -num_alignments arguments are used to limit the size of the output in the this online notebook).
# So lets try and run the 'raw' commandline version
!$blast_exe -query $query_file -db $database -evalue $eval_threshold -num_descriptions $num_descriptions -num_alignments $num_alignments
# Stdout should be printed below:
BLASTP 2.2.29+ Reference: Stephen F. Altschul, Thomas L. Madden, Alejandro A. Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs", Nucleic Acids Res. 25:3389-3402. Reference for composition-based statistics: Alejandro A. Schaffer, L. Aravind, Thomas L. Madden, Sergei Shavirin, John L. Spouge, Yuri I. Wolf, Eugene V. Koonin, and Stephen F. Altschul (2001), "Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements", Nucleic Acids Res. 29:2994-3005. Database: chembl_21.fa 8,834 sequences; 5,161,060 total letters Query= Q96P68_OXGR1_HUMAN Length=337 Score E Sequences producing significant alignments: (Bits) Value CHEMBL2150840 [Q96P68] 2-oxoglutarate receptor 1 (Homo sapiens) 688 0.0 CHEMBL2325 [Q6Y1R5] 2-oxoglutarate receptor 1 (Rattus norvegicus) 579 0.0 CHEMBL4315 [P47900] P2Y purinoceptor 1 (Homo sapiens) 216 9e-67 CHEMBL5720 [P49652] P2Y purinoceptor 1 (Meleagris gallopavo) 215 1e-66 CHEMBL2497 [P49651] P2Y purinoceptor 1 (Rattus norvegicus) 215 3e-66 > CHEMBL2150840 [Q96P68] 2-oxoglutarate receptor 1 (Homo sapiens) Length=337 Score = 688 bits (1775), Expect = 0.0, Method: Compositional matrix adjust. Identities = 337/337 (100%), Positives = 337/337 (100%), Gaps = 0/337 (0%) Query 1 MNEPLDYLANASDFPDYAAAFGNCTDENIPLKMHYLPVIYGIIFLVGFPGNAVVISTYIF 60 MNEPLDYLANASDFPDYAAAFGNCTDENIPLKMHYLPVIYGIIFLVGFPGNAVVISTYIF Sbjct 1 MNEPLDYLANASDFPDYAAAFGNCTDENIPLKMHYLPVIYGIIFLVGFPGNAVVISTYIF 60 Query 61 KMRPWKSSTIIMLNLACTDLLYLTSLPFLIHYYASGENWIFGDFMCKFIRFSFHFNLYSS 120 KMRPWKSSTIIMLNLACTDLLYLTSLPFLIHYYASGENWIFGDFMCKFIRFSFHFNLYSS Sbjct 61 KMRPWKSSTIIMLNLACTDLLYLTSLPFLIHYYASGENWIFGDFMCKFIRFSFHFNLYSS 120 Query 121 ILFLTCFSIFRYCVIIHPMSCFSIHKTRCAVVACAVVWIISLVAVIPMTFLITSTNRTNR 180 ILFLTCFSIFRYCVIIHPMSCFSIHKTRCAVVACAVVWIISLVAVIPMTFLITSTNRTNR Sbjct 121 ILFLTCFSIFRYCVIIHPMSCFSIHKTRCAVVACAVVWIISLVAVIPMTFLITSTNRTNR 180 Query 181 SACLDLTSSDELNTIKWYNLILTATTFCLPLVIVTLCYTTIIHTLTHGLQTDSCLKQKAR 240 SACLDLTSSDELNTIKWYNLILTATTFCLPLVIVTLCYTTIIHTLTHGLQTDSCLKQKAR Sbjct 181 SACLDLTSSDELNTIKWYNLILTATTFCLPLVIVTLCYTTIIHTLTHGLQTDSCLKQKAR 240 Query 241 RLTILLLLAFYVCFLPFHILRVIRIESRLLSISCSIENQIHEAYIVSRPLAALNTFGNLL 300 RLTILLLLAFYVCFLPFHILRVIRIESRLLSISCSIENQIHEAYIVSRPLAALNTFGNLL Sbjct 241 RLTILLLLAFYVCFLPFHILRVIRIESRLLSISCSIENQIHEAYIVSRPLAALNTFGNLL 300 Query 301 LYVVVSDNFQQAVCSTVRCKVSGNLEQAKKISYSNNP 337 LYVVVSDNFQQAVCSTVRCKVSGNLEQAKKISYSNNP Sbjct 301 LYVVVSDNFQQAVCSTVRCKVSGNLEQAKKISYSNNP 337 > CHEMBL2325 [Q6Y1R5] 2-oxoglutarate receptor 1 (Rattus norvegicus) Length=337 Score = 579 bits (1492), Expect = 0.0, Method: Compositional matrix adjust. Identities = 289/337 (86%), Positives = 299/337 (89%), Gaps = 0/337 (0%) Query 1 MNEPLDYLANASDFPDYAAAFGNCTDENIPLKMHYLPVIYGIIFLVGFPGNAVVISTYIF 60 M E LD AN SDF DY A NCTDE I KM YLPVIY IIFLVGFPGN V IS Y+F Sbjct 1 MIETLDSPANDSDFLDYITALENCTDEQISFKMQYLPVIYSIIFLVGFPGNTVAISIYVF 60 Query 61 KMRPWKSSTIIMLNLACTDLLYLTSLPFLIHYYASGENWIFGDFMCKFIRFSFHFNLYSS 120 KMRPWKSSTIIMLNLA TDLLYLTSLPFLIHYYASGENWIFGDFMCKFIRF FHFNLYSS Sbjct 61 KMRPWKSSTIIMLNLALTDLLYLTSLPFLIHYYASGENWIFGDFMCKFIRFGFHFNLYSS 120 Query 121 ILFLTCFSIFRYCVIIHPMSCFSIHKTRCAVVACAVVWIISLVAVIPMTFLITSTNRTNR 180 ILFLTCFS+FRY VIIHPMSCFSI KTR AVVACA VW+ISLVAV+PMTFLITST RTNR Sbjct 121 ILFLTCFSLFRYIVIIHPMSCFSIQKTRWAVVACAGVWVISLVAVMPMTFLITSTTRTNR 180 Query 181 SACLDLTSSDELNTIKWYNLILTATTFCLPLVIVTLCYTTIIHTLTHGLQTDSCLKQKAR 240 SACLDLTSSD+L TIKWYNLILTATTFCLPL+IVTLCYTTII TLTHG +T SC KQKAR Sbjct 181 SACLDLTSSDDLTTIKWYNLILTATTFCLPLLIVTLCYTTIISTLTHGPRTHSCFKQKAR 240 Query 241 RLTILLLLAFYVCFLPFHILRVIRIESRLLSISCSIENQIHEAYIVSRPLAALNTFGNLL 300 RLTILLLL FYVCFLPFHILRVIRIESRLLSISCSIE+ IHEAYIVSRPLAALNTFGNLL Sbjct 241 RLTILLLLVFYVCFLPFHILRVIRIESRLLSISCSIESHIHEAYIVSRPLAALNTFGNLL 300 Query 301 LYVVVSDNFQQAVCSTVRCKVSGNLEQAKKISYSNNP 337 LYVVVS+NFQQA CS VRCK G+LEQAKK S SNNP Sbjct 301 LYVVVSNNFQQAFCSAVRCKAIGDLEQAKKDSCSNNP 337 > CHEMBL4315 [P47900] P2Y purinoceptor 1 (Homo sapiens) Length=373 Score = 216 bits (550), Expect = 9e-67, Method: Compositional matrix adjust. Identities = 108/300 (36%), Positives = 176/300 (59%), Gaps = 4/300 (1%) Query 23 NCTDENIPLKMHYLPVIYGIIFLVGFPGNAVVISTYIFKMRPWKSSTIIMLNLACTDLLY 82 C + +YLP +Y ++F++GF GN+V I ++F M+PW ++ M NLA D LY Sbjct 41 KCALTKTGFQFYYLPAVYILVFIIGFLGNSVAIWMFVFHMKPWSGISVYMFNLALADFLY 100 Query 83 LTSLPFLIHYYASGENWIFGDFMCKFIRFSFHFNLYSSILFLTCFSIFRYCVIIHPMSCF 142 + +LP LI YY + +WIFGD MCK RF FH NLY SILFLTC S RY +++P+ Sbjct 101 VLTLPALIFYYFNKTDWIFGDAMCKLQRFIFHVNLYGSILFLTCISAHRYSGVVYPLKSL 160 Query 143 SIHKTRCAVVACAVVWIISLVAVIPMTFLITSTNRTNRS-ACLDLTSSDELNTIKWYNLI 201 K + A+ +VW+I +VA+ P+ F + R N++ C D TS + L + Y++ Sbjct 161 GRLKKKNAICISVLVWLIVVVAISPILFYSGTGVRKNKTITCYDTTSDEYLRSYFIYSMC 220 Query 202 LTATTFCLPLVIVTLCYTTIIHTLTHGLQTDSCLKQKARRLTILLLLAFYVCFLPFHILR 261 T FC+PLV++ CY I+ L + +S L++K+ L I++L F V ++PFH+++ Sbjct 221 TTVAMFCVPLVLILGCYGLIVRALIYKDLDNSPLRRKSIYLVIIVLTVFAVSYIPFHVMK 280 Query 262 VIRIESRL---LSISCSIENQIHEAYIVSRPLAALNTFGNLLLYVVVSDNFQQAVCSTVR 318 + + +RL C+ ++++ Y V+R LA+LN+ + +LY + D F++ + R Sbjct 281 TMNLRARLDFQTPAMCAFNDRVYATYQVTRGLASLNSCVDPILYFLAGDTFRRRLSRATR 340 > CHEMBL5720 [P49652] P2Y purinoceptor 1 (Meleagris gallopavo) Length=362 Score = 215 bits (548), Expect = 1e-66, Method: Compositional matrix adjust. Identities = 106/300 (35%), Positives = 174/300 (58%), Gaps = 4/300 (1%) Query 23 NCTDENIPLKMHYLPVIYGIIFLVGFPGNAVVISTYIFKMRPWKSSTIIMLNLACTDLLY 82 C+ + +YLP +Y ++F+ GF GN+V I ++F MRPW ++ M NLA D LY Sbjct 30 KCSLTKTGFQFYYLPTVYILVFITGFLGNSVAIWMFVFHMRPWSGISVYMFNLALADFLY 89 Query 83 LTSLPFLIHYYASGENWIFGDFMCKFIRFSFHFNLYSSILFLTCFSIFRYCVIIHPMSCF 142 + +LP LI YY + +WIFGD MCK RF FH NLY SILFLTC S+ RY ++HP+ Sbjct 90 VLTLPALIFYYFNKTDWIFGDVMCKLQRFIFHVNLYGSILFLTCISVHRYTGVVHPLKSL 149 Query 143 SIHKTRCAVVACAVVWIISLVAVIPMTFLITSTNRTNRS-ACLDLTSSDELNTIKWYNLI 201 K + AV ++VW + + + P+ F + R N++ C D T+ + L + Y++ Sbjct 150 GRLKKKNAVYVSSLVWALVVAVIAPILFYSGTGVRRNKTITCYDTTADEYLRSYFVYSMC 209 Query 202 LTATTFCLPLVIVTLCYTTIIHTLTHGLQTDSCLKQKARRLTILLLLAFYVCFLPFHILR 261 T FC+P +++ CY I+ L + +S L++K+ L I++L F V +LPFH+++ Sbjct 210 TTVFMFCIPFIVILGCYGLIVKALIYKDLDNSPLRRKSIYLVIIVLTVFAVSYLPFHVMK 269 Query 262 VIRIESRL---LSISCSIENQIHEAYIVSRPLAALNTFGNLLLYVVVSDNFQQAVCSTVR 318 + + +RL C+ ++++ Y V+R LA+LN+ + +LY + D F++ + R Sbjct 270 TLNLRARLDFQTPQMCAFNDKVYATYQVTRGLASLNSCVDPILYFLAGDTFRRRLSRATR 329 > CHEMBL2497 [P49651] P2Y purinoceptor 1 (Rattus norvegicus) Length=373 Score = 215 bits (548), Expect = 3e-66, Method: Compositional matrix adjust. Identities = 108/300 (36%), Positives = 175/300 (58%), Gaps = 4/300 (1%) Query 23 NCTDENIPLKMHYLPVIYGIIFLVGFPGNAVVISTYIFKMRPWKSSTIIMLNLACTDLLY 82 C + +YLP +Y ++F++GF GN+V I ++F M+PW ++ M NLA D LY Sbjct 41 RCALIKTGFQFYYLPAVYILVFIIGFLGNSVAIWMFVFHMKPWSGISVYMFNLALADFLY 100 Query 83 LTSLPFLIHYYASGENWIFGDFMCKFIRFSFHFNLYSSILFLTCFSIFRYCVIIHPMSCF 142 + +LP LI YY + +WIFGD MCK RF FH NLY SILFLTC S RY +++P+ Sbjct 101 VLTLPALIFYYFNKTDWIFGDVMCKLQRFIFHVNLYGSILFLTCISAHRYSGVVYPLKSL 160 Query 143 SIHKTRCAVVACAVVWIISLVAVIPMTFLITSTNRTNRS-ACLDLTSSDELNTIKWYNLI 201 K + A+ +VW+I +VA+ P+ F + R N++ C D TS + L + Y++ Sbjct 161 GRLKKKNAIYVSVLVWLIVVVAISPILFYSGTGIRKNKTVTCYDSTSDEYLRSYFIYSMC 220 Query 202 LTATTFCLPLVIVTLCYTTIIHTLTHGLQTDSCLKQKARRLTILLLLAFYVCFLPFHILR 261 T FC+PLV++ CY I+ L + +S L++K+ L I++L F V ++PFH+++ Sbjct 221 TTVAMFCIPLVLILGCYGLIVRALIYKDLDNSPLRRKSIYLVIIVLTVFAVSYIPFHVMK 280 Query 262 VIRIESRL---LSISCSIENQIHEAYIVSRPLAALNTFGNLLLYVVVSDNFQQAVCSTVR 318 + + +RL C ++++ Y V+R LA+LN+ + +LY + D F++ + R Sbjct 281 TMNLRARLDFQTPEMCDFNDRVYATYQVTRGLASLNSCVDPILYFLAGDTFRRRLSRATR 340 Lambda K H a alpha 0.331 0.141 0.446 0.792 4.96 Gapped Lambda K H a alpha sigma 0.267 0.0410 0.140 1.90 42.6 43.6 Effective search space used: 1045882860 Query= Q86XF0_DHFRL1_HUMAN Length=187 Score E Sequences producing significant alignments: (Bits) Value CHEMBL202 [P00374] Dihydrofolate reductase (Homo sapiens) 352 9e-125 CHEMBL2363 [Q920D2] Dihydrofolate reductase (Rattus norvegicus) 336 2e-118 CHEMBL2097172 [Q920D2] Dihydrofolate reductase (Rattus norvegicus) 336 2e-118 CHEMBL2097173 [Q920D2] Dihydrofolate reductase (Rattus norvegicus) 336 2e-118 CHEMBL4564 [P00375] Dihydrofolate reductase (Mus musculus) 335 5e-118 > CHEMBL202 [P00374] Dihydrofolate reductase (Homo sapiens) Length=187 Score = 352 bits (904), Expect = 9e-125, Method: Compositional matrix adjust. Identities = 171/183 (93%), Positives = 176/183 (96%), Gaps = 0/183 (0%) Query 5 LNCIVAVSQNMGIGKNGDLPRPPLRNEFRYFQRMTTTSSVEGKQNLVIMGRKTWFSIPEK 64 LNCIVAVSQNMGIGKNGDLP PPLRNEFRYFQRMTTTSSVEGKQNLVIMG+KTWFSIPEK Sbjct 5 LNCIVAVSQNMGIGKNGDLPWPPLRNEFRYFQRMTTTSSVEGKQNLVIMGKKTWFSIPEK 64 Query 65 NRPLKDRINLVLSRELKEPPQGAHFLARSLDDALKLTERPELANKVDMIWIVGGSSVYKE 124 NRPLK RINLVLSRELKEPPQGAHFL+RSLDDALKLTE+PELANKVDM+WIVGGSSVYKE Sbjct 65 NRPLKGRINLVLSRELKEPPQGAHFLSRSLDDALKLTEQPELANKVDMVWIVGGSSVYKE 124 Query 125 AMNHLGHLKLFVTRIMQDFESDTFFSEIDLEKYKLLPEYPGVLSDVQEGKHIKYKFEVCE 184 AMNH GHLKLFVTRIMQDFESDTFF EIDLEKYKLLPEYPGVLSDVQE K IKYKFEV E Sbjct 125 AMNHPGHLKLFVTRIMQDFESDTFFPEIDLEKYKLLPEYPGVLSDVQEEKGIKYKFEVYE 184 Query 185 KDD 187 K+D Sbjct 185 KND 187 > CHEMBL2363 [Q920D2] Dihydrofolate reductase (Rattus norvegicus) Length=187 Score = 336 bits (862), Expect = 2e-118, Method: Compositional matrix adjust. Identities = 161/183 (88%), Positives = 173/183 (95%), Gaps = 0/183 (0%) Query 5 LNCIVAVSQNMGIGKNGDLPRPPLRNEFRYFQRMTTTSSVEGKQNLVIMGRKTWFSIPEK 64 LNCIVAVSQNMGIGKNGDLP P LRNEF+YFQRMTTTSSVEGKQNLVIMGRKTWFSIPEK Sbjct 5 LNCIVAVSQNMGIGKNGDLPWPLLRNEFKYFQRMTTTSSVEGKQNLVIMGRKTWFSIPEK 64 Query 65 NRPLKDRINLVLSRELKEPPQGAHFLARSLDDALKLTERPELANKVDMIWIVGGSSVYKE 124 NRPLKDRIN+VLSRELKEPPQGAHFLA+SLDDALKL E+PELA+KVDM+W+VGGSSVY+E Sbjct 65 NRPLKDRINIVLSRELKEPPQGAHFLAKSLDDALKLIEQPELASKVDMVWVVGGSSVYQE 124 Query 125 AMNHLGHLKLFVTRIMQDFESDTFFSEIDLEKYKLLPEYPGVLSDVQEGKHIKYKFEVCE 184 AMN GHL+LFVTRIMQ+FESDTFF EIDLEKYKLLPEYPGVLS++QE K IKYKFEV E Sbjct 125 AMNQPGHLRLFVTRIMQEFESDTFFPEIDLEKYKLLPEYPGVLSEIQEEKGIKYKFEVYE 184 Query 185 KDD 187 K D Sbjct 185 KKD 187 > CHEMBL2097172 [Q920D2] Dihydrofolate reductase (Rattus norvegicus) Length=187 Score = 336 bits (862), Expect = 2e-118, Method: Compositional matrix adjust. Identities = 161/183 (88%), Positives = 173/183 (95%), Gaps = 0/183 (0%) Query 5 LNCIVAVSQNMGIGKNGDLPRPPLRNEFRYFQRMTTTSSVEGKQNLVIMGRKTWFSIPEK 64 LNCIVAVSQNMGIGKNGDLP P LRNEF+YFQRMTTTSSVEGKQNLVIMGRKTWFSIPEK Sbjct 5 LNCIVAVSQNMGIGKNGDLPWPLLRNEFKYFQRMTTTSSVEGKQNLVIMGRKTWFSIPEK 64 Query 65 NRPLKDRINLVLSRELKEPPQGAHFLARSLDDALKLTERPELANKVDMIWIVGGSSVYKE 124 NRPLKDRIN+VLSRELKEPPQGAHFLA+SLDDALKL E+PELA+KVDM+W+VGGSSVY+E Sbjct 65 NRPLKDRINIVLSRELKEPPQGAHFLAKSLDDALKLIEQPELASKVDMVWVVGGSSVYQE 124 Query 125 AMNHLGHLKLFVTRIMQDFESDTFFSEIDLEKYKLLPEYPGVLSDVQEGKHIKYKFEVCE 184 AMN GHL+LFVTRIMQ+FESDTFF EIDLEKYKLLPEYPGVLS++QE K IKYKFEV E Sbjct 125 AMNQPGHLRLFVTRIMQEFESDTFFPEIDLEKYKLLPEYPGVLSEIQEEKGIKYKFEVYE 184 Query 185 KDD 187 K D Sbjct 185 KKD 187 > CHEMBL2097173 [Q920D2] Dihydrofolate reductase (Rattus norvegicus) Length=187 Score = 336 bits (862), Expect = 2e-118, Method: Compositional matrix adjust. Identities = 161/183 (88%), Positives = 173/183 (95%), Gaps = 0/183 (0%) Query 5 LNCIVAVSQNMGIGKNGDLPRPPLRNEFRYFQRMTTTSSVEGKQNLVIMGRKTWFSIPEK 64 LNCIVAVSQNMGIGKNGDLP P LRNEF+YFQRMTTTSSVEGKQNLVIMGRKTWFSIPEK Sbjct 5 LNCIVAVSQNMGIGKNGDLPWPLLRNEFKYFQRMTTTSSVEGKQNLVIMGRKTWFSIPEK 64 Query 65 NRPLKDRINLVLSRELKEPPQGAHFLARSLDDALKLTERPELANKVDMIWIVGGSSVYKE 124 NRPLKDRIN+VLSRELKEPPQGAHFLA+SLDDALKL E+PELA+KVDM+W+VGGSSVY+E Sbjct 65 NRPLKDRINIVLSRELKEPPQGAHFLAKSLDDALKLIEQPELASKVDMVWVVGGSSVYQE 124 Query 125 AMNHLGHLKLFVTRIMQDFESDTFFSEIDLEKYKLLPEYPGVLSDVQEGKHIKYKFEVCE 184 AMN GHL+LFVTRIMQ+FESDTFF EIDLEKYKLLPEYPGVLS++QE K IKYKFEV E Sbjct 125 AMNQPGHLRLFVTRIMQEFESDTFFPEIDLEKYKLLPEYPGVLSEIQEEKGIKYKFEVYE 184 Query 185 KDD 187 K D Sbjct 185 KKD 187 > CHEMBL4564 [P00375] Dihydrofolate reductase (Mus musculus) Length=187 Score = 335 bits (859), Expect = 5e-118, Method: Compositional matrix adjust. Identities = 161/183 (88%), Positives = 173/183 (95%), Gaps = 0/183 (0%) Query 5 LNCIVAVSQNMGIGKNGDLPRPPLRNEFRYFQRMTTTSSVEGKQNLVIMGRKTWFSIPEK 64 LNCIVAVSQNMGIGKNGDLP PPLRNEF+YFQRMTTTSSVEGKQNLVIMGRKTWFSIPEK Sbjct 5 LNCIVAVSQNMGIGKNGDLPWPPLRNEFKYFQRMTTTSSVEGKQNLVIMGRKTWFSIPEK 64 Query 65 NRPLKDRINLVLSRELKEPPQGAHFLARSLDDALKLTERPELANKVDMIWIVGGSSVYKE 124 NRPLKDRIN+VLSRELKEPP+GAHFLA+SLDDAL+L E+PELA+KVDM+WIVGGSSVY+E Sbjct 65 NRPLKDRINIVLSRELKEPPRGAHFLAKSLDDALRLIEQPELASKVDMVWIVGGSSVYQE 124 Query 125 AMNHLGHLKLFVTRIMQDFESDTFFSEIDLEKYKLLPEYPGVLSDVQEGKHIKYKFEVCE 184 AMN GHL+LFVTRIMQ+FESDTFF EIDL KYKLLPEYPGVLS+VQE K IKYKFEV E Sbjct 125 AMNQPGHLRLFVTRIMQEFESDTFFPEIDLGKYKLLPEYPGVLSEVQEEKGIKYKFEVYE 184 Query 185 KDD 187 K D Sbjct 185 KKD 187 Lambda K H a alpha 0.320 0.138 0.409 0.792 4.96 Gapped Lambda K H a alpha sigma 0.267 0.0410 0.140 1.90 42.6 43.6 Effective search space used: 433983132 Query= Q9UKX5_ITGA11_HUMAN Length=1188 Score E Sequences producing significant alignments: (Bits) Value CHEMBL5883 [Q9UKX5] Integrin alpha-11 (Homo sapiens) 2473 0.0 CHEMBL5882 [O75578] Integrin alpha-10 (Homo sapiens) 919 0.0 CHEMBL3682 [P56199] Integrin alpha-1 (Homo sapiens) 825 0.0 CHEMBL3137278 [P56199] Integrin alpha-1 (Homo sapiens) 825 0.0 CHEMBL4998 [P17301] Integrin alpha-2 (Homo sapiens) 710 0.0 > CHEMBL5883 [Q9UKX5] Integrin alpha-11 (Homo sapiens) Length=1188 Score = 2473 bits (6409), Expect = 0.0, Method: Compositional matrix adjust. Identities = 1188/1188 (100%), Positives = 1188/1188 (100%), Gaps = 0/1188 (0%) Query 1 MDLPRGLVVAWALSLWPGFTDTFNMDTRKPRVIPGSRTAFFGYTVQQHDISGNKWLVVGA 60 MDLPRGLVVAWALSLWPGFTDTFNMDTRKPRVIPGSRTAFFGYTVQQHDISGNKWLVVGA Sbjct 1 MDLPRGLVVAWALSLWPGFTDTFNMDTRKPRVIPGSRTAFFGYTVQQHDISGNKWLVVGA 60 Query 61 PLETNGYQKTGDVYKCPVIHGNCTKLNLGRVTLSNVSERKDNMRLGLSLATNPKDNSFLA 120 PLETNGYQKTGDVYKCPVIHGNCTKLNLGRVTLSNVSERKDNMRLGLSLATNPKDNSFLA Sbjct 61 PLETNGYQKTGDVYKCPVIHGNCTKLNLGRVTLSNVSERKDNMRLGLSLATNPKDNSFLA 120 Query 121 CSPLWSHECGSSYYTTGMCSRVNSNFRFSKTVAPALQRCQTYMDIVIVLDGSNSIYPWVE 180 CSPLWSHECGSSYYTTGMCSRVNSNFRFSKTVAPALQRCQTYMDIVIVLDGSNSIYPWVE Sbjct 121 CSPLWSHECGSSYYTTGMCSRVNSNFRFSKTVAPALQRCQTYMDIVIVLDGSNSIYPWVE 180 Query 181 VQHFLINILKKFYIGPGQIQVGVVQYGEDVVHEFHLNDYRSVKDVVEAASHIEQRGGTET 240 VQHFLINILKKFYIGPGQIQVGVVQYGEDVVHEFHLNDYRSVKDVVEAASHIEQRGGTET Sbjct 181 VQHFLINILKKFYIGPGQIQVGVVQYGEDVVHEFHLNDYRSVKDVVEAASHIEQRGGTET 240 Query 241 RTAFGIEFARSEAFQKGGRKGAKKVMIVITDGESHDSPDLEKVIQQSERDNVTRYAVAVL 300 RTAFGIEFARSEAFQKGGRKGAKKVMIVITDGESHDSPDLEKVIQQSERDNVTRYAVAVL Sbjct 241 RTAFGIEFARSEAFQKGGRKGAKKVMIVITDGESHDSPDLEKVIQQSERDNVTRYAVAVL 300 Query 301 GYYNRRGINPETFLNEIKYIASDPDDKHFFNVTDEAALKDIVDALGDRIFSLEGTNKNET 360 GYYNRRGINPETFLNEIKYIASDPDDKHFFNVTDEAALKDIVDALGDRIFSLEGTNKNET Sbjct 301 GYYNRRGINPETFLNEIKYIASDPDDKHFFNVTDEAALKDIVDALGDRIFSLEGTNKNET 360 Query 361 SFGLEMSQTGFSSHVVEDGVLLGAVGAYDWNGAVLKETSAGKVIPLRESYLKEFPEELKN 420 SFGLEMSQTGFSSHVVEDGVLLGAVGAYDWNGAVLKETSAGKVIPLRESYLKEFPEELKN Sbjct 361 SFGLEMSQTGFSSHVVEDGVLLGAVGAYDWNGAVLKETSAGKVIPLRESYLKEFPEELKN 420 Query 421 HGAYLGYTVTSVVSSRQGRVYVAGAPRFNHTGKVILFTMHNNRSLTIHQAMRGQQIGSYF 480 HGAYLGYTVTSVVSSRQGRVYVAGAPRFNHTGKVILFTMHNNRSLTIHQAMRGQQIGSYF Sbjct 421 HGAYLGYTVTSVVSSRQGRVYVAGAPRFNHTGKVILFTMHNNRSLTIHQAMRGQQIGSYF 480 Query 481 GSEITSVDIDGDGVTDVLLVGAPMYFNEGRERGKVYVYELRQNLFVYNGTLKDSHSYQNA 540 GSEITSVDIDGDGVTDVLLVGAPMYFNEGRERGKVYVYELRQNLFVYNGTLKDSHSYQNA Sbjct 481 GSEITSVDIDGDGVTDVLLVGAPMYFNEGRERGKVYVYELRQNLFVYNGTLKDSHSYQNA 540 Query 541 RFGSSIASVRDLNQDSYNDVVVGAPLEDNHAGAIYIFHGFRGSILKTPKQRITASELATG 600 RFGSSIASVRDLNQDSYNDVVVGAPLEDNHAGAIYIFHGFRGSILKTPKQRITASELATG Sbjct 541 RFGSSIASVRDLNQDSYNDVVVGAPLEDNHAGAIYIFHGFRGSILKTPKQRITASELATG 600 Query 601 LQYFGCSIHGQLDLNEDGLIDLAVGALGNAVILWSRPVVQINASLHFEPSKINIFHRDCK 660 LQYFGCSIHGQLDLNEDGLIDLAVGALGNAVILWSRPVVQINASLHFEPSKINIFHRDCK Sbjct 601 LQYFGCSIHGQLDLNEDGLIDLAVGALGNAVILWSRPVVQINASLHFEPSKINIFHRDCK 660 Query 661 RSGRDATCLAAFLCFTPIFLAPHFQTTTVGIRYNATMDERRYTPRAHLDEGGDRFTNRAV 720 RSGRDATCLAAFLCFTPIFLAPHFQTTTVGIRYNATMDERRYTPRAHLDEGGDRFTNRAV Sbjct 661 RSGRDATCLAAFLCFTPIFLAPHFQTTTVGIRYNATMDERRYTPRAHLDEGGDRFTNRAV 720 Query 721 LLSSGQELCERINFHVLDTADYVKPVTFSVEYSLEDPDHGPMLDDGWPTTLRVSVPFWNG 780 LLSSGQELCERINFHVLDTADYVKPVTFSVEYSLEDPDHGPMLDDGWPTTLRVSVPFWNG Sbjct 721 LLSSGQELCERINFHVLDTADYVKPVTFSVEYSLEDPDHGPMLDDGWPTTLRVSVPFWNG 780 Query 781 CNEDEHCVPDLVLDARSDLPTAMEYCQRVLRKPAQDCSAYTLSFDTTVFIIESTRQRVAV 840 CNEDEHCVPDLVLDARSDLPTAMEYCQRVLRKPAQDCSAYTLSFDTTVFIIESTRQRVAV Sbjct 781 CNEDEHCVPDLVLDARSDLPTAMEYCQRVLRKPAQDCSAYTLSFDTTVFIIESTRQRVAV 840 Query 841 EATLENRGENAYSTVLNISQSANLQFASLIQKEDSDGSIECVNEERRLQKQVCNVSYPFF 900 EATLENRGENAYSTVLNISQSANLQFASLIQKEDSDGSIECVNEERRLQKQVCNVSYPFF Sbjct 841 EATLENRGENAYSTVLNISQSANLQFASLIQKEDSDGSIECVNEERRLQKQVCNVSYPFF 900 Query 901 RAKAKVAFRLDFEFSKSIFLHHLEIELAAGSDSNERDSTKEDNVAPLRFHLKYEADVLFT 960 RAKAKVAFRLDFEFSKSIFLHHLEIELAAGSDSNERDSTKEDNVAPLRFHLKYEADVLFT Sbjct 901 RAKAKVAFRLDFEFSKSIFLHHLEIELAAGSDSNERDSTKEDNVAPLRFHLKYEADVLFT 960 Query 961 RSSSLSHYEVKPNSSLERYDGIGPPFSCIFRIQNLGLFPIHGMMMKITIPIATRSGNRLL 1020 RSSSLSHYEVKPNSSLERYDGIGPPFSCIFRIQNLGLFPIHGMMMKITIPIATRSGNRLL Sbjct 961 RSSSLSHYEVKPNSSLERYDGIGPPFSCIFRIQNLGLFPIHGMMMKITIPIATRSGNRLL 1020 Query 1021 KLRDFLTDEANTSCNIWGNSTEYRPTPVEEDLRRAPQLNHSNSDVVSINCNIRLVPNQEI 1080 KLRDFLTDEANTSCNIWGNSTEYRPTPVEEDLRRAPQLNHSNSDVVSINCNIRLVPNQEI Sbjct 1021 KLRDFLTDEANTSCNIWGNSTEYRPTPVEEDLRRAPQLNHSNSDVVSINCNIRLVPNQEI 1080 Query 1081 NFHLLGNLWLRSLKALKYKSMKIMVNAALQRQFHSPFIFREEDPSRQIVFEISKQEDWQV 1140 NFHLLGNLWLRSLKALKYKSMKIMVNAALQRQFHSPFIFREEDPSRQIVFEISKQEDWQV Sbjct 1081 NFHLLGNLWLRSLKALKYKSMKIMVNAALQRQFHSPFIFREEDPSRQIVFEISKQEDWQV 1140 Query 1141 PIWIIVGSTLGGLLLLALLVLALWKLGFFRSARRRREPGLDPTPKVLE 1188 PIWIIVGSTLGGLLLLALLVLALWKLGFFRSARRRREPGLDPTPKVLE Sbjct 1141 PIWIIVGSTLGGLLLLALLVLALWKLGFFRSARRRREPGLDPTPKVLE 1188 > CHEMBL5882 [O75578] Integrin alpha-10 (Homo sapiens) Length=1167 Score = 919 bits (2374), Expect = 0.0, Method: Compositional matrix adjust. Identities = 515/1180 (44%), Positives = 726/1180 (62%), Gaps = 41/1180 (3%) Query 1 MDLPRGLVVAWALSLWPGFTDTFNMDTRKPRVIPGSRTAFFGYTVQQHDISGNKWLVVGA 60 M+LP + L G FN+D PR+ PG A FGY+V QH G +W++VGA Sbjct 1 MELPFVTHLFLPLVFLTGLCSPFNLDEHHPRLFPGPPEAEFGYSVLQHVGGGQRWMLVGA 60 Query 61 PLETNGYQKTGDVYKCPVIHGN---CTKLNLGRVTLSNVSERKDNMRLGLSLATNPKDNS 117 P + + GDVY+CPV + C K +LG L N S NM LG+SL D Sbjct 61 PWDGPSGDRRGDVYRCPVGGAHNAPCAKGHLGDYQLGNSSHPAVNMHLGMSLLETDGDGG 120 Query 118 FLACSPLWSHECGSSYYTTGMCSRVNSNFRFSKTVAPALQRCQTYMDIVIVLDGSNSIYP 177 F+AC+PLWS CGSS +++G+C+RV+++F+ ++AP QRC TYMD+VIVLDGSNSIYP Sbjct 121 FMACAPLWSRACGSSVFSSGICARVDASFQPQGSLAPTAQRCPTYMDVVIVLDGSNSIYP 180 Query 178 WVEVQHFLINILKKFYIGPGQIQVGVVQYGEDVVHEFHLNDYRSVKDVVEAASHIEQRGG 237 W EVQ FL ++ K +I P QIQVG+VQYGE VHE+ L D+R+ ++VV AA ++ +R G Sbjct 181 WSEVQTFLRRLVGKLFIDPEQIQVGLVQYGESPVHEWSLGDFRTKEEVVRAAKNLSRREG 240 Query 238 TETRTAFGIEFARSEAFQK--GGRKGAKKVMIVITDGESHDSPDLEKVIQQSERDNVTRY 295 ET+TA I A +E F + GGR A ++++V+TDGESHD +L ++ E VTRY Sbjct 241 RETKTAQAIMVACTEGFSQSHGGRPEAARLLVVVTDGESHDGEELPAALKACEAGRVTRY 300 Query 296 AVAVLGYYNRRGINPETFLNEIKYIASDPDDKHFFNVTDEAALKDIVDALGDRIFSLEGT 355 +AVLG+Y RR +P +FL EI+ IASDPD++ FFNVTDEAAL DIVDALGDRIF LEG+ Sbjct 301 GIAVLGHYLRRQRDPSSFLREIRTIASDPDERFFFNVTDEAALTDIVDALGDRIFGLEGS 360 Query 356 N-KNETSFGLEMSQTGFSSHVVEDGVLLGAVGAYDWNGAVLKETSAGKVIPLRESYLKEF 414 + +NE+SFGLEMSQ GFS+H ++DG+L G VGAYDW G+VL ++ P R + EF Sbjct 361 HAENESSFGLEMSQIGFSTHRLKDGILFGMVGAYDWGGSVLWLEGGHRLFPPRMALEDEF 420 Query 415 PEELKNHGAYLGYTVTSVVSSRQGRVYVAGAPRFNHTGKVILFTMHNNRSLTIHQAMRGQ 474 P L+NH AYLGY+V+S++ R++++GAPRF H GKVI F + + ++ + Q+++G+ Sbjct 421 PPALQNHAAYLGYSVSSMLLRGGRRLFLSGAPRFRHRGKVIAFQLKKDGAVRVAQSLQGE 480 Query 475 QIGSYFGSEITSVDIDGDGVTDVLLVGAPMYFN-EGRERGKVYVYEL-RQNLFVYNGTLK 532 QIGSYFGSE+ +D D DG TDVLLV APM+ + +E G+VYVY + +Q+L GTL+ Sbjct 481 QIGSYFGSELCPLDTDRDGTTDVLLVAAPMFLGPQNKETGRVYVYLVGQQSLLTLQGTLQ 540 Query 533 DSHSYQNARFGSSIASVRDLNQDSYNDVVVGAPLEDNHAGAIYIFHGFRGSILKTPKQRI 592 Q+ARFG ++ ++ DLNQD + DV VGAPLED H GA+Y++HG + + P QRI Sbjct 541 PEPP-QDARFGFAMGALPDLNQDGFADVAVGAPLEDGHQGALYLYHGTQSGVRPHPAQRI 599 Query 593 TASELATGLQYFGCSIHGQLDLNEDGLIDLAVGALGNAVILWSRPVVQINASLHFEPSKI 652 A+ + L YFG S+ G+LDL+ D L+D+AVGA G A++L SRP+V + SL P I Sbjct 600 AAASMPHALSYFGRSVDGRLDLDGDDLVDVAVGAQGAAILLSSRPIVHLTPSLEVTPQAI 659 Query 653 NIFHRDCKRSGRDATCLAAFLCFTPIFLAPHFQTTTVGIRYNATMDERRYTPRAHLDEGG 712 ++ RDC+R G++A CL A LCF P +R+ A++DE RA D G Sbjct 660 SVVQRDCRRRGQEAVCLTAALCFQVTSRTPGRWDHQFYMRFTASLDEWTAGARAAFDGSG 719 Query 713 DRFTNRAVLLSSGQELCERINFHVLDTADYVKPVTFSVEYSLEDPDH-GPMLDDGWPTTL 771 R + R + LS G CE+++FHVLDT+DY++PV +V ++L++ GP+L++G PT++ Sbjct 720 QRLSPRRLRLSVGNVTCEQLHFHVLDTSDYLRPVALTVTFALDNTTKPGPVLNEGSPTSI 779 Query 772 RVSVPFWNGCNEDEHCVPDLVLDARSDLPTAMEYCQRVLRKPAQDCSAYTLSFDTTVFII 831 + VPF C D CV DLVL D+ R RK F++ Sbjct 780 QKLVPFSKDCGPDNECVTDLVLQVNMDI--------RGSRK--------------APFVV 817 Query 832 ESTRQRVAVEATLENRGENAYSTVLNISQSANLQFASLIQKEDSDGSIECVNEERRLQKQ 891 R++V V TLENR ENAY+T L++ S NL ASL + +S +EC + Sbjct 818 RGGRRKVLVSTTLENRKENAYNTSLSLIFSRNLHLASLTPQRESPIKVECAAPSA--HAR 875 Query 892 VCNVSYPFFRAKAKVAFRLDFEFSKSIFLHHLEIELAAGSDSNERDSTKEDNVAPLRFHL 951 +C+V +P F+ AKV F L+FEFS S L + ++L A SDS ER+ T +DN A ++ Sbjct 876 LCSVGHPVFQTGAKVTFLLEFEFSCSSLLSQVFVKLTASSDSLERNGTLQDNTAQTSAYI 935 Query 952 KYEADVLFTRSSSLSHYEVKPNSSLERYDGIGPPFSCIFRIQNLGLFPIHGMMMKITIPI 1011 +YE +LF+ S+L YEV P +L G GP F R+QNLG + + G+++ +P Sbjct 936 QYEPHLLFSSESTLHRYEVHPYGTLPV--GPGPEFKTTLRVQNLGCYVVSGLIISALLPA 993 Query 1012 ATRSGNRLLKLRDFLTDEANTSCNIWGNSTEYRPTPVE-EDLRRAPQLNHSNSDVVSINC 1070 GN L L +T+ N SC I N TE PV E+L+ +LN SN+ + C Sbjct 994 VAHGGNYFLSLSQVITN--NASC-IVQNLTEPPGPPVHPEELQHTNRLNGSNTQCQVVRC 1050 Query 1071 NI-RLVPNQEINFHLLGNLWLRSLKALKYKSMKIMVNAALQRQFHSPFIFREEDPSRQIV 1129 ++ +L E++ LL + + K+KS+ ++ L + S E + + Sbjct 1051 HLGQLAKGTEVSVGLLRLVHNEFFRRAKFKSLTVVSTFELGTEEGSVLQLTEASRWSESL 1110 Query 1130 FEISKQEDWQVPIWIIVGSTLGGLLLLALLVLALWKLGFF 1169 E+ + + +WI++GS LGGLLLLALLV LWKLGFF Sbjct 1111 LEVVQTRPILISLWILIGSVLGGLLLLALLVFCLWKLGFF 1150 > CHEMBL3682 [P56199] Integrin alpha-1 (Homo sapiens) Length=1179 Score = 825 bits (2131), Expect = 0.0, Method: Compositional matrix adjust. Identities = 464/1211 (38%), Positives = 704/1211 (58%), Gaps = 84/1211 (7%) Query 6 GLVVA--WALSLWPGFTDTFNMDTRKPRVIPGSRTAFFGYTVQQHDISGNKWLVVGAPLE 63 G+ VA W L++ +FN+D + G FGYTVQQ++ KW+++G+PL Sbjct 10 GVAVACCWLLTVVLRCCVSFNVDVKNSMTFSGPVEDMFGYTVQQYENEEGKWVLIGSPLV 69 Query 64 TNGYQKTGDVYKCPVIHGN---CTKLNLG-RVTLSNVSERKDNMRLGLSLATNPKDNSFL 119 +TGDVYKCPV G C KL+L ++ NV+E K+NM G +L TNP + FL Sbjct 70 GQPKNRTGDVYKCPVGRGESLPCVKLDLPVNTSIPNVTEVKENMTFGSTLVTNP-NGGFL 128 Query 120 ACSPLWSHECGSSYYTTGMCSRVNSNFRFSKTVAPALQRCQTYMDIVIVLDGSNSIYPWV 179 AC PL+++ CG +YTTG+CS V+ F+ ++AP +Q C T +DIVIVLDGSNSIYPW Sbjct 129 ACGPLYAYRCGHLHYTTGICSDVSPTFQVVNSIAP-VQECSTQLDIVIVLDGSNSIYPWD 187 Query 180 EVQHFLINILKKFYIGPGQIQVGVVQYGEDVVHEFHLNDYRSVKDVVEAASHIEQRGGTE 239 V FL ++L++ IGP Q QVG+VQYGE+V HEF+LN Y S ++V+ AA I QRGG + Sbjct 188 SVTAFLNDLLERMDIGPKQTQVGIVQYGENVTHEFNLNKYSSTEEVLVAAKKIVQRGGRQ 247 Query 240 TRTAFGIEFARSEAFQ--KGGRKGAKKVMIVITDGESHDSPDLEKVIQQSERDNVTRYAV 297 T TA GI+ AR EAF +G R+G KKVM+++TDGESHD+ L+KVIQ E +N+ R+++ Sbjct 248 TMTALGIDTARKEAFTEARGARRGVKKVMVIVTDGESHDNHRLKKVIQDCEDENIQRFSI 307 Query 298 AVLGYYNRRGINPETFLNEIKYIASDPDDKHFFNVTDEAALKDIVDALGDRIFSLEGT-N 356 A+LG YNR ++ E F+ EIK IAS+P +KHFFNV+DE AL IV LG+RIF+LE T + Sbjct 308 AILGSYNRGNLSTEKFVEEIKSIASEPTEKHFFNVSDELALVTIVKTLGERIFALEATAD 367 Query 357 KNETSFGLEMSQTGFSSHVVEDGVLLGAVGAYDWNGAVLKETSAGKVIPLRESYLKEFPE 416 ++ SF +EMSQTGFS+H +D V+LGAVGAYDWNG V+ + ++ +IP ++ E + Sbjct 368 QSAASFEMEMSQTGFSAHYSQDWVMLGAVGAYDWNGTVVMQKASQIIIPRNTTFNVESTK 427 Query 417 ELKNHGAYLGYTVTSVVSSRQGRVYVAGAPRFNHTGKVILFTMHNNRSLTIHQAMRGQQI 476 + + +YLGYTV S +S +Y+AG PR+NHTG+VI++ M + ++ I Q + G+QI Sbjct 428 KNEPLASYLGYTVNSATASSGDVLYIAGQPRYNHTGQVIIYRMEDG-NIKILQTLSGEQI 486 Query 477 GSYFGSEITSVDIDGDGVTDVLLVGAPMYF-NEGRERGKVYVYELRQNLFVYNGTLK--- 532 GSYFGS +T+ DID D TD+LLVGAPMY E E+GKVYVY L Q F Y +L+ Sbjct 487 GSYFGSILTTTDIDKDSNTDILLVGAPMYMGTEKEEQGKVYVYALNQTRFEYQMSLEPIK 546 Query 533 ---------DSHSYQN------ARFGSSIASVRDLNQDSYNDVVVGAPLEDNHAGAIYIF 577 +S + +N ARFG++IA+V+DLN D +ND+V+GAPLED+H GA+YI+ Sbjct 547 QTCCSSRQHNSCTTENKNEPCGARFGTAIAAVKDLNLDGFNDIVIGAPLEDDHGGAVYIY 606 Query 578 HGFRGSILKTPKQRITASELATGLQYFGCSIHGQLDLNEDGLIDLAVGALGNAVILWSRP 637 HG +I K QRI + L++FG SIHG++DLN DGL D+ +G LG A + WSR Sbjct 607 HGSGKTIRKEYAQRIPSGGDGKTLKFFGQSIHGEMDLNGDGLTDVTIGGLGGAALFWSRD 666 Query 638 VVQINASLHFEPSKINIFHRDCKRSGRDATCLAAFLCFTPIFLAPHFQTTTVGIRYNATM 697 V + +++FEP+K+NI ++C G++ C+ A +CF + ++Y T+ Sbjct 667 VAVVKVTMNFEPNKVNIQKKNCHMEGKETVCINATVCFDVKLKSKEDTIYEADLQYRVTL 726 Query 698 DERRYTPRAHLDEGGDRFTNRAVLLSSGQELCERINFHVLDTADYVKPVTFSVEYSLEDP 757 D R R+ +R R + + + C + +F++LD D+ V +++++L DP Sbjct 727 DSLRQISRSFFSGTQERKVQRNITVRKSE--CTKHSFYMLDKHDFQDSVRITLDFNLTDP 784 Query 758 DHGPMLDDGWPTTLRVSVPFWNGCNEDEHCVPDLVLDARSDLPTAMEYCQRVLRKPAQDC 817 ++GP+LDD P ++ +PF C E C+ DL L Sbjct 785 ENGPVLDDSLPNSVHEYIPFAKDCGNKEKCISDLSL------------------------ 820 Query 818 SAYTLSFDTTVFIIESTRQRVAVEATLENRGENAYSTVLNISQSANLQFASL--IQKEDS 875 + + + + I+ S + V T++N ++AY+T + S NL F+ + IQK+ Sbjct 821 --HVATTEKDLLIVRSQNDKFNVSLTVKNTKDSAYNTRTIVHYSPNLVFSGIEAIQKDSC 878 Query 876 DGSIECVNEERRLQKQVCNVSYPFFRAKAKVAFRLDFEFSKSIFLHHLEIELAAGSDSNE 935 + + C V YPF R V F++ F+F+ S + ++ I L+A SDS E Sbjct 879 ESN----------HNITCKVGYPFLRRGEMVTFKILFQFNTSYLMENVTIYLSATSDSEE 928 Query 936 RDSTKEDNVAPLRFHLKYEADVLFTRSSSLSHYEVKPNSS----LERYDGIGPPFSCIFR 991 T DNV + +KYE + F S+S H + N + + + IG + + Sbjct 929 PPETLSDNVVNISIPVKYEVGLQFYSSASEYHISIAANETVPEVINSTEDIGNEINIFYL 988 Query 992 IQNLGLFPIHGMMMKITIPIATRSGNRLLKLRDFLTDE-ANTSCNIWGN----STEYRPT 1046 I+ G FP+ + + I+ P T +G +L + E AN +I+ + ++ + T Sbjct 989 IRKSGSFPMPELKLSISFPNMTSNGYPVLYPTGLSSSENANCRPHIFEDPFSINSGKKMT 1048 Query 1047 PVEEDLRRAPQLNHSNSDVVSINCNIRLVPNQEINFHLLGNLWLRSLKALKYKSMKIMVN 1106 + L+R L+ + +I CN+ ++N L+ LW + + S+ + + Sbjct 1049 TSTDHLKRGTILDCNTCKFATITCNLTSSDISQVNVSLI--LWKPTFIKSYFSSLNLTIR 1106 Query 1107 AALQRQFHSPFIFREEDPSRQIVFEISKQE-DWQVPIWIIVGSTLGGLLLLALLVLALWK 1165 L R ++ + + R++ +ISK +VP+W+I+ S GLLLL LL+LALWK Sbjct 1107 GEL-RSENASLVLSSSNQKRELAIQISKDGLPGRVPLWVILLSAFAGLLLLMLLILALWK 1165 Query 1166 LGFFRSARRRR 1176 +GFF+ +++ Sbjct 1166 IGFFKRPLKKK 1176 > CHEMBL3137278 [P56199] Integrin alpha-1 (Homo sapiens) Length=1179 Score = 825 bits (2131), Expect = 0.0, Method: Compositional matrix adjust. Identities = 464/1211 (38%), Positives = 704/1211 (58%), Gaps = 84/1211 (7%) Query 6 GLVVA--WALSLWPGFTDTFNMDTRKPRVIPGSRTAFFGYTVQQHDISGNKWLVVGAPLE 63 G+ VA W L++ +FN+D + G FGYTVQQ++ KW+++G+PL Sbjct 10 GVAVACCWLLTVVLRCCVSFNVDVKNSMTFSGPVEDMFGYTVQQYENEEGKWVLIGSPLV 69 Query 64 TNGYQKTGDVYKCPVIHGN---CTKLNLG-RVTLSNVSERKDNMRLGLSLATNPKDNSFL 119 +TGDVYKCPV G C KL+L ++ NV+E K+NM G +L TNP + FL Sbjct 70 GQPKNRTGDVYKCPVGRGESLPCVKLDLPVNTSIPNVTEVKENMTFGSTLVTNP-NGGFL 128 Query 120 ACSPLWSHECGSSYYTTGMCSRVNSNFRFSKTVAPALQRCQTYMDIVIVLDGSNSIYPWV 179 AC PL+++ CG +YTTG+CS V+ F+ ++AP +Q C T +DIVIVLDGSNSIYPW Sbjct 129 ACGPLYAYRCGHLHYTTGICSDVSPTFQVVNSIAP-VQECSTQLDIVIVLDGSNSIYPWD 187 Query 180 EVQHFLINILKKFYIGPGQIQVGVVQYGEDVVHEFHLNDYRSVKDVVEAASHIEQRGGTE 239 V FL ++L++ IGP Q QVG+VQYGE+V HEF+LN Y S ++V+ AA I QRGG + Sbjct 188 SVTAFLNDLLERMDIGPKQTQVGIVQYGENVTHEFNLNKYSSTEEVLVAAKKIVQRGGRQ 247 Query 240 TRTAFGIEFARSEAFQ--KGGRKGAKKVMIVITDGESHDSPDLEKVIQQSERDNVTRYAV 297 T TA GI+ AR EAF +G R+G KKVM+++TDGESHD+ L+KVIQ E +N+ R+++ Sbjct 248 TMTALGIDTARKEAFTEARGARRGVKKVMVIVTDGESHDNHRLKKVIQDCEDENIQRFSI 307 Query 298 AVLGYYNRRGINPETFLNEIKYIASDPDDKHFFNVTDEAALKDIVDALGDRIFSLEGT-N 356 A+LG YNR ++ E F+ EIK IAS+P +KHFFNV+DE AL IV LG+RIF+LE T + Sbjct 308 AILGSYNRGNLSTEKFVEEIKSIASEPTEKHFFNVSDELALVTIVKTLGERIFALEATAD 367 Query 357 KNETSFGLEMSQTGFSSHVVEDGVLLGAVGAYDWNGAVLKETSAGKVIPLRESYLKEFPE 416 ++ SF +EMSQTGFS+H +D V+LGAVGAYDWNG V+ + ++ +IP ++ E + Sbjct 368 QSAASFEMEMSQTGFSAHYSQDWVMLGAVGAYDWNGTVVMQKASQIIIPRNTTFNVESTK 427 Query 417 ELKNHGAYLGYTVTSVVSSRQGRVYVAGAPRFNHTGKVILFTMHNNRSLTIHQAMRGQQI 476 + + +YLGYTV S +S +Y+AG PR+NHTG+VI++ M + ++ I Q + G+QI Sbjct 428 KNEPLASYLGYTVNSATASSGDVLYIAGQPRYNHTGQVIIYRMEDG-NIKILQTLSGEQI 486 Query 477 GSYFGSEITSVDIDGDGVTDVLLVGAPMYF-NEGRERGKVYVYELRQNLFVYNGTLK--- 532 GSYFGS +T+ DID D TD+LLVGAPMY E E+GKVYVY L Q F Y +L+ Sbjct 487 GSYFGSILTTTDIDKDSNTDILLVGAPMYMGTEKEEQGKVYVYALNQTRFEYQMSLEPIK 546 Query 533 ---------DSHSYQN------ARFGSSIASVRDLNQDSYNDVVVGAPLEDNHAGAIYIF 577 +S + +N ARFG++IA+V+DLN D +ND+V+GAPLED+H GA+YI+ Sbjct 547 QTCCSSRQHNSCTTENKNEPCGARFGTAIAAVKDLNLDGFNDIVIGAPLEDDHGGAVYIY 606 Query 578 HGFRGSILKTPKQRITASELATGLQYFGCSIHGQLDLNEDGLIDLAVGALGNAVILWSRP 637 HG +I K QRI + L++FG SIHG++DLN DGL D+ +G LG A + WSR Sbjct 607 HGSGKTIRKEYAQRIPSGGDGKTLKFFGQSIHGEMDLNGDGLTDVTIGGLGGAALFWSRD 666 Query 638 VVQINASLHFEPSKINIFHRDCKRSGRDATCLAAFLCFTPIFLAPHFQTTTVGIRYNATM 697 V + +++FEP+K+NI ++C G++ C+ A +CF + ++Y T+ Sbjct 667 VAVVKVTMNFEPNKVNIQKKNCHMEGKETVCINATVCFDVKLKSKEDTIYEADLQYRVTL 726 Query 698 DERRYTPRAHLDEGGDRFTNRAVLLSSGQELCERINFHVLDTADYVKPVTFSVEYSLEDP 757 D R R+ +R R + + + C + +F++LD D+ V +++++L DP Sbjct 727 DSLRQISRSFFSGTQERKVQRNITVRKSE--CTKHSFYMLDKHDFQDSVRITLDFNLTDP 784 Query 758 DHGPMLDDGWPTTLRVSVPFWNGCNEDEHCVPDLVLDARSDLPTAMEYCQRVLRKPAQDC 817 ++GP+LDD P ++ +PF C E C+ DL L Sbjct 785 ENGPVLDDSLPNSVHEYIPFAKDCGNKEKCISDLSL------------------------ 820 Query 818 SAYTLSFDTTVFIIESTRQRVAVEATLENRGENAYSTVLNISQSANLQFASL--IQKEDS 875 + + + + I+ S + V T++N ++AY+T + S NL F+ + IQK+ Sbjct 821 --HVATTEKDLLIVRSQNDKFNVSLTVKNTKDSAYNTRTIVHYSPNLVFSGIEAIQKDSC 878 Query 876 DGSIECVNEERRLQKQVCNVSYPFFRAKAKVAFRLDFEFSKSIFLHHLEIELAAGSDSNE 935 + + C V YPF R V F++ F+F+ S + ++ I L+A SDS E Sbjct 879 ESN----------HNITCKVGYPFLRRGEMVTFKILFQFNTSYLMENVTIYLSATSDSEE 928 Query 936 RDSTKEDNVAPLRFHLKYEADVLFTRSSSLSHYEVKPNSS----LERYDGIGPPFSCIFR 991 T DNV + +KYE + F S+S H + N + + + IG + + Sbjct 929 PPETLSDNVVNISIPVKYEVGLQFYSSASEYHISIAANETVPEVINSTEDIGNEINIFYL 988 Query 992 IQNLGLFPIHGMMMKITIPIATRSGNRLLKLRDFLTDE-ANTSCNIWGN----STEYRPT 1046 I+ G FP+ + + I+ P T +G +L + E AN +I+ + ++ + T Sbjct 989 IRKSGSFPMPELKLSISFPNMTSNGYPVLYPTGLSSSENANCRPHIFEDPFSINSGKKMT 1048 Query 1047 PVEEDLRRAPQLNHSNSDVVSINCNIRLVPNQEINFHLLGNLWLRSLKALKYKSMKIMVN 1106 + L+R L+ + +I CN+ ++N L+ LW + + S+ + + Sbjct 1049 TSTDHLKRGTILDCNTCKFATITCNLTSSDISQVNVSLI--LWKPTFIKSYFSSLNLTIR 1106 Query 1107 AALQRQFHSPFIFREEDPSRQIVFEISKQE-DWQVPIWIIVGSTLGGLLLLALLVLALWK 1165 L R ++ + + R++ +ISK +VP+W+I+ S GLLLL LL+LALWK Sbjct 1107 GEL-RSENASLVLSSSNQKRELAIQISKDGLPGRVPLWVILLSAFAGLLLLMLLILALWK 1165 Query 1166 LGFFRSARRRR 1176 +GFF+ +++ Sbjct 1166 IGFFKRPLKKK 1176 > CHEMBL4998 [P17301] Integrin alpha-2 (Homo sapiens) Length=1181 Score = 710 bits (1833), Expect = 0.0, Method: Compositional matrix adjust. Identities = 428/1194 (36%), Positives = 675/1194 (57%), Gaps = 73/1194 (6%) Query 23 FNMDTRKPRVIPGSRTAFFGYTVQQHDISGNKWLVVGAPLETNGYQKTGDVYKCPV--IH 80 +N+ + ++ G + FGY VQQ WL+VG+P + GDVYKCPV Sbjct 30 YNVGLPEAKIFSGPSSEQFGYAVQQFINPKGNWLLVGSPWSGFPENRMGDVYKCPVDLST 89 Query 81 GNCTKLNLGRVT-LSNVSERKDNMRLGLSLATNPKDNSFLACSPLWSHECGSSYYTTGMC 139 C KLNL T + NV+E K NM LGL L N FL C PLW+ +CG+ YYTTG+C Sbjct 90 ATCEKLNLQTSTSIPNVTEMKTNMSLGLILTRNMGTGGFLTCGPLWAQQCGNQYYTTGVC 149 Query 140 SRVNSNFRFSKTVAPALQRCQTYMDIVIVLDGSNSIYPWVEVQHFLINILKKFYIGPGQI 199 S ++ +F+ S + +PA Q C + +D+V+V D SNSIYPW V++FL ++ IGP + Sbjct 150 SDISPDFQLSASFSPATQPCPSLIDVVVVCDESNSIYPWDAVKNFLEKFVQGLDIGPTKT 209 Query 200 QVGVVQYGEDVVHEFHLNDYRSVKDVVEAASHIEQRGGTETRTAFGIEFARSEAFQ--KG 257 QVG++QY + F+LN Y++ ++++ A S Q GG T T I++AR A+ G Sbjct 210 QVGLIQYANNPRVVFNLNTYKTKEEMIVATSQTSQYGGDLTNTFGAIQYARKYAYSAASG 269 Query 258 GRKGAKKVMIVITDGESHDSPDLEKVIQQSERDNVTRYAVAVLGYYNRRGINPETFLNEI 317 GR+ A KVM+V+TDGESHD L+ VI Q DN+ R+ +AVLGY NR ++ + + EI Sbjct 270 GRRSATKVMVVVTDGESHDGSMLKAVIDQCNHDNILRFGIAVLGYLNRNALDTKNLIKEI 329 Query 318 KYIASDPDDKHFFNVTDEAALKDIVDALGDRIFSLEGTNKNETSFGLEMSQTGFSSHVVE 377 K IAS P +++FFNV+DEAAL + LG++IFS+EGT + +F +EMSQ GFS+ Sbjct 330 KAIASIPTERYFFNVSDEAALLEKAGTLGEQIFSIEGTVQGGDNFQMEMSQVGFSADYSS 389 Query 378 --DGVLLGAVGAYDWNGAVLKETSAGKVIPLRESYLKEFPEELKNHGAYLGYTVTSVVSS 435 D ++LGAVGA+ W+G ++++TS G +I ++++ + + +NH +YLGY+V + +S+ Sbjct 390 QNDILMLGAVGAFGWSGTIVQKTSHGHLIFPKQAFDQILQD--RNHSSYLGYSV-AAIST 446 Query 436 RQGRVYVAGAPRFNHTGKVILFTMHNNRSLTIHQAMRGQQIGSYFGSEITSVDIDGDGVT 495 + +VAGAPR N+TG+++L++++ N ++T+ QA RG QIGSYFGS + SVD+D D +T Sbjct 447 GESTHFVAGAPRANYTGQIVLYSVNENGNITVIQAHRGDQIGSYFGSVLCSVDVDKDTIT 506 Query 496 DVLLVGAPMYFNE-GRERGKVYVYELRQNLFVYNGTLKDSHSYQNARFGSSIASVRDLNQ 554 DVLLVGAPMY ++ +E G+VY++ +++ + + L+ +N RFGS+IA++ D+N Sbjct 507 DVLLVGAPMYMSDLKKEEGRVYLFTIKKGILGQHQFLEGPEGIENTRFGSAIAALSDINM 566 Query 555 DSYNDVVVGAPLEDNHAGAIYIFHGFRGSILKTPKQRITASELA--TGLQYFGCSIHGQL 612 D +NDV+VG+PLE+ ++GA+YI++G +G+I Q+I S+ A + LQYFG S+ G Sbjct 567 DGFNDVIVGSPLENQNSGAVYIYNGHQGTIRTKYSQKILGSDGAFRSHLQYFGRSLDGYG 626 Query 613 DLNEDGLIDLAVGALGNAVILWSRPVVQINASLHFEPSKINIFHRDCKRSGRDATCLAAF 672 DLN D + D+++GA G V LWS+ + + F P KI + +++ + + Sbjct 627 DLNGDSITDVSIGAFGQVVQLWSQSIADVAIEASFTPEKITLVNKNAQ--------IILK 678 Query 673 LCFTPIFLAPHFQTTTVGIRYNATMD----ERRYTPRAHLDEGGDRFTNRAVLLSSGQEL 728 LCF+ F P Q V I YN T+D R T R E +R + ++++ Q Sbjct 679 LCFSAKF-RPTKQNNQVAIVYNITLDADGFSSRVTSRGLFKENNERCLQKNMVVNQAQSC 737 Query 729 CERINFHVLDTADYVKPVTFSVEYSLEDPDHGPMLDDGWPTTLRVSVPFWNGCNEDEHCV 788 E I ++ + +D V + V+ SLE+P P L+ T S+PF C ED C+ Sbjct 738 PEHI-IYIQEPSDVVNSLDLRVDISLENPGTSPALEAYSETAKVFSIPFHKDCGEDGLCI 796 Query 789 PDLVLDARSDLPTAMEYCQRVLRKPAQDCSAYTLSFDTTVFIIESTRQRVAVEATLENRG 848 DLVLD R +P A E +P FI+ + +R+ TL+N+ Sbjct 797 SDLVLDVR-QIPAAQE-------QP---------------FIVSNQNKRLTFSVTLKNKR 833 Query 849 ENAYSTVLNISQSANLQFASLIQKEDSDGSIECVNEERRLQKQV-CNVSYPFFRAKAKVA 907 E+AY+T + + S NL FAS D E + QK V C+V YP + + +V Sbjct 834 ESAYNTGIVVDFSENLFFASFSLPVD---GTEVTCQVAASQKSVACDVGYPALKREQQVT 890 Query 908 FRLDFEFSKSIFLHHLEIELAAGSDSNERDSTKEDNVAPLRFHLKYEADVLFTRSSSLSH 967 F ++F+F+ + + A S+S E + K DN+ L+ L Y+A++ TRS++++ Sbjct 891 FTINFDFNLQNLQNQASLSFQALSESQEEN--KADNLVNLKIPLLYDAEIHLTRSTNINF 948 Query 968 YEVKPN----SSLERYDGIGPPFSCIFRIQ-NLGLFPIHGMMMKITIPIATRSGNRLLKL 1022 YE+ + S + ++ +GP F IF ++ G P+ + I IP T+ N L+ L Sbjct 949 YEISSDGNVPSIVHSFEDVGPKF--IFSLKVTTGSVPVSMATVIIHIPQYTKEKNPLMYL 1006 Query 1023 RDFLTDEA-NTSCNIWGNSTEYRPTPV-----EEDLRRAPQLNHSNSDVVSINCNIRLVP 1076 TD+A + SCN N + T E+ R +LN + ++ C ++ V Sbjct 1007 TGVQTDKAGDISCNADINPLKIGQTSSSVSFKSENFRHTKELNCRTASCSNVTCWLKDVH 1066 Query 1077 NQ-EINFHLLGNLWLRSLKALKYKSMKIMVNAALQRQFHSPFIFREEDPSRQIVFEISK- 1134 + E ++ +W + + ++++++ AA + ++P I+ ED + I I K Sbjct 1067 MKGEYFVNVTTRIWNGTFASSTFQTVQL--TAAAEINTYNPEIYVIEDNTVTIPLMIMKP 1124 Query 1135 QEDWQVPIWIIVGSTLGGLLLLALLVLALWKLGFFRSARRRREPGLDPTPKVLE 1188 E +VP +I+GS + G+LLL LV LWKLGFF+ + D + E Sbjct 1125 DEKAEVPTGVIIGSIIAGILLLLALVAILWKLGFFKRKYEKMTKNPDEIDETTE 1178 Lambda K H a alpha 0.320 0.136 0.410 0.792 4.96 Gapped Lambda K H a alpha sigma 0.267 0.0410 0.140 1.90 42.6 43.6 Effective search space used: 4584869670 Query= P06804_TNFA_MOUSE Length=235 Score E Sequences producing significant alignments: (Bits) Value CHEMBL4984 [P06804] Tumor necrosis factor (Mus musculus) 483 6e-175 CHEMBL1825 [P01375] Tumor necrosis factor (Homo sapiens) 379 9e-134 CHEMBL2059 [P01374] Lymphotoxin-alpha (Homo sapiens) 88.6 2e-21 CHEMBL5714 [P48023] Tumor necrosis factor ligand superfamily me... 59.3 8e-11 CHEMBL2364162 [O14788] Tumor necrosis factor ligand superfamily... 52.8 1e-08 > CHEMBL4984 [P06804] Tumor necrosis factor (Mus musculus) Length=235 Score = 483 bits (1244), Expect = 6e-175, Method: Compositional matrix adjust. Identities = 235/235 (100%), Positives = 235/235 (100%), Gaps = 0/235 (0%) Query 1 MSTESMIRDVELAEEALPQKMGGFQNSRRCLCLSLFSFLLVAGATTLFCLLNFGVIGPQR 60 MSTESMIRDVELAEEALPQKMGGFQNSRRCLCLSLFSFLLVAGATTLFCLLNFGVIGPQR Sbjct 1 MSTESMIRDVELAEEALPQKMGGFQNSRRCLCLSLFSFLLVAGATTLFCLLNFGVIGPQR 60 Query 61 DEKFPNGLPLISSMAQTLTLRSSSQNSSDKPVAHVVANHQVEEQLEWLSQRANALLANGM 120 DEKFPNGLPLISSMAQTLTLRSSSQNSSDKPVAHVVANHQVEEQLEWLSQRANALLANGM Sbjct 61 DEKFPNGLPLISSMAQTLTLRSSSQNSSDKPVAHVVANHQVEEQLEWLSQRANALLANGM 120 Query 121 DLKDNQLVVPADGLYLVYSQVLFKGQGCPDYVLLTHTVSRFAISYQEKVNLLSAVKSPCP 180 DLKDNQLVVPADGLYLVYSQVLFKGQGCPDYVLLTHTVSRFAISYQEKVNLLSAVKSPCP Sbjct 121 DLKDNQLVVPADGLYLVYSQVLFKGQGCPDYVLLTHTVSRFAISYQEKVNLLSAVKSPCP 180 Query 181 KDTPEGAELKPWYEPIYLGGVFQLEKGDQLSAEVNLPKYLDFAESGQVYFGVIAL 235 KDTPEGAELKPWYEPIYLGGVFQLEKGDQLSAEVNLPKYLDFAESGQVYFGVIAL Sbjct 181 KDTPEGAELKPWYEPIYLGGVFQLEKGDQLSAEVNLPKYLDFAESGQVYFGVIAL 235 > CHEMBL1825 [P01375] Tumor necrosis factor (Homo sapiens) Length=233 Score = 379 bits (973), Expect = 9e-134, Method: Compositional matrix adjust. Identities = 186/236 (79%), Positives = 211/236 (89%), Gaps = 4/236 (2%) Query 1 MSTESMIRDVELAEEALPQKMGGFQNSRRCLCLSLFSFLLVAGATTLFCLLNFGVIGPQR 60 MSTESMIRDVELAEEALP+K GG Q SRRCL LSLFSFL+VAGATTLFCLL+FGVIGPQR Sbjct 1 MSTESMIRDVELAEEALPKKTGGPQGSRRCLFLSLFSFLIVAGATTLFCLLHFGVIGPQR 60 Query 61 DEKFPNGLPLISSMAQTLTLRSSSQNSSDKPVAHVVANHQVEEQLEWLSQRANALLANGM 120 +E FP L LIS +AQ + RSSS+ SDKPVAHVVAN Q E QL+WL++RANALLANG+ Sbjct 61 EE-FPRDLSLISPLAQAV--RSSSRTPSDKPVAHVVANPQAEGQLQWLNRRANALLANGV 117 Query 121 DLKDNQLVVPADGLYLVYSQVLFKGQGCPD-YVLLTHTVSRFAISYQEKVNLLSAVKSPC 179 +L+DNQLVVP++GLYL+YSQVLFKGQGCP +VLLTHT+SR A+SYQ KVNLLSA+KSPC Sbjct 118 ELRDNQLVVPSEGLYLIYSQVLFKGQGCPSTHVLLTHTISRIAVSYQTKVNLLSAIKSPC 177 Query 180 PKDTPEGAELKPWYEPIYLGGVFQLEKGDQLSAEVNLPKYLDFAESGQVYFGVIAL 235 ++TPEGAE KPWYEPIYLGGVFQLEKGD+LSAE+N P YLDFAESGQVYFG+IAL Sbjct 178 QRETPEGAEAKPWYEPIYLGGVFQLEKGDRLSAEINRPDYLDFAESGQVYFGIIAL 233 > CHEMBL2059 [P01374] Lymphotoxin-alpha (Homo sapiens) Length=205 Score = 88.6 bits (218), Expect = 2e-21, Method: Compositional matrix adjust. Identities = 63/178 (35%), Positives = 87/178 (49%), Gaps = 18/178 (10%) Query 67 GLPLISSMAQTLTLRSSSQ--NSSDKPVAHVVANHQVEEQLEWLSQRANALLANGMDLKD 124 G+ L S AQT +S+ KP AH++ + + L W + A L +G L + Sbjct 37 GVGLTPSAAQTARQHPKMHLAHSTLKPAAHLIGDPSKQNSLLWRANTDRAFLQDGFSLSN 96 Query 125 NQLVVPADGLYLVYSQVLFKGQG-------CPDYVLLTHTVSRFAISYQEKVNLLSAVKS 177 N L+VP G+Y VYSQV+F G+ P Y L H V F+ Y V LLS+ K Sbjct 97 NSLLVPTSGIYFVYSQVVFSGKAYSPKATSSPLY--LAHEVQLFSSQYPFHVPLLSSQKM 154 Query 178 PCPKDTPEGAELKPWYEPIYLGGVFQLEKGDQLSAEVNLPKYLDFAESGQVYFGVIAL 235 P G + +PW +Y G FQL +GDQLS + +L + S V+FG AL Sbjct 155 VYP-----GLQ-EPWLHSMYHGAAFQLTQGDQLSTHTDGIPHLVLSPS-TVFFGAFAL 205 > CHEMBL5714 [P48023] Tumor necrosis factor ligand superfamily member 6 (Homo sapiens) Length=281 Score = 59.3 bits (142), Expect = 8e-11, Method: Compositional matrix adjust. Identities = 46/147 (31%), Positives = 64/147 (44%), Gaps = 10/147 (7%) Query 90 KPVAHVVANHQVEEQ-LEWLSQRANALLANGMDLKDNQLVVPADGLYLVYSQVLFKGQGC 148 + VAH+ LEW LL+ G+ K LV+ GLY VYS+V F+GQ C Sbjct 144 RKVAHLTGKSNSRSMPLEWEDTYGIVLLS-GVKYKKGGLVINETGLYFVYSKVYFRGQSC 202 Query 149 PDYVLLTHTVSRFAISYQEKVNLLSAVKSPCPKDTPEGAELKPWYEPIYLGGVFQLEKGD 208 + L R + Q+ V + + S C + W YLG VF L D Sbjct 203 NNLPLSHKVYMRNSKYPQDLVMMEGKMMSYCTTG-------QMWARSSYLGAVFNLTSAD 255 Query 209 QLSAEVNLPKYLDFAESGQVYFGVIAL 235 L V+ ++F ES Q +FG+ L Sbjct 256 HLYVNVSELSLVNFEES-QTFFGLYKL 281 > CHEMBL2364162 [O14788] Tumor necrosis factor ligand superfamily member 11 (Homo sapiens) Length=317 Score = 52.8 bits (125), Expect = 1e-08, Method: Compositional matrix adjust. Identities = 42/166 (25%), Positives = 71/166 (43%), Gaps = 35/166 (21%) Query 90 KPVAHVVAN--------HQVEEQLEWLSQRANALLANGMDLKDNQLVVPADGLYLVYSQV 141 +P AH+ N H+V W R A ++N M + +L+V DG Y +Y+ + Sbjct 163 QPFAHLTINATDIPSGSHKVSLS-SWYHDRGWAKISN-MTFSNGKLIVNQDGFYYLYANI 220 Query 142 LFK-----GQGCPDYVLLTHTVSRFAISYQEKVNLLSAVKSPCPKDTPEGAELKPW---- 192 F+ G +Y+ L V++ +++K P +G K W Sbjct 221 CFRHHETSGDLATEYLQLMVYVTK------------TSIKIPSSHTLMKGGSTKYWSGNS 268 Query 193 ---YEPIYLGGVFQLEKGDQLSAEVNLPKYLDFAESGQVYFGVIAL 235 + I +GG F+L G+++S EV+ P LD + YFG + Sbjct 269 EFHFYSINVGGFFKLRSGEEISIEVSNPSLLD-PDQDATYFGAFKV 313 Lambda K H a alpha 0.319 0.135 0.396 0.792 4.96 Gapped Lambda K H a alpha sigma 0.267 0.0410 0.140 1.90 42.6 43.6 Effective search space used: 627431904 Query= P48050_KCNJ4_HUMAN Length=445 Score E Sequences producing significant alignments: (Bits) Value CHEMBL2146347 [P48050] Inward rectifier potassium channel 4 (Ho... 925 0.0 CHEMBL1293290 [P35561] Inward rectifier potassium channel 2 (Mu... 544 0.0 CHEMBL1914276 [P63252] Inward rectifier potassium channel 2 (Ho... 540 0.0 CHEMBL3038488 [P48544] G protein-activated inward rectifier pot... 402 6e-137 CHEMBL2406895 [P48051] G protein-activated inward rectifier pot... 395 9e-134 > CHEMBL2146347 [P48050] Inward rectifier potassium channel 4 (Homo sapiens) Length=445 Score = 925 bits (2391), Expect = 0.0, Method: Compositional matrix adjust. Identities = 445/445 (100%), Positives = 445/445 (100%), Gaps = 0/445 (0%) Query 1 MHGHSRNGQAHVPRRKRRNRFVKKNGQCNVYFANLSNKSQRYMADIFTTCVDTRWRYMLM 60 MHGHSRNGQAHVPRRKRRNRFVKKNGQCNVYFANLSNKSQRYMADIFTTCVDTRWRYMLM Sbjct 1 MHGHSRNGQAHVPRRKRRNRFVKKNGQCNVYFANLSNKSQRYMADIFTTCVDTRWRYMLM 60 Query 61 IFSAAFLVSWLFFGLLFWCIAFFHGDLEASPGVPAAGGPAAGGGGAAPVAPKPCIMHVNG 120 IFSAAFLVSWLFFGLLFWCIAFFHGDLEASPGVPAAGGPAAGGGGAAPVAPKPCIMHVNG Sbjct 61 IFSAAFLVSWLFFGLLFWCIAFFHGDLEASPGVPAAGGPAAGGGGAAPVAPKPCIMHVNG 120 Query 121 FLGAFLFSVETQTTIGYGFRCVTEECPLAVIAVVVQSIVGCVIDSFMIGTIMAKMARPKK 180 FLGAFLFSVETQTTIGYGFRCVTEECPLAVIAVVVQSIVGCVIDSFMIGTIMAKMARPKK Sbjct 121 FLGAFLFSVETQTTIGYGFRCVTEECPLAVIAVVVQSIVGCVIDSFMIGTIMAKMARPKK 180 Query 181 RAQTLLFSHHAVISVRDGKLCLMWRVGNLRKSHIVEAHVRAQLIKPYMTQEGEYLPLDQR 240 RAQTLLFSHHAVISVRDGKLCLMWRVGNLRKSHIVEAHVRAQLIKPYMTQEGEYLPLDQR Sbjct 181 RAQTLLFSHHAVISVRDGKLCLMWRVGNLRKSHIVEAHVRAQLIKPYMTQEGEYLPLDQR 240 Query 241 DLNVGYDIGLDRIFLVSPIIIVHEIDEDSPLYGMGKEELESEDFEIVVILEGMVEATAMT 300 DLNVGYDIGLDRIFLVSPIIIVHEIDEDSPLYGMGKEELESEDFEIVVILEGMVEATAMT Sbjct 241 DLNVGYDIGLDRIFLVSPIIIVHEIDEDSPLYGMGKEELESEDFEIVVILEGMVEATAMT 300 Query 301 TQARSSYLASEILWGHRFEPVVFEEKSHYKVDYSRFHKTYEVAGTPCCSARELQESKITV 360 TQARSSYLASEILWGHRFEPVVFEEKSHYKVDYSRFHKTYEVAGTPCCSARELQESKITV Sbjct 301 TQARSSYLASEILWGHRFEPVVFEEKSHYKVDYSRFHKTYEVAGTPCCSARELQESKITV 360 Query 361 LPAPPPPPSAFCYENELALMSQEEEEMEEEAAAAAAVAAGLGLEAGSKEEAGIIRMLEFG 420 LPAPPPPPSAFCYENELALMSQEEEEMEEEAAAAAAVAAGLGLEAGSKEEAGIIRMLEFG Sbjct 361 LPAPPPPPSAFCYENELALMSQEEEEMEEEAAAAAAVAAGLGLEAGSKEEAGIIRMLEFG 420 Query 421 SHLDLERMQASLPLDNISYRRESAI 445 SHLDLERMQASLPLDNISYRRESAI Sbjct 421 SHLDLERMQASLPLDNISYRRESAI 445 > CHEMBL1293290 [P35561] Inward rectifier potassium channel 2 (Mus musculus) Length=428 Score = 544 bits (1401), Expect = 0.0, Method: Compositional matrix adjust. Identities = 270/440 (61%), Positives = 325/440 (74%), Gaps = 44/440 (10%) Query 7 NGQAHVPRRKR-RNRFVKKNGQCNVYFANLSNKSQRYMADIFTTCVDTRWRYMLMIFSAA 65 NG++ V R++ R+RFVKK+G CNV F N+ K QRY+ADIFTTCVD RWR+ML+IF A Sbjct 32 NGKSKVHTRQQCRSRFVKKDGHCNVQFINVGEKGQRYLADIFTTCVDIRWRWMLVIFCLA 91 Query 66 FLVSWLFFGLLFWCIAFFHGDLEASPGVPAAGGPAAGGGGAAPVAPKPCIMHVNGFLGAF 125 F++SWLFFG +FW IA HGDL+ S K C+ VN F AF Sbjct 92 FVLSWLFFGCVFWLIALLHGDLDTSK------------------VSKACVSEVNSFTAAF 133 Query 126 LFSVETQTTIGYGFRCVTEECPLAVIAVVVQSIVGCVIDSFMIGTIMAKMARPKKRAQTL 185 LFS+ETQTTIGYGFRCVT+ECP+AV VV QSIVGC+ID+F+IG +MAKMA+PKKR +TL Sbjct 134 LFSIETQTTIGYGFRCVTDECPIAVFMVVFQSIVGCIIDAFIIGAVMAKMAKPKKRNETL 193 Query 186 LFSHHAVISVRDGKLCLMWRVGNLRKSHIVEAHVRAQLIKPYMTQEGEYLPLDQRDLNVG 245 +FSH+AVI++RDGKLCLMWRVGNLRKSH+VEAHVRAQL+K +T EGEY+PLDQ D+NVG Sbjct 194 VFSHNAVIAMRDGKLCLMWRVGNLRKSHLVEAHVRAQLLKSRITSEGEYIPLDQIDINVG 253 Query 246 YDIGLDRIFLVSPIIIVHEIDEDSPLYGMGKEELESEDFEIVVILEGMVEATAMTTQARS 305 +D G+DRIFLVSPI IVHEIDEDSPLY + K+++++ DFEIVVILEGMVEATAMTTQ RS Sbjct 254 FDSGIDRIFLVSPITIVHEIDEDSPLYDLSKQDIDNADFEIVVILEGMVEATAMTTQCRS 313 Query 306 SYLASEILWGHRFEPVVFEEKSHYKVDYSRFHKTYEVAGTPCCSARELQESKITVLPAPP 365 SYLA+EILWGHR+EPV+FEEK +YKVDYSRFHKTYEV TP CSAR+L E K + A Sbjct 314 SYLANEILWGHRYEPVLFEEKHYYKVDYSRFHKTYEVPNTPLCSARDLAEKKYILSNA-- 371 Query 366 PPPSAFCYENELALMSQEEEEMEEEAAAAAAVAAGLGLEAGSKEEAGIIRMLEFGSHLDL 425 ++FCYENE+AL S+EEEE E G+ + GI DL Sbjct 372 ---NSFCYENEVALTSKEEEEDSEN---------GVPESTSTDSPPGI----------DL 409 Query 426 ERMQASLPLDNISYRRESAI 445 QAS+PL+ RRES I Sbjct 410 HN-QASVPLEPRPLRRESEI 428 > CHEMBL1914276 [P63252] Inward rectifier potassium channel 2 (Homo sapiens) Length=427 Score = 540 bits (1390), Expect = 0.0, Method: Compositional matrix adjust. Identities = 266/440 (60%), Positives = 325/440 (74%), Gaps = 45/440 (10%) Query 7 NGQAHVPRRKR-RNRFVKKNGQCNVYFANLSNKSQRYMADIFTTCVDTRWRYMLMIFSAA 65 NG++ V R++ R+RFVKK+G CNV F N+ K QRY+ADIFTTCVD RWR+ML+IF A Sbjct 32 NGKSKVHTRQQCRSRFVKKDGHCNVQFINVGEKGQRYLADIFTTCVDIRWRWMLVIFCLA 91 Query 66 FLVSWLFFGLLFWCIAFFHGDLEASPGVPAAGGPAAGGGGAAPVAPKPCIMHVNGFLGAF 125 F++SWLFFG +FW IA HGDL+AS K C+ VN F AF Sbjct 92 FVLSWLFFGCVFWLIALLHGDLDASK------------------EGKACVSEVNSFTAAF 133 Query 126 LFSVETQTTIGYGFRCVTEECPLAVIAVVVQSIVGCVIDSFMIGTIMAKMARPKKRAQTL 185 LFS+ETQTTIGYGFRCVT+ECP+AV VV QSIVGC+ID+F+IG +MAKMA+PKKR +TL Sbjct 134 LFSIETQTTIGYGFRCVTDECPIAVFMVVFQSIVGCIIDAFIIGAVMAKMAKPKKRNETL 193 Query 186 LFSHHAVISVRDGKLCLMWRVGNLRKSHIVEAHVRAQLIKPYMTQEGEYLPLDQRDLNVG 245 +FSH+AVI++RDGKLCLMWRVGNLRKSH+VEAHVRAQL+K +T EGEY+PLDQ D+NVG Sbjct 194 VFSHNAVIAMRDGKLCLMWRVGNLRKSHLVEAHVRAQLLKSRITSEGEYIPLDQIDINVG 253 Query 246 YDIGLDRIFLVSPIIIVHEIDEDSPLYGMGKEELESEDFEIVVILEGMVEATAMTTQARS 305 +D G+DRIFLVSPI IVHEIDEDSPLY + K+++++ DFEIVVILEGMVEATAMTTQ RS Sbjct 254 FDSGIDRIFLVSPITIVHEIDEDSPLYDLSKQDIDNADFEIVVILEGMVEATAMTTQCRS 313 Query 306 SYLASEILWGHRFEPVVFEEKSHYKVDYSRFHKTYEVAGTPCCSARELQESKITVLPAPP 365 SYLA+EILWGHR+EPV+FEEK +YKVDYSRFHKTYEV TP CSAR+L E K + A Sbjct 314 SYLANEILWGHRYEPVLFEEKHYYKVDYSRFHKTYEVPNTPLCSARDLAEKKYILSNA-- 371 Query 366 PPPSAFCYENELALMSQEEEEMEEEAAAAAAVAAGLGLEAGSKEEAGIIRMLEFGSHLDL 425 ++FCYENE+AL S+EE++ E + + + +DL Sbjct 372 ---NSFCYENEVALTSKEEDDSENGVPESTST--------------------DTPPDIDL 408 Query 426 ERMQASLPLDNISYRRESAI 445 QAS+PL+ RRES I Sbjct 409 HN-QASVPLEPRPLRRESEI 427 > CHEMBL3038488 [P48544] G protein-activated inward rectifier potassium channel 4 (Homo sapiens) Length=419 Score = 402 bits (1034), Expect = 6e-137, Method: Compositional matrix adjust. Identities = 193/391 (49%), Positives = 279/391 (71%), Gaps = 25/391 (6%) Query 15 RKRRNRFVKKNGQCNVYFANLSNKSQRYMADIFTTCVDTRWRYMLMIFSAAFLVSWLFFG 74 +K R R+++K+G+CNV+ N+ ++ RY++D+FTT VD +WR+ L++F+ + V+WLFFG Sbjct 47 KKPRQRYMEKSGKCNVHHGNV-QETYRYLSDLFTTLVDLKWRFNLLVFTMVYTVTWLFFG 105 Query 75 LLFWCIAFFHGDLEASPGVPAAGGPAAGGGGAAPVAPKPCIMHVNGFLGAFLFSVETQTT 134 ++W IA+ GDL+ G + PC+ +++GF+ AFLFS+ET+TT Sbjct 106 FIWWLIAYIRGDLDHV-------------GDQEWI---PCVENLSGFVSAFLFSIETETT 149 Query 135 IGYGFRCVTEECPLAVIAVVVQSIVGCVIDSFMIGTIMAKMARPKKRAQTLLFSHHAVIS 194 IGYGFR +TE+CP +I ++VQ+I+G ++++FM+G + K+++PKKRA+TL+FS++AVIS Sbjct 150 IGYGFRVITEKCPEGIILLLVQAILGSIVNAFMVGCMFVKISQPKKRAETLMFSNNAVIS 209 Query 195 VRDGKLCLMWRVGNLRKSHIVEAHVRAQLIKPYMTQEGEYLPLDQRDLNVGYDIGLDRIF 254 +RD KLCLM+RVG+LR SHIVEA +RA+LIK T+EGE++PL+Q D+NVG+D G DR+F Sbjct 210 MRDEKLCLMFRVGDLRNSHIVEASIRAKLIKSRQTKEGEFIPLNQTDINVGFDTGDDRLF 269 Query 255 LVSPIIIVHEIDEDSPLYGMGKEELESEDFEIVVILEGMVEATAMTTQARSSYLASEILW 314 LVSP+II HEI++ SP + M + +L E+FE+VVILEGMVEAT MT QARSSY+ +E+LW Sbjct 270 LVSPLIISHEINQKSPFWEMSQAQLHQEEFEVVVILEGMVEATGMTCQARSSYMDTEVLW 329 Query 315 GHRFEPVVFEEKSHYKVDYSRFHKTYEVAGTPCCSARELQESK-----ITVLPAPPPPPS 369 GHRF PV+ EK Y+VDY+ FH TYE TP C A+EL E K + LP+PP Sbjct 330 GHRFTPVLTLEKGFYEVDYNTFHDTYE-TNTPSCCAKELAEMKREGRLLQYLPSPPLLGG 388 Query 370 AFCYENELALMSQEEEEMEEEAAAAAAVAAG 400 C E L +++ EE E + + A G Sbjct 389 --CAEAGLDAEAEQNEEDEPKGLGGSREARG 417 > CHEMBL2406895 [P48051] G protein-activated inward rectifier potassium channel 2 (Homo sapiens) Length=423 Score = 395 bits (1014), Expect = 9e-134, Method: Compositional matrix adjust. Identities = 183/343 (53%), Positives = 257/343 (75%), Gaps = 19/343 (6%) Query 14 RRKRR-NRFVKKNGQCNVYFANLSNKSQRYMADIFTTCVDTRWRYMLMIFSAAFLVSWLF 72 R KR+ R+V+K+G+CNV+ N+ ++ RY+ DIFTT VD +WR+ L+IF + V+WLF Sbjct 48 RTKRKIQRYVRKDGKCNVHHGNV-RETYRYLTDIFTTLVDLKWRFNLLIFVMVYTVTWLF 106 Query 73 FGLLFWCIAFFHGDLEASPGVPAAGGPAAGGGGAAPVAPKPCIMHVNGFLGAFLFSVETQ 132 FG+++W IA+ GD++ P+ PC+ ++NGF+ AFLFS+ET+ Sbjct 107 FGMIWWLIAYIRGDMDH------IEDPSW----------TPCVTNLNGFVSAFLFSIETE 150 Query 133 TTIGYGFRCVTEECPLAVIAVVVQSIVGCVIDSFMIGTIMAKMARPKKRAQTLLFSHHAV 192 TTIGYG+R +T++CP +I +++QS++G ++++FM+G + K+++PKKRA+TL+FS HAV Sbjct 151 TTIGYGYRVITDKCPEGIILLLIQSVLGSIVNAFMVGCMFVKISQPKKRAETLVFSTHAV 210 Query 193 ISVRDGKLCLMWRVGNLRKSHIVEAHVRAQLIKPYMTQEGEYLPLDQRDLNVGYDIGLDR 252 IS+RDGKLCLM+RVG+LR SHIVEA +RA+LIK T EGE++PL+Q D+NVGY G DR Sbjct 211 ISMRDGKLCLMFRVGDLRNSHIVEASIRAKLIKSKQTSEGEFIPLNQTDINVGYYTGDDR 270 Query 253 IFLVSPIIIVHEIDEDSPLYGMGKEELESEDFEIVVILEGMVEATAMTTQARSSYLASEI 312 +FLVSP+II HEI++ SP + + K +L E+ EIVVILEGMVEAT MT QARSSY+ SEI Sbjct 271 LFLVSPLIISHEINQQSPFWEISKAQLPKEELEIVVILEGMVEATGMTCQARSSYITSEI 330 Query 313 LWGHRFEPVVFEEKSHYKVDYSRFHKTYEVAGTPCCSARELQE 355 LWG+RF PV+ E Y+VDY+ FH+TYE + TP SA+EL E Sbjct 331 LWGYRFTPVLTLEDGFYEVDYNSFHETYETS-TPSLSAKELAE 372 Lambda K H a alpha 0.322 0.137 0.415 0.792 4.96 Gapped Lambda K H a alpha sigma 0.267 0.0410 0.140 1.90 42.6 43.6 Effective search space used: 1497848376 Query= Q80Z70_SE1L1_RAT Length=794 Score E Sequences producing significant alignments: (Bits) Value CHEMBL2214 [P41245] Matrix metalloproteinase-9 (Mus musculus) 84.7 4e-17 CHEMBL3870 [P50282] Matrix metalloproteinase-9 (Rattus norvegicus) 80.1 1e-15 CHEMBL321 [P14780] Matrix metalloproteinase-9 (Homo sapiens) 79.7 1e-15 CHEMBL2095216 [P14780] Matrix metalloproteinase-9 (Homo sapiens) 79.7 1e-15 CHEMBL333 [P08253] 72 kDa type IV collagenase (Homo sapiens) 77.0 8e-15 > CHEMBL2214 [P41245] Matrix metalloproteinase-9 (Mus musculus) Length=730 Score = 84.7 bits (208), Expect = 4e-17, Method: Compositional matrix adjust. Identities = 34/56 (61%), Positives = 42/56 (75%), Gaps = 0/56 (0%) Query 116 TAIEGTAHGEPCHFPFLFLDKEYDECTSDGREDGRLWCATTYDYKTDEKWGFCETE 171 T + G + GE C FPF+FL K+Y CTSDGR DGRLWCATT ++ TD+KWGFC + Sbjct 336 TVVGGNSAGELCVFPFVFLGKQYSSCTSDGRRDGRLWCATTSNFDTDKKWGFCPDQ 391 Score = 76.3 bits (186), Expect = 2e-14, Method: Compositional matrix adjust. Identities = 36/108 (33%), Positives = 54/108 (50%), Gaps = 1/108 (1%) Query 114 VLTAIEGTAHGEPCHFPFLFLDKEYDECTSDGREDGRLWCATTYDYKTDEKWGFCETEED 173 V+ G ++G PCHFPF F + Y CT+DGR DG WC+TT DY D K+GFC +E Sbjct 217 VIPTYYGNSNGAPCHFPFTFEGRSYSACTTDGRNDGTPWCSTTADYDKDGKFGFCPSERL 276 Query 174 AAKRRQMQEAEAIYQSGMKILNGSTRKNQKR-EAYRYLQKAAGMNHTK 220 + + ++ + + S + R + YR+ A + K Sbjct 277 YTEHGNGEGKPCVFPFIFEGRSYSACTTKGRSDGYRWCATTANYDQDK 324 Score = 68.6 bits (166), Expect = 4e-12, Method: Compositional matrix adjust. Identities = 28/56 (50%), Positives = 35/56 (63%), Gaps = 0/56 (0%) Query 120 GTAHGEPCHFPFLFLDKEYDECTSDGREDGRLWCATTYDYKTDEKWGFCETEEDAA 175 G G+PC FPF+F + Y CT+ GR DG WCATT +Y D+ +GFC T DA Sbjct 281 GNGEGKPCVFPFIFEGRSYSACTTKGRSDGYRWCATTANYDQDKLYGFCPTRVDAT 336 > CHEMBL3870 [P50282] Matrix metalloproteinase-9 (Rattus norvegicus) Length=708 Score = 80.1 bits (196), Expect = 1e-15, Method: Compositional matrix adjust. Identities = 39/109 (36%), Positives = 55/109 (50%), Gaps = 1/109 (1%) Query 114 VLTAIEGTAHGEPCHFPFLFLDKEYDECTSDGREDGRLWCATTYDYKTDEKWGFCETEED 173 V+ G A+G PCHFPF F + Y CT+DGR DG+ WC TT DY TD K+GFC +E Sbjct 218 VVPTYFGNANGAPCHFPFTFEGRSYLSCTTDGRNDGKPWCGTTADYDTDRKYGFCPSENL 277 Query 174 AAKRRQMQEAEAIYQSGMKILNGSTRKNQKR-EAYRYLQKAAGMNHTKA 221 + ++ + + S + R + YR+ A + KA Sbjct 278 YTEHGNGDGKPCVFPFIFEGHSYSACTTKGRSDGYRWCATTANYDQDKA 326 Score = 79.7 bits (195), Expect = 1e-15, Method: Compositional matrix adjust. Identities = 32/57 (56%), Positives = 41/57 (72%), Gaps = 0/57 (0%) Query 115 LTAIEGTAHGEPCHFPFLFLDKEYDECTSDGREDGRLWCATTYDYKTDEKWGFCETE 171 +T G + GE C FPF+FL K+Y CTS+GR DGRLWCATT ++ D+KWGFC + Sbjct 336 VTVTGGNSAGEMCVFPFVFLGKQYSTCTSEGRSDGRLWCATTSNFDADKKWGFCPDQ 392 Score = 63.9 bits (154), Expect = 1e-10, Method: Compositional matrix adjust. Identities = 27/54 (50%), Positives = 32/54 (59%), Gaps = 0/54 (0%) Query 120 GTAHGEPCHFPFLFLDKEYDECTSDGREDGRLWCATTYDYKTDEKWGFCETEED 173 G G+PC FPF+F Y CT+ GR DG WCATT +Y D+ GFC T D Sbjct 282 GNGDGKPCVFPFIFEGHSYSACTTKGRSDGYRWCATTANYDQDKADGFCPTRAD 335 > CHEMBL321 [P14780] Matrix metalloproteinase-9 (Homo sapiens) Length=707 Score = 79.7 bits (195), Expect = 1e-15, Method: Compositional matrix adjust. Identities = 33/56 (59%), Positives = 41/56 (73%), Gaps = 0/56 (0%) Query 116 TAIEGTAHGEPCHFPFLFLDKEYDECTSDGREDGRLWCATTYDYKTDEKWGFCETE 171 T + G + GE C FPF FL KEY CTS+GR DGRLWCATT ++ +D+KWGFC + Sbjct 336 TVMGGNSAGELCVFPFTFLGKEYSTCTSEGRGDGRLWCATTSNFDSDKKWGFCPDQ 391 Score = 73.6 bits (179), Expect = 1e-13, Method: Compositional matrix adjust. Identities = 32/69 (46%), Positives = 44/69 (64%), Gaps = 1/69 (1%) Query 105 EELKRVRKPVLTAIE-GTAHGEPCHFPFLFLDKEYDECTSDGREDGRLWCATTYDYKTDE 163 +EL + K V+ G A G CHFPF+F + Y CT+DGR DG WC+TT +Y TD+ Sbjct 207 DELWSLGKGVVVPTRFGNADGAACHFPFIFEGRSYSACTTDGRSDGLPWCSTTANYDTDD 266 Query 164 KWGFCETEE 172 ++GFC +E Sbjct 267 RFGFCPSER 275 Score = 72.8 bits (177), Expect = 2e-13, Method: Compositional matrix adjust. Identities = 29/57 (51%), Positives = 38/57 (67%), Gaps = 0/57 (0%) Query 119 EGTAHGEPCHFPFLFLDKEYDECTSDGREDGRLWCATTYDYKTDEKWGFCETEEDAA 175 +G A G+PC FPF+F + Y CT+DGR DG WCATT +Y D+ +GFC T D+ Sbjct 280 DGNADGKPCQFPFIFQGQSYSACTTDGRSDGYRWCATTANYDRDKLFGFCPTRADST 336 > CHEMBL2095216 [P14780] Matrix metalloproteinase-9 (Homo sapiens) Length=707 Score = 79.7 bits (195), Expect = 1e-15, Method: Compositional matrix adjust. Identities = 33/56 (59%), Positives = 41/56 (73%), Gaps = 0/56 (0%) Query 116 TAIEGTAHGEPCHFPFLFLDKEYDECTSDGREDGRLWCATTYDYKTDEKWGFCETE 171 T + G + GE C FPF FL KEY CTS+GR DGRLWCATT ++ +D+KWGFC + Sbjct 336 TVMGGNSAGELCVFPFTFLGKEYSTCTSEGRGDGRLWCATTSNFDSDKKWGFCPDQ 391 Score = 73.6 bits (179), Expect = 1e-13, Method: Compositional matrix adjust. Identities = 32/69 (46%), Positives = 44/69 (64%), Gaps = 1/69 (1%) Query 105 EELKRVRKPVLTAIE-GTAHGEPCHFPFLFLDKEYDECTSDGREDGRLWCATTYDYKTDE 163 +EL + K V+ G A G CHFPF+F + Y CT+DGR DG WC+TT +Y TD+ Sbjct 207 DELWSLGKGVVVPTRFGNADGAACHFPFIFEGRSYSACTTDGRSDGLPWCSTTANYDTDD 266 Query 164 KWGFCETEE 172 ++GFC +E Sbjct 267 RFGFCPSER 275 Score = 72.8 bits (177), Expect = 2e-13, Method: Compositional matrix adjust. Identities = 29/57 (51%), Positives = 38/57 (67%), Gaps = 0/57 (0%) Query 119 EGTAHGEPCHFPFLFLDKEYDECTSDGREDGRLWCATTYDYKTDEKWGFCETEEDAA 175 +G A G+PC FPF+F + Y CT+DGR DG WCATT +Y D+ +GFC T D+ Sbjct 280 DGNADGKPCQFPFIFQGQSYSACTTDGRSDGYRWCATTANYDRDKLFGFCPTRADST 336 > CHEMBL333 [P08253] 72 kDa type IV collagenase (Homo sapiens) Length=660 Score = 77.0 bits (188), Expect = 8e-15, Method: Compositional matrix adjust. Identities = 29/55 (53%), Positives = 38/55 (69%), Gaps = 0/55 (0%) Query 114 VLTAIEGTAHGEPCHFPFLFLDKEYDECTSDGREDGRLWCATTYDYKTDEKWGFC 168 ++ + G + G PC FPF FL +Y+ CTS GR DG++WCATT +Y D KWGFC Sbjct 336 AMSTVGGNSEGAPCVFPFTFLGNKYESCTSAGRSDGKMWCATTANYDDDRKWGFC 390 Score = 72.8 bits (177), Expect = 2e-13, Method: Compositional matrix adjust. Identities = 32/58 (55%), Positives = 39/58 (67%), Gaps = 0/58 (0%) Query 114 VLTAIEGTAHGEPCHFPFLFLDKEYDECTSDGREDGRLWCATTYDYKTDEKWGFCETE 171 V+ G A GE C FPFLF KEY+ CT GR DG LWC+TTY+++ D K+GFC E Sbjct 220 VVRVKYGNADGEYCKFPFLFNGKEYNSCTDTGRSDGFLWCSTTYNFEKDGKYGFCPHE 277 Score = 70.5 bits (171), Expect = 9e-13, Method: Compositional matrix adjust. Identities = 29/55 (53%), Positives = 35/55 (64%), Gaps = 0/55 (0%) Query 114 VLTAIEGTAHGEPCHFPFLFLDKEYDECTSDGREDGRLWCATTYDYKTDEKWGFC 168 L + G A G+PC FPF F YD CT++GR DG WC TT DY D+K+GFC Sbjct 278 ALFTMGGNAEGQPCKFPFRFQGTSYDSCTTEGRTDGYRWCGTTEDYDRDKKYGFC 332 Lambda K H a alpha 0.317 0.134 0.394 0.792 4.96 Gapped Lambda K H a alpha sigma 0.267 0.0410 0.140 1.90 42.6 43.6 Effective search space used: 2947914464 Query= P33277_GAP1_SCHPO Length=766 Score E Sequences producing significant alignments: (Bits) Value CHEMBL2176807 [Q04690] Neurofibromin (Mus musculus) 75.1 5e-14 CHEMBL2176804 [Q9QUH6] Ras/Rap GTPase-activating protein SynGAP... 51.2 9e-07 > CHEMBL2176807 [Q04690] Neurofibromin (Mus musculus) Length=2841 Score = 75.1 bits (183), Expect = 5e-14, Method: Compositional matrix adjust. Identities = 62/243 (26%), Positives = 109/243 (45%), Gaps = 53/243 (22%) Query 164 YESREEHLLLSLFQMVLTTEFEATSDVLSLLRANTPVSRMLTTYTRRGPGQAYLRSILYQ 223 ++SR HLL L + + E E + +L R N+ S+++T + + G YL+ +L Sbjct 1249 FDSR--HLLYQLLWNMFSKEVELADSMQTLFRGNSLASKIMT-FCFKVYGATYLQKLLDP 1305 Query 224 CINDVAIHPDLQLDIHPLSVYRYLVNTGQLSPSEDDNLLTNEEVSEFPAVKNAIQERSAQ 283 + + D Q H + V+ +L PSE +++E Sbjct 1306 LLRVIITSSDWQ---H----VSFEVDPTRLEPSE------------------SLEENQRN 1340 Query 284 LLLLTKRFLDAVLNSIDEIPYGIRWVCKLI---------------------RNLTNRLFP 322 LL +T++F A+++S E P +R VC + +++ ++ FP Sbjct 1341 LLQMTEKFFHAIISSSSEFPSQLRSVCHCLYQATCHSLLNKATVKERKENKKSVVSQRFP 1400 Query 323 SISDSTICSLIGGFFFLRFVNPAIISPQTSMLLDSCPSDNVRKTLATIAKIIQSVANGTS 382 S +G FLRF+NPAI+SP + +LD P + + L ++K++QS+AN Sbjct 1401 QNS----IGAVGSAMFLRFINPAIVSPYEAGILDKKPPPRIERGLKLMSKVLQSIANHVL 1456 Query 383 STK 385 TK Sbjct 1457 FTK 1459 > CHEMBL2176804 [Q9QUH6] Ras/Rap GTPase-activating protein SynGAP (Rattus norvegicus) Length=1308 Score = 51.2 bits (121), Expect = 9e-07, Method: Compositional matrix adjust. Identities = 52/192 (27%), Positives = 84/192 (44%), Gaps = 8/192 (4%) Query 264 NEEVSEFPAVKNAIQERSAQLLLLTKRFLDAVLNSIDEIPYGIRWVCKLIR-NLTNRLFP 322 N EV +++ E A L + + L V+NS P ++ V R R Sbjct 523 NCEVDPIKCTASSLAEHQANLRMCCELALCKVVNSHCVFPRELKEVFASWRLRCAERGRE 582 Query 323 SISDSTICSLIGGFFFLRFVNPAIISPQTSMLLDSCPSDNVRKTLATIAKIIQSVANGTS 382 I+D LI FLRF+ PAI+SP L+ P + +TL IAK+IQ++AN + Sbjct 583 DIADR----LISASLFLRFLCPAIMSPSLFGLMQEYPDEQTSRTLTLIAKVIQNLANFSK 638 Query 383 STKTHLDVSFQPMLKEYE-EKVHNLLRKLGNVGDFFEALELDQYIALSKKSLALEMTVNE 441 T + F E E + L ++ N+ + + YI L ++ L + E Sbjct 639 FTSKEDFLGFMNEFLELEWGSMQQFLYEISNLDTLTNSSSFEGYIDLGRELSTLHALLWE 698 Query 442 I--YLTHEIILE 451 + L+ E +L+ Sbjct 699 VLPQLSKEALLK 710 Lambda K H a alpha 0.320 0.135 0.381 0.792 4.96 Gapped Lambda K H a alpha sigma 0.267 0.0410 0.140 1.90 42.6 43.6 Effective search space used: 2828634688 Query= Q96PD4_IL17F_HUMAN Length=163 Score E Sequences producing significant alignments: (Bits) Value CHEMBL3390822 [Q16552] Interleukin-17A (Homo sapiens) 125 9e-37 > CHEMBL3390822 [Q16552] Interleukin-17A (Homo sapiens) Length=155 Score = 125 bits (315), Expect = 9e-37, Method: Compositional matrix adjust. Identities = 61/108 (56%), Positives = 76/108 (70%), Gaps = 0/108 (0%) Query 55 MKLDIGIINENQRVSMSRNIESRSTSPWNYTVTWDPNRYPSEVVQAQCRNLGCINAQGKE 114 + L+I N N S + +RSTSPWN DP RYPS + +A+CR+LGCINA G Sbjct 47 VNLNIHNRNTNTNPKRSSDYYNRSTSPWNLHRNEDPERYPSVIWEAKCRHLGCINADGNV 106 Query 115 DISMNSVPIQQETLVVRRKHQGCSVSFQLEKVLVTVGCTCVTPVIHHV 162 D MNSVPIQQE LV+RR+ C SF+LEK+LV+VGCTCVTP++HHV Sbjct 107 DYHMNSVPIQQEILVLRREPPHCPNSFRLEKILVSVGCTCVTPIVHHV 154 Lambda K H a alpha 0.320 0.133 0.406 0.792 4.96 Gapped Lambda K H a alpha sigma 0.267 0.0410 0.140 1.90 42.6 43.6 Effective search space used: 338902872 Query= P10144_GRAB_HUMAN Length=247 Score E Sequences producing significant alignments: (Bits) Value CHEMBL2316 [P10144] Granzyme B (Homo sapiens) 519 0.0 CHEMBL5622 [P28293] Cathepsin G (Mus musculus) 270 2e-90 CHEMBL4071 [P08311] Cathepsin G (Homo sapiens) 266 8e-89 CHEMBL4068 [P23946] Chymase (Homo sapiens) 238 5e-78 CHEMBL2132 [O35164] Mast cell protease 9 (Mus musculus) 209 1e-66 > CHEMBL2316 [P10144] Granzyme B (Homo sapiens) Length=247 Score = 519 bits (1337), Expect = 0.0, Method: Compositional matrix adjust. Identities = 247/247 (100%), Positives = 247/247 (100%), Gaps = 0/247 (0%) Query 1 MQPILLLLAFLLLPRADAGEIIGGHEAKPHSRPYMAYLMIWDQKSLKRCGGFLIRDDFVL 60 MQPILLLLAFLLLPRADAGEIIGGHEAKPHSRPYMAYLMIWDQKSLKRCGGFLIRDDFVL Sbjct 1 MQPILLLLAFLLLPRADAGEIIGGHEAKPHSRPYMAYLMIWDQKSLKRCGGFLIRDDFVL 60 Query 61 TAAHCWGSSINVTLGAHNIKEQEPTQQFIPVKRPIPHPAYNPKNFSNDIMLLQLERKAKR 120 TAAHCWGSSINVTLGAHNIKEQEPTQQFIPVKRPIPHPAYNPKNFSNDIMLLQLERKAKR Sbjct 61 TAAHCWGSSINVTLGAHNIKEQEPTQQFIPVKRPIPHPAYNPKNFSNDIMLLQLERKAKR 120 Query 121 TRAVQPLRLPSNKAQVKPGQTCSVAGWGQTAPLGKHSHTLQEVKMTVQEDRKCESDLRHY 180 TRAVQPLRLPSNKAQVKPGQTCSVAGWGQTAPLGKHSHTLQEVKMTVQEDRKCESDLRHY Sbjct 121 TRAVQPLRLPSNKAQVKPGQTCSVAGWGQTAPLGKHSHTLQEVKMTVQEDRKCESDLRHY 180 Query 181 YDSTIELCVGDPEIKKTSFKGDSGGPLVCNKVAQGIVSYGRNNGMPPRACTKVSSFVHWI 240 YDSTIELCVGDPEIKKTSFKGDSGGPLVCNKVAQGIVSYGRNNGMPPRACTKVSSFVHWI Sbjct 181 YDSTIELCVGDPEIKKTSFKGDSGGPLVCNKVAQGIVSYGRNNGMPPRACTKVSSFVHWI 240 Query 241 KKTMKRY 247 KKTMKRY Sbjct 241 KKTMKRY 247 > CHEMBL5622 [P28293] Cathepsin G (Mus musculus) Length=261 Score = 270 bits (691), Expect = 2e-90, Method: Compositional matrix adjust. Identities = 142/247 (57%), Positives = 189/247 (77%), Gaps = 2/247 (1%) Query 1 MQPILLLLAFLLLPRADAGEIIGGHEAKPHSRPYMAYLMIWDQKSLKRCGGFLIRDDFVL 60 MQP+LLLL F+LL +AG+IIGG EA+PHS PYMA+L+I + L CGGFL+R+DFVL Sbjct 1 MQPLLLLLTFILLQGDEAGKIIGGREARPHSYPYMAFLLIQSPEGLSACGGFLVREDFVL 60 Query 61 TAAHCWGSSINVTLGAHNIKEQEPTQQFIPVKRPIPHPAYNPKNFSNDIMLLQLERKAKR 120 TAAHC GSSINVTLGAHNI+ +E TQQ I V R I HP YNP+N NDIMLLQL R+A+R Sbjct 61 TAAHCLGSSINVTLGAHNIQMRERTQQLITVLRAIRHPDYNPQNIRNDIMLLQLRRRARR 120 Query 121 TRAVQPLRLPSNKAQVKPGQTCSVAGWGQTAPLGKHSHTLQEVKMTVQEDRKCESDLRHY 180 + +V+P+ LP +++PG C+VAGWG+ + + ++ LQEV++ VQ D+ C + + Sbjct 121 SGSVKPVALPQASKKLQPGDLCTVAGWGRVSQ-SRGTNVLQEVQLRVQMDQMCANRF-QF 178 Query 181 YDSTIELCVGDPEIKKTSFKGDSGGPLVCNKVAQGIVSYGRNNGMPPRACTKVSSFVHWI 240 Y+S ++CVG+P +K++F+GDSGGPLVC+ VAQGIVSYG NNG PP TK+ SF+ WI Sbjct 179 YNSQTQICVGNPRERKSAFRGDSGGPLVCSNVAQGIVSYGSNNGNPPAVFTKIQSFMPWI 238 Query 241 KKTMKRY 247 K+TM+R+ Sbjct 239 KRTMRRF 245 > CHEMBL4071 [P08311] Cathepsin G (Homo sapiens) Length=255 Score = 266 bits (679), Expect = 8e-89, Method: Compositional matrix adjust. Identities = 143/249 (57%), Positives = 185/249 (74%), Gaps = 6/249 (2%) Query 1 MQPILLLLAFLLLPRADAGEIIGGHEAKPHSRPYMAYLMIWDQKSLKRCGGFLIRDDFVL 60 MQP+LLLLAFLL A+AGEIIGG E++PHSRPYMAYL I RCGGFL+R+DFVL Sbjct 1 MQPLLLLLAFLLPTGAEAGEIIGGRESRPHSRPYMAYLQIQSPAGQSRCGGFLVREDFVL 60 Query 61 TAAHCWGSSINVTLGAHNIKEQEPTQQFIPVKRPIPHPAYNPKNFSNDIMLLQLERKAKR 120 TAAHCWGS+INVTLGAHNI+ +E TQQ I +R I HP YN + NDIMLLQL R+ +R Sbjct 61 TAAHCWGSNINVTLGAHNIQRRENTQQHITARRAIRHPQYNQRTIQNDIMLLQLSRRVRR 120 Query 121 TRAVQPLRLPSNKAQVKPGQTCSVAGWGQTAPLGKHSHTLQEVKMTVQEDRKCESDLRHY 180 R V P+ LP + ++PG C+VAGWG+ + + + + TL+EV++ VQ DR+C LR + Sbjct 121 NRNVNPVALPRAQEGLRPGTLCTVAGWGRVS-MRRGTDTLREVQLRVQRDRQC---LRIF 176 Query 181 --YDSTIELCVGDPEIKKTSFKGDSGGPLVCNKVAQGIVSYGRNNGMPPRACTKVSSFVH 238 YD ++CVGD +K +FKGDSGGPL+CN VA GIVSYG+++G+PP T+VSSF+ Sbjct 177 GSYDPRRQICVGDRRERKAAFKGDSGGPLLCNNVAHGIVSYGKSSGVPPEVFTRVSSFLP 236 Query 239 WIKKTMKRY 247 WI+ TM+ + Sbjct 237 WIRTTMRSF 245 > CHEMBL4068 [P23946] Chymase (Homo sapiens) Length=247 Score = 238 bits (607), Expect = 5e-78, Method: Compositional matrix adjust. Identities = 125/232 (54%), Positives = 156/232 (67%), Gaps = 3/232 (1%) Query 15 RADAGEIIGGHEAKPHSRPYMAYL-MIWDQKSLKRCGGFLIRDDFVLTAAHCWGSSINVT 73 RA+AGEIIGG E KPHSRPYMAYL ++ K CGGFLIR +FVLTAAHC G SI VT Sbjct 16 RAEAGEIIGGTECKPHSRPYMAYLEIVTSNGPSKFCGGFLIRRNFVLTAAHCAGRSITVT 75 Query 74 LGAHNIKEQEPTQQFIPVKRPIPHPAYNPKNFSNDIMLLQLERKAKRTRAVQPLRLPSNK 133 LGAHNI E+E T Q + V + HP YN +DIMLL+L+ KA T AV L PS Sbjct 76 LGAHNITEEEDTWQKLEVIKQFRHPKYNTSTLHHDIMLLKLKEKASLTLAVGTLPFPSQF 135 Query 134 AQVKPGQTCSVAGWGQTAPLGKHSHTLQEVKMTVQEDRKCESDLRHYYDSTIELCVGDPE 193 V PG+ C VAGWG+T L S TLQEVK+ + + + C S R +D ++LCVG+P Sbjct 136 NFVPPGRMCRVAGWGRTGVLKPGSDTLQEVKLRLMDPQAC-SHFRD-FDHNLQLCVGNPR 193 Query 194 IKKTSFKGDSGGPLVCNKVAQGIVSYGRNNGMPPRACTKVSSFVHWIKKTMK 245 K++FKGDSGGPL+C VAQGIVSYGR++ PP T++S + WI + ++ Sbjct 194 KTKSAFKGDSGGPLLCAGVAQGIVSYGRSDAKPPAVFTRISHYRPWINQILQ 245 > CHEMBL2132 [O35164] Mast cell protease 9 (Mus musculus) Length=246 Score = 209 bits (531), Expect = 1e-66, Method: Compositional matrix adjust. Identities = 111/232 (48%), Positives = 150/232 (65%), Gaps = 3/232 (1%) Query 15 RADAGEIIGGHEAKPHSRPYMAYLMIWDQKS-LKRCGGFLIRDDFVLTAAHCWGSSINVT 73 RA A EIIGG E++PHSRPYMAY+ + +K + CGGFLI FV+TAAHC G + VT Sbjct 15 RAGAEEIIGGVESEPHSRPYMAYVNTFSKKGYVAICGGFLIAPQFVMTAAHCSGRRMTVT 74 Query 74 LGAHNIKEQEPTQQFIPVKRPIPHPAYNPKNFSNDIMLLQLERKAKRTRAVQPLRLPSNK 133 LGAHN++++E TQQ I V++ I P YN + NDI+LL+L+++A T AV + LP Sbjct 75 LGAHNVRKRECTQQKIKVEKYILPPNYNVSSKFNDIVLLKLKKQANLTSAVDVVPLPGPS 134 Query 134 AQVKPGQTCSVAGWGQTAPLGKHSHTLQEVKMTVQEDRKCESDLRHYYDSTIELCVGDPE 193 KPG C AGWG+T SHTL+EV++ + ++ C+ RHY DS +++CVG Sbjct 135 DFAKPGTMCWAAGWGRTGVKKSISHTLREVELKIVGEKACK-IFRHYKDS-LQICVGSST 192 Query 194 IKKTSFKGDSGGPLVCNKVAQGIVSYGRNNGMPPRACTKVSSFVHWIKKTMK 245 + + GDSGGPL+C VA GIVS GR N PP T++S V WI + +K Sbjct 193 KVASVYMGDSGGPLLCAGVAHGIVSSGRGNAKPPAIFTRISPHVPWINRVIK 244 Lambda K H a alpha 0.320 0.136 0.425 0.792 4.96 Gapped Lambda K H a alpha sigma 0.267 0.0410 0.140 1.90 42.6 43.6 Effective search space used: 679717896 Database: chembl_21.fa Posted date: Jun 14, 2016 2:00 PM Number of letters in database: 5,161,060 Number of sequences in database: 8,834 Matrix: BLOSUM62 Gap Penalties: Existence: 11, Extension: 1 Neighboring words threshold: 11 Window for multiple hits: 40
Yes.
The output above can described as 'classic' BLAST output and though it is fairly easy to read, can be quite tricky to parse. To help us, there are many programming language specific libraries which allow you to run BLAST searches and easily parse the output, for example BioPerl, BioJava and BioRuby. Do not worry, as we are working in a Python environment we have the Biopython library to our disposal, so lets get started. We can wrap a commandline BLAST search with the NcbiblastpCommandline method:
from Bio.Blast.Applications import NcbiblastpCommandline
# The outfmt=5 value creates an XML formatted file
blastp_cmd = NcbiblastpCommandline(cmd=blast_exe, query=query_file, db=database, outfmt=5, out=results_xml, evalue=eval_threshold)
stdout, stderr = blastp_cmd()
from Bio.Blast import NCBIXML
result_handle = open(results_xml)
blast_records = NCBIXML.parse(result_handle)
E_VALUE_THRESH = 0.04
result_counter = 0
for blast_record in blast_records:
for alignment in blast_record.alignments:
result_counter+=1
for hsp in alignment.hsps:
if result_counter <= 5:
print '\n# Result ', result_counter, '#'
print 'Sequence: ' + alignment.title
print 'Length: ', alignment.length
print 'E-Value: ', hsp.expect
print 'Score: ', hsp.score
print 'Identities:', hsp.identities
print(hsp.query[0:75] + '...')
print(hsp.match[0:75] + '...')
print(hsp.sbjct[0:75] + '...')
# Result 1 # Sequence: gnl|BL_ORD_ID|7974 CHEMBL2150840 [Q96P68] 2-oxoglutarate receptor 1 (Homo sapiens) Length: 337 E-Value: 0.0 Score: 1775.0 Identities: 337 MNEPLDYLANASDFPDYAAAFGNCTDENIPLKMHYLPVIYGIIFLVGFPGNAVVISTYIFKMRPWKSSTIIMLNL... MNEPLDYLANASDFPDYAAAFGNCTDENIPLKMHYLPVIYGIIFLVGFPGNAVVISTYIFKMRPWKSSTIIMLNL... MNEPLDYLANASDFPDYAAAFGNCTDENIPLKMHYLPVIYGIIFLVGFPGNAVVISTYIFKMRPWKSSTIIMLNL... # Result 2 # Sequence: gnl|BL_ORD_ID|1266 CHEMBL2325 [Q6Y1R5] 2-oxoglutarate receptor 1 (Rattus norvegicus) Length: 337 E-Value: 0.0 Score: 1492.0 Identities: 289 MNEPLDYLANASDFPDYAAAFGNCTDENIPLKMHYLPVIYGIIFLVGFPGNAVVISTYIFKMRPWKSSTIIMLNL... M E LD AN SDF DY A NCTDE I KM YLPVIY IIFLVGFPGN V IS Y+FKMRPWKSSTIIMLNL... MIETLDSPANDSDFLDYITALENCTDEQISFKMQYLPVIYSIIFLVGFPGNTVAISIYVFKMRPWKSSTIIMLNL... # Result 3 # Sequence: gnl|BL_ORD_ID|3367 CHEMBL4315 [P47900] P2Y purinoceptor 1 (Homo sapiens) Length: 373 E-Value: 9.48885e-67 Score: 550.0 Identities: 108 NCTDENIPLKMHYLPVIYGIIFLVGFPGNAVVISTYIFKMRPWKSSTIIMLNLACTDLLYLTSLPFLIHYYASGE... C + +YLP +Y ++F++GF GN+V I ++F M+PW ++ M NLA D LY+ +LP LI YY + ... KCALTKTGFQFYYLPAVYILVFIIGFLGNSVAIWMFVFHMKPWSGISVYMFNLALADFLYVLTLPALIFYYFNKT... # Result 4 # Sequence: gnl|BL_ORD_ID|5348 CHEMBL5720 [P49652] P2Y purinoceptor 1 (Meleagris gallopavo) Length: 362 E-Value: 1.43544e-66 Score: 548.0 Identities: 106 NCTDENIPLKMHYLPVIYGIIFLVGFPGNAVVISTYIFKMRPWKSSTIIMLNLACTDLLYLTSLPFLIHYYASGE... C+ + +YLP +Y ++F+ GF GN+V I ++F MRPW ++ M NLA D LY+ +LP LI YY + ... KCSLTKTGFQFYYLPTVYILVFITGFLGNSVAIWMFVFHMRPWSGISVYMFNLALADFLYVLTLPALIFYYFNKT... # Result 5 # Sequence: gnl|BL_ORD_ID|1748 CHEMBL2497 [P49651] P2Y purinoceptor 1 (Rattus norvegicus) Length: 373 E-Value: 2.52191e-66 Score: 548.0 Identities: 108 NCTDENIPLKMHYLPVIYGIIFLVGFPGNAVVISTYIFKMRPWKSSTIIMLNLACTDLLYLTSLPFLIHYYASGE... C + +YLP +Y ++F++GF GN+V I ++F M+PW ++ M NLA D LY+ +LP LI YY + ... RCALIKTGFQFYYLPAVYILVFIIGFLGNSVAIWMFVFHMKPWSGISVYMFNLALADFLYVLTLPALIFYYFNKT...
Biopython is great and provides you with lots of additional functionality, but for the purpose of this tutorial we will now turn our attention to processing BLAST data using pandas. To get started we need to turn our BLAST output into a 'tabular' format. Fortunately we can create a CSV BLAST results file, so lets create this now (one thing to note about the BLAST CSV output, is that it does not include the BLAST alignments):
# Create a blast output file in csv format so that it can easily be loaded by pandas
# The outfmt=10 value creates an CSV formatted file
!$blast_exe -query $query_file -db $database -outfmt 10 -out $results_csv -evalue $eval_threshold
We can now load the BLAST results into a pandas dataframe. You should be able to map the result values (e.g. length, identity, e-value,..) in the table below to the earlier 'classic' and bioptyhon BLAST results:
# Now load BLAST information into pandas dataframe
import pandas
from pandas import DataFrame, read_csv, merge
from pandas.io import sql
from pandas.io.sql import read_sql
# Limit the default pandas dataframe size
pandas.set_option('display.max_rows', 10)
# Setup database connection to local ChEMBL instance
import psycopg2
con = psycopg2.connect(port=5432, user='chembl', dbname='chembl_21')
Location = results_csv
blast_df = read_csv(Location, names=['query', 'chembl_target_id', 'identity', 'length', 'mismatch', 'gapopen', 'qstart', 'qend', 'sstart', 'send', 'evalue', 'bitscore'])
blast_df
query | chembl_target_id | identity | length | mismatch | gapopen | qstart | qend | sstart | send | evalue | bitscore | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Q96P68_OXGR1_HUMAN | CHEMBL2150840 | 100.00 | 337 | 0 | 0 | 1 | 337 | 1 | 337 | 0.000000e+00 | 688.0 |
1 | Q96P68_OXGR1_HUMAN | CHEMBL2325 | 85.76 | 337 | 48 | 0 | 1 | 337 | 1 | 337 | 0.000000e+00 | 579.0 |
2 | Q96P68_OXGR1_HUMAN | CHEMBL4315 | 36.00 | 300 | 188 | 2 | 23 | 318 | 41 | 340 | 9.000000e-67 | 216.0 |
3 | Q96P68_OXGR1_HUMAN | CHEMBL5720 | 35.33 | 300 | 190 | 2 | 23 | 318 | 30 | 329 | 1.000000e-66 | 215.0 |
4 | Q96P68_OXGR1_HUMAN | CHEMBL2497 | 36.00 | 300 | 188 | 2 | 23 | 318 | 41 | 340 | 3.000000e-66 | 215.0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
775 | P10144_GRAB_HUMAN | CHEMBL1075308 | 27.61 | 268 | 160 | 13 | 12 | 247 | 352 | 617 | 1.000000e-12 | 65.9 |
776 | P10144_GRAB_HUMAN | CHEMBL3078 | 28.03 | 239 | 140 | 11 | 12 | 220 | 351 | 587 | 1.000000e-11 | 62.8 |
777 | P10144_GRAB_HUMAN | CHEMBL2040703 | 27.71 | 231 | 134 | 12 | 49 | 247 | 5 | 234 | 1.000000e-11 | 61.2 |
778 | P10144_GRAB_HUMAN | CHEMBL5731 | 25.94 | 239 | 120 | 13 | 30 | 220 | 491 | 720 | 1.000000e-09 | 57.0 |
779 | P10144_GRAB_HUMAN | CHEMBL3929 | 24.38 | 160 | 113 | 4 | 53 | 206 | 1 | 158 | 4.000000e-07 | 47.4 |
780 rows × 12 columns
We have not done anything new yet, just presented the BLAST results in yet another format, so what next?
The benefit of using a pandas dataframe is that it makes it very easy for us to join the BLAST resultset to another pandas dataframe resultset, in a similar way to how you might join 2 or more tables in an SQL query. So lets create some additional dataframes.
First, lets get some additional about the ChEMBL targets, such as names, organism:
# Select additional target information from the target_dictionary table
sql1 = """
select td.chembl_id as chembl_target_id,
td.pref_name,
td.organism,
td.tax_id,
td.tid,
td.target_type
from target_dictionary td
"""
chembl_target_df = read_sql(sql1, con)
chembl_target_df
chembl_target_id | pref_name | organism | tax_id | tid | target_type | |
---|---|---|---|---|---|---|
0 | CHEMBL2074 | Maltase-glucoamylase | Homo sapiens | 9606.0 | 1 | SINGLE PROTEIN |
1 | CHEMBL1971 | Sulfonylurea receptor 2 | Homo sapiens | 9606.0 | 2 | SINGLE PROTEIN |
2 | CHEMBL1827 | Phosphodiesterase 5A | Homo sapiens | 9606.0 | 3 | SINGLE PROTEIN |
3 | CHEMBL1859 | Voltage-gated T-type calcium channel alpha-1H ... | Homo sapiens | 9606.0 | 4 | SINGLE PROTEIN |
4 | CHEMBL1884 | Nicotinic acetylcholine receptor alpha subunit | Ascaris suum | 6253.0 | 5 | SINGLE PROTEIN |
... | ... | ... | ... | ... | ... | ... |
11014 | CHEMBL3559688 | Frizzled-7 | Homo sapiens | 9606.0 | 109743 | SINGLE PROTEIN |
11015 | CHEMBL3559689 | Frizzled-8 | Homo sapiens | 9606.0 | 109744 | SINGLE PROTEIN |
11016 | CHEMBL3559691 | Cyclin-dependent kinase | Homo sapiens | 9606.0 | 109746 | PROTEIN FAMILY |
11017 | CHEMBL3559701 | Proto-oncogene Mas | Homo sapiens | 9606.0 | 109748 | SINGLE PROTEIN |
11018 | CHEMBL3559703 | PI3-kinase class I | Homo sapiens | 9606.0 | 109750 | PROTEIN COMPLEX GROUP |
11019 rows × 6 columns
Next, lets use the ChEMBL database to get a count FDA approved drugs that bind each of the targets in the database:
# We can traverse the ChEMBL activities table to get the count of FDA approved molecules,
# which bind ChEMBL targets with high affinity
sql2 = """
select t.chembl_id as chembl_target_id,
count(m.chembl_id) as drug_count
from activities a,
assays s,
target_dictionary t,
molecule_dictionary m
where a.assay_id=s.assay_id
and s.tid=t.tid
and m.molregno=a.molregno
and a.pchembl_value >= 6
and s.confidence_score >= 8
and m.max_phase = 4
and m.therapeutic_flag=1
group by t.chembl_id
"""
chembl_drug_df = read_sql(sql2, con)
chembl_drug_df
chembl_target_id | drug_count | |
---|---|---|
0 | CHEMBL3004 | 7 |
1 | CHEMBL3829 | 8 |
2 | CHEMBL2658 | 1 |
3 | CHEMBL5464 | 6 |
4 | CHEMBL3563 | 1 |
... | ... | ... |
1145 | CHEMBL3350222 | 2 |
1146 | CHEMBL1697657 | 2 |
1147 | CHEMBL281 | 99 |
1148 | CHEMBL1667701 | 1 |
1149 | CHEMBL3994 | 2 |
1150 rows × 2 columns
We can use the ChEMBL database again to get a count of 'drug-like' molecules that bind each of the targets in the database:
# Similar to the previous query, but this time get the count of 'drug-like' compounds (defined by
# having no rule-of-5 violations), which bind ChEMBL targets with high affinity
sql3 = """
select t.chembl_id as chembl_target_id,
count(m.chembl_id) as druglike_count
from activities a,
assays s,
target_dictionary t,
molecule_dictionary m,
compound_properties p
where a.assay_id=s.assay_id
and s.tid=t.tid
and m.molregno=a.molregno
and m.molregno=p.molregno
and a.pchembl_value >= 6
and s.confidence_score >= 8
and p.num_ro5_violations=0
group by t.chembl_id
"""
chembl_druglike_df = read_sql(sql3, con)
chembl_druglike_df
chembl_target_id | druglike_count | |
---|---|---|
0 | CHEMBL2658 | 8 |
1 | CHEMBL5464 | 70 |
2 | CHEMBL1293246 | 24 |
3 | CHEMBL1075140 | 89 |
4 | CHEMBL4722 | 1051 |
... | ... | ... |
3048 | CHEMBL5023 | 76 |
3049 | CHEMBL2803 | 85 |
3050 | CHEMBL5533 | 15 |
3051 | CHEMBL5859 | 1 |
3052 | CHEMBL5982 | 21 |
3053 rows × 2 columns
Finally, lets get the list of known drug-target interactions from the ChEMBL Mechanism of Action tables, as not all interactions will be captured in the activities table:
# Get the count of molecules assoicated to a ChEMBL target through a known Mechanism of Action
sql4 = """
select td.chembl_id as chembl_target_id,
count(distinct dm.molregno) as moa_count
from drug_mechanism dm,
target_dictionary td
where dm.tid=td.tid
group by td.chembl_id
"""
chembl_moa_df = read_sql(sql4, con)
chembl_moa_df
chembl_target_id | moa_count | |
---|---|---|
0 | CHEMBL1075319 | 1 |
1 | CHEMBL1169596 | 1 |
2 | CHEMBL1169600 | 1 |
3 | CHEMBL1250417 | 2 |
4 | CHEMBL1293316 | 1 |
... | ... | ... |
714 | CHEMBL5936 | 2 |
715 | CHEMBL5971 | 8 |
716 | CHEMBL6007 | 1 |
717 | CHEMBL6120 | 4 |
718 | CHEMBL613897 | 10 |
719 rows × 2 columns
We now have 5 resultsets:
So we can now think about merging the resultsets together. By planned good fortune each of the resultsets share the attribute 'chembl_target_id', so lets us that to merge:
# Carry out the merge and also only return columns we are interested in
rs_merge_df = merge(blast_df,
chembl_target_df, how='left', on='chembl_target_id' ).merge(
chembl_drug_df, how='left', on='chembl_target_id' ).merge(
chembl_druglike_df, how='left', on='chembl_target_id' ).merge(
chembl_moa_df, how='left', on='chembl_target_id')[[
'query', 'chembl_target_id','pref_name', 'organism', 'length', 'evalue', 'identity', 'bitscore', 'moa_count', 'drug_count', 'druglike_count'
]].fillna(0)
rs_merge_df
query | chembl_target_id | pref_name | organism | length | evalue | identity | bitscore | moa_count | drug_count | druglike_count | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | Q96P68_OXGR1_HUMAN | CHEMBL2150840 | 2-oxoglutarate receptor 1 | Homo sapiens | 337 | 0.000000e+00 | 100.00 | 688.0 | 0.0 | 0.0 | 0.0 |
1 | Q96P68_OXGR1_HUMAN | CHEMBL2325 | G protein-coupled receptor 80 | Rattus norvegicus | 337 | 0.000000e+00 | 85.76 | 579.0 | 0.0 | 0.0 | 37.0 |
2 | Q96P68_OXGR1_HUMAN | CHEMBL4315 | Purinergic receptor P2Y1 | Homo sapiens | 300 | 9.000000e-67 | 36.00 | 216.0 | 0.0 | 0.0 | 30.0 |
3 | Q96P68_OXGR1_HUMAN | CHEMBL5720 | P2Y purinoceptor 1 | Meleagris gallopavo | 300 | 1.000000e-66 | 35.33 | 215.0 | 0.0 | 0.0 | 0.0 |
4 | Q96P68_OXGR1_HUMAN | CHEMBL2497 | Purinergic receptor P2Y1 | Rattus norvegicus | 300 | 3.000000e-66 | 36.00 | 215.0 | 0.0 | 0.0 | 0.0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
775 | P10144_GRAB_HUMAN | CHEMBL1075308 | Thrombin | Mus musculus | 268 | 1.000000e-12 | 27.61 | 65.9 | 0.0 | 0.0 | 0.0 |
776 | P10144_GRAB_HUMAN | CHEMBL3078 | Thrombin | Rattus norvegicus | 239 | 1.000000e-11 | 28.03 | 62.8 | 0.0 | 0.0 | 8.0 |
777 | P10144_GRAB_HUMAN | CHEMBL2040703 | Thrombin | Oryctolagus cuniculus | 231 | 1.000000e-11 | 27.71 | 61.2 | 0.0 | 0.0 | 0.0 |
778 | P10144_GRAB_HUMAN | CHEMBL5731 | Complement factor B | Homo sapiens | 239 | 1.000000e-09 | 25.94 | 57.0 | 0.0 | 0.0 | 0.0 |
779 | P10144_GRAB_HUMAN | CHEMBL3929 | Coagulation factor X | Canis lupus familiaris | 160 | 4.000000e-07 | 24.38 | 47.4 | 0.0 | 0.0 | 0.0 |
780 rows × 11 columns
So lets create a really simple score based on the information we have to predict in a target is likely to druggable:
def druggability_score(query_sequence_length, align_length, identity, moa_count, drug_count, druglike_count):
align_length = float(align_length)
identity = float(identity)
moa_score = (align_length/query_sequence_length) * (identity/100) * (1 if (moa_count > 0) else 0)
drug_score = (align_length/query_sequence_length) * (identity/100) * (1 if (drug_count > 0) else 0) * 0.8
druglike_score = (align_length/query_sequence_length) * (identity/100) * (1 if (druglike_count > 0) else 0) * 0.5
total_score = round((moa_score + drug_score + druglike_score),2)
return (1 if (total_score > 1) else total_score)
The cryptic 0.8 and 0.5 values are there to down weight the contribution of the drug_count and druglike_count values (I said it was simple). It is also assumed the mechanism of action information is a fact, i.e. it is known that this target binds 1 or more drugs, so no down weighting is applied.
So lets add this new druggable score column to the results table:
rs_merge_df['druggability_score'] = rs_merge_df.apply(lambda row: druggability_score(query_sequence_details[row['query']]['seq_length'],
row['length'],
row['identity'],
row['moa_count'],
row['drug_count'],
row['druglike_count']),axis=1)
rs_merge_df
query | chembl_target_id | pref_name | organism | length | evalue | identity | bitscore | moa_count | drug_count | druglike_count | druggability_score | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Q96P68_OXGR1_HUMAN | CHEMBL2150840 | 2-oxoglutarate receptor 1 | Homo sapiens | 337 | 0.000000e+00 | 100.00 | 688.0 | 0.0 | 0.0 | 0.0 | 0.00 |
1 | Q96P68_OXGR1_HUMAN | CHEMBL2325 | G protein-coupled receptor 80 | Rattus norvegicus | 337 | 0.000000e+00 | 85.76 | 579.0 | 0.0 | 0.0 | 37.0 | 0.43 |
2 | Q96P68_OXGR1_HUMAN | CHEMBL4315 | Purinergic receptor P2Y1 | Homo sapiens | 300 | 9.000000e-67 | 36.00 | 216.0 | 0.0 | 0.0 | 30.0 | 0.16 |
3 | Q96P68_OXGR1_HUMAN | CHEMBL5720 | P2Y purinoceptor 1 | Meleagris gallopavo | 300 | 1.000000e-66 | 35.33 | 215.0 | 0.0 | 0.0 | 0.0 | 0.00 |
4 | Q96P68_OXGR1_HUMAN | CHEMBL2497 | Purinergic receptor P2Y1 | Rattus norvegicus | 300 | 3.000000e-66 | 36.00 | 215.0 | 0.0 | 0.0 | 0.0 | 0.00 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
775 | P10144_GRAB_HUMAN | CHEMBL1075308 | Thrombin | Mus musculus | 268 | 1.000000e-12 | 27.61 | 65.9 | 0.0 | 0.0 | 0.0 | 0.00 |
776 | P10144_GRAB_HUMAN | CHEMBL3078 | Thrombin | Rattus norvegicus | 239 | 1.000000e-11 | 28.03 | 62.8 | 0.0 | 0.0 | 8.0 | 0.14 |
777 | P10144_GRAB_HUMAN | CHEMBL2040703 | Thrombin | Oryctolagus cuniculus | 231 | 1.000000e-11 | 27.71 | 61.2 | 0.0 | 0.0 | 0.0 | 0.00 |
778 | P10144_GRAB_HUMAN | CHEMBL5731 | Complement factor B | Homo sapiens | 239 | 1.000000e-09 | 25.94 | 57.0 | 0.0 | 0.0 | 0.0 | 0.00 |
779 | P10144_GRAB_HUMAN | CHEMBL3929 | Coagulation factor X | Canis lupus familiaris | 160 | 4.000000e-07 | 24.38 | 47.4 | 0.0 | 0.0 | 0.0 | 0.00 |
780 rows × 12 columns
Great, we have an extra column in the data frame, which contains the Druggability Score. As sequence identity contributes significantly to the score, we could just take the max value for the druggability_score column and say this is its Druggability Score for this particular protein. So the predicted druggability_score for the first query sequence defined in query_sequence is:
grouped_df = rs_merge_df.groupby('query')['druggability_score'].max().reset_index()
print grouped_df.ix[0]['query']+":",grouped_df.ix[0]['druggability_score']
P06804_TNFA_MOUSE: 1.0
We can wrap up this tutorial by presenting the Druggability Score for all sequences defined in query_sequence in a friendly pandas data frame:
# Show all results in final table
pandas.set_option('display.max_rows', 500)
druggability_results_df = DataFrame({'query':query_sequence_order}).merge(
grouped_df,
how='left',
on='query').fillna('No BLAST hits')
druggability_results_df
query | druggability_score | |
---|---|---|
0 | Q96P68_OXGR1_HUMAN | 0.66 |
1 | Q86XF0_DHFRL1_HUMAN | 1.00 |
2 | Q9UKX5_ITGA11_HUMAN | 0.26 |
3 | P06804_TNFA_MOUSE | 1.00 |
4 | P48050_KCNJ4_HUMAN | 0.60 |
5 | Q80Z70_SE1L1_RAT | 0.05 |
6 | P33277_GAP1_SCHPO | 0.00 |
7 | Q96PD4_IL17F_HUMAN | 0.37 |
8 | P10144_GRAB_HUMAN | 0.75 |