This is my attempt derive fundamental genomic tracks for the oyster genome that can be easily visualized.
Contents
Will use the full genome as scaffold (should be in cnidaria)
Derived from
ftp://climb.genomics.cn/pub/10.5524/100001_101000/100030/gene_v9/
gene:
gene_v9/oyster.v9.glean.final.rename.gff.gz gene feature of pacific oyster in gff format
gene_v9/oyster.v9.glean.final.rename.gff.cds.gz coding sequence of pacific oyster in fasta format
gene_v9/oyster.v9.glean.final.rename.gff.pep.gz protein sequence of pacific oyster in fasta format
!head /Volumes/Bay4\ scratch/oyster.v9.glean.final.rename.gff
C16582 GLEAN mRNA 35 385 0.555898 - . ID=CGI_10000001; C16582 GLEAN CDS 35 385 . - 0 Parent=CGI_10000001; C17212 GLEAN mRNA 31 363 0.999572 + . ID=CGI_10000002; C17212 GLEAN CDS 31 363 . + 0 Parent=CGI_10000002; C17316 GLEAN mRNA 30 257 0.555898 + . ID=CGI_10000003; C17316 GLEAN CDS 30 257 . + 0 Parent=CGI_10000003; C17476 GLEAN mRNA 34 257 0.998947 - . ID=CGI_10000004; C17476 GLEAN CDS 104 257 . - 0 Parent=CGI_10000004; C17476 GLEAN CDS 34 74 . - 2 Parent=CGI_10000004; C17998 GLEAN mRNA 196 387 1 - . ID=CGI_10000005;
!wc /Volumes/Bay4\ scratch/oyster.v9.glean.final.rename.gff
224718 2022462 14179523 /Volumes/Bay4 scratch/oyster.v9.glean.final.rename.gff
#not quite a GFF!
!head /Volumes/Bay4\ scratch/oyster.v9.glean.final.rename.gff.pep
>CGI_10000780 MERYGARRLRMTIWETTRNGQLQTTHLGSILFILVMMYACVFCRVSLKNG EEITQLREKGCNTVNRTSQTRNNTIVTTPGQKVHQKCRRDYINANSIKNY MREKDVSITEPTRDLRSSTPDFEFQKNCLFCGYFAKFSECKRGIDVFPVR TTDFSNTLRNICKKRNDEWSEIVLRRLNIAPSDLHAADAIYHQTCSVNFR TGQQIPVSKQANKMVEKGIKTKHADADADVLIALTAIESAKTKPTVLLGE DTDLLVLLLHHADVTSNSLIFKSGNVSKVNTHIKIWDILKTKVLLGEELC TLLPLIHAISGCDTTSRMFGVSKAATLKKFAEHDFLKTRQLLCNANAKDD VISAGENIISSLYNGAPYEELNVLRYRKFAARVLTNKTCVQIHTLPPTSN AASFHSQRAYLQMKMWMNEDNLNPCEWGWKVANGNLVPVKCTVKLPLNC
#not quite a GFF!
!head /Volumes/Bay4\ scratch/oyster.v9.glean.final.rename.gff.CDS
>CGI_10000780 ATGGAAAGATATGGCGCCCGTAGATTAAGAATGACGATATGGGAGACAAC TCGTAATGGTCAACTGCAGACGACGCATCTAGGTTCCATCCTTTTCATTC TGGTAATGATGTATGCTTGTGTTTTTTGTCGGGTGTCTCTAAAAAATGGT GAAGAAATAACACAACTAAGAGAAAAAGGATGTAACACAGTTAATAGGAC CAGCCAAACCAGAAATAATACAATCGTCACAACTCCAGGACAAAAAGTTC ATCAGAAATGTCGACGTGATTACATTAATGCTAACTCAATCAAGAATTAC ATGCGAGAAAAGGATGTATCGATAACCGAGCCAACTCGTGACTTACGATC TTCTACTCCTGATTTTGAGTTCCAGAAGAACTGTTTATTTTGTGGATATT TTGCAAAATTTTCAGAATGCAAAAGGGGAATCGACGTGTTTCCTGTCAGG
Specifically, (/Volumes/Bay4\ scratch/oyster.v9.glean.final.rename.gff) was parsed to Exon (CDS) and full gene (mRNA).
!head /Volumes/web/cnidarian/oyster.v9.glean.final.rename.CDS.gff
C16582 GLEAN CDS 35 385 . - 0 Parent=CGI_10000001; C17212 GLEAN CDS 31 363 . + 0 Parent=CGI_10000002; C17316 GLEAN CDS 30 257 . + 0 Parent=CGI_10000003; C17476 GLEAN CDS 104 257 . - 0 Parent=CGI_10000004; C17476 GLEAN CDS 34 74 . - 2 Parent=CGI_10000004; C17998 GLEAN CDS 196 387 . - 0 Parent=CGI_10000005; C18346 GLEAN CDS 174 551 . + 0 Parent=CGI_10000009; C18428 GLEAN CDS 286 546 . - 0 Parent=CGI_10000010; C18964 GLEAN CDS 203 658 . - 0 Parent=CGI_10000011; C18980 GLEAN CDS 30 674 . + 0 Parent=CGI_10000012;
!head /Volumes/web/cnidarian/oyster.v9.glean.final.rename.mRNA.gff
C16582 GLEAN mRNA 35 385 0.555898 - . ID=CGI_10000001; C17212 GLEAN mRNA 31 363 0.999572 + . ID=CGI_10000002; C17316 GLEAN mRNA 30 257 0.555898 + . ID=CGI_10000003; C17476 GLEAN mRNA 34 257 0.998947 - . ID=CGI_10000004; C17998 GLEAN mRNA 196 387 1 - . ID=CGI_10000005; C18346 GLEAN mRNA 174 551 1 + . ID=CGI_10000009; C18428 GLEAN mRNA 286 546 0.555898 - . ID=CGI_10000010; C18964 GLEAN mRNA 203 658 0.999572 - . ID=CGI_10000011; C18980 GLEAN mRNA 30 674 0.555898 + . ID=CGI_10000012; C19100 GLEAN mRNA 160 681 0.999955 - . ID=CGI_10000013;
!wc /Volumes/web/cnidarian/oyster.v9.glean.final.rename.CDS.gff
196691 1770219 12359791 /Volumes/web/cnidarian/oyster.v9.glean.final.rename.CDS.gff
!wc /Volumes/web/cnidarian/oyster.v9.glean.final.rename.mRNA.gff
28027 252243 1819732 /Volumes/web/cnidarian/oyster.v9.glean.final.rename.mRNA.gff
#check to make sure files add up.
sum(196691 + 28027)
224718
cp /Volumes/web/cnidarian/oyster.v9.glean.final.rename.CDS.gff /Volumes/web/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_exon.gff
!head /Volumes/web/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_exon.gff
C16582 GLEAN CDS 35 385 . - 0 Parent=CGI_10000001; C17212 GLEAN CDS 31 363 . + 0 Parent=CGI_10000002; C17316 GLEAN CDS 30 257 . + 0 Parent=CGI_10000003; C17476 GLEAN CDS 104 257 . - 0 Parent=CGI_10000004; C17476 GLEAN CDS 34 74 . - 2 Parent=CGI_10000004; C17998 GLEAN CDS 196 387 . - 0 Parent=CGI_10000005; C18346 GLEAN CDS 174 551 . + 0 Parent=CGI_10000009; C18428 GLEAN CDS 286 546 . - 0 Parent=CGI_10000010; C18964 GLEAN CDS 203 658 . - 0 Parent=CGI_10000011; C18980 GLEAN CDS 30 674 . + 0 Parent=CGI_10000012;
cp /Volumes/web/cnidarian/oyster.v9.glean.final.rename.mRNA.gff /Volumes/web/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_gene.gff
!head /Volumes/web/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_gene.gff
C16582 GLEAN mRNA 35 385 0.555898 - . ID=CGI_10000001; C17212 GLEAN mRNA 31 363 0.999572 + . ID=CGI_10000002; C17316 GLEAN mRNA 30 257 0.555898 + . ID=CGI_10000003; C17476 GLEAN mRNA 34 257 0.998947 - . ID=CGI_10000004; C17998 GLEAN mRNA 196 387 1 - . ID=CGI_10000005; C18346 GLEAN mRNA 174 551 1 + . ID=CGI_10000009; C18428 GLEAN mRNA 286 546 0.555898 - . ID=CGI_10000010; C18964 GLEAN mRNA 203 658 0.999572 - . ID=CGI_10000011; C18980 GLEAN mRNA 30 674 0.555898 + . ID=CGI_10000012; C19100 GLEAN mRNA 160 681 0.999955 - . ID=CGI_10000013;
!wc /Volumes/web/cnidarian/TJGR_oyster_v9_CG.gff
10035701 99934100 977314599 /Volumes/web/cnidarian/TJGR_oyster_v9_CG.gff
!fgrep -c "fuzznuc nucleotide_motif" /Volumes/web/cnidarian/TJGR_oyster_v9_CG.gff
9978551
!head /Volumes/web/cnidarian/TJGR_oyster_v9_CG.gff
##gff-version 3 ##sequence-region scaffold360 1 280 #!Date 2013-04-23 #!Type DNA #!Source-version EMBOSS 6.5.7.0 scaffold360 fuzznuc nucleotide_motif 60 61 2 + . ID=scaffold360.1;note=*pat pattern:CG scaffold360 fuzznuc nucleotide_motif 96 97 2 + . ID=scaffold360.2;note=*pat pattern:CG scaffold360 fuzznuc nucleotide_motif 120 121 2 + . ID=scaffold360.3;note=*pat pattern:CG scaffold360 fuzznuc nucleotide_motif 187 188 2 + . ID=scaffold360.4;note=*pat pattern:CG ##gff-version 3
cp /Volumes/web/cnidarian/TJGR_oyster_v9_CG.gff /Volumes/web/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_CG.gff
!sortbed -i /Volumes/web/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_CG.gff > /Volumes/web/cnidarian/TJGR_oyster_v9_CG_sorted.gff
!wc /Volumes/web/cnidarian/TJGR_oyster_v9_CG_sorted.gff
9978551 99785510 976050492 /Volumes/web/cnidarian/TJGR_oyster_v9_CG_sorted.gff
!head /Volumes/web/cnidarian/qDOD_scaffold_length.csv
!tr ',' "\t" </Volumes/web/cnidarian/qDOD_scaffold_length.csv> /Volumes/web/cnidarian/qDOD_scaffold_length.txt
!head /Volumes/web/cnidarian/qDOD_scaffold_length.txt
!flankbed -s -l 1000 -r 0 -i /Volumes/web/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_gene.gff -g /Volumes/web/cnidarian/qDOD_scaffold_length.txt > /Volumes/web/cnidarian/TJGR_Promoter_1k5p.gff
!head /Volumes/web/cnidarian/TJGR_Promoter_1k5p.gff
C16582 GLEAN mRNA 386 395 0.555898 - . ID=CGI_10000001; C17212 GLEAN mRNA 1 30 0.999572 + . ID=CGI_10000002; C17316 GLEAN mRNA 1 29 0.555898 + . ID=CGI_10000003; C17476 GLEAN mRNA 258 491 0.998947 - . ID=CGI_10000004; C17998 GLEAN mRNA 388 559 1 - . ID=CGI_10000005; C18346 GLEAN mRNA 1 173 1 + . ID=CGI_10000009; C18428 GLEAN mRNA 547 611 0.555898 - . ID=CGI_10000010; C18964 GLEAN mRNA 659 714 0.999572 - . ID=CGI_10000011; C18980 GLEAN mRNA 1 29 0.555898 + . ID=CGI_10000012; C19100 GLEAN mRNA 682 743 0.999955 - . ID=CGI_10000013;
!sed 's/mRNA/promoter/g' </Volumes/web/cnidarian/TJGR_Promoter_1k5p.gff> /Volumes/web/cnidarian/TJGR_Promoter_1k5p_b.gff
!head /Volumes/web/cnidarian/TJGR_Promoter_1k5p_b.gff
C16582 GLEAN promoter 386 395 0.555898 - . ID=CGI_10000001; C17212 GLEAN promoter 1 30 0.999572 + . ID=CGI_10000002; C17316 GLEAN promoter 1 29 0.555898 + . ID=CGI_10000003; C17476 GLEAN promoter 258 491 0.998947 - . ID=CGI_10000004; C17998 GLEAN promoter 388 559 1 - . ID=CGI_10000005; C18346 GLEAN promoter 1 173 1 + . ID=CGI_10000009; C18428 GLEAN promoter 547 611 0.555898 - . ID=CGI_10000010; C18964 GLEAN promoter 659 714 0.999572 - . ID=CGI_10000011; C18980 GLEAN promoter 1 29 0.555898 + . ID=CGI_10000012; C19100 GLEAN promoter 682 743 0.999955 - . ID=CGI_10000013;
cp /Volumes/web/cnidarian/TJGR_Promoter_1k5p_b.gff /Volumes/web/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_1k5p_gene_promoter.gff
#clean up in SQLShare
!head /Volumes/web/cnidarian/Cgigas_v9_1k5p_gene_promoter_v2.gff
!tail -n +2 /Volumes/web/cnidarian/Cgigas_v9_1k5p_gene_promoter_v2.gff > /Volumes/web/cnidarian/Cgigas_v9_1k5p_gene_promoter_v2b.gff
!tr ',' "\t" </Volumes/web/cnidarian/Cgigas_v9_1k5p_gene_promoter_v2b.gff> /Volumes/web/cnidarian/Cgigas_v9_1k5p_gene_promoter_v2c.gff
!head /Volumes/web/cnidarian/Cgigas_v9_1k5p_gene_promoter_v2c.gff
!cp /Volumes/web/cnidarian/Cgigas_v9_1k5p_gene_promoter_v2c.gff /Volumes/web/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_1k5p_gene_promoter.gff
!sed 's/Parent=/#Parent=/g' </Volumes/web/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_exon.gff> /Volumes/web/cnidarian/TJGR_Cgigas_v9_exon_b.gff
!sed 's/ID=/#ID=/g' </Volumes/web/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_gene.gff> /Volumes/web/cnidarian/TJGR_Cgigas_v9_gene_b.gff
!head /Volumes/web/cnidarian/TJGR_Cgigas_v9_exon_b.gff
C16582 GLEAN CDS 35 385 . - 0 #Parent=CGI_10000001; C17212 GLEAN CDS 31 363 . + 0 #Parent=CGI_10000002; C17316 GLEAN CDS 30 257 . + 0 #Parent=CGI_10000003; C17476 GLEAN CDS 104 257 . - 0 #Parent=CGI_10000004; C17476 GLEAN CDS 34 74 . - 2 #Parent=CGI_10000004; C17998 GLEAN CDS 196 387 . - 0 #Parent=CGI_10000005; C18346 GLEAN CDS 174 551 . + 0 #Parent=CGI_10000009; C18428 GLEAN CDS 286 546 . - 0 #Parent=CGI_10000010; C18964 GLEAN CDS 203 658 . - 0 #Parent=CGI_10000011; C18980 GLEAN CDS 30 674 . + 0 #Parent=CGI_10000012;
!subtractBed -a /Volumes/web/cnidarian/TJGR_Cgigas_v9_gene_b.gff -b /Volumes/web/cnidarian/TJGR_Cgigas_v9_exon_b.gff > /Volumes/web/cnidarian/Cgigas_v9_intron.gff
!head /Volumes/web/cnidarian/Cgigas_v9_intron.gff
C17476 GLEAN mRNA 75 103 0.998947 - . #ID=CGI_10000004; C19392 GLEAN mRNA 184 451 1 + . #ID=CGI_10000015; C20262 GLEAN mRNA 539 641 1 - . #ID=CGI_10000025; C20262 GLEAN mRNA 650 871 1 - . #ID=CGI_10000025; C20334 GLEAN mRNA 524 867 1 - . #ID=CGI_10000028; C20412 GLEAN mRNA 215 409 1 - . #ID=CGI_10000029; C20412 GLEAN mRNA 464 705 1 - . #ID=CGI_10000029; C20462 GLEAN mRNA 50 271 1 + . #ID=CGI_10000030; C20462 GLEAN mRNA 360 481 1 + . #ID=CGI_10000030; C20462 GLEAN mRNA 577 822 1 + . #ID=CGI_10000030;
!sed 's/#ID=/Parent=/g' </Volumes/web/cnidarian/Cgigas_v9_intron.gff> /Volumes/web/cnidarian/Cgigas_v9_intron_b.gff
!sed 's/GLEAN/subtractBed/g' </Volumes/web/cnidarian/Cgigas_v9_intron_b.gff> /Volumes/web/cnidarian/Cgigas_v9_intron_c.gff
!sed 's/mRNA/_intron/g' </Volumes/web/cnidarian/Cgigas_v9_intron_c.gff> /Volumes/web/cnidarian/Cgigas_v9_intron_d.gff
!head /Volumes/web/cnidarian/Cgigas_v9_intron_d.gff
C17476 subtractBed _intron 75 103 0.998947 - . Parent=CGI_10000004; C19392 subtractBed _intron 184 451 1 + . Parent=CGI_10000015; C20262 subtractBed _intron 539 641 1 - . Parent=CGI_10000025; C20262 subtractBed _intron 650 871 1 - . Parent=CGI_10000025; C20334 subtractBed _intron 524 867 1 - . Parent=CGI_10000028; C20412 subtractBed _intron 215 409 1 - . Parent=CGI_10000029; C20412 subtractBed _intron 464 705 1 - . Parent=CGI_10000029; C20462 subtractBed _intron 50 271 1 + . Parent=CGI_10000030; C20462 subtractBed _intron 360 481 1 + . Parent=CGI_10000030; C20462 subtractBed _intron 577 822 1 + . Parent=CGI_10000030;
http://eagle.fish.washington.edu/cnidarian/Cgigas_v9_intron_d.gff
cp /Volumes/web/cnidarian/Cgigas_v9_intron_d.gff /Volumes/web/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_intron.gff
!head /Volumes/web/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_intron.gff
C17476 subtractBed _intron 75 103 0.998947 - . Parent=CGI_10000004; C19392 subtractBed _intron 184 451 1 + . Parent=CGI_10000015; C20262 subtractBed _intron 539 641 1 - . Parent=CGI_10000025; C20262 subtractBed _intron 650 871 1 - . Parent=CGI_10000025; C20334 subtractBed _intron 524 867 1 - . Parent=CGI_10000028; C20412 subtractBed _intron 215 409 1 - . Parent=CGI_10000029; C20412 subtractBed _intron 464 705 1 - . Parent=CGI_10000029; C20462 subtractBed _intron 50 271 1 + . Parent=CGI_10000030; C20462 subtractBed _intron 360 481 1 + . Parent=CGI_10000030; C20462 subtractBed _intron 577 822 1 + . Parent=CGI_10000030;
#will clean up in SQLSHARE
!head /Volumes/web/cnidarian/Cgigas_v9_intron_v2d.gff
#!tail -n +2 /Volumes/web/cnidarian/Cgigas_v9_intron_v2b.gff > /Volumes/web/cnidarian/Cgigas_v9_intron_v2c.gff
!sed 's/intron/intrn/g' </Volumes/web/cnidarian/Cgigas_v9_intron_v2c.gff> /Volumes/web/cnidarian/Cgigas_v9_intron_v2d.gff
cp /Volumes/web/cnidarian/Cgigas_v9_intron_v2d.gff /Volumes/web/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_intron.gff
!wc /Volumes/web/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_intron.gff
176049 1584441 12654996 /Volumes/web/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_intron.gff
!wc /Volumes/web/cnidarian/Cgigas_v9_intron_d.gff
176049 1584441 13834641 /Volumes/web/cnidarian/Cgigas_v9_intron_d.gff
!complementBed -i /Volumes/web/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_exon.gff -g /Volumes/web/cnidarian/qDOD_scaffold_length.txt > /Volumes/web/cnidarian/TJGR_complement_exon.bed
!head /Volumes/web/cnidarian/TJGR_complement_exon.bed
C1 0 100 C10003 0 156 C10005 0 156 C10007 0 156 C10009 0 156 C1001 0 103 C10011 0 156 C10013 0 157 C10015 0 157 C10021 0 157
!intersectBed -a /Volumes/web/cnidarian/TJGR_complement_exon.bed -b /Volumes/web/cnidarian/TJGR_Cgigas_v9_gene_b.gff > /Volumes/web/cnidarian/TJGR_intron2.bed
!head /Volumes/web/cnidarian/TJGR_intron2.bed
C17476 74 103 C19392 183 451 C20262 538 641 C20262 649 871 C20334 523 867 C20412 214 409 C20412 463 705 C20462 49 271 C20462 359 481 C20462 576 822
Generating TE canonical GFF from RepeatProteinMask oyster v9
Updated Today
The starting file for this is the output of RepeatProteinMask performed by SR
(look towards the bottom of this entry): https://www.evernote.com/shard/s10/sh/7dea995c-17ac-4bcf-bc38-963220e9e7c9/b28dacbbdbfe123960b88e42fa45a34a
The txt file (http://eagle.fish.washington.edu/cnidarian/qDOD_RepeatProteinMask_v9.txt) was uploaded into SQLshare
Then a gff was derived using the following query:
SELECT
SeqID as seqname,
Method as source,
Type as feature,
[Begin] as [start],
[End] as [end],
Score as score,
sym as strand,
'.' as frame,
'.' as attribute
FROM [mgavery@washington.edu].[qDOD_RepeatProteinMask_v9.txt]
The derived SQLdataset is shared publicly here: https://sqlshare.escience.washington.edu/sqlshare#s=query/mgavery%40washington.edu/qDOD_RepeatProteinMask_v9_asgff
The file was downloaded and saved as a .gff and saved here: http://eagle.fish.washington.edu/bivalvia/wholegenomefiles_MBDbsSeq_gill/gffs/qDOD_RepeatProteinMask_v9_asgff.gff
!head /Volumes/web/bivalvia/wholegenomefiles_MBDbsSeq_gill/gffs/qDOD_RepeatProteinMask_v9_asgff.gff
C21242 TRF Tandem_Repeat 38 100 72 + . . C21306 TRF Tandem_Repeat 35 143 112 + . . C21306 TRF Tandem_Repeat 574 947 208 + . . C21306 TRF Tandem_Repeat 574 901 313 + . . C21372 TRF Tandem_Repeat 643 671 58 + . . C22542 TRF Tandem_Repeat 1727 1774 96 + . . C22728 TRF Tandem_Repeat 426 491 105 + . . C23428 TRF Tandem_Repeat 130 415 202 + . . C23796 TRF Tandem_Repeat 547 608 97 + . . C24440 TRF Tandem_Repeat 1059 1089 62 + . .
cp /Volumes/web/bivalvia/wholegenomefiles_MBDbsSeq_gill/gffs/qDOD_RepeatProteinMask_v9_asgff.gff /Volumes/web/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_TE.gff
!cat /Volumes/web/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_1k5p_gene_promoter.gff /Volumes/web/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_TE.gff /Volumes/web/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_gene.gff > /Volumes/web/cnidarian/TJGR_gene_TE_promoter.gff
!head /Volumes/web/cnidarian/TJGR_gene_TE_promoter.gff
C16582 GLEAN promoter 386 395 0.555898 - . ID=CGI_10000001; C17212 GLEAN promoter 1 30 0.999572 + . ID=CGI_10000002; C17316 GLEAN promoter 1 29 0.555898 + . ID=CGI_10000003; C17476 GLEAN promoter 258 491 0.998947 - . ID=CGI_10000004; C17998 GLEAN promoter 388 559 1 - . ID=CGI_10000005; C18346 GLEAN promoter 1 173 1 + . ID=CGI_10000009; C18428 GLEAN promoter 547 611 0.555898 - . ID=CGI_10000010; C18964 GLEAN promoter 659 714 0.999572 - . ID=CGI_10000011; C18980 GLEAN promoter 1 29 0.555898 + . ID=CGI_10000012; C19100 GLEAN promoter 682 743 0.999955 - . ID=CGI_10000013;
!sortBed -i /Volumes/web/cnidarian/TJGR_gene_TE_promoter.gff > /Volumes/web/cnidarian/TJGR_gene_TE_promoter_s.gff
!head /Volumes/web/cnidarian/TJGR_gene_TE_promoter_s.gff
C10153 WUBlastX LTR_Pao 3 158 109 - . . C10177 WUBlastX LINE_L2 2 157 97 - . . C10191 WUBlastX LTR_Copia 2 157 174 - . . C10245 WUBlastX LINE_Penelope 5 154 59 - . . C10291 WUBlastX LTR_Copia 2 160 85 - . . C10475 WUBlastX LINE_L1-Tx1 3 149 50 - . . C10673 WUBlastX LTR_DIRS 37 162 59 + . . C10675 WUBlastX LINE_L2 1 165 132 + . . C10805 WUBlastX LINE_I 1 168 100 - . . C10973 WUBlastX LTR_Gypsy 3 167 186 + . .
!mergebed -i /Volumes/web/cnidarian/TJGR_gene_TE_promoter_s.gff > /Volumes/web/cnidarian/TJGR_gene_TE_promoter_s_unique.bed
!head /Volumes/web/cnidarian/TJGR_gene_TE_promoter_s_unique.bed
C10153 2 158 C10177 1 157 C10191 1 157 C10245 4 154 C10291 1 160 C10475 2 149 C10673 36 162 C10675 0 165 C10805 0 168 C10973 2 167
!complementBed -i /Volumes/web/cnidarian/TJGR_gene_TE_promoter_s_unique.bed -g /Volumes/web/cnidarian/qDOD_scaffold_length.txt > /Volumes/web/cnidarian/TJGR_gene_TE_promoter_s_unique_comp.bed
!head /Volumes/web/cnidarian/TJGR_gene_TE_promoter_s_unique_comp.bed
C1 0 100 C10003 0 156 C10005 0 156 C10007 0 156 C10009 0 156 C1001 0 103 C10011 0 156 C10013 0 157 C10015 0 157 C10021 0 157
http://eagle.fish.washington.edu/cnidarian/TJGR_gene_TE_promoter_s_unique_comp.bed
cp /Volumes/web/cnidarian/TJGR_gene_TE_promoter_s_unique_comp.bed /Volumes/web/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_COMP_gene_prom_TE.bed
!head /Volumes/web/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_COMP_gene_prom_TE.bed
C1 0 100 C10003 0 156 C10005 0 156 C10007 0 156 C10009 0 156 C1001 0 103 C10011 0 156 C10013 0 157 C10015 0 157 C10021 0 157
TEST - Verification everything is covered
!cat /Volumes/web/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_exon.gff /Volumes/web/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_intron.gff /Volumes/web/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_1k5p_gene_promoter.gff /Volumes/web/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_TE.gff > /Volumes/web/cnidarian/TJGR_CanTest
!head /Volumes/web/cnidarian/TJGR_CanTest
C16582 GLEAN CDS 35 385 . - 0 Parent=CGI_10000001; C17212 GLEAN CDS 31 363 . + 0 Parent=CGI_10000002; C17316 GLEAN CDS 30 257 . + 0 Parent=CGI_10000003; C17476 GLEAN CDS 104 257 . - 0 Parent=CGI_10000004; C17476 GLEAN CDS 34 74 . - 2 Parent=CGI_10000004; C17998 GLEAN CDS 196 387 . - 0 Parent=CGI_10000005; C18346 GLEAN CDS 174 551 . + 0 Parent=CGI_10000009; C18428 GLEAN CDS 286 546 . - 0 Parent=CGI_10000010; C18964 GLEAN CDS 203 658 . - 0 Parent=CGI_10000011; C18980 GLEAN CDS 30 674 . + 0 Parent=CGI_10000012;
!sortBed -i /Volumes/web/cnidarian/TJGR_CanTest > /Volumes/web/cnidarian/TJGR_CanTest_s
!mergebed -i /Volumes/web/cnidarian/TJGR_CanTest_s > /Volumes/web/cnidarian/TJGR_CanTest_s_unique.bed
!head /Volumes/web/cnidarian/TJGR_CanTest_s_unique.bed
C10153 2 158 C10177 1 157 C10191 1 157 C10245 4 154 C10291 1 160 C10475 2 149 C10673 36 162 C10675 0 165 C10805 0 168 C10973 2 167
http://eagle.fish.washington.edu/cnidarian/TJGR_CanTest_s_unique.bed
!intersectBed -a /Volumes/web/cnidarian/TJGR_CanTest_s_unique.bed -b /Volumes/web/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_COMP_gene_prom_TE.bed > /Volumes/web/cnidarian/TJGR_CanTest_s_unique_inter_COMP.bed
!wc /Volumes/web/cnidarian/TJGR_CanTest_s_unique_inter_COMP.bed
0 0 0 /Volumes/web/cnidarian/TJGR_CanTest_s_unique_inter_COMP.bed
Gene
http://eagle.fish.washington.edu/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_gene.gff
Exons
http://eagle.fish.washington.edu/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_exon.gff
Intron
http://eagle.fish.washington.edu/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_intron.gff
Promoter (= 1kbp 5' of genes)
http://eagle.fish.washington.edu/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_1k5p_gene_promoter.gff
Transposable Elements
http://eagle.fish.washington.edu/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_TE.gff
Complement to Gene, Promoter, and TE tracks
http://eagle.fish.washington.edu/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_COMP_gene_prom_TE.bed
All CGs
http://eagle.fish.washington.edu/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_CG.gff
Import all tracks
http://eagle.fish.washington.edu/cnidarian/igv_session_073013.xml
previews
!head /Volumes/web/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_gene.gff
C16582 GLEAN mRNA 35 385 0.555898 - . ID=CGI_10000001; C17212 GLEAN mRNA 31 363 0.999572 + . ID=CGI_10000002; C17316 GLEAN mRNA 30 257 0.555898 + . ID=CGI_10000003; C17476 GLEAN mRNA 34 257 0.998947 - . ID=CGI_10000004; C17998 GLEAN mRNA 196 387 1 - . ID=CGI_10000005; C18346 GLEAN mRNA 174 551 1 + . ID=CGI_10000009; C18428 GLEAN mRNA 286 546 0.555898 - . ID=CGI_10000010; C18964 GLEAN mRNA 203 658 0.999572 - . ID=CGI_10000011; C18980 GLEAN mRNA 30 674 0.555898 + . ID=CGI_10000012; C19100 GLEAN mRNA 160 681 0.999955 - . ID=CGI_10000013;
!head /Volumes/web/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_exon.gff
C16582 GLEAN CDS 35 385 . - 0 Parent=CGI_10000001; C17212 GLEAN CDS 31 363 . + 0 Parent=CGI_10000002; C17316 GLEAN CDS 30 257 . + 0 Parent=CGI_10000003; C17476 GLEAN CDS 104 257 . - 0 Parent=CGI_10000004; C17476 GLEAN CDS 34 74 . - 2 Parent=CGI_10000004; C17998 GLEAN CDS 196 387 . - 0 Parent=CGI_10000005; C18346 GLEAN CDS 174 551 . + 0 Parent=CGI_10000009; C18428 GLEAN CDS 286 546 . - 0 Parent=CGI_10000010; C18964 GLEAN CDS 203 658 . - 0 Parent=CGI_10000011; C18980 GLEAN CDS 30 674 . + 0 Parent=CGI_10000012;
!head /Volumes/web/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_intron.gff
!head /Volumes/web/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_1k5p_gene_promoter.gff
!head /Volumes/web/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_TE.gff
C21242 TRF Tandem_Repeat 38 100 72 + . . C21306 TRF Tandem_Repeat 35 143 112 + . . C21306 TRF Tandem_Repeat 574 947 208 + . . C21306 TRF Tandem_Repeat 574 901 313 + . . C21372 TRF Tandem_Repeat 643 671 58 + . . C22542 TRF Tandem_Repeat 1727 1774 96 + . . C22728 TRF Tandem_Repeat 426 491 105 + . . C23428 TRF Tandem_Repeat 130 415 202 + . . C23796 TRF Tandem_Repeat 547 608 97 + . . C24440 TRF Tandem_Repeat 1059 1089 62 + . .
!head /Volumes/web/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_COMP_gene_prom_TE.bed
C1 0 100 C10003 0 156 C10005 0 156 C10007 0 156 C10009 0 156 C1001 0 103 C10011 0 156 C10013 0 157 C10015 0 157 C10021 0 157
!head /Volumes/web/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_CG.gff
##gff-version 3 ##sequence-region scaffold360 1 280 #!Date 2013-04-23 #!Type DNA #!Source-version EMBOSS 6.5.7.0 scaffold360 fuzznuc nucleotide_motif 60 61 2 + . ID=scaffold360.1;note=*pat pattern:CG scaffold360 fuzznuc nucleotide_motif 96 97 2 + . ID=scaffold360.2;note=*pat pattern:CG scaffold360 fuzznuc nucleotide_motif 120 121 2 + . ID=scaffold360.3;note=*pat pattern:CG scaffold360 fuzznuc nucleotide_motif 187 188 2 + . ID=scaffold360.4;note=*pat pattern:CG ##gff-version 3
methratio did produce output and his was uploaded to SQLShare
Want to convert to IGV
SELECT
chr as seqname,
pos - 1 as start, -- compensating for going to zero-based?
pos + 1 as [end],
'CG' as feature,
ratio as score
FROM [sr320@washington.edu].
[BiGill_methratio_v9_A.txt] yel
where
context like '__CG_' --_=single character wildcard
and
CT_Count >= 5
python fetchdata.py -d "[sr320@washington.edu].[BiGill_methratio_v9_IGV]" -f tsv -o /Volumes/web/cnidarian/BiGill_meth_v9_5x.igv
!head /Volumes/web/cnidarian/BiGill_meth_v9_5x.igv
Imported in IGV and looks like coordinates are ok
SELECT
chr as seqname,
pos - 1 as start, -- compensating for going to zero-based?
pos + 1 as [end],
'CG' as feature,
ratio as score
FROM [sr320@washington.edu].
[BiGO_betty_plain_methratio_v1.txt] yel
where
context like '__CG_' --_=single character wildcard
and
CT_Count >= 5
python fetchdata.py -d "[sr320@washington.edu].[BiGO_betty_methratio_v1_IGV]" -f tsv -o /Volumes/web/cnidarian/BiGO_betty_methratio_v1.igv
IGV Session resaved http://eagle.fish.washington.edu/cnidarian/oyster_v9_igv_session.xml
Details on sperm exon level expression available here
Gene level expression is in SQLShare, originally derived from CLC RNA-Seq
Gill Expression data
SQLShare Query
SELECT
Chromosome,
"Chromosome region start" - 1 as start,
"Chromosome region end" as [end],
'gene' as feature,
RPKM
FROM [sr320@washington.edu].[qDOD_Zhang_Gil_gene_RNA-seq]
Resulting file https://sqlshare.escience.washington.edu/sqlshare#s=query/sr320%40washington.edu/Zhang_Gil_gene_RNA-seq_IGV
Downloading
python fetchdata.py -d "[sr320@washington.edu].[Zhang_Gil_gene_RNA-seq_IGV]" -f tsv -o /Volumes/web/cnidarian/Zhang_Gil_gene_RNA-seq.igv
Needs to be sorted in IGV
http://eagle.fish.washington.edu/cnidarian/Zhang_Gil_gene_RNA-seq.sorted.igv
Sperm Gene level expression
File in SQLShare https://sqlshare.escience.washington.edu/sqlshare#s=query/sr320%40washington.edu/qDOD_Zhang_Mgo_gene_RNA-seq
SQLShare Query
SELECT
Chromosome,
"Chromosome region start" - 1 as start,
"Chromosome region end" as [end],
'gene' as feature,
RPKM as Mgo_RPKM
FROM [sr320@washington.edu].[qDOD_Zhang_Mgo_gene_RNA-seq]
New Dataset https://sqlshare.escience.washington.edu/sqlshare#s=query/sr320%40washington.edu/Zhang_Mgo_gene_RNA-seq_IGV
Downloading
python fetchdata.py -d "[sr320@washington.edu].[Zhang_Mgo_gene_RNA-seq_IGV]" -f tsv -o /Volumes/web/cnidarian/Zhang_Mgo_gene_RNA-seq.igv
Sorted
http://eagle.fish.washington.edu/cnidarian/Zhang_Mgo_gene_RNA-seq.sorted.igv