Summary: using BSMAP to look at TE specific methylation
Updated: June 21, 2013
Previously Transposable Elements were identifed - evernote link
Mac has taken http://eagle.fish.washington.edu/cnidarian/qDOD_RepeatProteinMask_v9.txt ? and run fastafromBed however sequence name was uniformative.
./fastaFromBed -fi /Volumes/web-1/bivalvia/wholegenomefiles_MBDbsSeq_gill/oyster.v9.fa -bed /Volumes/web-1/bivalvia/wholegenomefiles_MBDbsSeq_gill/qDOD_RepeatProteinMask_v9_gffformat.gff -fo /Volumes/web-1/bivalvia/wholegenomefiles_MBDbsSeq_gill/RepeatProteinMask_v9.fa -s
re-run using -name
!fastaFromBed -s -name -fi /Volumes/web/cnidarian/oyster.v9.fa -bed /Volumes/web/bivalvia/wholegenomefiles_MBDbsSeq_gill/qDOD_RepeatProteinMask_v9_gffformat.gff -fo /Volumes/web/cnidarian/TJGR_RepProMask_TE.fa
!head /Volumes/web/cnidarian/TJGR_RepProMask_TE.fa
>DNA/hAT-Tip100 AGCTCGACGATGGACAAGTTTGTAACAAAAACGAATAGAAAAAGTACAGATGATCAGACAGCTGATAAAAATAACAACGAAACTACATTAGCGAAACATGGTAAATACGAAAAGACCAAACGGAAAAGATGCTATTTGAAAACGTGGGAGGAAACCTGGCCCTAGGTACGACATGACGATATAAATGACACAATGTATTACACAATTTGCAGGGAATTTTCTCCTGCGGTTGGCAACGTAAAAAACAATTCTTTCTACGAAGGATGCAAACTCTTTCATGTGGATTCTTTTAAGGCCCATCAAACTAGCGATGTTCATGTGATATGCACCAGTTTCTATCGCCAGAAAAACGGAACTCCCGTGACTAAGGGTGGAAATCAAGCACATATAAGTGACAATGTCAATGTTAGAGCAACACCTATTATGACAGCATTGCTGAAGTTAGATGAAAACCAAATGAAAAGGCTTGATGCTCTTTTCAAAATAGCATATAAAATTGCAAAACATGGTAAACCACACAGTGATTTTGAAATCGATTGTAAATTGATTCAGAAACTTGGTGTTGATCTTGGGAACAATTATTTTAATACAAATAGATGTAAAGATTTTATAAAATCTATCGCTGATGCTATGACTGAAGCTGTGGCTGATGATTTGAAAGGAGCAAATTTTGTTTTTGTTCTCTCTGATGGTAGCACTGACTGTAGTAACCTTGAACAAGAGAATGTCCCTGTACGATATGTGTCTCCTAAGACTCGGAATCCTGTAACAATATTTTGTGGCATTGTAAATTTGGAACATGGTCATGAAGATGGCGTTTAAGATGGCATTTTTAGAGCCTTGAGTTTAGTTGGCTTAACGAAGGAGAATTTGAAAGCTAATGAAGCAGGTCCAACATTAATTTGCGCGAACTTTGACGATGGAAATGTCATGCAAGGTAAGAAAAATGGAGTTGTGGGAAAACTCGTAAAGGAATACAATCATGTTTTGGGAATGTGGTGAATTGCACACAAACTTCAACTGGCAGTGATGGATTCAGTCAAGGATGTTCAAAGTCTTCAGGAA >DNA/hAT-Tip100 GTAGAAACAAACCCAGAACAAATATCTAGTCAGTGTACACCTGATACTTCCATAATTAAATTGCACCTCGTTCATTCAGATCCAAAGTTAGATTGGGACGTCTCTATCATTGGCAATGAACCAAACCAACCACAGAAGAGTTTTCCTGTAACAATAATTGCAAAACGAAAGAGATCATTTGTTTACAGTTGGTTTAGTACCTGGACATGGCTTCATTATGAAGAATATTCGGACAAAGCTTTCTGTTTCTATTGTATTAAAGCTTACAAGGAGCAAAAGTTGGCAAACATGTGCAAAGAAAAGGCTTTCATATCTGATGGATTCAAAAATTGGAAAAAAGCCACATTAAAATTCAGAGAACATGATAACTCTGATTGTCACAGGGAAGCAGTCGAACGC >DNA/hAT-Tip100 CAAACATTGAGTGATGGAAATTCTTATACATGTCCAAAAAGTGTTAGTGAATTCCAAAGTGTGTGTGCTAAGGTTGTTTTAATCAATGTCATTGAAAAAGTTTTGAATGCAGGAGTATTTTCTCTAATGCTTGATGAAAGCACAGACAGAGGAAACCGTAAACGGCTGTTGGTGTACATCCAATATCTACATGAGAGAAAACTACAGACCAGCCTCCTTAGCAACATTGAAATTTTAACAGCAAAAGCGGATGCTGAAACTATAACAAACCATGTTTTGACAGAGCTGCGGATGAAAGGCTTAAGTGTTGCAGGACTTGTTGGCATTGGAACTGACGGTGCGAATGTTATGACAGGGAAAAAGAGTGGAGTTGTTGTGAGGCTTCGGGAGCACAGTCCCACATTGATAGGAACACATTGTGCCGCTCATAGATGTGCCCTGGCTGCTTCACAAGCATCAAAATCAATTCCAGAGCTAAAGAAGTATTCTGACACAGTGTCAAATATATTTTTCTATTTTAGTGGCTCTGCTCTCAGATCAAATAAACTGAGGGAGATTCAAATGCTACTGAACCTCCCCATGTTAAAATTTGCCCAGCTCCACTCAGTCCGGTGGTTAAGTCTTCAGAAATCTGTGGAAATTCTGTACAGGACTTACCCAGCACTTCTTGTGGCATTAGAACATGAGGGAACAGTGAACCCCGCTGCAAGAGGAATATATGCAGAAGTATCGCAGTACAGTTTCATTGCAATCACACATATGCTTATGGACATCCTCCCATTTCTTGGAAAACTTAGTCGAATTTTTCAGACAAAGGACTTGGATCTAAGTAAAGTAACACCCATAGTCAAAAGCACCTGTGATGCACTTCTAGATTTCAAAGAGGCAGAGGGAATGTATGTTGATAAACTAGATGAGTTCATAATTAAAGATGGGGACAGTGTTTTGTACAGGAGACCTGACACTGAAAGTAGAAAATCAGCTGTCCAGGGAGCCATAGAAGTAAACATGGATGGATTCAGTGGTTTTGAAAAAGAGCATGAAGAGTGTAACTTTGAAGAAGAAGACAATGGGGTTCAAATTAGGTATTATCAACAACAACAAAACGTCCTCAGAAGTATCATGCCCAAGTACTTGGATACTATTGTTGCTAACTTAGAAAACAGATTTCAGGAGAATGGTTTTATTGAAAAAATGCAAGTGCTTCAGCCTATTCACATTACAGCTGCCCACAAGAAAGGAGAGCTAGCAAAGTATGGAAGTGAAGAAATTGCTGTTATGGCCAACCACTTTGCTATCCATGGAATTAATCCAGAGGAAACTCCAATGGAATACAAACAGTATAAACGTCTTGTAGTTGGATCATATCAAACTTCTTCCCTCAGTGATATGTGCTTTCACCTTGGATCCGAGTATGAAGATATCATGCCCAATATTGTGAGACTGATTCACTGTTGTACTGTGATACCAGTCAGCAGTGCAACATGTGAAAGAGGGTTCTCAACCCAAAACAGAATTAAATCTAGACTGAGGACCAATTTGAACAACATTTCATTAAATGATTTAATGAGAATTTCTGAAGATGGTCCTTCCATGGCACTTTTTGACTTTCCAGAGGCTCTGAAAGTATGGAAGGAAGAAAAGAAGCGA >DNA/hAT-Tip100 AGGAAGGGAAAATATTCTGTCAACTGGATTCTGCAACTGGAAAGATGCTACCTCAAAGTTCCAAAACATCAAGACTCCGACTGTCACAAAGAGGCCGTGGAAAGAAAGTTGAAACTTCCTACAGAAACCAAGGACATCGGTGAAGTTCTCTCAGCCGCCCACTCAGAGGAGAAAACCCCCAACAGACAGCAAATGTTGACCATCCTACGCAACATTAGATTTCGTGCTAGACAGGGACTACCAATCCGTGGCCATGACGATAACCAGAGCAACTTCATCCAACTC >DNA/hAT-Tip100 CCCAATATTGTGAGACTGATTCACTGTTGTACTGTGATACCAGTCAGCAGTGCAACATGTGAAAGAGGGTTCTCAACCCAAAACAGAATTAAATCTAGACTGAGGACCAACTTGAACAACAGTTCATTAAATGATTTAATGAGAATTTCTGAAGATGGTCCTTTTTCATGGCACACTTTGACTTTCCAGAGGCTCTGAAAATATGGAAGGAAGGAAAGA
also have a Pearl file; Descriptions not informative but can parse post.
!head /Volumes/web/bivalvia/wholegenomefiles_MBDbsSeq_gill/oyster_v9_pearlTE.fa
>scaffold42662:175722-176097(+) TTAGCTCACCTGAGCCAAAGGCTCAAGTGAGCTTTTCTGATCACAATTTGTCCGTTGTCTGTCGTTGTCGTTGTCGTTGTCGTCGTCGTTGATGTCGTCGTCGTTGTAAACTTTTCACATTTTCATCTTCTTCTCAAGAACCACAGGGCCAATTTCAATCCAACTTGCCATAAAGTATCCTTAGGTAAAGGGGTTTCAGGTTTGTTCAAATGAAGATCCTTGCCCTCTTTCAAGGGGAGATAATTAGGAATTAGTGAAAAAGTAATGGTGTATTTTAAAAATCTTCTTCTCAAGAACCACTTGGCCAAAAAAGATGAAACTCATAGTGAAGTATCCTCAGGTGGTGTAGATTCAAGTTTGTTCAAATCATGACCC >scaffold48:322270-322642(+) TTAGCTCACCTGAGCCAAAGGCTCAAGTGAGCTTTTCTGATCACAATTTGTCCGTTGTCTGTCGTTGTCGTTGTCGTTGTCGTCGTTGTCGTCGTCGTTGTTGTAAACTTTTCACATTTTCATCTTCTTCTCAAGAACCACTGGGCAGATTCCAACCGAATTTGGCACAAATCCCCACTAGGTGAAGGGGATCCAGGTTTGTTCAAATAAAGTGCCACGCCCTCTTTAAAGGGGAGATAATTGAGAATTATTGAAAATTTGTTGGTATTTTTCAAAAATCTTCTTCTCAAAAACTATTCGGCCTGAAAAGCTTGAACTTGTGTGGAGGCATCCTCAGGTAGTGTAGATTCAAGTTTGTTCAAATCATGATCC >scaffold544:193731-194102(+) TTAGCTCACCTGAGCTGAAAGCTCAAGTGAGCTATTCTGATCACATTTTGTCTGTCATCCATCTGTCCATTTGTCCATCTGTCCATGTCTGTACGTCTGTCTGTAAACTTTTACATTTTCAACATCTTCTCTAGAACCACTGGGCCAACTTTAACCAAATTTAGCACAAAGCATCTCTAAGCAAAGGGCATTCAAAGTTGTGAAAATTAAGGACCACACTGTTTTACATGTAGAGATAATTAGGAGTTATTGAAAATTTAAAAAAAAAGCCAGAAATCTTCTTCTCCAGAACTGTTTGGCCAGGAAAGCTGAAACTTATGTGGAAGCATCCTCAGGTAATGTAGATTCAAAGTTGTGAAAATCATGAACCA >scaffold1715:409762-410131(+) TTAGCTCACCTGAGCTGAAAGCTCAAGTGAGCTTTTCTGATCAAAATTTGTCCGTTGTCTGTCGTCGGCGTTGGCGTTGTCATTGGCAGCGTTGTCGTTGTAAACTTTTCACATTTTCTTCTTCTTCTCCAGAACCACACTGCAGATTTCAACCAAATTTGGCACAAAGCATCACTTGGTGAAGGGGATTCAAGTTTGTTCAAATGAAGGGCAACGCCCTCTTTAAAGGGGAAATAATTGAGAATTATTGAAAATTTATTAGTATTTTTCAAAAATCTTCTTCTCAAAAACTATTTGGCCAGAAAAGCTTTAACCTGTGTGGAGGCATCCTCAGGTATTGTTGATTCAAGTTTGTTCAAATCATGGACC >scaffold1834:443990-444359(+) TTAGCTCACCTGAGCTGAAAGCTGAAAGCTCAAGTGAGGTTTTTTCTGATCAAAATTTGTCCGTTGTCTGTTGTCTTTGGCGTTGGCGTTGTCATTGTTATAAACTTTTCACATTTTTATCTACTTCTCCAGAACCACTGGGTCGATTTCAACCAAACTTGGCACAAAGCATCACTGGGTGAAAGGGATTCAAGATTGTTCAAATGAAGGGCCCACCCATGTTAAAGGGGAGATAATTTAAAATCATTAAATAATTTGTTGGTATTTTTCGAAAAACTTCTTCTCAAAAAGTATTGGGCCAAAAAAGCTTTAACTTGTGTTGAAGCATTCTCAGGTTGTATAGATTTAAGTTTGTTCAAATCATGATCC
!bsmap -w 1000 -a /Volumes/NGS\ Drive/NGS\ Raw\ Data/Oyster_BSseq/filtered_Unlabeled_NoIndex_L003_R1.fastq.gz -d /Volumes/web/cnidarian/TJGR_RepProMask_TE.fa -o /Volumes/web/cnidarian/BiGill_BSMAP_TEonly_v1.sam -p 1
aborted
Running BSMAP with -w 1000
on genome
Previous analysis was as follows
./bsmap -a //Volumes/NGS\ Drive/NGS\ Raw\ Data/Oyster_BSseq/filtered_Unlabeled_NoIndex_L003_R1.fastq.gz -d /Volumes/web/cnidarian/oyster.v9.fa -o /Volumes/web/cnidarian/BiGill_BSMAP_GillMBD_genome_v9_v1.sam -p 16
Total number of aligned reads: 120734940 (86%) Done. Finished at Wed Apr 17 09:09:59 2013 Total time consumed: 8235 secs
http://eagle.fish.washington.edu/cnidarian/BiGill_BSMAP_GiIlMBD_genome_v9_v1.sam
redux..
!bsmap -w 1000 -a /Volumes/NGS\ Drive/NGS\ Raw\ Data/Oyster_BSseq/filtered_Unlabeled_NoIndex_L003_R1.fastq.gz -d /Volumes/web/cnidarian/cnidarian/oyster.v9.fa -o /Volumes/web/cnidarian/BiGill_BSMAP_genome_v9_v1000.sam -p 2
aborted
d-128-95-149-219:bsmap-2.74 sr320$ ./bsmap -w 1000 -a /Volumes/NGS\ Drive/NGS\ Raw\ Data/Oyster_BSseq/filtered_Unlabeled_NoIndex_L003_R1.fastq.gz -d /Volumes/web/cnidarian/TJGR_RepProMask_TE.fa -o /Volumes/web/cnidarian/BiGill_BSMAP_TEonly_v2.sam -p 16
BSMAP v2.74
Start at: Fri Jun 14 13:19:06 2013
Input reference file: /Volumes/web/cnidarian/TJGR_RepProMask_TE.fa (format: FASTA)
Load in 119787 db seqs, total size 39414569 bp. 2 secs passed
total_kmers: 43046721
Create seed table. 5 secs passed
max number of mismatches: read_length * 8% max gap size: 0
kmer cut-off ratio: 5e-07
max multi-hits: 1000 max Ns: 5 seed size: 16 index interval: 4
quality cutoff: 0 base quality char: '!'
min fragment size:28 max fragemt size:500
start from read #1 end at read #4294967295
additional alignment: T in reads => C in reference
mapping strand: ++,-+
Single-end alignment(16 threads)
Input read file: /Volumes/NGS Drive/NGS Raw Data/Oyster_BSseq/filtered_Unlabeled_NoIndex_L003_R1.fastq.gz (format: gzipped FASTQ)
Output file: /Volumes/web/cnidarian/BiGill_BSMAP_TEonly_v2.sam (format: SAM)
BSMAP complete on Hummingbird
Total number of aligned reads: 2545683 (1.8%) Done. Finished at Sat Jun 15 20:45:07 2013 Total time consumed: 113161 secs
@ Mon Jun 17 08:25:48 2013: reading reference /Volumes/web/cnidarian/TJGR_RepProMask_TE.fa ... @ Mon Jun 17 08:25:50 2013: reading /Volumes/web/cnidarian/BiGill_BSMAP_TEonly_v2.sam ... [samopen] SAM header is present: 119787 sequences. @ Mon Jun 17 08:26:57 2013: combining CpG methylation from both strands ... @ Mon Jun 17 08:26:57 2013: writing /Volumes/web/cnidarian/BiGill_methratio_TEonly_A.txt ... @ Mon Jun 17 08:26:58 2013: done. total 1486700 valid mappings, 3267 covered cytosines, average coverage: 536.48 fold.