monoseq
¶monoseq
is a Python library for pretty-printing DNA and protein sequences using a monospace font. It also provides a simple command line interface.
Sequences are pretty-printed in the traditional way using blocks of letters where each line is prefixed with the sequence position. User-specified regions are highlighted and the output format can be HTML or plaintext with optional styling using ANSI escape codes for use in a terminal.
Here we show how monoseq
can be used in the IPython Notebook environment. See the monoseq
documentation for more.
Note: Some applications (e.g., GitHub) will not show the annotation styling in this notebook. View this notebook on nbviewer to see all styling.
If you haven't already done so, install monoseq
using pip
.
pip install monoseq
The monoseq.ipynb
module provides Seq
, a convenience wrapper around monoseq.pprint_sequence
providing easy printing of sequence strings in an IPython Notebook.
from monoseq.ipynb import Seq
s = ('cgcactcaaaacaaaggaagaccgtcctcgactgcagaggaagcaggaagctgtc'
'ggcccagctctgagcccagctgctggagccccgagcagcggcatggagtccgtgg'
'ccctgtacagctttcaggctacagagagcgacgagctggccttcaacaagggaga'
'cacactcaagatcctgaacatggaggatgaccagaactggtacaaggccgagctc'
'cggggtgtcgagggatttattcccaagaactacatccgcgtcaag')
Seq(s)
1 cgcactcaaa acaaaggaag accgtcctcg actgcagagg aagcaggaag ctgtcggccc 61 agctctgagc ccagctgctg gagccccgag cagcggcatg gagtccgtgg ccctgtacag 121 ctttcaggct acagagagcg acgagctggc cttcaacaag ggagacacac tcaagatcct 181 gaacatggag gatgaccaga actggtacaa ggccgagctc cggggtgtcg agggatttat 241 tcccaagaac tacatccgcg tcaag
We can change the number of characters per block and the number of blocks per line.
Seq(s, block_length=8, blocks_per_line=8)
1 cgcactca aaacaaag gaagaccg tcctcgac tgcagagg aagcagga agctgtcg gcccagct 65 ctgagccc agctgctg gagccccg agcagcgg catggagt ccgtggcc ctgtacag ctttcagg 129 ctacagag agcgacga gctggcct tcaacaag ggagacac actcaaga tcctgaac atggagga 193 tgaccaga actggtac aaggccga gctccggg gtgtcgag ggatttat tcccaaga actacatc 257 cgcgtcaa g
Let's say we want to highlight two subsequences because they are conserved between species. We define each region as a tuple start,stop (zero-based, stop not included) and include this in the annotation argument.
conserved = [(11, 37), (222, 247)]
Seq(s, annotations=[conserved])
1 cgcactcaaa acaaaggaag accgtcctcg actgcagagg aagcaggaag ctgtcggccc 61 agctctgagc ccagctgctg gagccccgag cagcggcatg gagtccgtgg ccctgtacag 121 ctttcaggct acagagagcg acgagctggc cttcaacaag ggagacacac tcaagatcct 181 gaacatggag gatgaccaga actggtacaa ggccgagctc cggggtgtcg agggatttat 241 tcccaagaac tacatccgcg tcaag
As a contrived example to show several levels of annotation, let's also annotate every 12th character and the middle third of the sequence.
twelves = [(p, p + 1) for p in range(11, len(s), 12)]
middle = [(len(s) / 3, len(s) / 3 * 2)]
Seq(s, annotations=[conserved, twelves, middle])
1 cgcactcaaa acaaaggaag accgtcctcg actgcagagg aagcaggaag ctgtcggccc 61 agctctgagc ccagctgctg gagccccgag cagcggcatg gagtccgtgg ccctgtacag 121 ctttcaggct acagagagcg acgagctggc cttcaacaag ggagacacac tcaagatcct 181 gaacatggag gatgaccaga actggtacaa ggccgagctc cggggtgtcg agggatttat 241 tcccaagaac tacatccgcg tcaag
The default CSS that is applied can be overridden with the style argument.
style = """
{selector} {{ background: beige; color: gray }}
{selector} .monoseq-margin {{ font-style: italic; color: green }}
{selector} .monoseq-annotation-0 {{ color: blue; font-weight: bold }}
"""
Seq(s, style=style, annotations=[conserved])
1 cgcactcaaa acaaaggaag accgtcctcg actgcagagg aagcaggaag ctgtcggccc 61 agctctgagc ccagctgctg gagccccgag cagcggcatg gagtccgtgg ccctgtacag 121 ctttcaggct acagagagcg acgagctggc cttcaacaag ggagacacac tcaagatcct 181 gaacatggag gatgaccaga actggtacaa ggccgagctc cggggtgtcg agggatttat 241 tcccaagaac tacatccgcg tcaag
See the string in monoseq.ipynb.DEFAULT_STYLE
for a longer example.