This might have been redacted with a slighly modified version of nbconvert to fix a few bug and reflect more what shoudl happend than what is happening with current nbconvert.
In this post I will introduce you to the programatic API of nbconvert to show you how to use it in various context.
For this I will use one of @jakevdp great blog post. I've explicitely chosen a post with no javascript tricks as Jake seem to be found of right now, for the reason that the becommings of embeding javascript in nbviewer, which is based on nbconvert is not fully decided yet.
This will not focus on using the command line tool to convert file. The attentive reader will point-out that no data are read from, or written to disk during the conversion process. Indeed, nbconvert as been though as much as possible to avoid IO operation and work as well in a database, or web-based environement.
The main principle of nbconvert is to instanciate a Exporter
that controle
a pipeline through which each notebook you want to export with go through.
Let's start by importing what we need from the API, and download @jakevdp's notebook.
import requests
response = requests.get('http://jakevdp.github.com/downloads/notebooks/XKCD_plots.ipynb')
response.content[0:60]+'...'
'{\n "metadata": {\n "name": "XKCD_plots"\n },\n "nbformat": 3,\n...'
We read the response into a slightly more convenient format which represent IPython notebook. There are not real advantages for now, except some convenient methods, but with time this structure should be able to guarantee that the notebook structure is valid.
from IPython.nbformat import current as nbformat
jake_notebook = nbformat.reads_json(response.content)
jake_notebook.worksheets[0].cells[0]
{u'cell_type': u'heading', u'level': 1, u'metadata': {}, u'source': u'XKCD plots in Matplotlib'}
So we have here Jake's notebook in a convenient for, which is mainly a Super-Powered dict and list nested. You don't need to worry about the exact structure.
cd ~/nbconvert/
/Users/bussonniermatthias/nbconvert
The nbconvert API exposes some basic exporter for common format and default options. We will start by using one of them. First we import it, instanciate an instance with all the defautl parameters and fed it the downloaded notebook.
import nbconvert
from nbconvert import BasicHtmlExporter
exportHtml = BasicHtmlExporter()
(body,resources) = exportHtml.from_notebook_node(jake_notebook)
The exporter returns a tuple containing the body of the converted notebook, here raw HTML, as well as a resources dict. The resource dict contains (among many things) the extracted PNG, JPG [...etc] from the notebook when applicable. The basic HTML exporter does keep them as embeded base64 into the notebook, but one can do ask the figures to be extracted. Cf advance use. So for now the resource dict should be mostly empty, except for 1 key containing some css.
Exporter are stateless, you won't be able to extract any usefull information (except their configuration) from them.
You can directly re-use the instance to convert another notebook. Each exporter expose for convenience a from_file
and from_filename
methods if you need.
resources.keys()
['inlining']
# Part of the body, here the first Heading
print body[:200]+'...'
<div class="text_cell_render border-box-sizing rendered_html"> <h1> <a class="heading-anchor" id="XKCD_plots_in_Matplotlib" href="#XKCD_plots_in_Matplotlib"> XKCD plots in Matplotlib </a> </h1> </...
You can directly write the body into an HTML file if you wish, as you see it does not contains any body tag, or style declaration, but thoses are included in the FullHtmlExporter. In advance use you will see how to customise each of the exporters.
When exporting one might want to extract the base64 encoded figures to separate files, this is by default what does the RstExporter does, let see how to use it.
from nbconvert import RstExporter
rst_export = RstExporter()
(body,resources) = rst_export.from_notebook_node(jake_notebook)
print body[:570]+'...'
print '[.....]'
print body[900:1100]+'...'
XKCD plots in Matplotlib ======================== This notebook originally appeared as a blog post at `Pythonic Perambulations <http://jakevdp.github.com/blog/2012/10/07/xkcd-style-plots-in-matplotlib/>`_ by Jake Vanderplas. One of the problems I've had with typical matplotlib figures is that everything in them is so precise, so perfect. For an example of what I mean, take a look at this figure: In[1]: .. code:: python from IPython.display import Image Image('http://jakevdp.github.com/figures/xkcd_version.png') .. image:: _fig_01.png Sometimes wh... [.....] om/figures/mpl_version.png') .. image:: _fig_03.png It just doesn't have the same effect. Matplotlib is great for scientific plots, but sometimes you don't want to be so precise. This subject has...
Here we see that base64 images are not embeded, but we get what look like file name. Actually those are (Configurable) keys to get back the binary data from the resources dict we havent inspected earlier.
So when writing a Rst Plugin for any blogengine, Sphinx or anything else, you will be responsible for writing all those data to disk, in the right place. Of course to help you in this task all those naming are configurable in the right place.
let's try to see how to get one of these images
resources['figures']['binary'].keys()
[u'_fig_07.png', u'_fig_09.png', u'_fig_03.png', u'_fig_12.png', u'_fig_01.png']
We have extracted 5 binary figures, here png
s, but they could have been svg, and then wouldn't appear in the binary sub dict.
keep in mind that a object having multiple repr will store all it's repr in the notebook.
Hence if you provide _repr_javascript_
,_repr_latex_
and _repr_png_
to an object, you will be able to determine at conversion time which representaition is the more appropriate. You could even decide to show all the representaition of an object, it's up to you. But this will require beeing a little more involve and write a few line of Jinja template. This will probably be the subject of another tutorial.
Back to our images,
from IPython.display import Image
Image(data=resources['figures']['binary']['_fig_07.png'],format='png')
Yep, this is indeed the image we were expecting, and I was able to see it without ever writing or reading it from disk. I don't think I'll have to show to you what to do with those data, as if you are here you are most probably familiar with IO.
Use case:
I write an awesome blog in HTML, and I want all but having base64 embeded images.
Having one html file with all inside is nice to send to coworker, but I definitively want resources to be cached ! So I need an HTML exporter, and I want it to extract the figures !
The process of converting a notebook to a another format with the nbconvert Exporters happend in a few steps:
Transformers
. Transformer only act on the structureHere we'll be interested in the Transformers
. Each Transformer
is applied successively and in order on the notebook before going through the conversion process.
We provide some transformer that do some modification on the notebook structure by default.
One of them, the ExtractFigureTransformer
is responsible for crawling notebook,
finding all the figures, and put them into the resources directory, as well as choosing the key
(_fig_xx.png
) that can replace the figure in the template.
Thes ExtractFigureTransformer
is special in the fact that it should be availlable on all Exporter
s, but is just inactive.
# second transformer shoudl be Instance of ExtractFigureTransformer
print exportHtml.transformers
[<function wrappedfunc at 0x10bc5a050>, <nbconvert.transformers.extractfigure.ExtractFigureTransformer object at 0x10be25050>, <nbconvert.transformers.csshtmlheader.CSSHtmlHeaderTransformer object at 0x10be25110>]
print rst_export.transformers
[<function wrappedfunc at 0x10bc5a050>, <nbconvert.transformers.extractfigure.ExtractFigureTransformer object at 0x10be93310>]
To enable it we will use IPython configuration/Traitlets system. If you are familiar with it, this will look pretty familiar to you. Configuration option are always of the form:
ClassName.attribute_name = value
A few ways exist to create such config, like reading a config file in your profile, but you can also do it programatically usign a dictionary. Let's create such a config object, and see the difference if we pass it to our HtmlExporter
from IPython.config import Config
c = Config({
'ExtractFigureTransformer':{'enabled':True}
})
exportHtml = BasicHtmlExporter()
exportHtml_and_figs = BasicHtmlExporter(config=c)
(_, resources) = exportHtml.from_notebook_node(jake_notebook)
(_, resources_with_fig) = exportHtml_and_figs.from_notebook_node(jake_notebook)
print 'resources without the "figures" key :'
print resources.keys()
print ''
print 'resources with the "figures" key, subkey "binary" with each figures as subkeys and values :'
print resources_with_fig['figures']['binary'].keys()
resources without the "figures" key : ['inlining'] resources with the "figures" key, subkey "binary" with each figures as subkeys and values : [u'_fig_07.png', u'_fig_09.png', u'_fig_03.png', u'_fig_12.png', u'_fig_01.png']
So now you can loop through the dict and write all those figures to disk in the right place... well at least you got the basics, this won't be enough for your blog,I'll let you fire out why by yourself. But more on that later.
Of course you can imagine many transformation that you would like to apply to a notebook. This is one of the reason we provide a way to register your own transformers that will be applied to the notebook after the default ones.
To do so you'll have to pass an ordered list of Transformer
s to the Exporter constructor.
But what is an transformer ? Transformer can be either decorated function for dead-simple Transformer
s that apply
independently to each cell, for more advance transformation that support configurability You have to inherit from
Transformer
and define a call
method as we'll see below.
Here, in particular I'll inherit ActivatableTransformer
that will give your transformer a magic attribute that allows it to be activated/disactivate from the config dict.
from nbconvert.transformers.activatable import ActivatableTransformer, ConfigurableTransformer
import IPython.config
print "Four relevant docstring"
print '============================='
print ActivatableTransformer.__doc__
print '============================='
print ConfigurableTransformer.__doc__
print '============================='
print ConfigurableTransformer.call.__doc__
print '============================='
print ConfigurableTransformer.cell_transform.__doc__
print '============================='
Four relevant docstring ============================= ConfigurableTransformer that has an enabled flag Inherit from this if you just want to have a transformer which is disable by default and can be enabled via the config by 'c.YourTransformerName.enabled = True' ============================= A configurable transformer Inherit from this class if you wish to have configurability for your transformer. Any configurable traitlets this class exposed will be configurable in profiles using c.SubClassName.atribute=value you can overwrite cell_transform to apply a transformation independently on each cell or __call__ if you prefer your own logic. See orresponding docstring for informations. ============================= Transformation to apply on each notebook. You should return modified nb, resources. If you wish to apply your transform on each cell, you might want to overwrite cell_transform method instead. Parameters ---------- nb : NotebookNode Notebook being converted resources : dictionary Additional resources used in the conversion process. Allows transformers to pass variables into the Jinja engine. ============================= Overwrite if you want to apply a transformation on each cell. You should return modified cell and resource dictionary. Parameters ---------- cell : NotebookNode cell Notebook cell being processed resources : dictionary Additional resources used in the conversion process. Allows transformers to pass variables into the Jinja engine. index : int Index of the cell being processed =============================
We don't provide convenient method to be aplied on each worksheet as the data structure for worksheet will be removed. (not the worksheet functionnality, which is still on it's way)
I'll now demonstrate a specific example requested while nbconvert 2 was beeing developped. The ability to exclude cell from the conversion process based on their index.
I'll let you imagin how to inject cell, if what you just want is to happend static content at the beginning/end of a notebook, plese refer to templating section, it will be much easier and cleaner.
from IPython.utils.traitlets import Integer, Bool
from copy import deepcopy
class PelicanSubCell(ActivatableTransformer):
"""A Pelican specific transformer to remove somme of the cells of a notebook"""
# I could also read the cells from nbc.metadata.pelican is someone wrote a JS extension
# But I'll stay with configurable value.
start = Integer(0, config=True, help="first cell of notebook to be converted")
end = Integer(-1, config=True, help="last cell of notebook to be converted")
verbose = Bool(False, config=True, help="Shoudl I speek too much")
def call(self, nb, resources):
#nbc = deepcopy(nb)
nbc = nb
# don't print in real transformer !!!
if self.verbose :
print "I'll keep only cells from ", self.start, "to ", self.end, "\n"
for worksheet in nbc.worksheets :
cells = worksheet.cells[:]
worksheet.cells = cells[self.start:self.end]
return nbc, resources
# I create this on the fly, but this could be loaded from a DB, and config object support merging...
c = Config({
'PelicanSubCell':{
'enabled':True,
'start':4,
'end':6,
}
})
# additionaly, I'll make if verbose
c.PelicanSubCell.verbose = True
I'm creating a pelican exporter that take PelicanSubCell
extra transformers and a config
object as parameter. This might seem redundant, but with configuration system you'll see that one can register an inactive transformer on all exporters and activate it at will form its config files and command line.
pelican = RstExporter(transformers=[PelicanSubCell], config=c)
print pelican.from_notebook_node(jake_notebook)[0]
I'll keep only cells from 4 to 6 Sometimes when showing schematic plots, this is the type of figure I want to display. But drawing it by hand is a pain: I'd rather just use matplotlib. The problem is, matplotlib is a bit too precise. Attempting to duplicate this figure in matplotlib leads to something like this: In[2]: .. code:: python Image('http://jakevdp.github.com/figures/mpl_version.png') .. image:: _fig_03.png
This paragraph will be more a specifc example that something general, but it hav enough specificity to be fully explain. In most of the conversion process, you will probably want to have the extracted figures/graphs not in the same folder than the generated html or tex file.
I'll take the example of the LaTeX converter where, most of the time, you will want the figures to be saved under the figs/
folder.
and As you can't directly include the svg
, you will need another name to be included in the .tex
file
How does part of the latex Exporter look like without customisation ?
from nbconvert.exporters.latex import LatexExporter
# same config as before
c = Config({
'PelicanSubCell':{
'enabled':True,
'start':4,
'end':6,
},
})
lpelican = LatexExporter(transformers=[PelicanSubCell], config=c)
print lpelican.from_notebook_node(jake_notebook)[0][4027:]
\begin{document} Sometimes when showing schematic plots, this is the type of figure I want to display. But drawing it by hand is a pain: I'd rather just use matplotlib. The problem is, matplotlib is a bit too precise. Attempting to duplicate this figure in matplotlib leads to something like this: \begin{codecell} \begin{codeinput} \begin{lstlisting} Image('http://jakevdp.github.com/figures/mpl_version.png') \end{lstlisting} \end{codeinput} \begin{codeoutput} \begin{center} \includegraphics[width=0.7\textwidth, height=0.9\textheight, keepaspectratio]{_fig_03.png} \par \end{center} \end{codeoutput} \end{codecell} \end{document}
Clearly the \includegraphics[...]{_fig_03.png}
is probably not what you want. To help us with that we will use 2 configurables values of the ExtractFigureTransformers
I'm not particulary happy with this part of the configurability, the exact naming may change, feedback and ideas welcommed
the property are the following :
- key_format_map
- figure_name_format_map
They respectively control the template of keys in the resources
dict returned, the template of the figure names as they appear in the converted document.
Carefull reader will discover in the above cell a nice way to modify aconfig object, and will recognize something which is really close to the IPython config file syntax. One might forsee how the user will be able to create it's own converter through config file in a near future.
The curious reader wan create multiple config object and will look at the possibilities that the <Config>.update(<Config>)
methods allow.
A practical example would be to extract svg
s into a figs/svgs/
folder then convert them to .ps
in figs/ps/
.
As the notebook I'm working with does not contain svgs, let's say that I will extract png
s into the figs/pngs/
folder, and will convert them to tiffs into the figs/tiffs
folder. So I need the .tex
file to have the correct includegraphics
....
# use a diffrent templates for pngs links into tex file:
c.ExtractFigureTransformer.update({'figure_name_format_map':{'png':'figs/tiffs/fig_{index:04d}.tiff'}})
c
{'ExtractFigureTransformer': {'figure_name_format_map': {'png': 'figs/tiffs/fig_{index:04d}.tiff'}}, 'PelicanSubCell': {'enabled': True, 'end': 6, 'start': 4}}
lpelican = LatexExporter(transformers=[PelicanSubCell], config=c)
body,resources = lpelican.from_notebook_node(jake_notebook)
print body[4509:-70]
includegraphics[width=0.7\textwidth, height=0.9\textheight, keepaspectratio]{figs/tiffs/fig_0003.tiff}
Great, so now, the value in the converted document have changed, but not the keys in the resources
dict.
lpelican.config
{'ExtractFigureTransformer': {'enabled': True, 'extra_ext_map': {'svg': 'pdf'}, 'figure_name_format_map': {'png': 'figs/tiffs/fig_{index:04d}.tiff'}}, 'GlobalConfigurable': {'display_data_priority': ['latex', 'svg', 'png', 'jpg', 'jpeg', 'text']}, 'PelicanSubCell': {'enabled': True, 'end': 6, 'start': 4}}
resources['figures']['binary'].keys()
[u'_fig_07.png', u'_fig_09.png', u'_fig_03.png', u'_fig_12.png', u'_fig_01.png']
I can fix that by setting the key_format_map
:
c.ExtractFigureTransformer.key_format_map = {'png':'png-tiff.{index}.{ext}'}
c
{'ExtractFigureTransformer': {'figure_name_format_map': {'png': 'figs/tiffs/fig_{index:04d}.tiff'}, 'key_format_map': {'png': 'png-tiff.{index}.{ext}'}}, 'PelicanSubCell': {'enabled': True, 'end': 6, 'start': 4}}
lpelican = LatexExporter(transformers=[PelicanSubCell], config=c)
(body,resources)= lpelican.from_notebook_node(jake_notebook)
resources['figures']['binary'].keys()
['png-tiff.9.png', 'png-tiff.1.png', 'png-tiff.12.png', 'png-tiff.3.png', 'png-tiff.7.png']
The current version of the ipynb files store the data relative to each display format by using file extension (png, json, jpeg), you can use {ext}
to access those in the previous templates.
In following version of ipynb format, the data will most likely be organized by mimetype. Which will require some change in this place.
I think this is enough for now, As you have seen there are a few bugs here and there I need to correct before continuing. Next time I'll show you how to modify template :
{%- extends 'fullhtml.tpl' -%}
{% block input_group -%}
{% endblock input_group %}
... and you just removed all the codecell by keeping the output and markdown codecell, isn't that wonderfull ? You want to wrap each cell in your own div ?
{%- extends 'fullhtml.tpl' -%}
{% block codecell %}
<div class="myclass">
{{ super() }}
</div>
{%- endblock codecell %}
Try to look at what Jinja can do, then learn about Jinja Filters and imagine they can magically read your config file.
For example we provide a filter that highlight by presupposing code is Python. Or one that wraps text at a default length of 80 char... Want a rot13 filter on some codecell when doing exercises for student ? See you next time !
from nbconvert.exporters.reveal import RevealExporter
r = RevealExporter()
r.config
{'CSSHtmlHeaderTransformer': {'enabled': True}}
%config IPCompleter.greedy = True
cd ~/nbconvert
/Users/bussonniermatthias/nbconvert
from nbconvert.exporters import LatexExporter
s = LatexExporter()
print s.transformers[1].config
print s.transformers[1].display_data_priority
{'ExtractFigureTransformer': {'enabled': True, 'extra_ext_map': {'svg': 'pdf'}}, 'GlobalConfigurable': {'display_data_priority': ['latex', 'svg', 'png', 'jpg', 'jpeg', 'text']}} ['latex', 'svg', 'png', 'jpg', 'jpeg', 'text']