#!/usr/bin/env python # coding: utf-8 # > This is one of the 100 recipes of the [IPython Cookbook](http://ipython-books.github.io/), the definitive guide to high-performance scientific computing and data science in Python. # # # 3.2. Converting an IPython notebook to other formats with nbconvert # You need pandoc, a LateX distribution, and the Notebook dataset on the book's website. On Windows, you also need pywin32 (`conda install pywin32` if you use Anaconda). # 1. Let's open the test notebook in the `data` folder. A notebook is just a plain text file (JSON), so we open it in text mode (`r` mode). # In[ ]: with open('data/test.ipynb', 'r') as f: contents = f.read() print(len(contents)) # In[ ]: print(contents[:345] + '...' + contents[-33:]) # 2. Now that we have loaded the notebook as a string, let's parse it with the `json` module. # In[ ]: import json nb = json.loads(contents) # 3. Let's have a look at the keys in the notebook dictionary. # In[ ]: print(nb.keys()) print('nbformat ' + str(nb['nbformat']) + '.' + str(nb['nbformat_minor'])) # The version of the notebook format is indicated in `nbformat` and `nbformat_minor`. # 3. The main field is `worksheets`: there is only one by default. A worksheet contains a list of cells, and some metadata. # In[ ]: nb['worksheets'][0].keys() # 4. Each cell has a type, optional metadata, some contents (text or code), possibly one or several outputs, and other information. Let's look at a Markdown cell and a code cell. # In[ ]: nb['worksheets'][0]['cells'][1] # In[ ]: nb['worksheets'][0]['cells'][2] # 5. Once parsed, the notebook is represented as a Python dictionary. Manipulating it is therefore quite convenient in Python. Here, we count the number of Markdown and code cells. # In[ ]: cells = nb['worksheets'][0]['cells'] nm = len([cell for cell in cells if cell['cell_type'] == 'markdown']) nc = len([cell for cell in cells if cell['cell_type'] == 'code']) print(("There are {nm} Markdown cells and " "{nc} code cells.").format( nm=nm, nc=nc)) # 6. Let's have a closer look at the image output of the cell with the matplotlib figure. # In[ ]: png = cells[2]['outputs'][0]['png'] cells[2]['outputs'][0]['png'] = png[:20] + '...' + png[-20:] cells[2]['outputs'][0] # In general, there can be zero, one, or multiple outputs. Besides, each output can have multiple representations. Here, the matplotlib figure has a PNG representation (the base64-encoded image) and a text representation (the internal representation of the figure). # 7. Now, we are going to use nbconvert to convert our text notebook to other formats. This tool can be used from the command-line (if you are using IPython < 4.x, replace command `jupyter` with `ipython`). Here, we convert the notebook to an HTML document. # In[ ]: get_ipython().system('jupyter nbconvert --to html data/test.ipynb') # 8. Let's display this document in an `