Excel-like data grid editor for Pandas in the IPython notebook

This proof-of-concept brings an interactive Excel-like data grid editor in the IPython notebook, compatible with Pandas' DataFrame. It means that whenever you have a DataFrame in the IPython notebook, you can edit it within an integrated GUI in the notebook, and the corresponding Python object will be automatically updated in real-time.

This proof-of-concept uses the new widget machinery of IPython 2.0. You need the latest development version of IPython (or v2.0beta, or v2.0 when it's released within the next few weeks). You also need the Handsontable Javascript library. Other data grid editors could probably be used as well.

Getting started

There are multiple steps to make this example work (assuming you have the latest IPython).

  1. Go here.
  2. Download jquery.handsontable.full.css and jquery.handsontable.full.js, and put these two files in ~\.ipython\profile_default\static\custom\.
  3. In this folder, add the following line in custom.js:
    require(['/static/custom/jquery.handsontable.full.js']);
    
  4. In this folder, add the following line in custom.css:
    @import "/static/custom/jquery.handsontable.full.css"
    
  5. Execute this notebook.

How to do it...

  • Let's import a few functions and classes.
In [1]:
from __future__ import print_function # For py 2.7 compat

from IPython.html import widgets # Widget definitions
from IPython.display import display # Used to display widgets in the notebook
from IPython.utils.traitlets import Unicode # Used to declare attributes of our widget
  • We create a new widget. The value trait will contain the JSON representation of the entire table. This trait will be synchronized between Python and Javascript by IPython 2.0's widget machinery.
In [2]:
class HandsonTableWidget(widgets.DOMWidget):
    _view_name = Unicode('HandsonTableView', sync=True)
    value = Unicode(sync=True)
  • Now we write the Javascript code for the widget. There is a tiny bit of boilerplate code, but have a look at the three important functions that are responsible for the synchronization:

    • render for the widget initialization
    • update for Python --> JS update
    • handle_table_change for JS --> Python update

This is a bit oversimplified, of course. You will find more information on this tutorial.

In [3]:
%%javascript
var table_id = 0;
require(["widgets/js/widget"], function(WidgetManager){    
    // Define the HandsonTableView
    var HandsonTableView = IPython.DOMWidgetView.extend({
        
        render: function(){
            // CREATION OF THE WIDGET IN THE NOTEBOOK.
            
            // Add a <div> in the widget area.
            this.$table = $('<div />')
                .attr('id', 'table_' + (table_id++))
                .appendTo(this.$el);
            // Create the Handsontable table.
            this.$table.handsontable({
            });
            
        },
        
        update: function() {
            // PYTHON --> JS UPDATE.
            
            // Get the model's JSON string, and parse it.
            var data = $.parseJSON(this.model.get('value'));
            // Give it to the Handsontable widget.
            this.$table.handsontable({data: data});
            
            // Don't touch this...
            return HandsonTableView.__super__.update.apply(this);
        },
        
        // Tell Backbone to listen to the change event of input controls.
        events: {"change": "handle_table_change"},
        
        handle_table_change: function(event) {
            // JS --> PYTHON UPDATE.
            
            // Get the table instance.
            var ht = this.$table.handsontable('getInstance');
            // Get the data, and serialize it in JSON.
            var json = JSON.stringify(ht.getData());
            // Update the model with the JSON string.
            this.model.set('value', json);
            
            // Don't touch this...
            this.touch();
        },
    });
    
    // Register the HandsonTableView with the widget manager.
    WidgetManager.register_widget_view('HandsonTableView', HandsonTableView);
});
  • Now, we have a synchronized table widget that we can already use. But we'd like to integrate it with Pandas. To do this, we create a light wrapper around a DataFrame instance. We create two callback functions for synchronizing the Pandas object with the IPython widget. Changes in the GUI will automatically trigger a change in the DataFrame, but the converse is not true. You'll need to re-display the widget if you change the DataFrame in Python.
In [4]:
import StringIO
import numpy as np
import pandas as pd

class HandsonDataFrame(object):
    def __init__(self, df):
        self._df = df
        self._widget = HandsonTableWidget()
        self._widget.on_trait_change(self._on_data_changed, 'value')
        self._widget.on_displayed(self._on_displayed)
        
    def _on_displayed(self, e):
        # DataFrame ==> Widget (upon initialization only)
        json = self._df.to_json(orient='values')
        self._widget.value = json
        
    def _on_data_changed(self, e, val):
        # Widget ==> DataFrame (called every time the user
        # changes a value in the graphical widget)
        buf = StringIO.StringIO(val)
        self._df = pd.read_json(buf, orient='values')
        
    def to_dataframe(self):
        return self._df
        
    def show(self):
        display(self._widget)
  • Now, let's test all that! We first create a random DataFrame.
In [5]:
data = np.random.randint(size=(3, 5), low=100, high=900)
df = pd.DataFrame(data)
df
Out[5]:
0 1 2 3 4
0 661 435 308 244 778
1 330 463 687 149 417
2 317 154 155 504 549
  • We wrap it in a HandsonDataFrame and show it.
In [6]:
ht = HandsonDataFrame(df)
ht.show()

We can now change the values interactively, and they will be changed in Python automaticall.

In [7]:
ht.to_dataframe()
Out[7]:
0 1 2 3 4
0 1 2 3 244 778
1 330 463 687 149 417
2 317 154 155 4 549

There are many ways this proof-of-concept could be improved.

  • Synchronize only deltas instead of synchronizing the whole array every time (i.e. the method here would be slow on large tables).
  • Also, avoid recreating a new DataFrame at very change, but update the same DataFrame instance in-place.
  • Support for named columns.
  • Hide the wrapper, i.e. make it so that the default rich representation of the DataFrame in the notebook is the HandsonDataFrame.
  • Implement everything in an easy-to-use extension.
  • etc.