By @carnby.
This notebook showcases the basic matta visualizations, as well as their usage.
Note that the init_javascript
call is not needed when running on local server having added the javascript code to your IPython profile.
import pandas as pd
import networkx as nx
import matta
import json
import requests
from networkx.readwrite import json_graph
# we do this to load the required libraries when viewing on NBViewer
matta.init_javascript(path='https://rawgit.com/carnby/matta/master/matta/libs')
Wordclouds are implemented using the d3.layout.cloud layout by Jason Davies. They work with bags of words. The python Counter
class is perfect for this purposes.
hamlet = requests.get('http://www.gutenberg.org/cache/epub/2265/pg2265.txt').text
hamlet[0:100]
import re
from collections import Counter
words = re.split(r'[\W]+', hamlet.lower())
counts = Counter(words)
df = pd.DataFrame.from_records(counts.iteritems(), columns=['word', 'frequency'])
df.sort_values(['frequency'], ascending=False, inplace=True)
df.head()
matta.wordcloud(dataframe=df.head(500), text='word', font_size='frequency',
typeface='Helvetica', font_weight='bold',
font_color={'value': 'frequency', 'palette': 'cubehelix', 'scale': 'threshold'})
Treemaps use the Treemap Layout from d3.js. They work with trees, which we construct through networkx.DiGraph
.
flare_data = requests.get('https://gist.githubusercontent.com/mbostock/4063582/raw/a05a94858375bd0ae023f6950a2b13fac5127637/flare.json').json()
flare_data['name']
tree = nx.DiGraph()
def add_node(node):
node_id = tree.number_of_nodes() + 1
n = tree.add_node(node_id, name=node['name'])
if 'size' in node:
tree.node[node_id]['size'] = node['size']
if 'children' in node:
for child in node['children']:
child_id = add_node(child)
tree.add_edge(node_id, child_id)
return node_id
root = add_node(flare_data)
# treemap requires this attribute
tree.graph['root'] = root
nx.is_arborescence(tree)
import seaborn as sns
matta.treemap(tree=tree, node_value='size', node_label='name',
node_color={'value': 'parent.name', 'scale': 'ordinal', 'palette': sns.husl_palette(15, l=.4, s=.9)})
Sankey or flow diagrams use the Sankey plugin by Mike Bostock. They work with digraphs, just like treemaps. Note that graphs with loops are not supported.
sankey_data = requests.get('http://bost.ocks.org/mike/sankey/energy.json')
sankey_graph = json_graph.node_link_graph(json.loads(sankey_data.text), directed=True)
sankey_graph.nodes_iter(data=True).next(), sankey_graph.edges_iter(data=True).next()
matta.flow(graph=sankey_graph, node_label='name', link_weight='value', node_color='indigo',
node_width=12, node_padding=13,
link_color={'value': 'value', 'palette': 'Greys', 'scale': 'threshold'}, link_opacity=0.8)
Parallel Coordinates are based on the code by Jason Davies. They work with pandas.DataFrame
.
df = pd.read_csv('http://bl.ocks.org/jasondavies/raw/1341281/cars.csv', index_col='name')
df.head()
matta.parcoords(dataframe=df)
df = pd.read_csv('https://www.jasondavies.com/parallel-sets/titanic.csv')
df.head()
matta.parsets(dataframe=df, columns=['Survived', 'Sex', 'Age', 'Class'])
Graphs from networkx.DiGraph
are visualized using the Force Layout in d3.js.
graph = nx.davis_southern_women_graph()
for node in graph.nodes_iter(data=True):
graph.node[node[0]]['color'] = 'purple' if node[1]['bipartite'] else 'green'
graph.node[node[0]]['size'] = graph.degree(node[0])
matta.force(graph=graph, link_distance=100, height=600,
node_ratio='size',
node_color={'value': 'bipartite', 'scale': 'ordinal', 'palette': 'Set2'})