Here we illustrate various interactive computing features of IPython by loading the social data set with the NetworkX
module.
import networkx as nx
Tab completion is highly useful in an interactive session to explore an object's attributes and methods. Here, we look for ways to open our files from the Facebook data.
nx.read
cd fbdata
(bookmark:fbdata) -> chapter2\data\facebook chapter2\data\facebook
The read_edgelist
method looks adapted here as the .edges
files contain list of person identifiers. This method returns a graph.
g = nx.read_edgelist('0.edges')
Let's display the number of nodes and edges in the graph.
len(g.nodes()), len(g.edges())
(333, 2519)
Let's try to compute the radius of the graph.
nx.radius(g)
--------------------------------------------------------------------------- NetworkXError Traceback (most recent call last) <ipython-input-6-73e1ad16c3de> in <module>() ----> 1 nx.radius(g) C:\Python27\lib\site-packages\networkx\algorithms\distance_measures.pyc in radius(G, e) 140 """ 141 if e is None: --> 142 e=eccentricity(G) 143 return min(e.values()) 144 C:\Python27\lib\site-packages\networkx\algorithms\distance_measures.pyc in eccentricity(G, v, sp) 60 if L != order: 61 msg = "Graph not connected: infinite path length" ---> 62 raise networkx.NetworkXError(msg) 63 64 e[v]=max(length.values()) NetworkXError: Graph not connected: infinite path length
The error comes from the fact that the graph is not connected, so that the radius is infinite. We can try to obtain a connected component instead. Tab completion can help us finding the right function for that.
nx.connected
sg = nx.connected_component_subgraphs(g)
[len(s) for s in sg]
[324, 3, 2, 2, 2]
We take the largest connected component.
sg = sg[0]
Now we can compute the radius and diameter of the graph.
nx.radius(sg), nx.diameter(sg)
(6, 11)
Appendind ?
to any object in IPython gives information about it.
nx.eccentricity?
The %pdef
, %pdoc
and %psource
magic commands give different pieces of information about objects: the definition, the docstring, and the source code.
%pdef nx.eccentricity
nx.eccentricity(G, v=None, sp=None)
%pdoc nx.eccentricity
%psource nx.eccentricity
We can use the %timeit
magic command to evaluate the time an instruction takes.
%timeit nx.center(sg)
1 loops, best of 3: 723 ms per loop
nx.center(sg)
[u'51', u'190', u'83', u'307', u'175', u'237', u'277', u'124']
Now we write our own, unoptimized function that computes a graph's center. Here is the code contained in center.py
:
import networkx as nx
g = nx.read_edgelist('0.edges')
sg = nx.connected_component_subgraphs(g)[0]
center = [node for node in sg.nodes() if nx.eccentricity(sg, node) == nx.radius(sg)]
print(center)
We can benchmark and profile it to find hotspots that should be optimized.
cd ../..
chapter2
run -t center.py
[u'51', u'190', u'83', u'307', u'175', u'237', u'277', u'124'] IPython CPU timings (estimated): User : 267.02 s. System : 0.00 s. Wall time: 267.02 s.
run -p center.py
[u'51', u'190', u'83', u'307', u'175', u'237', u'277', u'124']
We repeatedly call the exact same functions which explain why this function is so slow. Let's optimize it by caching the outputs of these functions. Here is the code contained in the file center2.py
:
import networkx as nx
g = nx.read_edgelist('0.edges')
sg = nx.connected_component_subgraphs(g)[0]
# we compute the eccentricity once, for all nodes
ecc = nx.eccentricity(sg)
# we compute the radius once
r = nx.radius(sg)
center = [node for node in sg.nodes() if ecc[node] == r]
print(center)
run -t center2.py
[u'51', u'190', u'83', u'307', u'175', u'237', u'277', u'124'] IPython CPU timings (estimated): User : 1.77 s. System : 0.00 s. Wall time: 1.77 s.
NetworkX allows to plot graphs with the help of Matplotlib. We use several options to make the graph nicer. You can find the full documentation here.
nx.draw_networkx(sg, node_size=15, edge_color='y', with_labels=False, alpha=.4, linewidths=0)