Chapter 2, example 3¶

Here we illustrate various interactive computing features of IPython by loading the social data set with the NetworkX module.

In [1]:

import networkx as nx

Tab completion is highly useful in an interactive session to explore an object's attributes and methods. Here, we look for ways to open our files from the Facebook data.

In [2]:

nx.read

In [3]:

cd fbdata

(bookmark:fbdata) -> chapter2\data\facebook
chapter2\data\facebook

The read_edgelist method looks adapted here as the .edges files contain list of person identifiers. This method returns a graph.

In [4]:

g = nx.read_edgelist('0.edges')

Let's display the number of nodes and edges in the graph.

In [5]:

len(g.nodes()), len(g.edges())

Out[5]:

(333, 2519)

Let's try to compute the radius of the graph.

In [6]:

nx.radius(g)

---------------------------------------------------------------------------
NetworkXError                             Traceback (most recent call last)
<ipython-input-6-73e1ad16c3de> in <module>()
----> 1 nx.radius(g)

C:\Python27\lib\site-packages\networkx\algorithms\distance_measures.pyc in radius(G, e)
    140     """
    141     if e is None:
--> 142         e=eccentricity(G)
    143     return min(e.values())
    144 

C:\Python27\lib\site-packages\networkx\algorithms\distance_measures.pyc in eccentricity(G, v, sp)
     60         if L != order:
     61             msg = "Graph not connected: infinite path length"
---> 62             raise networkx.NetworkXError(msg)
     63 
     64         e[v]=max(length.values())

NetworkXError: Graph not connected: infinite path length

The error comes from the fact that the graph is not connected, so that the radius is infinite. We can try to obtain a connected component instead. Tab completion can help us finding the right function for that.

In [7]:

nx.connected

In [8]:

sg = nx.connected_component_subgraphs(g)

In [9]:

[len(s) for s in sg]

Out[9]:

[324, 3, 2, 2, 2]

We take the largest connected component.

In [10]:

sg = sg[0]

Now we can compute the radius and diameter of the graph.

In [11]:

nx.radius(sg), nx.diameter(sg)

Out[11]:

(6, 11)

Appendind ? to any object in IPython gives information about it.

In [12]:

nx.eccentricity?

The %pdef, %pdoc and %psource magic commands give different pieces of information about objects: the definition, the docstring, and the source code.

In [13]:

%pdef nx.eccentricity

 nx.eccentricity(G, v=None, sp=None)

In [14]:

%pdoc nx.eccentricity

In [15]:

%psource nx.eccentricity

We can use the %timeit magic command to evaluate the time an instruction takes.

In [16]:

%timeit nx.center(sg)

1 loops, best of 3: 723 ms per loop

In [17]:

nx.center(sg)

Out[17]:

[u'51', u'190', u'83', u'307', u'175', u'237', u'277', u'124']

Now we write our own, unoptimized function that computes a graph's center. Here is the code contained in center.py:

import networkx as nx
g = nx.read_edgelist('0.edges')
sg = nx.connected_component_subgraphs(g)[0]
center = [node for node in sg.nodes() if nx.eccentricity(sg, node) == nx.radius(sg)]
print(center)

We can benchmark and profile it to find hotspots that should be optimized.

In [18]:

cd ../..

chapter2

In [19]:

run -t center.py

[u'51', u'190', u'83', u'307', u'175', u'237', u'277', u'124']

IPython CPU timings (estimated):
  User   :     267.02 s.
  System :       0.00 s.
Wall time:     267.02 s.

In [21]:

run -p center.py

[u'51', u'190', u'83', u'307', u'175', u'237', u'277', u'124']

We repeatedly call the exact same functions which explain why this function is so slow. Let's optimize it by caching the outputs of these functions. Here is the code contained in the file center2.py:

import networkx as nx
g = nx.read_edgelist('0.edges')
sg = nx.connected_component_subgraphs(g)[0]
# we compute the eccentricity once, for all nodes
ecc = nx.eccentricity(sg)
# we compute the radius once
r = nx.radius(sg)
center = [node for node in sg.nodes() if ecc[node] == r]
print(center)

In [22]:

run -t center2.py

[u'51', u'190', u'83', u'307', u'175', u'237', u'277', u'124']

IPython CPU timings (estimated):
  User   :       1.77 s.
  System :       0.00 s.
Wall time:       1.77 s.

NetworkX allows to plot graphs with the help of Matplotlib. We use several options to make the graph nicer. You can find the full documentation here.

In [23]:

nx.draw_networkx(sg, node_size=15, edge_color='y', with_labels=False, alpha=.4, linewidths=0)