Title: Random Graphs Author: Thomas Breuel Institution: UniKL
from collections import Counter
import os,string,re
import networkx as nx
--------------------------------------------------------------------------- ImportError Traceback (most recent call last) <ipython-input-1-0cfdb82f8551> in <module>() 2 from collections import Counter 3 import os,string,re ----> 4 import networkx as nx ImportError: No module named networkx
Real-world graphs are often generated in some random fashion:
That is, many graphs are created from underlying random processes.
(Random Binomial Graph)
Simplest form of random graph.
Given a set of $n$ vertices, pick each edge with probability $p$.
Also the Erdös-Renyi random graph.
figsize(12,4)
for i,p in enumerate([0.1,0.3,0.5]):
subplot(1,3,i+1)
nx.draw(nx.erdos_renyi_graph(10,p,33))
--------------------------------------------------------------------------- NameError Traceback (most recent call last) <ipython-input-2-2cd3ba3d21f5> in <module>() 2 for i,p in enumerate([0.1,0.3,0.5]): 3 subplot(1,3,i+1) ----> 4 nx.draw(nx.erdos_renyi_graph(10,p,33)) NameError: name 'nx' is not defined
(global properties of random graphs)
Random graphs have interesting global properties and transitions.
Question: what is the probability that a random Erdös-Renyi graph is connected?
Let's try.
ps = []; vs = []
for p in linspace(0.0,0.4,9):
ps.append(p); ns = []
for trial in range(1000):
G = nx.erdos_renyi_graph(20,p,trial)
n = nx.number_connected_components(G)
ns.append(n)
vs.append(array(ns))
xticks(range(1,9),ps); _=boxplot(vs)
ps = []; ms = []
for p in linspace(0.0,0.4,9):
connected = 0
for trial in range(200):
G = nx.erdos_renyi_graph(20,p,trial)
n = nx.number_connected_components(G)
connected += (n<=1)
ps.append(p); ms.append(connected/200.0)
plot(ps,ms)
ps = []; ms = []
for p in linspace(0.0,0.1,21):
connected = 0
for trial in range(100):
G = nx.erdos_renyi_graph(200,p,trial)
n = nx.number_connected_components(G)
connected += (n<=1)
ps.append(p)
ms.append(connected/100.0)
plot(ps,ms)
--------------------------------------------------------------------------- NameError Traceback (most recent call last) <ipython-input-3-07a8884b4e40> in <module>() 3 connected = 0 4 for trial in range(100): ----> 5 G = nx.erdos_renyi_graph(200,p,trial) 6 n = nx.number_connected_components(G) 7 connected += (n<=1) NameError: name 'nx' is not defined
(thresholds in random graphs)
Many interesting random graphs properties have thresholds and "phase transitions".
One the probability of connectivity crosses some threshold, a global property (like connectedness) becomes true with very high probability.
(model-based random graphs)
The Erdös-Renyi model seems kind of ad hoc (although it actually describes some forms of physical situations).
We can and often should derive a random graph model from an actual functional model.
This is similar to the kinds of simulations we discussed before.
(randomly citing scientists)
Generative model:
This is a kind of branching process.
TODO: Implement this (as an exercise).
(Barabasi-Albert Model)
Important property: generates scale free models (see later).
Uses preferential attachment: the more connected a node is, the more likely it is to receive new connections.
Algorithm:
figsize(8,8)
G = nx.barabasi_albert_graph(50,3)
nx.draw_circular(G)
--------------------------------------------------------------------------- NameError Traceback (most recent call last) <ipython-input-4-428847a5317f> in <module>() 1 figsize(8,8) ----> 2 G = nx.barabasi_albert_graph(50,3) 3 nx.draw_circular(G) NameError: name 'nx' is not defined
# node degree distribution for Barabasi Albert graph
G = nx.barabasi_albert_graph(100000,3)
counts = Counter([len(G[i]) for i in G])
ll = array([(log(n*1.0),log(c*1.0)) for n,c in counts.items()])
plot(ll[:,0],ll[:,1],'+')
--------------------------------------------------------------------------- NameError Traceback (most recent call last) <ipython-input-5-81cddf853416> in <module>() 1 # node degree distribution for Barabasi Albert graph ----> 2 G = nx.barabasi_albert_graph(100000,3) 3 counts = Counter([len(G[i]) for i in G]) 4 ll = array([(log(n*1.0),log(c*1.0)) for n,c in counts.items()]) 5 plot(ll[:,0],ll[:,1],'+') NameError: name 'nx' is not defined
(power law)
If we plot the frequency of the number of nodes of degree $k$, we get a power law:
$$ P(k) \propto k^{-3} $$This is a power law.
Graphs obeying such laws are called scale free (in some sense, they look "the same" at all scales).
(scale free networks)
In general, scale-free networks have the form:
$$ P(k) \propto k^{-\gamma}, \gamma \in [2,3] $$Many real-world networks have been claimed to be scale free (web, citation networks, social networks, biochemical networks), but it's not clear that this is true (it's statistically difficult to test).
First observed with citations: the number of citations to papers is a Pareto distribution.
# node degree distribution for Erdos Renyi graphs
from collections import Counter
G = nx.erdos_renyi_graph(1000,0.3)
counts = Counter([len(G[i]) for i in G])
ll = array([(log(n*1.0),log(c*1.0)) for n,c in counts.items()])
plot(ll[:,0],ll[:,1],'+')
--------------------------------------------------------------------------- NameError Traceback (most recent call last) <ipython-input-6-4c702cddef82> in <module>() 1 # node degree distribution for Erdos Renyi graphs 2 from collections import Counter ----> 3 G = nx.erdos_renyi_graph(1000,0.3) 4 counts = Counter([len(G[i]) for i in G]) 5 ll = array([(log(n*1.0),log(c*1.0)) for n,c in counts.items()]) NameError: name 'nx' is not defined
# average shortest path for Barabasi Albert graphs
result = []
for N in linspace(5,405,10):
ds = []
for trial in range(10):
G = nx.barabasi_albert_graph(N,3)
d = nx.average_shortest_path_length(G)
ds.append(d)
result.append((N,mean(ds)))
result = array(result); plot(log(result[:,0]),result[:,1])
--------------------------------------------------------------------------- NameError Traceback (most recent call last) <ipython-input-7-1e8495cebefe> in <module>() 4 ds = [] 5 for trial in range(10): ----> 6 G = nx.barabasi_albert_graph(N,3) 7 d = nx.average_shortest_path_length(G) 8 ds.append(d) NameError: name 'nx' is not defined
(small world network)
The Barabasi-Albert network is sort-of an example of a kind of small-world network:
The average distance between nodes is a logarithmic function of the size of the network.
True small world networks also have a high clustering value.
(Watts-Strogatz Graph)
A process that generates a small world graph:
(Different variants: add a new edge, replace only when the graph remains connected, ...)
# average shortest path for Watts Strogatz
result = []
for N in range(20,405,20):
ds = []
for trial in range(20):
G = nx.connected_watts_strogatz_graph(N,5,0.1)
d = nx.average_shortest_path_length(G)
ds.append(d)
result.append((N,mean(ds)))
result = array(result); plot(log(result[:,0]),result[:,1])
--------------------------------------------------------------------------- NameError Traceback (most recent call last) <ipython-input-8-37100b88134d> in <module>() 4 ds = [] 5 for trial in range(20): ----> 6 G = nx.connected_watts_strogatz_graph(N,5,0.1) 7 d = nx.average_shortest_path_length(G) 8 ds.append(d) NameError: name 'nx' is not defined
# average shortest path for Erdos Renyi
result = []
for N in range(20,405,20):
ds = []
for trial in range(20):
G = nx.erdos_renyi_graph(N,0.2)
if not nx.is_connected(G): continue
d = nx.average_shortest_path_length(G)
ds.append(d)
result.append((N,mean(ds)))
result = array(result); plot(log(result[:,0]),result[:,1])
--------------------------------------------------------------------------- NameError Traceback (most recent call last) <ipython-input-9-e2ac3057a004> in <module>() 4 ds = [] 5 for trial in range(20): ----> 6 G = nx.erdos_renyi_graph(N,0.2) 7 if not nx.is_connected(G): continue 8 d = nx.average_shortest_path_length(G) NameError: name 'nx' is not defined
(polynomial growth of number of neighbors)
There is another "power law" that ocurs frequently in graphs: the growth in the number of neighbors.
Graphs of this type are often generated by $k$-nearest neighbor relations in a $d$-dimensional space. For these, the number of neighbors grows as $r^d$ both for the vector distance and for the geodesic distance.