# Structural Similarity¶

During this seminar we will:

1. Consider some node similarity measures, particularly: Euclidean Distance, Correlation Coefficient and Cosine Distance
2. Take a look at Cuthill-McKee node reordering procedure
3. Calculate Assortative mixing coefficient for some Game Of Thrones network
In [1]:
import numpy as np
import pandas as pd
import scipy.spatial as spt
import matplotlib.pyplot as plt
plt.xkcd()
import networkx as nx
%matplotlib inline


### Task 1 - Similarities Calculation¶

1. Calculate Euclidean Distance, Correlation Coefficient and Cosine Distance for some toy-network (Zachary again?) and for Les Miserables dataset
2. Visualize them

HINT: For correlation coeficient you can use np.corrcoef(), for the distances you may implement them on your own or use scipy.spatial.distance.pdist()

In [13]:
# Put your code here..
#
#
G = nx.karate_club_graph()
A = nx.to_numpy_matrix(G, dtype=int)
A = np.asarray(A)

def plotDist(A):

f, ax = plt.subplots(2, 2, figsize=(10,10))
ax[0, 0].imshow(A, cmap = 'Greys', interpolation = 'None')

D = np.corrcoef(A)
ax[1, 0].imshow(D, cmap = 'Greys', interpolation = 'None')
ax[1, 0].set_title('Correlation coeff.')

dVec = spt.distance.pdist(A, metric = 'euclidean')
D = spt.distance.squareform(dVec)
ax[0, 1].imshow(D, cmap = 'Greys', interpolation = 'None')
ax[0, 1].set_title('Euclidean Dist.')

dVec = spt.distance.pdist(A, metric = 'cosine')
D = spt.distance.squareform(dVec)
ax[1, 1].imshow(D, cmap = 'Greys', interpolation = 'None')
ax[1, 1].set_title('Cosine Dist.')

plotDist(A)


In [14]:
G = nx.read_gml('lesmis.gml')
A = nx.to_numpy_matrix(G, dtype=int)
A = np.asarray(A)

plotDist(A)


### Task 2 - Node Reordering¶

Without special preprocess procidures graph adjacency matrix can look very noisy and hide network's structure (just look at the matrices above). Offcourse usually you don't know the structure itself (communities, groups of closelly connected nodes, etc.) unless it is given, however there are some procedures of node reordering that provides a better view of the network's adjacency matrix.

Reverse Cuthill-McKee finds permutation of the nodes that minimizes the bandwidth of the matrix, which is calculated as: $$\theta = \max_{a_{ij} > 0}|i-j|$$ Unformally, this algorithm puts some mass on the diagonal of adjacency matrix.

Run this reordering with nx.utils.reverse_cuthill_mckee_ordering(G) and compare with the results above

In [19]:
# Put your code here
#
#
G = nx.karate_club_graph()

# run procedure
cm = nx.utils.reverse_cuthill_mckee_ordering(G)
# get permutation
l = [n for n in cm]
l

A = nx.to_numpy_matrix(G)
A = np.asarray(A)
# apply reordering
A = A[np.ix_(l,l)]

plotDist(A)


### Task 3 - Assortative Mixing¶

The first dataset provides information on characters of the Game Of Thrones universe. The second one -- describes some events that have occured with them during the story. We are interested in killing events since they can be considered as binary relations and consequently -- graphs. The attribute wrt which we are going to compute assortative mixing is called "Team".

We will explore datasets with pandas module. The list of usefull functions:

• dropna
• set_index('characterID')['Team'].to_dict()
• events[events['event'] == 'killed']
In [20]:
# Put your code here
#
#

In [36]:
characters.head()

Out[36]:
characterID Name Team isDead isCaptured isHurt title prefix
0 2 Addam Marbrand Lannister 0 0 0 knight Ser
1 1894 Adrack Humble Greyjoy 0 0 0 NaN NaN
2 7 Aegon Frey (Jinglebell) Frey (North) 0 0 0 NaN NaN
3 8 Aegon I Targaryen Targaryen 1 0 0 King of the Seven Kingdoms King
4 12 Aegon Targaryen Targaryen 1 0 0 prince Prince

5 rows × 8 columns

In [29]:
kill_events = events[events['event'] == 'killed']
kill_events = pd.DataFrame(kill_events, index = None, columns=['characterID', 'event', 'withID'])
kill_events = kill_events.dropna()

In [31]:
kill_events.head()

Out[31]:
characterID event withID
7 1808 killed 2068
9 1825 killed 1808
25 557 killed 456
289 1186 killed 1528
518 755 killed 629

5 rows × 3 columns

In [55]:
G = nx.DiGraph()

for row, data in kill_events.iterrows():
killer = data[2]
killed = data[0]

In [56]:
ch_team = characters.set_index('characterID')['Team'].to_dict()
for k in ch_team.keys():
if k not in G.nodes():
del ch_team[k]

In [57]:
nx.set_node_attributes(G, 'Team', ch_team)

In [63]:
nx.assortativity.attribute_assortativity_coefficient(G, 'Team')
as_dict = nx.assortativity.attribute_mixing_dict(G, 'Team')
as_dict

Out[63]:
{'Bolton (Lannister)': {},
'Bolton (North)': {'Bolton (Lannister)': 1,
'Greyjoy': 1,
'Stark': 3,
'none': 1},
'Brave Companions': {'Lannister': 2, 'none': 1},
'Drogo': {'Drogo': 1, 'Targaryen': 1},
'Essos': {'Targaryen': 2},
'Faceless Men': {'Citadel': 1, 'Lannister': 2},
'Frey (Lannister)': {},
'Frey (North)': {'Stark': 3},
'Greyjoy': {'Greyjoy': 4,
'Stark': 4,
'Tyrell': 1,
'Wildlings (north of wall)': 1},
'Lannister': {'Frey (Lannister)': 2,
'Lannister': 1,
'Robert': 2,
'Stark': 4,
'none': 1},
'Littlefinger': {'Littlefinger': 1, 'Robert': 1},
'Martell': {'Robert': 1},
'Night Watch': {'Night Watch': 4,
'Tyrell': 1,
'Wildlings (north of wall)': 5},
'Red God': {'Renly': 2, 'Stannis': 4},
'Renly': {'Brave Companions': 3, 'Stark': 1, 'none': 1},
'Robert': {'Brave Companions': 2,
'Essos': 1,
'Martell': 1,
'Robert': 5,
'Second Sons': 1,
'Stark': 4},
'Second Sons': {},
'Stannis': {},
'Stark': {'Bolton (Lannister)': 1,
'Brave Companions': 1,
'Drogo': 1,
'Frey (Lannister)': 2,
'Frey (North)': 6,
'Lannister': 4,
'Night Watch': 4,
'Robert': 1,
'Stark': 1},
'Stormcrows': {'Stormcrows': 2},
'Targaryen': {'Drogo': 1, 'Essos': 2},
'Tyrell': {'Lannister': 1, 'Renly': 3},
'Tyrion': {'Lannister': 1, 'Tyrion': 1},
'Wildling refugees': {},
'Wildlings (north of wall)': {'Bolton (Lannister)': 1,
'Night Watch': 5,
'Stark': 1},
'none': {'Lannister': 3, 'Night Watch': 2, 'Stark': 2, 'none': 1},
'the Others': {'Night Watch': 5, 'Tyrell': 1, 'Wildling refugees': 1}}