This week we'll talk about advanced network measures (beyond the degree distribution), and communities.
Reading: This week, the reading is mostly for reference. It's for you to have a place to go, if you want more detailed information about the topics that I cover in the video lectures. Thus, I recommend you check out Chapter 9 of the network science book. In particular, we'll delve into section 9.4 in the exercises below. We will also talk a little bit about degree correlations - you can read about those in Chapter 7.
We will use the philosopher age more below.
from IPython.display import YouTubeVideo
YouTubeVideo("IOWXZFOyk9Y",width=800, height=450)
We begin with a preliminary analysis of the network
Exercise: Begin to analyze the philosopher network.
- Why do you think I want you guys to use a directed graph? Could have we used an undirected graph instead?
- What is the total number of nodes in the network? What is the total number of links? What's the average degree? What's the median degree?
- List the top 10 philosophers by in-degree and out-degree. What is the interpretation of in-degree and out-degree in this case? Have you heard about these philosophers before?
- Plot the distribution of in-degrees and out-degrees, binned using
numpy.histogram
. What is the most common degree?- Plot the distribution of in-degrees and out-degrees one more time, this time plotted loglog scale.
- Describe the distributions (as you would in a well written figure caption). Do they look Poissonian? Power-law? Something in between?
- Plot a scatter plot of the the in- versus out- degree for all philosophers using a loglog scale. Comment on the relation between the two.
Exercise: Going in depth with the structure of the the philosopher network
Above, we found the most connected philosophers (using degree centrality). Now let's dig in and try to understand more about the philosopher network using more advanced network features. If your network has more than one component, just work on the giant connected component (GCC) in the exercises below (in a directed graph use the weakly connected component).
- Not all of the measures we'll be considering below are defined for directed graphs, thus begin by creating an undirected version of the philosopher graph, that we can use whenever needed. Only use the undirected graph when explicitly stated in the exercise.
- Find the 5 most central philosophers according to betweenness centrality. What role do you imagine philosophers with high wikipedia graph betweenness centrality play in the history of philosophy?
- Find the 5 most central philosophers according to eigenvector centrality. Calculate centrality corresponding to both in- and out-edges (see NetworkX documentation for details). How is eigenvector centrality difference from degree centrality? Compare your results for eigenvector centrality to the results for betweenness centrality - does the difference make sense when you read the philosopher's wikipedia pages?
- Is the undirected version of the graph assortative with respect do degree? (e.g. do high-degree philosophers tend to link to other high-degree philosophers, and low-degree philosophers to other low-degree philosophers?). Provide an interpretation of your answer?
Exercises: Age and structure in the philosopher network
Age of the philosophers plays a large role. Socrates/Plato and Aristotle are massive influences on philosophy in millennia subsequent to their birth. Now, let us explore how the birth-year of the philosophers plays a role in shaping the network. We've created a file (
.json
format) which provides the birth year for most of the philosophers. Get it here. In the following, you may use that to get the birth-year for the philosopher. Note: It's possible that the list of names in the list of birth-years is not identical to the set of nodes in your network; thus, for the exercises including age, simply work on the subgraph of nodes for which you have age-info.
- Create a histogram of the number of philosophers born in every century, starting 500 BC. Describe the plot. Does philosophy seem to have developed at a steady pace - or in a more bursty manner? Are we living in a golden age of philosophy right now? (Use the data & common sense to present an argument for your answer).
- Is the undirected version of the graph assortative with respect to age? Once again, please provide a discussion of your findings.
- Note 1: The NetworkX assortativity coefficient works on categorical data, thus I recommend you represent each philosopher's age by their century of birth rather than birth year.
- Note 2: Alternatively you can write your own algorithm to calculate age-assortativity, use the strategy behind degree correlations - see this paper's Equation (21).
- A reasonable hypothesis is that old philosophers are more famous. Create a scatter-plot of age versus in-degree. Do you see a correlation between the two? Justify your answer (for example by calculating the correlation coefficient.)
Before we begin coding, let's learn about communities. If you want to learn more details, I recommend you take a look at Chapter 9 in the Network Science book ... but the lecture should be enough to get you started. For this and the next exercise, work on the undirected version of the network.
Video Lecture: Communities in networks.
You can watch the 2015 video here.
YouTubeVideo("FSRoqXw28RI",width=800, height=450)
Exercises: We will use the concept of modularity to explore how community-like the six branches of philosophy are.
- Explain the concept of modularity in your own words.
- Now we will calculate the modularity of the branches reported by the Wikipedia editors. But we need to do a bit of work to make this happen.
- Firstly, modularity does not work when the communities are overlapping. Thus, we need to do something about the philosophers that are part of multiple branches. We will handle it by creating a set of six new branches, where we take all of the philosophers that belong to more than one branch and assign them to the branch that they have the most connections to. The procedure is the following:
- Start with creating six new branches (e.g. represented as
set
s in Python) containing all of the philosopher that can be unambiguously assigned to a branch.- Then, take the list of all nodes that are part of more than one branch. For each member i of this list, find those of its neighbours that you just unambigously assigned to each branch.
- Add i to to the branch that it has most such connections to.
- Now that we have a new set of non-overlapping branches, we can calculate the modularity, described in the Network Science book, section 9.4). Use equation 9.12 in the book to calculate the modularity M of the branches-partitioning.
- Comment on the value of M. Are the branches good communities? (We will explore this question in depth below.)
Exercises: As a final exercise, we will now run community detection on the full philosopher network.
- Use the Python Louvain-algorithm implementation to find communities in the full philosopher network. Report the value of modularity found by the algorithm. Is it higher or lower than what you found above for the branches as communities? What does this comparison reveal about the branches?
> * \[**Note**: This implementation is now available as Anaconda package. Install with `conda` as expained [here](https://anaconda.org/auto/python-louvain)\].
> * You can also try the *Infomap* algorithm instead if you're curious. Go to [this page]. (http://www.mapequation.org/code.html) and search for 'python'. It's harder to install, but a better community detection algorithm.
- Compare the communities found by your algorithm with the branches of philosophy by creating a matrix D with dimension (B times C), where B is the number of branches and C is the number of communities. We set entry D(i,j) to be the number of nodes that branch i has in common with community j. The matrix D is what we call a confusion matrix. Use the confusion matrix to explain how well the communities you've detected correspond to the labeled branches of philosophy.
This part is optional! But I recommend that you work through it anyway - it is probably the exercise that will teach you the most about philosophy.
Exercises: Understanding the branches of philosophy. You already know a lot about network analysis, so in this assignment I'm providing fewer details on what I want you to do: Figuring out what things to calculate in order to provide an awesome quantitative answer to my questions is part of the assignment. That way you can show that you're able to use the skills you've acquired up till now.
- Perform an centrality analysis of each of the six branches of philosophy knowing what you've just learned about centrality in complex networks. Use your analysis to reveal the identity of the most important figures in each branch of philosophy.
- Each of the six branches has its own unique history. Use histograms (and/or whatever else you can think of) to reveal the temporal differences and similarities between the six branches of philosophy.