#!/usr/bin/env python # coding: utf-8 # > This is one of the 100 recipes of the [IPython Cookbook](http://ipython-books.github.io/), the definitive guide to high-performance scientific computing and data science in Python. # # # 14.3. Resolving dependencies in a Directed Acyclic Graph with a topological sort # You need the `python-apt` package in order to build the package dependency graph. (https://pypi.python.org/pypi/python-apt/) # # We also assume that this notebook is executed on a Debian system (like Ubuntu). If you don't have such a system, you can download the data *Debian* directly on the book's website. Extract it in the current directory, and start this notebook directly at step 7. (http://ipython-books.github.io) # 1. We import the `apt` module and we build the list of packages. # In[ ]: import json import apt cache = apt.Cache() # 2. The `graph` dictionary will contain the adjacency list of a small portion of the dependency graph. # In[ ]: graph = {} # 3. We define a function that returns the list of dependencies of a package. # In[ ]: def get_dependencies(package): if package not in cache: return [] pack = cache[package] ver = pack.candidate or pack.versions[0] # We flatten the list of dependencies, # and we remove the duplicates. return sorted(set([item.name for sublist in ver.dependencies for item in sublist])) # 4. We now define a *recursive* function that builds the dependency graph for a particular package. This function updates the `graph` variable. # In[ ]: def get_dep_recursive(package): if package not in cache: return [] if package not in graph: dep = get_dependencies(package) graph[package] = dep for dep in graph[package]: if dep not in graph: graph[dep] = get_dep_recursive(dep) return graph[package] # 5. Let's build the dependency graph for IPython. # In[ ]: get_dep_recursive('ipython'); # 6. Finally, let's save the adjacency list in JSON. # In[ ]: with open('data/apt.json', 'w') as f: json.dump(graph, f, indent=1) # 7. We import a few packages. # In[ ]: import json import numpy as np import networkx as nx import matplotlib.pyplot as plt get_ipython().run_line_magic('matplotlib', 'inline') # 8. Let's load the adjacency list from the JSON file. # In[ ]: with open('data/apt.json', 'r') as f: graph = json.load(f) # 9. Now, we create a directed graph (`DiGraph` in NetworkX) from our adjacency list. We reverse the graph to get a more natural ordering. # In[ ]: g = nx.DiGraph(graph).reverse() # 10. A topological sort only exists when the graph is a **directed acyclic graph** (DAG). This means that there is no cycle in the graph, i.e. no circular dependency here. Is our graph a DAG? # In[ ]: nx.is_directed_acyclic_graph(g) # 11. What are the packages responsible for the cycles? We can find it out with the `simple_cycles` function. # In[ ]: set([cycle[0] for cycle in nx.simple_cycles(g)]) # 12. Here, we can try to remove these packages. In an actual package manager, these cycles need to be carefully taken into account. # In[ ]: g.remove_nodes_from(_) # In[ ]: nx.is_directed_acyclic_graph(g) # 13. The graph is now a DAG. Let's display it first. # In[ ]: ug = g.to_undirected() deg = ug.degree() # In[ ]: plt.figure(figsize=(4,4)); # The size of the nodes depends on the number of dependencies. nx.draw(ug, font_size=6, node_size=[20*deg[k] for k in ug.nodes()]); # 14. Finally, we can perform the topological sort, thereby obtaining a linear installation order satisfying all dependencies. # In[ ]: nx.topological_sort(g) # > You'll find all the explanations, figures, references, and much more in the book (to be released later this summer). # # > [IPython Cookbook](http://ipython-books.github.io/), by [Cyrille Rossant](http://cyrille.rossant.net), Packt Publishing, 2014 (500 pages).