Let's consider the *Traveling Salesperson Problem*:

Given a set of cities, and the distances between each pair of cities, find atourof the cities with the minimum total distance. Atourmeans you start at one city, visit every other city exactly once, and then return to the starting city.

This is a well-known *intractable*#Intractability) problem, meaning that there are no efficient solutions that work for a large number of cities. But we can create an inefficient algorithm that works fine for a small number of cites (about a dozen). We can also find a *nearly*-shortest tour over thousands of cities. Actually, the fact there is no efficient algorithm is liberating: it means that we can use a very simple, inefficient algorithm and not feel too bad about it. In this exercise we will look at some algorithms for solving the problem. The most important lesson is how to *think about solving a problem* like this; the details of this particular problem are less important.

Let's think first about the *vocabulary* of this problem: what are the important concepts, and are we sure we understand them well enough to implement the concepts (as data or functions)? We will start to consider possible implementations, in the Python programming language, but won't make definitive choices yet:

**City**: For the purpose of this exercise, a city is "atomic" in the sense that we don't have to know anything about the components or attributes of a city; we don't need to know its name, population, or anything else, except that we do have to know how far it is from other cities.**Cities**: We will need to represent a set of cities; Python's`set`

datatype might be appropriate for that.**Distance**: We will need the distance between two cities. If`A`

and`B`

are cities, this could be done with a function,`distance(A, B),`

or with a dict,`distance[A][B]`

or`distance[A, B]`

, or with an array if`A`

and`B`

are integer indexes. The resulting distance will be a real number (which Python calls a`float`

).**Tour**: A tour is an ordered list of cities; Python's`list`

or`tuple`

datatypes would work.**Total distance**: The total distance for a tour is the sum of the distances of adjacent cities in the tour. We will probably have a function,`total_distance(tour)`

.

In [12]:

```
import matplotlib
import matplotlib.pyplot as plt
import random
import time
import itertools
```

Here is our first algorithm, which finds the tour with shortest total distance with a straightforward approach:

Generate all the possible tours of the cities, and choose the shortest one (the tour with the minimum total distance).

We can implement this as the Python function `exact_TSP`

(TSP is the standard abbreviation for Traveling Salesperson Problem, and "exact" means that it finds the shortest tour, exactly, not just an approximation to the shortest tour). Here's the design philosophy we will use:

*Write Python code that closely mirrors the English description of the algorithm. This will probably require some auxilliary functions and data structures; just assume we will be able to define them as well, using the same design philosophy.*

In [13]:

```
def exact_TSP(cities):
"Generate all possible tours of the cities and choose the shortest one."
return shortest(alltours(cities))
def shortest(tours):
"Return the tour with the minimum total distance."
return min(tours, key=total_distance)
```

*Note 1*: We have not yet defined the function `total_distance`

, nor `alltours`

.

*Note 2*: In Python `min(`

*collection*`,key=`

*function*`)`

means to find the element *x* that is a member of *collection* such that *function(x)* is minimized. So `shortest`

finds the tour whose `total_distance`

in the minimal among the tours. So our Python code implements (and closely mimics) our English description of the algorithm. Now we need to define what a tour is, and how to measure total distance.

*Meta-note*: I would prefer for these notes to be asides; maybe implemented as mouse-overs, or as thought bubbles from little character icons off to the side. For now they are implemented as plain text and are inline; feel free to skim or skip over them.

A tour starts in one city, and then visits each of the other cities in order, before finally retirning to the start. A natural representation of the set of available cities is a Python `set`

, and a natural representation of a tour is a sequence that is a *permutation* of the set. Below we list the 6 possible tours/permutations of a set of three cities. The tuple `(1, 2, 3)`

, for example, represents a tour that starts in city 1, moves to 2, then 3, and then returns to 1 to finish the tour. (I debated using `(1, 2, 3, 1)`

as the representation of a tour that starts and ends at city 1, but decided that `(1, 2, 3)`

is better because it is not redundant.)

In [14]:

```
alltours = itertools.permutations # The permutation function is already defined in the itertools module
cities = {1, 2, 3}
list(alltours(cities))
```

Out[14]:

Now for the notion of *distance*. We define `total_distance(tour)`

as the sum of the distances between consecutive cities in the tour; that part is shown below and is easy (with one Python-specific trick: when `i`

is 0, then `distance(tour[0], tour[-1])`

gives us the wrap-around distance between the first and last cities, because `tour[-1]`

is the last element of `tour`

).

In [15]:

```
def total_distance(tour):
"The total distance between each pair of consecutive cities in the tour."
return sum(distance(tour[i], tour[i-1]) for i in range(len(tour)))
```

Before we can define `distance(A, B)`

, the distance between two cities, we have to make a choice. In the fully general version of the TSP problem, the distance between two cities could be anything: it could be the amount of time it takes to travel between cities, the number of dollars it costs, or anything else. So the distances could be represented by a matrix `distance[A][B]`

, where any entry in the matrix could be any numeric value.

But we will make the choice not to allow that full generality, and instead restrict ourselves to Euclidean distance, the straight-line distance between points in a two-dimensional plane. So each city will be represented by a two-dimensional point: a pair of *x* and *y* coordinates. We will use the constructor function `City`

, so that `City(300, 0)`

creates a city with x-coordinate of 300 and y coordinate of 0. then `distance(A, B)`

will be a function that uses the *x* and *y* coordinates to compute the distance between `A`

and `B`

.

How will we represent a two-dimensional point? Here are some choices, with their pros and cons:

**Tuple:**A point (or city) is a two-tuple of (*x*,*y*) coordinates, for example,`(300, 0)`

.**Pro:**Very simple, easy to break a point down into components. Reasonably efficient.**Con:**doesn't distinguish points from other two-tuples. If`p`

is a point, can't do`p.x`

or`p.y`

.**class:**Define`City`

as a custom class with*x*and*y*fields.**Pro:**explicit, gives us`p.x`

accessors.**Con:**less efficient because of the overhead of creating user-defined objects.**complex:**Python already has the two-dimensional point as a built-in numeric data type, but in a non-obvious way: as*complex numbers*, which inhabit the two-dimensional (real × complex) plane. We can make this use more explicit by defining "`City = complex`

", meaning that we can construct the representation of a city using the same constructor that makes complex numbers.**Pro:**most efficient, because it uses a builtin type that is already a pair of numbers. The distance between two points is simple: the absolute value of their difference.**Con:**it may seem confusing to bring complex numbers into play; can't say`p.x`

.**subclass:**Define "`class Point(complex): pass`

", meaning that points are a subclass of complex numbers.**Pro:**All the pros of using`complex`

directly, with the added protection of making it more explicit that these are treated as points, not as complex numbers.**Con:**less efficient than using`complex`

directly; still can't do`p.x`

or`p.y`

.**subclass with properties:**Define "`class Point(complex): x, y = property(lambda p: p.real), property(lambda p: p.imag)`

".**Pro:**All the pros of previous approach, and we can finally say`p.x`

.**Con:**less efficient than using`complex`

directly.

Any of these choices would work perfectly well; I chose to go with complex numbers:

In [16]:

```
City = complex # Constructor for new cities, e.g. City(300, 400)
def distance(A, B):
"The distance between two points."
return abs(A - B)
```

Here's an example of computing the distance between two cities:

In [17]:

```
A = City(300, 0)
B = City(0, 400)
distance(A, B)
```

Out[17]:

Here's how to make a set of random cities:

In [18]:

```
def Cities(n):
"Make a set of n cities, each with random coordinates."
return set(City(random.randrange(10, 890), random.randrange(10, 590)) for c in range(n))
# Let's make some standard sets of cities of various sizes.
# We'll set the random seed so that these sets are the same every time we run this notebook.
random.seed('seed')
cities8, cities10, cities100, cities1000 = Cities(8), Cities(10), Cities(100), Cities(1000)
cities8
```

Out[18]:

`exact_TSP`

function to find the minimal tour through the set of cities `cities8`

:

In [19]:

```
tour = exact_TSP(cities8)
print(tour)
print(total_distance(tour))
```

The permutation `(1, 2, 3)`

represents the tour that goes from 1 to 2 to 3 and back to 1. You may have noticed that there aren't really six different tours of three cities: the cities 1, 2, and 3 form a triangle; any tour must connect the three points of the triangle; and there are really only two ways to do this: clockwise or counterclockwise. In general, with $n$ cities, there are $n!$ (that is, $n$ factorial) permutations, but only $(n-1)!$, tours that are *distinct*: the tours `123`

, `231`

, and `312`

are three ways of representing the *same* tour.

So we can make our `TSP`

program $n$ times faster by never considering redundant tours. Arbitrarily, we will say that all tours must start with the "first" city in the set of cities. We don't have to change the definition of `TSP`

—just by making `alltours`

return only nonredundant tours, the whole program gets faster.

(While we're at it, we'll make tours be represented as lists, rather than the tuples that are returned by `permutations`

. It doesn't matter now, but later on we will want to represent *partial* tours, to which we will want to append cities one by one; that can only be done to lists, not tuples.)

In [20]:

```
def alltours(cities):
"Return a list of tours, each a permutation of cities, but each one starting with the same city."
start = first(cities)
return [[start] + list(tour)
for tour in itertools.permutations(cities - {start})]
def first(collection):
"Start iterating over collection, and return the first element."
for x in collection: return x
```

In [21]:

```
alltours({1, 2, 3})
```

Out[21]:

In [22]:

```
alltours({1, 2, 3, 4})
```

Out[22]:

`exact_TSP(cities8)`

still works and gives the same tour with the same total distance. (But it is now about 8 times faster.)

In [23]:

```
tour = exact_TSP(cities8)
print(tour)
print(total_distance(tour))
```

*plot* a tour we'd understand it better. I define `plot_tour(algorithm, cities)`

to apply a TSP algorithm to a set of cities, plot the resulting tour, and print information about the total distance of the tour and the time it took to find it. The plot will have each city as a blue circle, except the starting point will be a red square:

In [60]:

```
wdef plot_tour(algorithm, cities):
"Apply a TSP algorithm to cities, and plot the resulting tour."
# Find the solution and time how long it takes
t0 = time.clock()
tour = algorithm(cities)
t1 = time.clock()
# Plot the tour as blue lines between blue circles, and the starting city as a red square.
plotline(list(tour) + [tour[0]])
plotline([tour[0]], 'rs')
plt.show()
print("{} city tour; total distance = {:.1f}; time = {:.3f} secs for {}".format(
len(tour), total_distance(tour), t1-t0, algorithm.__name__))
def plotline(points, style='bo-'):
"Plot a list of points (complex numbers) in the 2-D plane."
X, Y = XY(points)
plt.plot(X, Y, style)
def XY(points):
"Given a list of points, return two lists: X coordinates, and Y coordinates."
return [p.real for p in points], [p.imag for p in points]
plot_tour(exact_TSP, cities8)
```

The plot looks much better than a list of coordinates! Now let's try a much harder 10-city tour:

In [61]:

```
plot_tour(exact_TSP, cities10)
```

It takes about 3 seconds on my machine to solve this problem. In general, the function `TSP`

looks at (*n*-1)! tours for an *n*-city problem, and each tour has $n$ cities, so the time for $n$ cities should be roughly proportional to *n*!. This means that the time grows rapidly with the number of cities; we'd need longer than the age of the Universe to run `exact_TSP`

on just 24 cities:

n cities | time |
---|---|

10 | 3 secs |

12 | 3 secs × 12 × 11 = 6.6 mins |

14 | 6.6 mins × 13 × 14 = 20 hours |

24 | 3 secs × 24! / 10! = 16 billion years |

There must be a better way ...

*general, exact* Traveling Salesperson Problem is intractable; there is no efficient algorithm to find the tour with minimum total distance. But if we restrict ourselves to Euclidean distance and if we are willing to settle for a tour that is *reasonably* short but not the shortest, then the news is much better. We will consider several *approximate* algorithms, which find tours that are usually within 10 or 20% of the shortest possible and can handle thousands of cities in a few seconds.

Here is our first approximate algorithm:

Start at any city; at each step extend the tour by moving from the previous city to its nearest neighbor that has not yet been visited.

This is called a *greedy algorithm*, because it greedily takes what looks best in the short term (the nearest neighbor) even when that won't always be the best in the long term. To implement the algorithm I need to represent all the noun phrases in the English description: "start" (a city; arbitrarily the first city); "the tour" (a list of cities, initialy just the start city); "previous city" (the last element of tour, that is, `tour[-`

]`); "nearest neighbor" (a function that, when given a city, A, and a list of other cities, finds the one with minimal distance from A); and "not yet visited" (we will keep a set of unvisited cities; initially all cities but the start city are unvisited).

Once these are initialized, we repeatedly find the nearest unvisited neighbor, `C`

, and add it to the tour and remove it from `unvisited`

:

In [26]:

```
def greedy_TSP(cities):
"At each step, visit the nearest neighbor that is still unvisited."
start = first(cities)
tour = [start]
unvisited = cities - {start}
while unvisited:
C = nearest_neighbor(tour[-1], unvisited)
tour.append(C)
unvisited.remove(C)
return tour
def nearest_neighbor(A, cities):
"Find the city in cities that is nearest to city A."
return min(cities, key=lambda x: distance(x, A))
```

(In Python, as in the formal mathematical theory of computability, `lambda`

is the symbol for *function*, so "`lambda x: distance(x, A)`

" means the function of `x`

that computes the distance from `x`

to the city `A`

. The name `lambda`

comes from the Greek letter λ.)

We can compare the fast approximate `greedy_TSP`

algorithm to the slow `exact_TSP`

algorithm on a small map, as shown below. (If you have this page in a IPython notebook you can repeatedly `run`

the cell, and see how the algorithms compare. `Cities(9)`

will return a different set of cities each time. I ran it 20 times, and only once did the greedy algorithm find the optimal solution, but half the time it was within 10% of optimal, and it was never more than 25% worse than optimal.)

In [66]:

```
cities = Cities(9)
plot_tour(exact_TSP, cities)
plot_tour(greedy_TSP, cities)
```

In [28]:

```
plot_tour(greedy_TSP, cities100)
plot_tour(greedy_TSP, cities1000)
```

Can we do better? Can we find shorter tours without taking an exponentially increasing amount of time? Our challenge is to combine the speed of the greedy algorithm with the precision of the exact algorithm.

We need an idea for how to get there ...

`exact_TSP`

algorithm is too slow for large *n*, because it considers all $n!$ tours. But the `greedy_TSP`

algorithm considers exactly one tour—the tour that starts at an arbitrary city and then transitions to the nearest neighbor (repeatedly). Could we get a good compromise between execution speed and total tour distance if we considered more than 1 but less than $n!$ tours? What's a number between 1 and $n!$, and how could we use that to create a new algorithm?

Let's think about how to improve the greedy algorithm. In the plot of the 100-city tour with `greedy_TSP`

, we see that there are a few very-long edges between cities. These come about because we get to a certain point in the greedy construction of the tour where there are no close-by cities, and we have to jump far away. In a way, this just seems like bad luck—the way we flow from neighbor to neighbor just happens to leave a few very-long edges. Just as with buying lottery tickets, we can improve our luck by trying more often:

For each city, run the greedy algorithm with that city as the starting point, and choose the resulting tour with the shortest total distance.

So, with $n$ cities we run the `greedy_TSP`

algorithm $n$ times, meaning the run time will be $n$ times longer, but the resulting tour will (in most cases) be shorter (and always at least as short as `greedy_TSP`

). To implement `all_greedy_TSP`

we just take the shortest tour over all starting cities. To do that requires a modification of `greedy_TSP`

so that the starting city can be specified as an optional argument. Previously, `greedy_TSP`

always used the first city as the strating point. We will make `start`

an optional parameter; if it is unspecified, we default to the old behavior of using the first city.

In [68]:

```
def all_greedy_TSP(cities):
"Try the greedy algorithm from each of the starting cities; return the shortest tour."
return shortest(greedy_TSP(cities, start=c) for c in cities)
# We will modify greedy_TSP to take an optional start city; otherwise it is unchanged.
def greedy_TSP(cities, start=None):
"At each step, visit the nearest neighbor that is still unvisited."
if start is None: start = first(cities)
tour = [start]
unvisited = cities - {start}
while unvisited:
C = nearest_neighbor(tour[-1], unvisited)
tour.append(C)
unvisited.remove(C)
return tour
# Compare greedy_TSP to all_greedy_TSP
plot_tour(greedy_TSP, cities100)
plot_tour(all_greedy_TSP, cities100)
```

`all_greedy_TSP`

does indeed take about $n=100$ times longer to run, but it yields a tour that is 8% shorter than `greedy_TSP`

(for this particular 100-city map). For the 1000-city problem it would take about 1000 times longer. I didn't want to wait that long, so I didn't try. Instead let's look for other algorithms that better balance speed and shorter tours.

Of the several very-long edges between cities, it looks like most of them appear near the *end* of the tour. That makes sense: when there are only a few cities left, it is not surprising that the nearest one might be far away. So here's an idea:

Use the greedy algorithm for all but the last few cities (about 8), then use the exact algorithm over all possible ways to complete the tour.

This is another example of the "pick a number from 1 to $n!$" idea, this time picking $8!$ as the number of tours to consider.
We'll define the function `greedy_exact_end_TSP`

to take a parameter `end_size`

to say how many cities at the end we will consider an exact solution for. The default is `end_size=8`

. Suppose we are solving a 1000-city problem and have constructed a greedy tour through 992 cities, and now want to complete the final 8 cities. We will consider all 8! = 40,320 permutations of those cities. The simple way to do that is: generate all permutations of the 8 cities, and one at a time, append the 8 onto the 992 to yield a 1000-city tour, and calculate the total-distance for each such tour. Choose the shortest.

But that means we're computing `total_distance`

of a 1000-city tour 40,320 times. It would be 100 times faster if we could just compute the `total_distance`

of 10-city tours instead of 1000-city tours. Fortunately, we can do that. Since all 992 cities of the greedy part of the tour stay the same, their contribution to the total distance is the same for every permutation, *except* for the contribution of the first and last city out of the 992. So we can omit the 990 cities in the middle of the greedy part, and just evaluate the `total_distance`

over a tour consisting of the first and last cities of the 992, followed by a permutation of the 8 remaining cities. Again, try them all and choose the shortest, and once we have the shortest, put the omitted cities back in.

In [51]:

```
def greedy_exact_end_TSP(cities, start=None, end_size=8):
"""At each step, visit the nearest neighbor that is still unvisited until
there are k_end cities left; then choose the best of all possible endings."""
if start is None: start = first(cities)
tour = [start]
unvisited = cities - {start}
# Use greedy algorithm for all but the last end_size cities
while len(unvisited) > end_size:
C = nearest_neighbor(tour[-1], unvisited)
tour.append(C)
unvisited.remove(C)
# Consider all permutations of possible ends to the tour, and choose the best one.
# (But to make things faster, omit the middle of the tour.)
ends = map(list, itertools.permutations(unvisited))
best = shortest([tour[0], tour[-1]] + end for end in ends)
return tour + best[2:]
plot_tour(greedy_exact_end_TSP, cities100)
plot_tour(greedy_exact_end_TSP, cities1000)
```

`greedy_exact_end_TSP`

is a bit disapointing on the 100-city map; it gives a longer tour than `all_greedy_TSP`

and is no faster. But on the 1000-city map, it is faster than the expected run time of `all_greedy_TSP`

, and yields a slightly shorter tour than `greedy_TSP`

.

Should we spend our computation time on different cities at the start of the tour (as in `all_greedy_TSP`

) or on different permutations at the end of the tour (as in `greedy_exact_end_TSP`

)? *Why not do both?*

In the function `greedy_bi_TSP`

we combine an `end_size`

parameter to say how many of the final cities to do exact search over, and a `start_size`

parameter to say how many different starting cities to try. We won't necessarily try *all* starting cities, and the ones we do try will be sampled at random.

We try the algorithm with a default of 12 starting cities and 7 cities at the end; the results on both the 100-city and 1000-city maps are shorter than any other algorithm we have seen so far. That's encouraging, but with other choices of random starting points the results might be different, and on other maps the results might be different. I would want more convincing evidence before declaring this algorithm the champion.

In [31]:

```
def greedy_bi_TSP(cities, start_size=12, end_size=6):
"At each step, visit the nearest neighbor that is still unvisited."
starts = random.sample(cities, min(len(cities), start_size))
return shortest(greedy_exact_end_TSP(cities, start, end_size)
for start in starts)
random.seed('bi')
plot_tour(greedy_bi_TSP, cities100)
plot_tour(greedy_bi_TSP, cities1000)
```

If we only consider one or two maps we can't be confident which algorithm is best. It would be better to average over, say, 50 or 100 random maps at a time and see a summary of the total distances produced by different algorithms on these maps. We'll define the `compare_algorithms`

function to take a list of TSP algorithms and a list of maps, and apply each algorithm to all the maps. For each algorithm we collect all the total distances on all the maps, and then sort the distances to get a monitonically increasing curve. We can then compare the curves: the lower the curve, the better. The chart legend also gives the total time in seconds and the average tour length for each algorithm.

Here is the code and a comparison plot:

In [32]:

```
def compare_algorithms(algorithms, maps):
"Apply each algorithm to each map and plot results."
for algorithm in algorithms:
t0 = time.clock()
results = [total_distance(algorithm(m)) for m in maps]
t1 = time.clock()
avg = sum(results) / len(results)
label = '{:.0f}; {:.1f}s: {}'.format(avg, t1-t0, algorithm.__name__)
plt.plot(sorted(results), label=label)
plt.legend(loc=2)
plt.show()
print('{} x {}-city maps'.format(len(maps), len(maps[0])))
def Maps(M, N):
"Return a list of M maps, each consisting of a set of N cities."
return [Cities(N) for m in range(M)]
compare_algorithms([greedy_TSP, greedy_exact_end_TSP, all_greedy_TSP], Maps(100, 50))
```

`greedy_TSP`

(the blue line) is the fastest of the three algorithms, but gives the longest tours (as expected). `greedy_exact_end`

is the slowest, by far, and only a little bit shorter than `greedy_TSP`

. The winner looks like `all_greedy_TSP`

; significantly shorter tours with not-too-long runtimes. This suggests that the multiple starting points are helping more than the permutation of ending cities, but let's explore the tradeoff by considering multiple points along the way:

In [33]:

```
def bi_10_6(cities): return greedy_bi_TSP(cities, 10, 6)
def bi_20_5(cities): return greedy_bi_TSP(cities, 20, 5)
def bi_40_4(cities): return greedy_bi_TSP(cities, 40, 4)
def bi_80_2(cities): return greedy_bi_TSP(cities, 80, 2)
def bi_160_1(cities): return greedy_bi_TSP(cities, 160, 1)
algorithms = [bi_10_6, bi_20_5, bi_40_4, bi_80_2, bi_160_1]
```

In [34]:

```
compare_algorithms(algorithms, Maps(100, 50))
```

In [35]:

```
compare_algorithms(algorithms, Maps(50, 100))
```

Again, there is no clear winner. Let's try 160-city maps:

In [52]:

```
compare_algorithms(algorithms, Maps(25, 160))
```

We're starting to see a bit of separation: `bi_160_1`

, which tries all starting cities and only one permutation at the end, is the best, but still not by a wide margin.

Now we could gather more statistics, and we could do a *significance test* to quantify the probability that the winning algorithm really is the best, or just benefitted from lucky random variation. Doing that kind of test is a useful exercise, but I feel like we should explore other completely different algorithms before spending times splitting hairs over the best parameters to `greedy_bi_TSP`

.

We can do a *postmortem analysis* where we examine hat went wrong with the greedy algorithm and try to come up with ways to improve.

*ignoring the outlying points*, and then at the end we will be left with long paths to the outliers.

In [53]:

```
outliers_list = [City(2, 2), City(2, 3), City(2, 4), City(2, 5), City(2, 6),
City(3, 6), City(4, 6), City(5, 6), City(6, 6),
City(6, 5), City(6, 4), City(6, 3), City(6, 2),
City(5, 2), City(4, 2), City(3, 2),
City(1, 6.8), City(7.8, 6.4), City(7, 1.2), City(0.2, 2.8)]
plotline(outliers_list, 'bo')
```

`greedy_TSP`

algorithm tour starts at the red point, moves clockwise around the inner square, and then picks up the four outlying points in counter-clockwise order, with four very long edges.

In [55]:

```
outliers = set(outliers_list)
plot_tour(greedy_TSP, outliers)
```

`all_greedy_TSP`

has the same problem:

In [58]:

```
plot_tour(all_greedy_TSP, outliers)
```

`greedy_exact_end_TSP`

and `greedy_bi_TSP`

) do somewhat better: an end-size of 8 is enough to pick up two of the four outliers in a reasonable postion. But not all four outliers.

In [59]:

```
plot_tour(greedy_exact_end_TSP, outliers)
plot_tour(greedy_bi_TSP, outliers)
```

Let's try to understand what went wrong. First we'll create some more tools to better draw diagrams.

In [63]:

```
def plot_lines(points, labels, *args):
"""Takes a list of points and a (possibly empty) list of labels for them,
followed by any number of args which can be either a string, indicating a new line style, like 'bo-',
or a list of indexes into points, like [0, 1, 2, 3], which indicates a line through those points."""
plot_points(points, labels)
style = 'bo-'
for arg in args:
if isinstance(arg, str):
style = arg
else: # arg is a list of indexes into points, forming a line
line = [points[i] for i in arg]
X, Y = XY(line)
plt.plot(X, Y, style)
plt.show()
def plot_points(points, labels=(), style='bo'):
"Plot individual points (and maybe labels for them) without any lines."
plot_labels(points, labels)
for p in points:
plt.plot([p.real], [p.imag], style)
def plot_labels(cities, labels):
"If any labels are supplied, plot them."
for (c, L) in zip(cities, labels):
plt.text(c.real, c.imag, ' '+str(L))
```

In [64]:

```
plot_lines(outliers_list, range(20), range(0, 5), 'ro--', [4, 16], 'bo--', [4, 5])
```

*C* (like city 16 in the diagram above) does not happen to be the nearest neighbor to the tour's endpoint (city 4 in the diagram above), if *C* is not near to any other city, then probably now is a good time to add it to the tour. That is, I should add *C* to the tour if either *C* is the nearest neighbor of the tour's endpoint, or if *C*'s nearest neighbor is the tour's endpoint. We call this algorithm `double_greedy_TSP`

. For efficiency, it precomputes the nearest neighbor of each city and stores it the dict `NN`

. It tracks the tour and the unvisited cities, just like `greedy_TSP`

. On each iteration, it does this: call the last city in the tour `A`

, and its nearest neighbor `B`

. Then form the set `Ds`

of all cities for whom `A`

is the nearest neighbor. If there are any, make the chosen city, `C`

be the closest member of `Ds`

; if there aren't any, make `C`

be the nearest neighbor, `B`

. Add `C`

to the tour and remove it from `unvisited`

.

In [40]:

```
def double_greedy_TSP(cities, start=None):
"""At each step, call the last city in the tour A, and consider the
nearest neighbor B, and also any city D that has A as nearest neighbor."""
NN = {C: nearest_neighbor(C, cities - {C}) for C in cities}
if start is None: start = first(cities)
tour = [start]
unvisited = cities - {start}
while unvisited:
A = tour[-1]
B = NN[A] if NN[A] in unvisited else nearest_neighbor(A, unvisited)
Ds = [D for D in unvisited if NN[D] is A and D is not B]
C = (min(Ds, key=lambda D: distance(D, A))) if Ds else B
tour.append(C)
unvisited.remove(C)
return tour
plot_tour(double_greedy_TSP, outliers)
compare_algorithms([greedy_TSP, double_greedy_TSP, all_greedy_TSP], Maps(100, 50))
```

We see that `double_greedy_TSP`

completely solves the `outliers`

map. Unfortunately, it fails to turn in a distinguished performance on a collection of 50-city maps, performing roughly the same as the simple `greedy_TSP`

(but slower). It seems that it is not *always* a good idea to pick up an outlier. So what can we do? Let's take a moment to reflect on what we have done so far, and what we might yet do.

**Better choices**: figure out when it is a good idea to extend the tour to a non-nearest-neighbor, and when it is not.**More choices**: consider both including and not including a non-nearest neighbor. We will have multiple possible partial tours under consideration at any time.**Recovery from bad choices**: make one choice, but allow for the possibility of making changes. We will quickly build one single complete tour, and then will consider alterations to the tour.SO far we've seen 6 algorithms, but only two basic strategies:

Exact algorithms:

*Try all tours and choose the best.*- Greedy algorithms:
*Start with a partial tour (initially one city) and keep extending it by adding a near neighbor.*

Now we'll introduce a third basic strategy:

- Improvement algorithms: *Use an old algorithm to produce a tour, then improve the tour by trying many small alterations, and keeping those that make the tour shorter.*

That's the basic idea, but there are many details to work out. First we'll look at some ways tours could be improved.

In general, in situations like this, there are three choices:

*segment* as a subsequence of a tour: a sequence of consecutive cities within a tour. One way we could try to improve a tour is by *reversing* a segment. We'll look at an example, but first we'll need a tool for drawing segments. We'll call it `plot_lines`

, and it takes the following arguments: first, a list of cities, second a sequence of labels for the cities (or `None`

), and then any number of pairs of arguments that describe how to draw segments—first a list of indexes into the list of cities, and second a line style (in the format expected by the `plot`

function).

In [41]:

```
cross = [9+3j, 3+10j, 2+16j, 3+21j, 9+28j, 26+3j, 32+10j, 33+16j, 32+21j, 26+28j]
plot_lines(cross, range(10), range(-1,10), 'bo-')
```

`[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]`

. If we reverse the segment `[5, 6, 7, 8, 9]`

, we get the tour `[0, 1, 2, 3, 4, 9, 8, 7, 6, 5]`

, which is the optimal tour. In the diagram below, reversing `[5, 6, 7, 8, 9]`

is equivalent to deleting the dotted red lines and adding the dotted blue lines. We know that the reversal is in fact an improvement if the sum of the blue dotted lines is less than the sum of the red dotted lines.

In [42]:

```
plot_lines(cross, range(10), 'bo-', range(5), range(5, 10), 'bo--', (4, 9), (0, 5),
'r--', (4, 5), (0, 9))
plot_points(outliers_list, range(20))
plot_lines(outliers_list, (), range(16), 'bo-')
```

Note that if distances are Euclidean, then any time there is a pair of crossed lines, the tour can be improved by uncrossing them.

Also note the general idea of an improvement algorithm: to allow a base algorithm to make mistakes, but to avoid suffering from those mistakes. We can use a greedy algorithm as a base algorithm, and it can make mistakes (like putting an outlier in the wrong place, or crossing lines) and we can still recover from those mistakes. Why arrange things with two parts to the algorithm, a greedy part followed by an improvement part? Because some decisions can be difficult to make when a partial tour is in the process of being created, but can be easy to make after the tour is completed. It is always wrong to cross lines, but when we are constructing a tour and laying down the first line, we don't know that it will eventually become part of a crossed pair. But after the tour is completed, we do know.

That leaves a lot of details to work out. We will:

- Try multiple starting cities, find a tour for each one, and improve each tour.
- Base the improvements on the result of
`greedy_exact_end_TSP`

. - Try reversing each 2- or 3-city long segment
- Try moving each city to a better position in the tour
- Randomly try moving or reversing segments of different sizes
- When a large number of tries in a row fails to improve things, stop trying

In [43]:

```
def improved_greedy(cities, start_size=10, end_size=4, num_fails=None):
"Improve the result of greedy_TSP from each starting point."
starts = cities if start_size >= len(cities) else random.sample(cities, start_size)
return shortest(improve(greedy_exact_end_TSP(cities, start, end_size), num_fails)
for start in starts)
RNG = random.randrange
def improve(tour, num_fails=None):
"Improve the tour by moving or reversing segments (tour[i:j])."
N = len(tour)
num_fails = num_fails or 4*N
# Try moving each individual city to a better spot
for i in range(N):
maybe_move_city(tour, i)
# Try reversing each 2- and 3-long segments
for i in range(N-2):
maybe_reverse_segment(tour, i, i+2)
for i in range(N-3):
maybe_reverse_segment(tour, i, i+3)
# Randomly pick segment tour[i:j], try to reverse or move it.
# If Ntries in a row fail to improve, then give up.
consecutive_fails = 0
while consecutive_fails < Nfails:
i = RNG(N-2)
j = RNG(i+2, N)
better = maybe_reverse_segment(tour, i, j) or maybe_move_somewhere(tour, i, j, cities)
consecutive_fails = 0 if better else consecutive_fails + 1
return tour
def maybe_reverse_segment(tour, i, j):
"If reversing tour[i:j] helps, then do it. Return True if it helps"
A, B, C, D = tour[i-1], tour[i], tour[j-1], tour[j]
# Are edges added less than edges taken away?
if distance(A, C) + distance(B, D) < distance(A, B) + distance(C, D):
tour[i:j] = reversed(tour[i:j])
return True
def maybe_move_cities(tour):
"Try to relocate each city to the best spot in the tour (closest to neighbors)."
N = len(tour)
positions = range(N)
dummy = object()
for i in positions:
A, B, C = tour[i-1], tour[i], tour[(i+1)%N]
# Try to move city B to the best place it could go: the minimum detour distance
j = min(positions, key=lambda j: detour(B, tour[j], tour[j-1]))
D, E = tour[j-1], tour[j]
if detour(B, D, E) < detour(B, A, C):
tour[i], tour[j:j] = dummy, [B]
tour.remove(dummy)
def detour(B, A, C):
"In going A-B-C, how much of a detour is this over just A-C?"
return distance(A, B) + distance(B, C) - distance(A, C)
#plot_tour(all_greedy_TSP, cities100)
plot_tour(improved_greedy, cities100)
#plot_tour(improved_greedy, tricky)
```

Working with the greedy nearest neighbor algorithm may have given you some ideas for other algorithms (I know it did for me, and we'll get to them in a bit). But if you don't have an idea, you can always Google it, in which case you'll quickly find this algorithm:

Construct a Minimum Spanning Tree, then walk the tree in pre-order, omitting repeated nodes. That will give you a tour that is guaranteed to be no more than twice as long as the minimal tour.

What does all this jargon mean? We will first explain what a minimum spanning tree is, and then how to walk it in pre-order, then why you might want to do that. First a definition:

Given a set of vertexes (e.g. cities), aspanning treeis a tree that connects all the vertexes together. Atreeis a collection of vertexes and edges such that there is exctly one path between any two vertexes: there are no loops, and no disconnected parts. Aminimum spanning treeis a spanning tree such that the sum of the edge distances is as small or smaller than any other spanning tree.

Now an algorithm for finding a minimum spanning tree:

Consider all the edges (we'll represent an edge as a pair (A, B) of cities) and sort the edges, shortest first. Initialize atreeto be a single city (we'll arbitrarily shoose the first) with no edges coming out of the city. Now repeat the following until the tree contains all the cities: find the shortest edge that links a city (A) that is in the tree to a city (B) that is not yet in the tree, and add B to the list of A's links in the tree.

(Note we are using "link" and "edge" as synonyms (whichever sounds better at the time)). This is called Prim's algorithmHere's the code:

In [ ]:

```
def MST(cities):
"""Given a set of cities, build a minimum spanning tree: a dict of the form {parent: [child...]},
where parent and children are cities, and the root of the tree is first(cities)."""
N = len(cities)
edges = shortest_first([(A, B) for A in cities for B in cities if A is not B])
tree = {first(cities): []} # the first city is the root of the tree.
while len(tree) < N:
(A, B) = first((A, B) for (A, B) in edges if (A in tree) and (B not in tree))
tree[A].append(B)
tree[B] = []
return tree
def shortest_first(edges):
"Sort a list of edges so that shortest come first."
edges.sort(key=lambda (A, B): distance(A, B))
return edges
from collections import defaultdict
def MSTk(cities):
"""Go through edges, shortest first. Always join two closest segments."""
edges = shortest_first([(A, B) for A in cities for B in cities if abs(A) < abs(B)])
tree = defaultdict(list)
subgraphs = {c: [c] for c in cities}
for (A, B) in edges:
if subgraphs[A] is not subgraphs[B]:
join_subgraphs(subgraphs, A, B)
tree[A].append(B)
tree[B].append(A)
if len(tree) == len(cities):
return tree
def join_subgraphs(subgraphs, A, B):
"Copy B's subgraph into A, and update."
As, Bs = subgraphs[A], subgraphs[B]
As.extend(Bs)
for c in Bs:
subgraphs[c] = As
plot_tree(MST(cities100))
plot_tree(MSTk(cities100))
```

**tree** because all the cities in the tree are connected (we start with just one, and every time we add a city, it is connected to one in the tree) and there are no loops (we add only one link to a city -- after that, it is in the tree, so we never add another link to it). Second, it is a **spanning tree** because it contains all the cities, and third, it is a **minimum spanning tree** because each city was added with the shortest possible edge. Suppose this algorithm produces the tree T. For another putative spanning tree to be shorter, it would have to contain at least one city C with an edge into C that is shorter than any edge into C in T.

In [ ]:

```
def plot_tree(tree):
"Given a tree of {parent: [child...]}, plot the lines between points."
for P in tree:
for C in tree[P]:
plotline([P, C], 'ko-')
plt.show()
print('{} node Minimum Spanning Tree of length: {:.1f}'.format(
len(tree), sum(distance(P, C) for P in tree for C in tree[P])))
mst = MST(cities100)
plot_tree(mst)
R = random.randrange
def Grid(w, h): return set(City(x+R(2,7)/10., y+R(2,7)/10.) for x in range(w) for y in range(h))
grid = Grid(15, 10)
plot_tree(MST(cities10))
plot_tree(MST(grid))
plot_tour(greedy_TSP, grid)
plot_tour(MST_TSP, grid)
plot_tour(improved_greedy, grid)
```

In [ ]:

```
def MST_TSP(cities):
"Create a minimum spanning tree and walk it in pre-order, omitting duplicates."
return preorder_uniq(MST(cities), first(cities), [])
def preorder_uniq(tree, node, result):
"Traverse tree in pre-order, starting at node, omitting repeated nodes."
# Accumulate results in the 'result' parameter, which should start with an empty list
if node not in result:
result.append(node)
for child in tree[node]:
preorder_uniq(tree, child, result)
return result
plot_tour(MST_TSP, cities100)
plot_tour(greedy_TSP, cities100)
```

Can we do better than `greedy_TSP`

without paying the time penalty of `all_greedy_TSP`

? One idea is that instead of keeping one partial tour and always adding on to the end, we can instead keep multiple smaller *tour segments*: a segment is a list of 1 or more cities, in order, that will eventually fit into the final tour. Then the algorithm is:

Start with each city defining its own 1-city segment. At each step, join together the two segments that are closest to each other. Repeat until only one segment is left.

By *closest*, we mean the distance from one of the two ends of one segment to one of the ends of the other. That means that segments can join at either end, so a segment does not have a direction, but it does have an ordering. For example, if we have the two segments `ABC`

and `DE`

, we can join them 4 ways: `ABC+DE`

, `ABC+ED`

, `CBA+DE`

, or `CBA+ED`

. Note that `ABCDE`

and `EDCBA`

are considered the same segment. We also keep a dictionary of segments: `segements[C]`

gives the segment that contains city `C`

.

The way we implement this is to keep a list of *edges* in the form `(A, B)`

, where `A`

and `B`

are cities, and where the list is sorted so that shorter edges come first. Then we work through the list of edges, seeing if an edge can be used to join two segments.

Suppose we have the segments `ABC`

and `DE`

. If the next shortest edge is `(A, C)`

, then `join_segments`

does nothing, because `A`

and `C`

are already in the same segment. If the next edge is `(B, E)`

then again `join_segments`

does nothing, because `B`

is in the middle of a segment; it already has an incoming and outcoming edge, and can't be joined to something else. But if `(A, D)`

is next, then we can join the segments: we reverse `ABC`

to get `CBA`

and then add `DE`

to the end of `CBA`

to form `CBADE`

. (And we update `segments[D]`

and `segments[E]`

to be this joined segment.) We continue the joining process until there is one segment that covers all the cities.

Below we see the code, and the tour for the same 100 cities as shown above. Notice that on this particular map, `greedy_join_TSP`

is 20 times faster than `all_greedy_TSP`

and produces a tour that is within 1% in length.

In [ ]:

```
def greedy_join_TSP(cities):
"""Go through edges, shortest first. Always join two closest segments."""
edges = shortest_first([(A, B) for A in cities for B in cities if abs(A) < abs(B)])
segments = {c: [c] for c in cities}
for (A, B) in edges:
join_segments_with(segments, A, B)
if len(segments[A]) == len(cities):
return segments[A]
def join_segments_with(segments, A, B):
"If the A-B edge can be used to join B's segment onto A's, then do it."
As, Bs = segments[A], segments[B]
# if they are not already joined and if A and B are at the ends of their segments
if As is not Bs and A in (As[0],As[-1]) and B in (Bs[0],Bs[-1]):
# Make As == [..., A] and Bs == [B, ...]
if As[0] is A: As.reverse()
if Bs[-1] is B: Bs.reverse()
# Move all the cities in Bs over into As
As += Bs
for c in Bs:
segments[c] = As
return As
plot_tour(greedy_join_TSP, cities100)
```

In [ ]:

```
def greedy_join_TSP(cities, plot_sizes=()):
"""Go through edges, shortest first. Always join two closest segments."""
edges = shortest_first([(A, B) for A in cities for B in cities if abs(A) < abs(B)])
segments = {c: [c] for c in cities}
for (A, B) in edges:
if join_segments(segments, A, B):
S = distinct_segments(segments)
if len(S) in plot_sizes:
map(plotline, S); plt.show()
print('{} cities with {} segments'.format(len(cities), len(S)))
if len(segments[A]) == len(cities):
return segments[A]
print map(len, segments.values())
def distinct_segments(segments):
return set(map(tuple, segments.values()))
greedy_join_TSP(cities100, {20, 10, 5})
plot_tour(greedy_join_TSP, cities100)
```

*all* the other ideas! The function `ensemble_TSP`

combines the other algorithms we have defined in such a way that it is guaranteed to be best (except that it will be slower than running a single algorithm). If there are 10 or fewer cities, it finds the optimal solution with `TSP`

. Otherwise, it tries each of `greedy_join_TSP`

, `double_greedy_TSP`

, and `improved_greedy`

, then applies `improve`

to each of the results, and returns the shortest tour. (It doesn't try `greedy_TSP`

nor `all_greedy_TSP`

, because they are subsumed by `improved_greedy`

,) When compared to `improved_greedy`

, we see `ensemble_TSP`

is usually (but not always) better (and can never be worse).

In [ ]:

```
def ensemble_TSP(cities, algorithms={greedy_join_TSP, double_greedy_TSP}):
# jQuery17104754353954922408_1370288511049? MST_TSP, improved_greedy
"Apply all the algorithms and pick the shortest resulting tour."
if len(cities) <= 10:
return TSP(cities)
else:
return shortest(improve(tsp(cities)) for tsp in algorithms)
#plot_tour(improved_greedy, cities100)
plot_tour(ensemble_TSP, cities100)
```

We see that of the fast algorithms, `greedy_TSP`

is twice as fast as the others, but `greedy_join_TSP`

produces about 5% shorter tours.

Among the slower algorithms, `ensemble_TSP`

and `improved_greedy`

are similar, both producing tours about 10% to 15% shorter (but 80 times slower) than `greedy_join_TSP`

. Meanwhile, `all_greedy_TSP`

is intermediate in tour length and speed. (Note: unfortunately, `greedy_join_TSP`

is red in the top graph and blue in the bottom; my aplogies if this causes confusion.)

We don't know what the optimal solutions are for those 50-city maps (we can't run `TSP`

on them; it would take forever), but we can do a comparison on smaller (9-city) maps. It looks like `ensemble_TSP`

is exactly identical to the optimal `TSP`

in the tours it produces, and it runs at similar speed. `greedy_join_TSP`

runs over 1000 times faster, but produces tours that are 5% to 10% longer.

In [ ]:

```
compare_algorithms([greedy_join_TSP, ensemble_TSP, TSP], Maps(100, 9))
```

In [ ]:

```
points = [(3+10j),(2+16j),(3+21j),(9+28j),(12+6j),(16+28j),(22+21j),(23+16j),(22+10j),(16+3j),(9+3j)]
plot_lines(points, None, range(11)+[0], 'bo-')
```

In [ ]:

```
plot_lines(points, None, (10, 0, 1, 2, 3), 'bo-', (5, 6, 7, 8, 9), 'bo-',
(3, 4, 5), 'r--', (9, 10), 'r--',
(10, 4, 9), 'bo--', (3, 5), 'bo--')
```

In [1]:

```
def parse_city(line):
"Parse a line of data, as in http://www.realestate3d.com/gps/latlong.htm , and return a (City, cityname, state) triplet."
code, lat, long, name = line.split(None, 3)
cityname, state = name.split(',')
return City(-float(long), float(lat)), cityname, state
UScityData = map(parse_city, """
[TCL] 33.23 87.62 Tuscaloosa,AL
[PHX] 33.43 112.02 Phoenix,AZ
[PGA] 36.93 111.45 Page,AZ
[TUS] 32.12 110.93 Tucson,AZ
[LIT] 35.22 92.38 Little Rock,AR
[SFO] 37.62 122.38 San Francisco,CA
[LAX] 33.93 118.40 Los Angeles,CA
[SAC] 38.52 121.50 Sacramento,CA
[SAN] 32.73 117.17 San Diego,CA
[SBP] 35.23 120.65 San Luis Obi,CA
[EKA] 41.33 124.28 Eureka,CA
[SJC] 37.37 121.92 San Jose,CA
[DEN] 39.75 104.87 Denver,CO
[DRO] 37.15 107.75 Durango,CO
[HVN] 41.27 72.88 New Haven,CT
[DOV] 39.13 75.47 Dover,DE
[DCA] 38.85 77.04 Washington/Natl,DC
[MIA] 25.82 80.28 Miami Intl,FL
[TPA] 27.97 82.53 Tampa Intl,FL
[JAX] 30.50 81.70 Jacksonville,FL
[TLH] 30.38 84.37 Tallahassee,FL
[ATL] 33.65 84.42 Atlanta,GA
[BOI] 43.57 116.22 Boise,ID
[CHI] 41.90 87.65 Chicago,IL
[IND] 39.73 86.27 Indianapolis,IN
[DSM] 41.53 93.65 Des Moines,IA
[ICT] 37.65 97.43 Wichita,KS
[LEX] 38.05 85.00 Lexington,KY
[NEW] 30.03 90.03 New Orleans,LA
[BOS] 42.37 71.03 Boston,MA
[PWM] 43.65 70.32 Portland,ME
[BGR] 44.80 68.82 Bangor,ME
[DET] 42.42 83.02 Detroit,MI
[STC] 45.55 94.07 St Cloud,MN
[DLH] 46.83 92.18 Duluth,MN
[STL] 38.75 90.37 St Louis,MO
[JAN] 32.32 90.08 Jackson,MS
[BIL] 45.80 108.53 Billings,MT
[BTM] 45.95 112.50 Butte,MT
[RDU] 35.87 78.78 Raleigh-Durh,NC
[INT] 36.13 80.23 Winston-Salem,NC
[OMA] 41.30 95.90 Omaha/Eppley,NE
[LAS] 36.08 115.17 Las Vegas,NV
[EWR] 40.70 74.17 Newark Intl,NJ
[ABQ] 35.05 106.60 Albuquerque,NM
[SAF] 35.62 106.08 Santa Fe,NM
[LRU] 32.30 106.77 Las Cruces,NM
[NYC] 40.77 73.98 New York,NY
[BUF] 42.93 78.73 Buffalo,NY
[ALB] 42.75 73.80 Albany,NY
[FAR] 46.90 96.80 Fargo,ND
[BIS] 46.77 100.75 Bismarck,ND
[CVG] 39.05 84.67 Cincinnati,OH
[CLE] 41.42 81.87 Cleveland,OH
[OKC] 35.40 97.60 Oklahoma Cty,OK
[PDX] 45.60 122.60 Portland,OR
[MFR] 42.37 122.87 Medford,OR
[AGC] 40.35 79.93 Pittsburgh,PA
[PVD] 41.73 71.43 Providence,RI
[CHS] 32.90 80.03 Charleston,SC
[MEM] 35.05 90.00 Memphis Intl,TN
[DFW] 32.90 97.03 Dallas/FW,TX
[LBB] 33.65 101.82 Lubbock,TX
[IAH] 29.97 95.35 Houston,TX
[SAT] 29.53 98.47 San Antonio,TX
[ABI] 32.42 99.68 Abilene,TX
[SLC] 40.78 111.97 Salt Lake Ct,UT
[MPV] 44.20 72.57 Montpelier,VT
[RIC] 37.50 77.33 Richmond,VA
[SEA] 47.45 122.30 Seattle,WA
[ALW] 46.10 118.28 Walla Walla,WA
[GRB] 44.48 88.13 Green Bay,WI
[MKE] 42.95 87.90 Milwaukee,WI
[CYS] 41.15 104.82 Cheyenne,WY
[SHR] 44.77 106.97 Sheridan,WY
""".strip().splitlines())
UScities = [data[0] for data in UScityData]
def bi_80_6(cities): return greedy_bi_TSP(cities, 80, 6)
plot_points(UScities, [data[1][0] for data in UScityData])
plot_tour(bi_80_6, set(UScities))
```

In [ ]:

```
```