#!/usr/bin/env python # coding: utf-8 # # Local Search in Discrete State and Action Spaces # # For some search problems, the path to reach the goal is not # important. Discovering a goal state or a state with the best value of # some objective function is the important part. Such problems are # often called **optimization problems**. # # What are some examples? # # What is the best setting for a control on your stove to cook your soup # as quickly as possible without burning it? # # What is the best way to pack the words of a word cloud, or a [wordle](http://www.wordle.net)? See an answer [here](https://stackoverflow.com/questions/342687/algorithm-to-implement-a-word-cloud-like-wordle). # # What is the best way to divide up the computations of a matrix # calculation to be distributed across multiple cores? # **Local search** algorithms operate by maintaining a single state (or # set of pretty good states) and generating another state to try based # on this limited memory. Some set of actions are available to modify a # state to another "neighboring" state. It is a "neighbor" state in an # abstract graph of states connected by single actions. # # Imagine all of the possible states for a problem laid out along a # line, along an x-axis. The value of the objective function can be # plotted as a height above the points on the x-axis. A local search # algorithm tries to find the state with the best height, where best # could be the maximal or minimal value depending on the search # problem. The best value over all states is the **global optimum** (a # minimum or a maximum depending on the problem). # # A local search algorithm is **complete** if it always finds a goal # state. It is **optimal** if it always finds the optimum state, the # one with the global maximum or minimum. # # Hill-Climbing Search # This is our first example of a local search algorithm. Imagine you # are climbing a mountain and you are in a very thick fog. You can only # see a distance equal to one step length. To try to climb you take the # step in the direction that is steepest to get to the highest # point of all the locations you can currently see. # # In other words, hill-climbing search simply evaluates the objective # function for all states that are neighbors to the current state, and # takes the neighbor state with the best objective function value as the # new current state. If there are more than one next best states, one # is picked randomly. # # Hill-climbing search is sometimes called **greedy search**, because a # step is taken after only considering the immediate neighbors. No time # is spent considering possible future states. # # Hill-climbing is easy to formulate and implement and often finds # pretty good states quickly. But, it has the following problems: # # * it gets stuck on local optima (hills for maximizing searches, valleys for minimizing searches, # * it may get stuck on a ridge, if no single action can advance the search along the ridge, # * it may get stuck wandering on a plateau for which all neighboring states have equal value. # # Common variations include # * allow sideways moves (when on a plateau) # * stochastic hill-climbing: choose next state with probability related to increase in value of objective function # * first-choice hill-climbing: generate neighbors by random choice of available actions and keep first state that has better value, # * random-restart hill climbing: conduct multiple hill-climbing searches from multiple, randomly generated, initial states. # # Only this last one, with random-restarts, is **complete**. In the limit, all states will be tried as starting states so the goal, or best state, will eventually be found. # # The Eight-Queens Problem # Place eight queens on a chess board so that no queen is attacking # another. Each queen must be in one of the 8 columns, so each queen # can be placed in one of the 8 rows, for a total of $8^8 \approx 17$ # million states. Actions are moving a single queen to a different row # in the same column. The objective function is the number of pairs of # queens attacking each other. This function is to be minimized, of # course. See [this animation](https://www.youtube.com/watch?v=ckC2hFdLff0) # and [this nice explanation](https://www.youtube.com/watch?v=jPcBU0Z2Hj8) # # Hill-climbing search as described only finds solutions 14% of the time, but solves those instances quickly, in an average of 4 steps. When it gets stuck this is discovered in an average of 3 steps. # # | Hill-climbing Variations | Percent Solved | Steps to Find Solution | Steps to Know Failure | # | :--: |:--: |:--: |:--: | # | basic | 14% | 4 | 3 | # | with sideways moves | 94% | 21 | 64 | # | with restarts | 100% | 22 | | # | with sidways moves and restarts | 100% | 28 | | # # Hard Problems # # Hard ones for hill-climbing are ones with many local optima. NP-hard problems often have an exponential number of local optima, but states with pretty good value can often be found with a small number of restarts. # # Simulated Annealing # # Hill-climbing searches will get stuck on local optima. Only by adding # random restarts can you have a hill-climbing algorithm that is # complete. # # To get off of a local optimum, a search must be defined to allow steps # that are "downhill" for maximizing searches, and "uphill" for # minimizing searches, away from the optimum. # # **Simulated annealing** # is an algorithm that does this probabilisitically. Assume we are # doing a maximizing search, meaning we want to find the state with the # maximum value. Let the value of # the current state be $v$. Imagine an action has been applied to that # state and the resulting state has a lower (worse) value $v'$. Simulated # annealing will accept this new state as the current state with # probability $e^{(v' - v)/T}$. $T$ is like a "temperature", the higher # the value the more likely we are to take a step to a state with a # worse value. In practice, $T$ starts at a high value and is slowly # decreased towards zero. If it is decreased "slowly enough", the # global optimum will be found with probabilty 1. In other words, this is a # **complete** algorithm. # In[2]: import numpy as np import matplotlib.pyplot as plt get_ipython().run_line_magic('matplotlib', 'inline') def probOfAcceptance(dE, T): r = np.exp(dE/T) r[r>1] = 1.0 return r dE = np.linspace(-5, 5, num=100) plt.figure(figsize=(10,7)) plt.clf() legendText = [] for T in [0.1, 1, 10, 100]: plt.plot(dE, probOfAcceptance(dE,T), 'o-') legendText.append('Temperature = {:1f}'.format(T)) plt.xlabel('New Value - Current Value') plt.ylabel('Probability of Accepting New State') plt.legend(legendText,loc='lower right'); # # Local Beam Search # # The above searches keep just one state in memory. **Local beam # search** keeps the best $k$ states in memory. The successors of all # $k$ states are generated, and the best $k$ among them are kept. # # A variant is **stochastic beam search** that selects the $k$ states to # keep as a probabilistic function of their values. This search tends # to maintain more diversity in the set of kept $k$ states on each # iteration. # # Genetic Algorithms # # **Genetic algorithms** are a lot like stochastic beam search. A set # of $k$ states are kept. The difference is in how successor states are # generated. # # Genetic algorithms generate successor states by combining parts of # good states to make new ones (the **crossover** operator) and by # randomly modifying parts (the **mutation** operator). Then the values # of the new ones are used to select the best to keep for the next # iteration, or generation. # # Many variations on representations of states as strings of symbols, # operators, and on ways of selecting winners for the next generation. # # Local Search in Continuous State and Action Spaces # ## Without Derivatives of Value Function # ### Binary Search # A monotonic function of a single variable can be optimized using # binary search. # # # # Here is a useful application of binary search. # ### Effective Branching Factor # How can we compare the efficiency of different search algorithms, or of # the A\* algorithm using different heuristic functions? We can use the # computation time and the maximum amount of memory used during the # search. # # It would also be nice to know how "focused" a search is. If we use a # perfect heuristic function, we should get a search that won't explore # any nodes that do not lie on an optimal search path. # # A measure related to this is the "effective branching factor". The # branching factor of a tree is the number of children at each node. If # this is not the same for all nodes, then we can find an average, or # "effective" branching factor. # # If we do a breadth-first (or depth-first) search on a tree like this # # # # we will have explored 1 + 3 + 6 = 10 nodes. What is the effective # branching factor for this tree? Or, stated mathematically, what is # the value of $b$ in # # $$ 1 + b + b^2 = 10$$ # # Now we can solve this for $b$. What should the value be, roughly? # Solve it exactly. What do you get? # We can solve this one exactly, but what if the search is 10 levels # deep? We will have to do a search (yay!) to find the value of $b$. # How? Could pick a whole bunch of values randomly..... # # How about a binary search? To start a binary search we need to pick a # low and high value of $b$ that we know bracket the true value. For a # search that explored $n$ nodes to a maximum depth of $d$, what would # good low and high values be? Now do a binary search between these two # values and for each new guess at $b$, calculate # # $$ 1 + b + b^2 + \cdots + b^d$$ # # and compare the result with the actual number of nodes, $n$. Use the # comparison to continue the binary search. Continue until the range of # possible values of $b$ are within a desired precision, such as $0.01$. A faster way # to calculate the above quantity is # # $$ \frac{1-b^{d+1}}{1-b}$$ # # How would you derive this? # # To estimate the effective branching factor, you could define a # function `def ebf(numberNodes, depth, precision=0.01)`. Then, to use it # # # In [3]: ebf(100, 12, 0.01) # Out[3]: 1.3034343719482422 # # In [4]: ebf(0, 0) # Out[4]: 0 # # In [5]: ebf(1, 1,) # Out[5]: 0.0078125 # ### Run and Twiddle # # Take a step in a random direction from current state. If value is # higher, keep going in that direction. If value is lower, pick a new # random direction. The "smallest" implementation of this algorithm is # found in [bacteria](http://www.pnas.org/content/99/1/7.full). # ### Nelder-Mead Simplex Algorithm # # This is not in our text. See the # [Scholarpedia # entry for Nelder-Mead](http://www.scholarpedia.org/article/Nelder-Mead_algorithm). This maintains a set of $k$ good states. To # generate one next state to evaluate, the worst of the $k$ states is # reflected through the centroid, or mean, of the $k$ states. If the # new state is better than the worst of the $k$ states, it replaces the # worst state. # # Also see the [Python # documentation for fmin function in scipy](https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.fmin.html). # ### CMA-ES # # Stands for [Covariance # Matrix Adaptation Evolution Strategy](https://en.wikipedia.org/wiki/CMA-ES). A Gaussian distribution is # fit to the set of good states and a new state is generated by sampling # probabilistically from the Gaussian distribution. The Gaussian # distribution is updated when a new good state is found. # [Recent # results show](http://hal.archives-ouvertes.fr/docs/00/17/32/07/PDF/geccodcma.pdf) show that certain quasi-random methods are # computationally more efficient. # ## With Derivatives of Value Function # # When the derivative of the value function with respect to each # component of the state can be calculated, then many derivative-based # searches are available. If you know the slope at the current state, # you know which direction to go. If you also know the second # derivative, you can make some intelligent guesses at how far to go. # # If the state has multiple components, almost always the case, the # derivatives are gradients, and the second derivatives are Hessian # matrices. # Let's find the minimum of the function # # $$f(x) = 2 x^4 + 3 x^3 + 3$$ # # Its first derivative is # # $$\frac{df(x)}{dx} = 8 x^3 + 9 x^2$$ # # and its second derivative is # # $$\frac{d^2f(x)}{dx^2} = 24 x^2 + 18 x$$ # # In[31]: import time import IPython.display as ipd # for display and clear_output import random def f(x): return 2 * x**4 + 3 * x**3 + 3 def df(x): return 8 * x**3 + 9 * x**2 def ddf(x): return 24 * x**2 + 18*x def taylorf(x,dx): return f(x) + df(x) * dx + 0.5 * ddf(x) * dx**2 xs = np.linspace(-2,1,num=100) dxs = np.linspace(-0.5,0.5,num=100) fig = plt.figure(figsize=(18, 8)) for rep in range(5): x = random.uniform(-2, 1) # first guess at minimum for step in range(10): time.sleep(1) # sleep 2 seconds plt.clf() plt.plot(xs, f(xs)) plt.grid('on') plt.plot(x+dxs, taylorf(x, dxs),'g-',linewidth=5, alpha=0.4) plt.plot(x, f(x), 'ro') y0, y1 = plt.ylim() plt.plot([x, x], [y0, y1], 'r--') if step == 0: plt.text(x, 10, 'New first x', color='r', fontsize=40) plt.text(x + 0.05, (y0 + y1) * 0.5, str(x), color='r', fontsize=20) x = x - df(x) / float(ddf(x)) plt.plot(x, f(x), 'go') plt.legend(('$f(x)$', '$\hat{f}(x)$')) ipd.clear_output(wait=True) ipd.display(fig) ipd.clear_output(wait=True) # ### Try these # # 4.1 What kind of search do we get from # # - Local beam search with $k=1$? # - Local beam search with one initial state and no limit on the number of states retained? # - Simulated annealing with $T=0$ at all times? # - Simulated annealing with $T=\infty$? # - Genetic algorithm with population size $k=1$? # - Gradient-ascent in a discrete state and action space? # - Nelder-Mead with $k=2$ in a two-dimensional state space? # # On-Line Search Algorithms (from Section 4.5) # # For many real-world problems, an agent cannot predict the outcome of # applying an action in the current state. It must take the action and # observe the new state. # # An agent can end up in a dead-end from which it cannot recover. We # will ignore this, and assume that a search problem is **safely # explorable**---some goal state is reachable from every reachable # state. # # An on-line search agent cannot explore the successors to an arbitrary # node, but only the successors of the current node. This is the nature # of depth-first so the authors adapt off-line dept-first search to # an on-line search algorithm. Results of each action are stored in a # map that associates each state-action pair with the resulting state. # It assumes it is possible to undo, or reverse, each action in order to # backtrack. # # An on-line version of A* is learning real-time A* , or LRTA* # Instead of maintaining memory of states as in the above depth-first # algorithm, LRTA* only maintains a memory of the current best estimate # of cost to reach a goal for each node. It updates this estimate after # each action is tried. It assumes it can identify all successors for a # state and compare their remaining cost estimates. An interesting # aspect is that all untried actions are assumed to result in a state # whose heuristic function value is correct, a feature called **optimism # under uncertainty**. # # Learning better estimates of remaining cost based on single steps is # at the heart of **reinforcement learning**, covered in Chapter 21.