#!/usr/bin/env python # coding: utf-8 # $$ # \newcommand{\xv}{\mathbf{x}} # \newcommand{\Xv}{\mathbf{X}} # \newcommand{\yv}{\mathbf{y}} # \newcommand{\zv}{\mathbf{z}} # \newcommand{\av}{\mathbf{a}} # \newcommand{\Wv}{\mathbf{W}} # \newcommand{\wv}{\mathbf{w}} # \newcommand{\tv}{\mathbf{t}} # \newcommand{\Tv}{\mathbf{T}} # \newcommand{\muv}{\boldsymbol{\mu}} # \newcommand{\sigmav}{\boldsymbol{\sigma}} # \newcommand{\phiv}{\boldsymbol{\phi}} # \newcommand{\Phiv}{\boldsymbol{\Phi}} # \newcommand{\Sigmav}{\boldsymbol{\Sigma}} # \newcommand{\Lambdav}{\boldsymbol{\Lambda}} # \newcommand{\half}{\frac{1}{2}} # \newcommand{\argmax}[1]{\underset{#1}{\operatorname{argmax}}} # \newcommand{\argmin}[1]{\underset{#1}{\operatorname{argmin}}} # $$ # # Adversarial Search (Chapter 5) # ## Zero-sum # Initially we focus on games that are deterministic and completely # observable. We also assume that the payoff to # each player at the end of a game is equal and opposite, called # **zero-sum**. To get a true sum of zero, some games require # subtraction from each outcome. Imagine a win is value 1, a loss is # value 0, and a draw is 1/2. # # # | | Result | Subtract 1/2 | # | :--: | :--: | :--: | # | Win,Loss | Player A = 1, Player B = 0 | Player A = 1/2 Player B = -1/2 | # | Draw | Player A = 1/2, Player B = 1/2 | Player A = 0, Player B = 0 | # # Definition of a game: # * initial state $s_0$ # * $player(s)$: which player is to move in state $s$, # * $actions(s)$: legal actions from state $s$, # * $result(s,a)$: state that results, like our `take_action_f` # * $terminaltest(s)$: true when game is over # * $utility(s,p)$: payoff for player $p$ upon reaching state $s$. # ## Minimax # The two players in a two person game will be called `Max` and # `Min`. These names reflect the meaning of the $utility(s,p)$ # function, which is to be maximized by Player `Max` and minimized by # Player `Min`. # # The partial search tree in this next presentation illustrates the # reasoning behind the concept of alternate layers minimizing and # maximizing the utility value to back up a value from terminal states # to non-terminal states. # In[1]: from IPython.display import IFrame IFrame("http://www.cs.colostate.edu/~anderson/cs440/notebooks/minmax.pdf", width=800, height=600) # The calculation of the `minimax(s)` value of a state $s$ can be # summarized as # # $$ # \text{minimax}(s) = \begin{cases} # utility(s), & \text{if }terminaltest(s);\\ # \max_{a\in actions(s)} \text{minimax}(result(s,a)), & \text{if # }player(s) \text{ is Max};\\ # \min_{a\in actions(s)} \text{minimax}(result(s,a)), & \text{if # }player(s) \text{ is Min} # \end{cases} # $$ # # Assumes player `Min` plays optimally. If not, `Max` will do even # better. # # The textbook shows in Figure 5.3 the *minimax-decision* algorithm as # a depth-first search that altenates between calling `max-value` and # `min-value` functions. # ## Alpha Beta Pruning # Some of the search tree can be ignored if we know we cannot find a # better move from the best one found so far. If you are Player X in # Tic-Tac-Toe, and # * your best move so far will result in a draw, and # * the next move you are evaluating you discover your opponent can definitely win from, # * do not explore any other choices your opponent might have. # In[2]: IFrame("http://www.cs.colostate.edu/~anderson/cs440/notebooks/alphabeta.pdf", width=800, height=600) # For each node, keep track of # # $alpha$ is best value by any means # * Any value less than this is no use because we already now how to achieve at least a value of $alpha$ # * Minimum value Max can get # * Initially, negative infinity # # *beta* is worst value for the opponent # * Anything higher than this won't be useful to opponent # * Maximum score Min can get # * Initially, infinity # # The span between *alpha* and *beta* progressively gets smaller. # # Any position for which *beta* < *alpha* can be pruned. # In[3]: IFrame("http://www.cs.colostate.edu/~anderson/cs440/notebooks/alphabetatictactoe.pdf", width=800, height=600) # ## Stochastic Games # First, a definition of *expected value*. The average value of a lot # (infinite number) of dice rolls with a fair dice is # # $$ # (1+2+3+4+5+6) / 6 # $$ # # The *expected value* is exactly this average, but is defined as the # sum of the possible values times their probability of occurring. # # $$ # 1(1/6) + 2(1/6) + 3(1/6) + 4(1/6) + 5(1/6) + 6(1/6) # $$ # # If the 4, 5 and 6 sides are less likely than the other sides, then the # expected value might be # # $$ # 1(1/4) + 2(1/4) + 3(1/4) + 4(1/12) + 5(1/12) + 6(1/12) # $$ # A stochastic game is modeled by # simply adding a level of **chance nodes** between each player's levels in # the search tree. The various outcomes from the chance node have # certain probabilities of occurring. When backing up values through a # chance node, the values are multiplied by their probability of # occurring. # # # # # This illustrates the *expectimax* algorithm, for backing up values # through chance nodes. # An alternative approach is to do Monte Carlo simulation to estimate # the expected values. Perform many searches from the same node and at # each chance node select just one outcome with the corresponding # probability. Average over the resulting backed up values. Sometimes # called a *rollout*. # # Can alpha-beta pruning be applied to the expectimax algorithm? # # Seems like the answer is no; we must know all children to calculate # their weighted average values. But bounds can be placed on the # possible average value if we know bounds on the utility values. # # Can evaluation functions be applied to non-terminal nodes in # stochastic games? Yes, but must be careful, as Figure 5.12 # illustrates. Evaluation functions must be a positive linear # transformation of the expected utility from a position. # In[ ]: