An intuitive explanation of how computers play board games (MCTS algorithm)

Ryan Leong
21 min readDec 1, 2024

--

First, a note to the different audiences that may read this article.

  1. If you don’t have a background in coding, or just want a high level overview, the first section “A human analogy for search algorithms” is all you need to read.
  2. If you know how to code and want to get technical, keep going after that :)

A human analogy for search algorithms

How do computers determine the best move in a board game like chess? Well, there a few main methods. One of these methods is a search algorithm known as minimax. To give you a rough intuition of how minimax works, imagine the minimax algorithm as a human player called Max, with a certain play style.

Every time it is Max's turn, they will consider every single move they can make on the board, even if it's as crazy as putting their queen up for easy grabs. You see, Max never jumps to conclusions. In their mind, you never know when that seemingly dubious queen move is actually a brilliant sacrifice.

For every move Max is considering, they'll check every single response that their opponent can make. Max likes to be very thorough in their analysis. They never dismiss any move when calculating variations.

At some point, after Max has calculated to a certain depth, they'll stop and just use their general knowledge of chess to determine roughly how good the position is.

For example, if Max calculates a variation where there is a forced combination to win a queen, then they'll assume that variation is likely very good for them. Even though being up a queen doesn't guarantee a win, it significantly increases one's chances of winning.

Finally, when Max is calculating, they assume that their opponent always plays the best move. Therefore, Max never tries to set up traps that they hope their opponent will fall for, or try to set up risky checkmate because they hope their opponent will miss it.

Max is very good at playing open positions (for those not into chess, think of open positions as positions with many opportunities for big sacrifices and forced combinations that lead to gaining material or checkmate) because they thoroughly look at every single possibility without missing anything.

However, they are less good at playing closed positions where each move is about making a tiny improvement to one's own position. Of course this depends on how good Max's existing knowledge of chess positions is, but it's generally harder for Max to find good moves when their calculations do not lead to clear outcomes like checkmate or obvious positional or material advantages.

If you're wondering what "existing chess knowledge" is equivalent to for a computer, it's the static evaluation of a position. In other words, the computer uses a set of rules to give a score to any given position. These rules can include who has more material, whose pieces have more mobility, and other things that give a rough indicator of who is winning in a given position is.

As you noticed, Max has some weaknesses, so let me introduce you to another chess player called Carlo, whose play style is resembles the Monte Carlo Tree Search (MCTS) search algorithm.

Just like Max, Carlo always assumes their opponent plays the best move.

However, when it comes to how they spend their time, Carlo likes to consider a handful of moves that seem promising given their general knowledge of chess, and explore those moves further. Carlo doesn't bother calculating what happens from crazy looking moves like queen sacrifices.

As a result, Carlo doesn't do great in open positions with very specific combinations that win material, especially against players like Max. This is because the way Carlo dismisses moves, they'll often miss that one opponent response which punishes Carlo's move, when all the other opponent responses would've let to Carlo having a better position.

On the other hand, because Carlo chooses a few promising moves to look into, they do better than Max in closed positions, because they have more time to explore the variations they've chosen deeper, so they can be more confident in whichever move they've chosen out of the few promising moves.

But did you know, Carlo's play style is actually on a spectrum! On one end of the spectrum (maximum exploitation), Carlo will choose very few promising moves (top 1-2 moves) to explore, and when exploring each of these moves, they'll only look at what they believe are the top 1-2 opponent responses. This only tends to work out if Carlo's general chess knowledge is VERY strong. On the other end of the spectrum (maximum exploration), Carlo looks at all possible moves, and calculates all possible variants exactly like Max. Usually Carlo plays best somewhere in between these extremes.

Today, we'll be focusing on MCTS, since it has some applications in machine learning that standard Minimax not well suited for. We'll be using tic-tac-toe as the test medium, because chess has too many possibilities for MCTS to work well in a short amount of time.

Evaluating tic-tac-toe board positions

We’ll start by coming up with a way to evaluate board positions. We could just use a minimax algorithm to search each tic-tac-toe position to the very end since it is such a small game, but for games with a higher branching factor we would need to give the minimax algorithm a way to score each leaf position if it’s not a final game state (win/loss/draw). Like in chess, 99% of the time it’s not possible to calculate every possible combination that leads to a final game state, so instead we evaluate the final positions that we’ve calculated to with our general knowledge of chess.

However, there is actually another way to evaluate positions which doesn’t require us to come up with a rule set specific to the game! It’s called using random playouts, and it’s commonly used for MCTS. More on that later!

Now would be a good time to open up a code editor and start following along :)

First off, we need a way to represent our tic-tac-toe board. Here’s a Board class for convenience. Player tokens are represented with -1s and 1s, empty spaces are represented by 0s.

Create a board for tic-tac-toe with this code. The 1 for the 4th parameter indicates that the 1s player will move first.

board = Board(3, 3, 3, 1)

Make moves by passing a tuple with (x, y) coordinates of the square you want to place a token on. The Board class automatically ensures the players alternate turns.

board.make_move((0, 0))  # Places a token in the top-left corner

Don’t worry about the details of the implementation.

"""Board"""
class Board():
def __init__(self, board_width, board_height, win_length, turn, state=None, empties=None):
self.board_width = board_width
self.board_height = board_height
self.win_length = win_length
self.turn = turn
if state == None:
self.state = tuple([0. for _ in range (board_width)] for _ in range(board_height))
self.empties = [(x, y) for y in range (board_height) for x in range(board_width)]
else:
self.state = tuple([x for x in y] for y in state)
self.empties = [e for e in empties]

def __str__(self):
board_state_as_str = ""
board_state = [row.copy() for row in self.state]
for row in board_state:
for i, token in enumerate(row):
if token == 1.0:
row[i] = "X"
elif token == -1.0:
row[i] = "O"
elif token == 0.0:
row[i] = "_"
board_state_as_str += str(row) + "\n"
return board_state_as_str

def deepcopy(self):
board = Board(self.board_width, self.board_height, self.win_length, self.turn, self.state, self.empties)
return board

def out_of_bounds(self, square):
return (square[0] < 0 or square[1] < 0 or square[0] >= self.board_width or square[1] >= self.board_height)

def outcome(self, updated_square):
player_num = self.state[updated_square[1]][updated_square[0]]
vectors = []
for x in [-1, 0, 1]:
for y in [-1, 0, 1]:
if x == 0 and y == 0:
continue
if updated_square[0] - 1 < 0 and x == -1: # past left edge
continue
if updated_square[0] + 1 > self.board_width - 1 and x == 1: # past right edge
continue
if updated_square[1] - 1 < 0 and y == -1: # past top edge
continue
if updated_square[1] + 1 > self.board_height - 1 and y == 1: # past bottom edge
continue
neighbour_x, neighbour_y = updated_square[0] + x, updated_square[1] + y
if self.state[neighbour_y][neighbour_x] == player_num:
vector = (neighbour_x - updated_square[0], neighbour_y - updated_square[1])
if vector not in [(v[0]*-1, v[1]*-1) for v in vectors]:
vectors.append(vector)
for vector in vectors:
line_length = 1
reversing = False
i = 1
while i < self.win_length:
if not reversing:
x = updated_square[0] + i * vector[0]
y = updated_square[1] + i * vector[1]
next_square = x, y
else:
x = updated_square[0] + i * -vector[0]
y = updated_square[1] + i * -vector[1]
next_square = x, y

if self.out_of_bounds(next_square) \
or self.state[next_square[1]][next_square[0]] != player_num:
if not reversing:
reversing = True
i = 0
else:
break
else:
line_length += 1
if line_length >= self.win_length:
return player_num
i += 1
if len(self.empties) == 0:
return 0
return None

def make_move(self, move_coords: tuple):
self.empties.remove(move_coords)
self.state[move_coords[1]][move_coords[0]] = self.turn
self.turn *= -1

def grid_print(grid, space=0, decimals=0, title=""): # Prints the board neatly.
print(title)
if grid != None:
for row in grid:
print("[", end="")
for col in row:
print(" ", end="")
exec("""print("{:%s.%sf},".format(col), end=" ")""" % (space, decimals))
print("]")
print()
else:
print("No output\n")

Cool! Back to random playouts. Here’s the general idea:

Imagine there is a chess position where White has one move that captures Black’s queen for free, and the rest of their moves don’t. What MCTS does with random playouts is: for every single move White has in that position, it simulates a bunch of random continuations from that move until the game is over. Given enough simulations, we should find that the move where White captured their opponent’s queen leads to an overall higher win rate compared to the moves where it didn’t.

I encourage you to try coding this yourself! The exercise is below.

For the exercise, we’d like the MCTS function to return a 2D list where every move is assigned a value between -1 and 1, where -1 indicates the move is very likely to lead to a win for the -1s player. Same idea for the 1s player. 0 indicates a likely draw.

For example, in this position where it is the -1s player’s turn to move:

[[-1,-1, 1],
[ 1,-1, 0],
[ 1, 0, 1]]

The MCTS function should return the following grid, since placing on the bottom row results in a win for the -1s player while placing on the right column allows the 1s player to win by going on the bottom row the next turn

[[ 0.0, 0.0, 0.0],
[ 0.0, 0.0, 1.0],
[ 0.0,-1.0, 0.0]]
# Exercise
class MCTS:
def __init__(self, board):
self.board = board

def search(self):
for move in self.board.empties:
pass # give the move a score based on several random games from that move onward

# Test cases
board1 = Board(3, 3, 3, 1)
board1.make_move((0, 0))
board1.make_move((1, 1))
board1.make_move((0, 1))
board1.make_move((0, 2))

board2 = Board(3, 3, 3, 1)
board2.make_move((0, 0))

board3 = Board(3, 3, 3, 1)
board3.make_move((0, 0))
board3.make_move((1, 1))
board3.make_move((2, 2))

# Your final scores should roughly resemble the scores in the grids below.
grid_print(board1.state, 2, 0, "Board 1")
mcts_searcher1 = MCTS(board1)
grid_print(mcts_searcher1.search(), 5, 2, "Value grid")
# [[0.00, -0.17, 0.17],
# [0.00, 0.00, -0.83],
# [0.00, -0.33, -0.50]]

grid_print(board2.state, 2, 0, "Board 2")
mcts_searcher2 = MCTS(board2)
grid_print(mcts_searcher2.search(), 5, 2, "Value grid")
# [[0.00, 0.45, 0.29],
# [0.45, 0.12, 0.45],
# [0.29, 0.45, 0.29]]

grid_print(board3.state, 2, 0, "Board 3")
mcts_searcher3 = MCTS(board3)
grid_print(mcts_searcher3.search(), 5, 2, "Value grid")
# [[ 0.00, -0.10, -0.17],
# [-0.10, 0.00, -0.10],
# [-0.17, -0.10, 0.00]]

Great work!

Short summary of my solution:

  1. For each potential move, we first check if it immediately ends the game, and return the corresponding value (-1 for a -1s player win, 0 for a draw, and 1 for a 1s player win) if it does. Otherwise, we move on to simulating games to get an estimate for how good the move is.
  2. In the simulate function, we play a set number of random games and take the average outcome of all the random games played to give us our final expected outcome.
  3. The play_random_game function keeps making random moves for each side until the game ends. It returns a value of -1 for a -1s player win, 0 for a draw, and 1 for a 1s player win.
# Solution
import random


class MCTS:
def __init__(self, board):
self.board = board

def simulate(self, board, reps):
total_value = 0
for _ in range (reps):
total_value += self.play_random_game(board.deepcopy())
return total_value / reps

def play_random_game(self, board):
game_on = True
while game_on:
move = random.choice(board.empties)
board.make_move(move)
value = board.outcome()
if value != None:
return value

def search(self):
move_probability_grid = [[0.0 for x in range(3)] for y in range(3)]
for move in self.board.empties:
new_board = self.board.deepcopy()
new_board.make_move(move)
value = new_board.outcome()
if new_board.outcome() == None:
value = self.simulate(new_board, 5000)
move_probability_grid[move[1]][move[0]] = value
return move_probability_grid

# Test cases
board1 = Board(3, 3, 3, 1)
board1.make_move((0, 0))
board1.make_move((1, 1))
board1.make_move((0, 1))
board1.make_move((0, 2))

board2 = Board(3, 3, 3, 1)
board2.make_move((0, 0))

board3 = Board(3, 3, 3, 1)
board3.make_move((0, 0))
board3.make_move((1, 1))
board3.make_move((2, 2))


# Your final scores should roughly resemble the scores in the grids below.
grid_print(board1.state, 2, 0, "Board 1")
mcts_searcher1 = MCTS(board1)
grid_print(mcts_searcher1.search(), 5, 2, "Value grid")
# [[0.00, -0.17, 0.17],
# [0.00, 0.00, -0.83],
# [0.00, -0.33, -0.50]]

grid_print(board2.state, 2, 0, "Board 2")
mcts_searcher2 = MCTS(board2)
grid_print(mcts_searcher2.search(), 5, 2, "Value grid")
# [[0.00, 0.45, 0.29],
# [0.45, 0.12, 0.45],
# [0.29, 0.45, 0.29]]

grid_print(board3.state, 2, 0, "Board 3")
mcts_searcher3 = MCTS(board3)
grid_print(mcts_searcher3.search(), 5, 2, "Value grid")
# [[ 0.00, -0.10, -0.17],
# [-0.10, 0.00, -0.10],
# [-0.17, -0.10, 0.00]]

After we run the cell, we can see our algorithm performs quite well in the first two examples.

In the 1st example, it is the 1s player’s turn. The algorithm indicates that playing in the top-right corner gives the 1s player the best chance of winning. In other words, it has correctly recognized that the best move is for the 1s player to block their opponent’s win threat.

In the 2nd example it is it is the -1s player’s turn. Our MCTS algorithm was able to recognize that playing in the center gives the -1 player the best chance of winning, as seen by the assignment of the lowest value to the center square. In fact, any other move results allows the 1s player to force a win. You can verify this for yourself.

However, in the 3rd position, where it is the -1s player’s turn to move, the two moves it thinks give the -1s player the best chance at winning are actually the only two moves that allow the 1s player to force a win!

So random playouts are a pretty good way of evaluating a position, but it clearly isn’t perfect. That’s why we need to implement the rest of the search tree so the MCTS algorithm can also operate under the assumption that each player will choose the best move for themselves

Back to our analogy with Carlo, what we’d like to do is have our algorithm choose a few interesting moves and explore them deeper, while keeping the assumption that their opponent will always choose the best move as well.

The 4 steps of MCTS

The actual implementation of MCTS is little different from how I’ve described Carlo, our human player analogy for MCTS. MCTS has four main phases: Selection, Expansion, Evaluation, and Backpropagation. These 4 steps are repeated in a cycle for a fixed number of times. The more times we repeat this cycle, the more parts of the game tree we get to explore, and the stronger the final suggested move will be.

Selection

Selection is about deciding which moves to explore deeper. The way we score a move is:

score = value + exploration bonus

For a given position, a value of -1.0 means our evaluation predicts the -1s player is highly likely to win, and 1.0 means it predicts the 1s player is highly likely to win. Just like we did with random playouts.

For now, think of exploration bonus as 1/# times a move has been explored. Essentially, we want to go more in depth and further evaluate moves that our initial evaluation suggests are good, but we also want to explore moves that we haven’t looked at as thoroughly in case they are actually good moves. It’s a balancing act between exploitation (keep exploring good moves) and exploration (checking out more alternatives).

How much our algorithm explores is actually determined by an exploration/exploitation parameter which we’ll call “c”, which we can use assign different weight to the value and exploration bonus. So our final score formula looks more like:

score = value + c * exploration_bonus

Finally, when deciding on the “value” for a move, we choose the move based on which player’s turn it is.

In the following image, each box represents a position, with “Root” indicating our starting position. The numbers in each box indicate our evaluations of which player is most likely to win in each position. Each arrow represents a move. Assume in the root position it is the 1s player’s turn to move.

For simplicity, let’s ignore the exploration bonus for now. When we choose our first move, we choose from the 1s player’s perspective, meaning we want to take move that leads to the position with the highest value. However, on the next layer, we choose from the -1s player’s perspective because it is now their turn. This time, we want to take a move leading to the position with the lowest value. We repeat this process until we reach a leaf node, which is a node where none of the child positions have been explored yet, either because the MCTS algorithm hasn’t had a chance to yet, or the node is a final game state (win/loss/draw).

Now here’s how the exploration bonus might come into play:

When we first run the MCTS algorithm, both exploration bonuses (E) are equal, so it’ll keep selecting the node on the right since it has a significantly higher initial evaluation. However, eventually, as the node gets explored more and more, the exploration bonus for exploring the node on the left grows until the overall score for choosing the move that leads to that position is higher than the score for the right side position. At this point we choose to select the node on the left instead, as seen below.

Why? Remember that our initial evaluation of any position is far from perfect, so the move on the left might not be as bad for the 1s player as it seems.

Maybe upon exploring the left position more, it turns out there is actually a brilliant queen sacrifice that gives the 1s player a huge positional advantage!

In that case, we will update the value of the left position to be much more favourable, and start exploring that more instead.

At this point you may be asking, how do we explore positions or update our evaluations of each position based on new information? We’ll answer both in the next few sections.

Expansion & Evaluation

Whenever we reach a leaf node, if it is a terminal node (win/loss/draw), we immediately skip to the backpropagation step. More on that later

If it is not a terminal node, there are two possibilities:

If it has not been evaluated yet, then we skip expansion and evaluate it with random playouts.

Otherwise, we expand the node by creating nodes to represent each of its child positions. Then we randomly choose one of the child positions to evaluate with a random playout. This way we ensure there is always an evaluation step in each cycle of MCTS. For the rest of the unexplored child nodes, we leave their exploration value at infinity, so if the MCTS algorithm ever selects their parent node again, it’ll then select one of the completely unexplored child nodes instead of an already explored child node.

When leaf node has not been evaluated: skip node expansion and evaluate the given leaf node instead

When leaf node is already evaluated: expand the node by creating children & evaluate a random child node

How do we evaluate? For tic-tac-toe we can reliably use random playouts, which give a pretty good estimate of how good a position is. Remember, the evaluation function we use doesn’t have to be perfect, because our algorithm will update the initial evaluations to be more accurate as it explores deeper!

Finally, how does each new cycle of exploration affect our final conclusion of what the best move is? That’s where backpropagation comes in!

Backpropagation

The core idea behind backpropagation is how we use our new information from each new evaluation to update the rest of the nodes in our search tree.

I’ve updated the nodes from our previous representation of the search tree to include a total value and visit count. The value of a node is calucated by value = total_value/visit_count. The node in orange has just been evaluated.

Below are the 3 steps of backpropagation. What we do is, starting from the parent of the updated node:

  1. We increase the visit count by 1
  2. We add the child node’s evaluation to the total value
  3. We calculate the new value using the formula: value = total_value/visit_count

Repeat for the next parent node until we reach the root node

Calculating value with total_value/visit_count ensures that nodes which don't have many evaluations contributing to their end value are updated more heavily than nodes which we are already confident about the values of, since they have a lot more evaluations supporting their end value.

And there you have it! The essence of MCTS!

Implementing MCTS with code

What’s next? Well the best way to learn something is to try and build it yourself :D

I encourage you to try implementing the MCTS algorithm based on the steps I’ve given you. If you get stuck, feel free to use Google or take a peek at the part of the solution code that you’re stuck on.

Before you start, here is the actual formula used to calculate the score of a child node when deciding which positions to explore further during the selection phase:

value = evaluation of the child node’s position

c = exploration/exploitation parameter (the greater c is, the more weight we give to exploration over exploitation, and vice versa)

Vp = visit count of the parent node

Vc = visit count of the child node

Okay you have all the tools you need to code MCTS now. Here’s some starter code too. You might need to modify it depending on what approach you use. Good luck😁

Your final output should be a nested list representing move probabilities for each move from 0 to 1. The sum of the probabilities for all moves should add up to 1.

Example:

[[-1, 0, 0],
[ 0, 1, 0],
[ 0, 0,-1]]

In the above position where it is the 1 player’s turn to move, your MCTS algorithm’s output should approach the probabilities in the grid below as you increase the number of cycles. Around 200 cycles you can see the algorithm clearly starting to favour the 4 edge squares over the top-right and bottom-left corners.

[[0.00, 0.25, 0.00],
[0.00, 0.00, 0.25],
[0.00, 0.25, 0.00]]

Hint: Remember that selection favours visiting good nodes more often, so the best moves will be visited the most times by the time the algorithm finishes.

import math

class Node:
def __init__(self, parent, board, move):
self.parent = parent
self.children = []
self.visit_count = 1
self.total_value = 0
self.board = board
self.move = move

def is_terminal(self):
return self.board.outcome() != None

def is_leaf(self):
return self.children == []

class MCTS:
def __init__(self, board):
self.root_node = Node(None, board, (-2, -2))

def uct(self, parent, node, player_num):
return (player_num * node.total_value/node.visit_count) + 1.4142 * math.sqrt(math.log(parent.visit_count) / node.visit_count)

def select(self):
pass

def expand(self):
pass

def evaluate(self):
pass

def backpropagate(self):
pass

def search(self, reps):
pass

Here’s my implementation. It’s just the coded version of what we talked about in the article. Feel free to reference it whenever you get stuck!

# Solution
import math
import random


"""MCTS"""
class Node:
def __init__(self, parent, board, move):
self.parent = parent
self.children = []
self.visit_count = 0
self.total_value = 0
self.board = board
self.move = move # Used as a parameter for the board outcome check since it requires knowing the last move played

def is_terminal(self):
return self.board.outcome(self.move) != None

def is_leaf(self):
return self.children == []

class MCTS:
def __init__(self, board):
self.root_node = Node(None, board, (-2, -2))

def uct(self, parent, node, player_num):
if node.visit_count == 0: # Prioritize selecting nodes that haven't been explored
return 1000000
else:
return (player_num * node.total_value/node.visit_count) + 1.4142 * math.sqrt(math.log(parent.visit_count) / node.visit_count)

def select(self, node):
if node.is_leaf():
return node
else:
max_value = -1000000
chosen_node = None
for child_node in node.children:
value = self.uct(node, child_node, node.board.turn)
if value > max_value:
max_value = value
chosen_node = child_node
return self.select(chosen_node)

def expand(self, node):
legal_moves = node.board.empties
for m in legal_moves:
child_board = node.board.deepcopy()
child_board.make_move(m)
child_node = Node(node, child_board, m)
node.children.append(child_node)


def evaluate(self, node, playouts):
outcome = node.board.outcome(node.move)
if outcome != None:
return outcome
else:
total_value = 0
for _ in range (playouts):
total_value += self.play_random_game(node.board.deepcopy())
return total_value / playouts

def play_random_game(self, board):
game_on = True
while game_on:
move = random.choice(board.empties)
board.make_move(move)
value = board.outcome(move)
if value != None:
return value

def backpropagate(self, node, evaluation):
node.visit_count += 1
node.total_value += evaluation
if node.parent != None:
self.backpropagate(node.parent, evaluation)

def search(self, iterations, playouts=20):
for _ in range(iterations):
node = self.select(self.root_node)
if node.visit_count == 0 or node.is_terminal():
node_eval = self.evaluate(node, playouts)
self.backpropagate(node, node_eval)
else:
self.expand(node)
child_node = random.choice(node.children)
child_node_eval = self.evaluate(child_node, playouts)
self.backpropagate(child_node, child_node_eval)
move_probs = [[0 for _ in range(self.root_node.board.board_width)] for _ in range(self.root_node.board.board_height)] # -1 ensures illegal moves are never chosen
for child_node in self.root_node.children:
prob = child_node.visit_count / self.root_node.visit_count
move_probs[child_node.move[1]][child_node.move[0]] = prob
return move_probs

Time to test your algorithm :D

from tabulate import tabulate  # prints mcts output in a nice grid


def mcts_vs_human(board, mcts_iterations):
game_on = True
while game_on:
if board.turn == 1:
mcts_searcher = MCTS(board)
mcts_move_probs = mcts_searcher.search(mcts_iterations)
max_prob = -1
for y in range(board.board_height):
for x in range(board.board_width):
if mcts_move_probs[y][x] > max_prob:
max_prob = mcts_move_probs[y][x]
chosen_move = (x, y)
print("Move probabilities:")
print(tabulate(mcts_move_probs, tablefmt="simple_grid"))
board.make_move(chosen_move)
else:
try:
x = input("x: ")
y = input("y: ")
chosen_move = (int(x), int(y))
board.make_move(chosen_move)
except Exception:
print("Invalid move, please try again")
continue
print("Board")
print(board)
value = board.outcome(chosen_move)
if value != None:
break

Note that the mcts algorithm always plays as the Xs (internally represented in the Board class as the 1s player). The 4th parameter to the Board class specifies with player goes first.

A ‘1’ means the algorithm goes first, ‘-1’ means you go first. You can also manually make moves like shown below to test your algorithm on specific positions.

Have fun experimenting!

b = Board(3, 3, 3, -1)
b.make_move((0, 0))
b.make_move((1, 1))
b.make_move((2, 2))

mcts_vs_human(b, 200)

Congrats🎉 You’ve successfully taught a computer how to play tic-tac-toe! In fact, the same approach can easily be generalized to any game.

We can even do a bit of that here! Remember the first 3 parameters of the board class? They actually allow us to set the board height, board width, and win length of our game! Let’s try experimenting with that!

b = Board(6, 6, 4, -1)
mcts_vs_human(b, 5000) # Increased # iterations to allow for better performance

It works pretty well! The algorithm knows to play in the center and try to get forks (making a move which results in two win threats at the same time, meaning your opponent can only block one)

However, it takes quite a while! What if we wanted to extend this to a game with more possibilities like Gomoku (objective: 5 in a row on a 15x15 board)? It would take a looooooooooong time to come up with ideal moves. Think about why…

If you thought about our evaluation function, you are correct! Since we use random playouts to evaluate games, the larger the board, the more moves involved in each random playout. Furthermore, we need A LOT more random playouts to get a better evaluation of a leaf node.

The second part is that we need more iterations of MCTS to expand our search tree enough to get a better sense of the different possible moves in our game.

So does that mean MCTS just takes forever for more complex games?

No! There’s a few things we can do. Firstly, we can replace the random playouts in our evaluation function with a neural network!

If you’ve never worked with neural networks, think of it as an analogy for a human’s intuition.

In the context of chess, most players who play regularly develop an intuition/natural instinct of what pieces they should consider moving in a given situation. For example, if an experienced chess player sees a bunch of enemy pieces near their exposed king, they’ll likely focus on protect their king, instead of some useless pawn move on the other side of the board.

Of course this intuition isn’t always perfect. In fact, when a player first starts out, they DO often consider every single possibility. However, eventually they learn from experience that it is not good to send their king to center of the board, because it always results in them getting checkmated. So gradually, they learn what’s important to focus on and what’s not.

Another more widly relatable analogy would be when someone first learns to drive, they’re bombarded by information from all around them (cars, signs, checking mirrors, pedestrains, etc)

However, over time they good at automating several behaviours essential to driving, becoming better drivers in the process.

A neural network is capable of improving in a similar fashion.

In the context of tic-tac-toe type games, given enough games, our neural network will learn things like how to easily spot an opponent win threat instead of having to evaluate every single move on a large board with random playouts.

It can also be used to guide the MCTS algorithm’s search. For example, it can tell the MCTS algorithm to focus on the centre because moves in the centre are generally better than moves on the edges.

If you want to learn more about neural networks and implement one that enables us to use MCTS for a larger board, stay tuned! Another article just for that is coming in a week!

--

--

Ryan Leong
Ryan Leong

Written by Ryan Leong

Innovator @ TKS | Deep Learning Game Agents

No responses yet