Monte Carlo Tree Search (MCTS) Tutorial – Complete Guide with Examples

Monte Carlo Tree Search (MCTS) Tutorial – Complete Guide with Examples
Master Monte Carlo Tree Search (MCTS) with this beginner-friendly guide. Learn the theory, step-by-step algorithm, and practical examples used in AI and game-playing systems.
What is MCTS?
Monte Carlo Tree Search (MCTS) is a powerful decision-making algorithm used in games and simulations. It balances exploration (trying new moves) and exploitation (choosing moves known to be good).
Quick Links
Watch Video Tutorial
MCTS Steps Diagram

1. Tree Traversal (Selection)
Select a node to explore by traversing the tree from the root using the UCB1 formula:
Where:
- = total value of node
- = number of visits to node
- = number of simulations from the parent node
- = exploration constant
2. Node expansion
Generate one or more child nodes from the selected node.
3. Rollout (random simulation)
Play out the game from the new node using random moves until a terminal state is reached.
4. Backpropagation
Update the values of the nodes on the path taken during selection based on the result of the simulation.
🔁 This process is repeated until the algorithm converges on a solution and learns the optimal policy.
Example

You are given a snapshot of a Monte Carlo Tree Search (MCTS) in progress. Starting from this state, perform two additional rollouts with the terminal values 15 and 75. Then, determine the updated best strategy based on these rollouts. If a random choice is needed at any point, always prefer the right node.
Start from the root node and recursively select child nodes using the UCB1 formula:
Rollout 1 (value = 15)
🔹 Step 1: Selection from S0
We compute UCB1 for both child nodes of S0.
Let:
→ For S1:
→ For S2:
✅ Select S1 (higher UCB1)

🔹 Step 2: Expansion
Since S1 has unvisited children, we expand it by adding a new child node S3.

🔹 Step 3: Simulation (Rollout)
From S3, we simulate a random move leading to a terminal state with value 15.

🔹 Step 4: Backpropagation
Backpropagate the value 15 from S3 to S1 and S0.
-
Update S3, S1, S0:
S3: t3 = 15, n3 = 1 S1: t1 = 60 + 15 = 75, n1 = 2 + 1 = 3 S0: t0 = 83 + 15 = 98, n0 = 3 + 1 = 4

Rollout 2 (value = 75)
🔹 Step 1: Selection from S0
We compute UCB1 again for S1 and S2.
→ For S1:
→ For S2:
✅ Select S1 again (higher UCB1)

🔹 Step 2: Selection from S1
Now both children visited once:
- S3:
- S4:
Let’s compute UCB1 for both (with , parent visits = 3):
→ For S3:
→ For S4:
✅ Select S4

🔹 Step 3: expansion
Since S4 has unvisited children, we expand it by adding a new child node S6.

🔹 Step 4: Simulation (Rollout)
From S6, we simulate a random move leading to a terminal state with value 75.

🔹 Step 5: Backpropagation
Backpropagate the value 75 from S6 to S4, S1, and S0.
-
Update S6, S4, S1, S0:
S6: t6 = 75, n6 = 1 S4: t4 = 40 + 75 = 115, n4 = 1 + 1 = 2 S1: t1 = 75 + 75 = 150, n1 = 3 + 1 = 4 S0: t0 = 98 + 75 = 173, n0 = 4 + 1 = 5

Q&A
1. The Rollout Step Follows Which Algorithm?
The Rollout step in Monte Carlo Tree Search (MCTS) typically follows a default policy, which is often:
A random simulation policy (i.e., play random legal moves until the game ends).
This is not a complex algorithm — it’s usually a simple random playout to evaluate how promising a move is without deeper computation.
However, in advanced cases (like AlphaGo), the rollout can use:
- Heuristic-based simulation
- Or even a learned policy network
But for basic MCTS:
🔁 Rollout = random simulation from the current node to a terminal state
2. What Is a Terminal Node?
A terminal node is a state in the game where no further actions are possible — meaning the game has ended.
It could be:
- A win
- A loss
- Or a draw
Example: Tic-Tac-Toe
Let’s say you’re building an MCTS for Tic-Tac-Toe.
Imagine this board state:
X | O | X
---------
X | O | O
---------
O | X | X
In this case:
- All 9 cells are filled
- No more moves can be made
- It’s a draw
This board is a terminal node in the MCTS tree.
If the rollout reaches this node, the reward is evaluated (say, 0 for draw), and that value is then backpropagated up the tree.
Summary
Term | Meaning |
---|---|
Rollout | Simulate a game to the end using random or heuristic moves |
Terminal Node | A game state where no moves are left (win/loss/draw) |
Example | Full Tic-Tac-Toe board with no empty spots → terminal draw state |
3. What if no unvisited nodes exist?
If all possible actions from a node have already been visited, then:
- Expansion is skipped
- The algorithm goes directly to simulation (rollout) from one of the already expanded child nodes
In short:
No unvisited nodes ⇒ Skip expansion ⇒ Continue with simulation and backpropagation
4. What if the node is a terminal state?
If during selection:
- All children have been visited, and
- The current node is a terminal state (i.e., the game has ended)
Then:
- Skip expansion
- Skip simulation
You directly:
- Backpropagate the terminal reward (win/loss/draw)
Final Note
These four steps are repeated X times (where X is the number of iterations or simulations you want to run). Over time, the tree prioritizes better actions using a balance of exploration and exploitation, learning the optimal game policy.