Week 3 of SoK 2025
This week, I focused on integrating the Monte Carlo Tree Search (MCTS) algorithm into the MankalaEngine. The primary goal was to test the performance of the MCTS-based agent against various existing algorithms in the engine. Let's dive into what MCTS is, how it works, and what I discovered during the testing phase.
What is Monte Carlo Tree Search (MCTS)?
The Monte Carlo Tree Search (MCTS) is a heuristic search algorithm used for decision-making in sequential decision problems. It incrementally builds a search tree and simulates multiple random moves at each step to evaluate potential outcomes. These simulations help the algorithm determine the most promising move to make.
How Does MCTS Work?
MCTS operates through four key steps:
1. Selection
The algorithm starts at the root node (representing the current game state) and traverses down the tree to a leaf node (an unexplored state). During this process, the algorithm selects child nodes using a specific strategy.
A popular strategy for node selection is Upper Confidence Bounds for Trees (UCT). The UCT formula helps balance exploration and exploitation by selecting nodes based on the following equation:
UCT = mean + C × sqrt(ln(N) / n)
Where:
mean
is the average reward (or outcome) of a node.N
is the total number of simulations performed for the parent node.n
is the number of simulations performed for the current child node.C
is a constant that controls the level of exploration.
2. Expansion
Once the algorithm reaches a leaf node, it expands the tree by adding one or more child nodes representing potential moves or decisions that can be made from the current state.
3. Simulation
The algorithm then performs a simulation or rollout from the newly added child node. During this phase, the algorithm randomly plays out a series of moves (typically following a simple strategy) until the game reaches a terminal state (i.e., win, loss, or draw).
This is where the Monte Carlo aspect of MCTS shines. By simulating many random games, the algorithm gains insights into the likely outcomes of different actions.
4. Backpropagation
After the simulation ends, the results are propagated back up the tree, updating the nodes with the outcome of the simulation. This allows the algorithm to adjust the expected rewards of the parent nodes based on the result of the child node’s simulation.
read more here - MCTS
Implementing MCTS in C++ for MankalaEngine
With a solid understanding of the algorithm, I began implementing MCTS in C++. The initial step involved integrating the MCTS logic into the benchmark utility of the MankalaEngine. After resolving a series of issues and running multiple tests, the code was functioning as expected.
Testing Results
I compared the performance of the MCTS agent against other existing agents in the MankalaEngine, such as Minimax, MTDF, and Random agents. Here’s how the MCTS agent performed:
Random Agent (Player 1) vs. MCTS (Player 2)
- MCTS won 80% of the time
MCTS (Player 1) vs. Random Agent (Player 2)
- MCTS won 60% of the time
MCTS vs. Minimax & MTDF
- Unfortunately, MCTS consistently lost against both Minimax and MTDF agents. 😞
Key Improvements for MCTS
While MCTS performed well against the Random Agent, there is still room for improvement, especially in its simulation phase. Currently, the algorithm uses a random policy for simulations, which can be inefficient. To improve performance, we can:
- Use more efficient simulation policies that simulate only promising moves, rather than randomly selecting moves.
- At the start of the Selection step, focus on moves that have historically been good opening strategies (this requires further research to identify these moves, especially in Pallanguli).
- Fine-tune the exploration-exploitation balance to improve decision-making.
Upcoming Tasks
In the upcoming week, I plan to:
- Write test cases for the Pallanguli implementation.
- Review the implementation of Pallanguli.