Wrapping Up My SoK Journey
Introduction -
Over the last 10 weeks, I had the opportunity to contribute to MankalaEngine by exploring and integrating new algorithms for gameplay, as well as working on adding the Pallanguli variant to the engine. My journey involved researching about various algorithms like Monte Carlo Tree Search (MCTS), implementing Q-learning, an ML-based approach, and evaluating their performance against the existing algorithms of MankalaEngine. Also assisted in reviewing the implementation of the Pallanguli variant.
Implementing and Testing MCTS
I first explored Monte Carlo Tree Search (MCTS) and implemented it in MankalaEngine. To assess its effectiveness, I tested it against the existing algorithms, such as Minimax and MTDF, which operate at depth 7 before each move.
MCTS Performance Results -
Player 1 | Player 2 | MCTS Win Rate |
---|---|---|
Random | MCTS | 80% |
MCTS | Random | 60% |
Minimax | MCTS | 0% |
MCTS | Minimax | 0% |
MTDF | MCTS | 0% |
MCTS | MTDF | 0% |
The results was not good enough. This was expected because existing Minimax and MTDF algorithms are strong and operate at depth 7 before each move.
Moving to Machine Learning: Implementing Q-Learning.
Given MCTS's poor performance against strong agents, I explored Machine Learning (ML) techniques, specifically Q-Learning, a reinforcement learning algorithm. After learning its mechanics, I implemented and trained a Q-learning agent in MankalaEngine, testing it against existing algorithms.
Q-Learning Performance Results -
Player 1 | Player 2 | Q-Learning Win Rate |
---|---|---|
Random | Q-Learning | 100% |
Q-Learning | Random | 98% |
Minimax | Q-Learning | 100% |
Q-Learning | Minimax | 0% |
MTDF | Q-Learning | 100% |
Q-Learning | MTDF | 10% |
Q-learning showed significant improvement, defeating existing algorithms in most cases. However, it still had weaknesses.
Techniques Explored to Improve Q-Learning Results:
To improve performance, I experimented with the following techniques:
Using Epsilon decay to balance exploration (random moves) and exploitation (using learned strategies).
Increased rewards for wins to reinforce successful strategies.
Training Q-learning against Minimax and MTDF rather than only against itself.
Despite these improvements, Q-learning still could not consistently outperform all existing algorithms.
After these experiments and research, I believe more advanced algorithms like DQN or Double DQN are needed to outperform all existing algorithms. This would also an exciting project for this summer.
Work related to Integration of Pallanguli Variant
Apart from exploring ML algorithms, I also worked on integrating the Pallanguli variant of the Mancala game into MankalaEngine. My contributions included:
Reviewing Srisharan’s code, suggesting fixes and discussions.
Creating Merge Request (MR) that allows users to input custom initial counters for Pallanguli.
Conclusion -
This journey has been an incredible learning experience, and I am grateful for the guidance of my mentors, Benson Muite and João Gouveia, who were always there to help.
I look forward to continuing my contributions to the KDE Community, as I truly love the work being done here.
Thank you to the KDE Community for this amazing opportunity!