AlphaSnake
This is a snake game engine written in Rust, which achieved 7th place in the ByteFight tournament! Because of game rules, the Rust code is called by a thin python wrapper, with interop handled by the maturin and pyo3 crates.
Algorithm
Our algorithmn based on Learning to Play Two-Player Perfect-Information Games without Knowledge’s UBFMs algorithmn, without the neural network related parts of it.
Initially, we did try to train a neural network to play the game, which you can see in our git history and in the python/scripts/genetic.py and python/scripts/train.py files. We experimented with this network and algorithmn using simplified games like Connect 2 and Connect 4, where we trained a convolutional neural network on the whole board state, which was used to classify the state of each board. As the snake game was much more complicated with a much larger board, we decided to drop the convolutional neural network idea, and instead precalculate various heuristics from the board, which we would then feed into a neural network to classify the state of the board. We experimented with this approach, using different methods of training including a genetic algorithmn (on a 96 core GCP machine) based on the competition algorithmn, and training the network based on resolved states using the training algorithmn described in the paper, but couldn’t get a model to converge at a spot we were happy with. This is why our current approach is just a hand trained and optimized heuristic function.
Our current heuristic is relatively simple. It’s based on:
- the length of each snake
- how close apples are to each snake
- how fast the snake can be “captured” (e.g. left with no moves) by the other snake
We think our algorithmn is one of the best in the competition at long term planning and decision making due to the unbounded nature of the UBFMs algorithmn. However, we’re susceptible to enemy snakes oneshotting our snake by going on long move chains and encircling us. This is definitely solvable by further heuristic tuning, but we ran out of time.
Here are some improvements we ran out of time for:
- running a default minimax algorithmn on a seperate thread to help us evade oneshots from the enemy.
- replacing the hand tuned heuristic with a genetically trained neural network.
- various simd and cache line locality optimizations in our board representation and heuristic function.