Perfecting an AI Agent for Deathchase
Yet more adventures in teaching an AI to play ZX Spectrum games.
Continuing on from the last post and the post before I have been teaching AIs to play ZX Spectrum games. The focus has been on Deathchase, a racing/shooter game, and I have made good progress.
As mentioned in the last post I'd decided to move away from Ray due to hitting too many bugs. Since that post someone from Anyscale, the company behind Ray, got in touch asking to have a chat and discuss how I use their framework. I agreed and mentioned the bugs I'd hit. The meeting never happened; they cancelled on me twice, once at short notice when I'd taken time off for the meeting. So that didn't exactly convince me to stick with Ray. I managed to get one of the bugs fixed after submitting a patch myself, but it took so long to get merged I gave up completely on Ray.
I tried a few other frameworks but didn't find one that worked for me. They were either too slow, too old or not flexible enough. So I decided to write my own. And after a lot of work I've had some decent success.
The framework is written in Python, using PyTorch for most of the AI heavy lifting. It outputs data in Tensorboard format which makes it nice an easy to view training statistics. It supports ONNX export so I can then load the models into my ZX Spectrum emulator to run.
The core emulation is still C#, but now I have greater control over the parallism of environments. Python isn't great for threading but C# is, making it easy to run environments in parallel without being in lockstep. (Gymnasium and frameworks that build on it appear to only run parallel environments in lockstep, unless I've missed something) This has enabled me to get high throughput, hundreds of thousands of steps per minute.
The first algorithm I added was DQN because I had success with that in Ray. After various tweaking I managed to get an agent that would win, i.e. complete all 8 sectors without losing a life, 80% of the time. Turns out the reason I couldn't get higher than that was the frame skip I was using. Frame skip involves running the model for a frame, and then skipping several before running it again. We stack the last n frames as a single observation to send to the model so it has some sense of time and can see how things in the game are moving. Frame skip therefore lets us get a longer window of time for the same size frame stack, as well as generally reducing compute as we don't have to run the model as often. I was skipping 4 frames for Deathchase; reducing this to 2 allowed the model to win 98% of the time. At 1/50th second per frame this means each action taken by the model covers 3 frames (1 observation and 2 skips) or approximately 60ms. I guess that means a human has to be that fast to reach the same level, which seems harsh given the average human reaction time is at least 150ms...
The agent takes about 12 hours to train on my PC to get to the 98% level. At that point it has learnt to avoid trees but nothing more than that. It shoots constantly in the direction it's facing, occasionally hitting enemies with pure luck:
If we keep training the agent gets better, learning to steer towards the enemies and fire at them. However, it will lose track of the enemies if they go off the screen due to having no memory other than the stack of frames. Luckily in Deathchase the enemies are never off the screen for too long:
The top levels of the model behind the agent are CNN layers. These work by running multiple sets of filters over the input images, and then combining the stacked images for each filter into a single image. The outputs can then be fed into another CNN layer for further processing. The model I used has several levels with different size filters, to pick out different details in the input. During training the model learns which filters work best. I've enhanced my emulator to show the input frame stack as well as a visualisation of the CNN layers so you can see what the model 'sees'. As well as helping with debugging it looks quite cool:
I'm now considering Deathchase done. The more eagle eyed amongst you will notice the title is about 'perfecting' Deathchase, and I've only reached 98%. What can I tell you, it was click bait, pure and simple. I could probably get it to 100% but it's good enough. It can play the game for a long time, so long that the score display corrupts because the scores that high weren't anticipated:
I've since moved onto other algorithms and other games. I have a working PPO implementation and have played about with a few other games:
Batty - First attempt, not great
Green Beret - Good at fighting but not so much at exploring
But the game I've been concentrating on is Manic Miner as it's one of the all time classic ZX Spectrum games. Unfortunately training Manic Miner is hard. The game has very sparse rewards, which is not ideal for training an AI. There are long distances between items to collect and reaching the exit to complete the level. This means very few positive signals for the training to focus on. I've had to pull out various tricks to complement the PPO algorithm and recently had some success - an agent that completes the first cavern:
Not the most efficient route in the world, but it does complete the level. Took a lot of work to get to this point! But that's a topic for my next blog post.
Source code not available yet I'm afraid, but it will be. I've been using AI to prototype quite heavily. This has enabled me to test and try out new things much faster than I could have done without it. The downside is that I now have a lot of AI slop to rewrite...