dipping my toes into reinforcement learning

3 min readNov 25, 2016

I recently came across this great introductory tutorial using policy gradient reinforcement learning to play Battleship. It was fun to read and easy to play with so I figured I could spend a little time to adapt it to a slightly more challenging game called ‘Plane Strike’.

‘Plane Strike’ basically is a slightly harder game because there is a pattern to the target, which we humans are pretty good at detecting and leveraging. Here is an simple example of the ‘plane’ target:

Given the size of board (6x6) and the straightforward target shape, I’m pretty sure any half-decent programmer could code up some feature detection function to play the game well. But what if the board is 10x10 and it looks like this:

What if it’s 100x100 and there are 30 targets of different shapes? Manual feature engineering apparently wouldn’t work. So we need machine learning here, whose best part is that no manual feature engineering (although coming up with a good reward is a real challenge but luckily for me the one use in the Battleship article works pretty well).

So in any case, I tried the policy gradient approach (source code) and it seems to work pretty well with the simple 6x6 Plane Strike. Here is a benchmark of average game lengths (sliding window size=50):

RL — Reinforcement Learning, NS — (randomized) Neighborhood Search, Random — pure random search

So RL was able to get it down around 10.8, which is pretty good, considering that it can only get less than 3 wrong guesses, on average. Not bad. I’m actually not sure I can beat it any more.

One cool thing is that it figures out for the first strike, it should always try one of the 4 center cells, which makes a lot of sense if you think about it: however you place your plane, there is a really good chance it will cover 1 of the 4 center cells.

Here is a real game play by the algorithm:

The green circles are the ‘hits’ and the red ones are the ‘misses’. The misses actually make a lot of sense. Here is why:

So this was my quick fun with the little game. And I made a voice game for Google Home based on this. You can check it out by saying ‘OK Google talk to Plane Strike’.

dipping my toes into reinforcement learning

Written by Wayne Wei