This clever AI figured out that the only winning move is not to play

Our greatest challenge in machine learning is to clearly explain to an AI what we want to achieve.

For this, we use a reward function.

A reward function is a mathematical expression that tells an AI if it’s doing a good a job. The function measures the difference between an AI’s predictions and the expected results.

So if I want an AI to predict today’s temperature, I would give it a reward function that is simply the difference between its predictions and the actual temperatures. The better the predictions are, the greater the reward.

During training, I would reward great predictions and penalize bad ones. Over time, the AI will learn how to consistently make great predictions.

So far so good. But what if we have a more complex scenario?

Consider the game CoastRunners.

The objective in the game is to race around a track in a speedboat, stay ahead of the competition, and collect as many powerups as possible.

You would probably think that a good reward function is to get the highest possible game score, right?

Researchers at OpenAI put this to the test. They trained a machine learning system to play the game and aim for the highest possible score.

What happened next blew their mind:

The AI ignored the race, took its boat into the clearing in the middle of the lake, and turned in a circle over and over again.

Check it out, the boat is completely outside the race track, it’s going the wrong way, and it is repeatedly catching fire because it’s crashing into the pier and into other boats over and over.

But the boat is also picking up 3 powerups during each cycle!

And this strategy is so successful that the AI consistently beats human players by 20% or more.

So here’s what went wrong.

The game designers placed powerups all along the racetrack to entice players to follow the route, but didn’t realize that the powerups in the middle of the lake are all it takes to win the game.

And they only reward players for picking up powerups. They didn’t add any incentive for players to follow the track, stay ahead of the competition, or complete the race in a minimum of time.

The AI researchers trained the AI to only look at the score and ignore everything else in the game.

So of course the AI discovered this loophole and ran with it.

This is a very cute example, but you wouldn’t want a self-driving car crash into a lamppost over and over in real life, because its internal reward function compels it to do so.

Think of all the behaviors we want to see in a self-driving car: stay on the road, drive safely, navigate door-to-door, don’t crash, brake in time for children playing on the street, and so on.

Our great challenge in AI and machine learning right now is to codify all of these good behaviors into a simple reward function that an AI can use and train on.