The amazing power of Long Short Term Memory Networks (LSTMs)

By now I hope to have convinced you that neural networks are very cool!

But they have one huge disadvantage.

Can you guess what it is?

The huge disadvantage of a neural network is…

It can’t remember anything!

This may sound a bit weird. I mean, we train neural networks on images. A fully trained network can look at a picture of a dog and say: “I see a dog”. So obviously the network is remembering something.

Let me explain.

Let’s say we want a neural network to read the following sentence:

“The quick brown fox jumped over the lazy dog”

Our neural network gets started, looks at the first word “The”, and processes it.

But by the time the network gets to the second word “quick”, it will have forgotten all about the first word!

This makes neural networks terrible at Natural Language Processing. A network can understand individual words, but it can’t understand the meaning of a sentence unless it can scan all words simultaneously. But that’s not how reading works.

Neural networks have no short term memory.

In 1982 a scientist called John Hopfield found a solution to this problem. He designed the world’s first Recurrent Neural Network (RNN).

It looks like this:

You’re looking at a single neural network node (A) with an input (X) and and output (h).

But there’s a twist! The node is connected to itself with a looping connection.

You can visualize the loop by ‘unrolling’ it. The right hand side of the diagram shows the unrolled network.

Each input is now at a specific point in time. We start by feeding X0 into the network, then X1, then X2, and so on. The horizontal connectors provide the short term memory, by feeding the network state into the next iteration.

RNNs are great for processing sequences of data, for example a dataset of temperature over time, or a written sentence, or an audio clip of someone talking.

But RNNs also have a big disadvantage…

RNNs forget things very quickly.

Look at the diagram. Each sideways transferral of memory will weaken the signal until eventually the memory is gone. Our RNN will be able to read a sentence, but by the time it reaches the word “dog”, it will have forgotten about the “fox” at the beginning.

So we need one more tweak.

We need to build a network that can retain short term memory over a longer term.

A network like this:


You’re looking at an LSTM: a Long Short Term Memory network.

LSTMs are a more complex versions of RNNs: they have two memory pathways instead of one.

The upper pathway is the main memory bus that retains state information while the network is running.

The lower pathway is a control bus. It combines with the input signal and either strengthens or weakens the memory signal in the upper pathway.

This setup allows an LSTM to remember state information for a very long time.

LSTMs have revolutionized machine learning for speech recognition, handwriting recognition, and language translation.

Google Translate uses an LSTM to translate text, and Google Assistant uses one for speech recognition. LSTMs are everywhere today.

And we are now at the cusp of another machine learning revolution.

LSTMs are poised to revolutionize the field of Natural Language Processing.

We are only a few years away from having neural networks that can read a text, understand it, and then write a new story about it.

Imagine that. A world where software can read, understand, and write stories.

How many jobs will that affect?

Which new apps will be possible?

Could a computer become the next Hemingway? (*)

What do you think?