Recurrent Neural Network and Long-Short Term Memory

By Anonymous - April 25, 2019

Recurrent Neural Network

In traditional neural networks, all the inputs and outputs are independent of each other, but in cases like when it is required to predict the next word of a sentence, the previous words are required and hence there is a need to remember the previous words. Thus Recurrent Neural Network (RNN) came into existence. RNN are networks with loops in them, allowing information to persist.

Recurrent Neural Network Architecture

In the above diagram, the network takes $ x_t$ as input and outputs $ y_t$. The hidden layer applies a formula to the current input as well as the previous state to get current state $ h_t$.

The formula for the current state can be written like this:

$h_t = tanh(l1(x_t) + r1(h_{t-1}))$

The output can be calculated:

$y_t = l2(h_t)$

This chain-like nature reveals that recurrent neural networks are intimately related to sequences and lists. They’re the natural architecture of neural network to use for such data.

Recurrent Neural Networks suffer from short-term memory. If a sequence is long enough, they’ll have a hard time carrying information from earlier time steps to later ones. So if you are trying to process a paragraph of text to do predictions, RNN’s may leave out important information from the beginning.

Long-Short Term Memory

During back propagation, recurrent neural networks suffer from the vanishing gradient problem. Gradients are values used to update a neural networks weights. The vanishing gradient problem is when the gradient shrinks as it back propagates through time. If a gradient value becomes extremely small, it doesn’t contribute too much learning.

Then later, LSTM (long short term memory) was invented to solve this issue by explicitly introducing a memory unit, called the cell into the network. This is the diagram of LSTM building blocks.