Recurrent Neural Network and Long-Short Term Memory

Recurrent Neural Network

In traditional neural networks, all the inputs and outputs are independent of each other, but in cases like when it is required to predict the next word of a sentence, the previous words are required and hence there is a need to remember the previous words.  Thus Recurrent Neural Network (RNN) came into existence. RNN are networks with loops in them, allowing information to persist.
Recurrent Neural Network Architecture
In the above diagram, the network takes $ x_t$ as input and outputs $ y_t$. The hidden layer applies a formula to the current input as well as the previous state to get current state $ h_t$.
The formula for the current state can be written like this:
$h_t = tanh(l1(x_t) + r1(h_{t-1}))$
The output can be calculated:
$y_t = l2(h_t)$
This chain-like nature reveals that recurrent neural networks are intimately related to sequences and lists. They’re the natural architecture of neural network to use for such data.

Recurrent Neural Networks suffer from short-term memory. If a sequence is long enough, they’ll have a hard time carrying information from earlier time steps to later ones. So if you are trying to process a paragraph of text to do predictions, RNN’s may leave out important information from the beginning.

Long-Short Term Memory


During back propagation, recurrent neural networks suffer from the vanishing gradient problem. Gradients are values used to update a neural networks weights. The vanishing gradient problem is when the gradient shrinks as it back propagates through time. If a gradient value becomes extremely small, it doesn’t contribute too much learning.

Then later, LSTM (long short term memory) was invented to solve this issue by explicitly introducing a memory unit, called the cell into the network. This is the diagram of LSTM building blocks.
The repeating module in an LSTM contains four interacting layers.


  • How much information will be forgotten from previous cell  


  • How much information will be kept from input


              Temporary cell information based on input.


  • Current cell information in whole sequence




  • How much cell information will be used for output  


Gated Recurrent Unit



Various types


Attention



Comments

Popular posts from this blog

YOLO: You Only Look Once

Giới thiệu về Generative Adversarial Networks (GANs)

Intersection over Union (IoU) cho object detection