1. Sequence to Sequence Model¶

Introduction¶

Sequence to Sequence (seq2seq) Model is one of the most important frameworks in NLP which includes many applications like chatbots (Text generators), text summarization, and translators.

Recently, seq2seq is empowered with Machine Learning in both encoder and decoder models. These models are both language models and are often built with LSTM (Long short-term memory) models.

First, text input is fed to the encoder (sequence), the output of this is fed to the decoder (to), and the final output is transformed back to the text (sequence).

Because this task is sequence-based, we can use Recurrent Neural Networks (RNNs) or Gated Recurrent Units (GRU) (computationally more efficient than LSTM) depending on the design of the system and accuracy requirement.

Demo Image source and read more here

Overview¶

Training Task¶

There are two tasks during training:

  1. Input Task: given an input sequence (text) and extract useful information
  2. Output Task: Built weight parameters in neural networks and calculate word probabilities from both input sequence and previous words in the output sequence (known output text).

However, there are couples of issues while dealing with languages.

  1. Standardize Unicode letters and/or convert them to ASCII to simplify the process.
  2. Data Cleaning is very important to remove not related information as well as special symbols, and letters. If data was crawled from websites, HTML tags must be checked to remove them.
  3. The length output is not given/fixed, such as translation, and summarization of text. But the input of the model is fixed when building neural networks. An extra symbol was filled into an empty slot called a pad.
  4. The input/output is not required, but we need Machine returns something. So we use SOS and EOS (start-of-sequence and end-of-sequence) tokens.
  5. Tokenization is important in some languages. For example, in Vietnamese, a unit word can consist of two or more separate words)
  6. We also want to build stop-words that repeat many times but have less meaning (such as "and", "or", and "of"). In a subject-oriented, for example, in the subject of air transport business, "airplane" should be a stop-word.

These extra symbols are called new vocabulary or extended vocabulary.

ATTENTION¶

In the encoder-decoder model, because the decoder gets information from the final state in each layer of an encoder, it is hard to capture all of the informative data in case of large input which also might contain long-term dependencies.

The solution is to let the decoder also access to intermediate time step using ATTENTION which calculates context vector at each time step. Attention can be another neural network. By tuning this attention, we can get a great result for long paragraph input.

Attention - source educative.io (source educative.io)