Sequence to Sequence (seq2seq) Model is one of the most important frameworks in NLP which includes many applications like chatbots (Text generators), text summarization, and translators.
Recently, seq2seq is empowered with Machine Learning in both encoder and decoder models. These models are both language models and are often built with LSTM (Long short-term memory) models.
First, text input is fed to the encoder (sequence), the output of this is fed to the decoder (to), and the final output is transformed back to the text (sequence).
Because this task is sequence-based, we can use Recurrent Neural Networks (RNNs) or Gated Recurrent Units (GRU) (computationally more efficient than LSTM) depending on the design of the system and accuracy requirement.
There are two tasks during training:
However, there are couples of issues while dealing with languages.
These extra symbols are called new vocabulary or extended vocabulary.
In the encoder-decoder model, because the decoder gets information from the final state in each layer of an encoder, it is hard to capture all of the informative data in case of large input which also might contain long-term dependencies.
The solution is to let the decoder also access to intermediate time step using ATTENTION which calculates context vector at each time step. Attention can be another neural network. By tuning this attention, we can get a great result for long paragraph input.
(source educative.io)