Member-only story
It is a neural network model that consists of two main parts:
- Encoder: Processes and understands the input.
- Decoder: Uses this understanding to generate meaningful output.
This model is commonly used in language processing tasks.
Example: Language Translation
If we want to translate “Hello, how are you?” into Spanish:
- The encoder converts the English words into a numerical representation (context vector).
- The decoder takes this representation and generates the Spanish translation “Hola, ¿cómo estás?”.
How Does It Work?

Encoder:
- Takes a sequence of words (tokens) as input.
- Converts them into a fixed-size vector representation.
- This vector captures the context and meaning of the sentence.
Decoder:
- Takes the encoded representation as input.
- Generates words one at a time (auto-regressively).
- Uses previous words to predict the next word.
Stacking Encoders and Decoders:
- Transformers use multiple encoders and decoders stacked together.
- Each layer focuses on different parts of the input, improving accuracy.
🛠️ Why Use an Encoder-Decoder?
- Natural language has many words with similar meanings (e.g., “king” and “queen”).
- Instead of treating every word separately, the encoder groups similar words together.
- This helps reduce the model’s complexity and improves learning.
Example: Word Similarity
Consider the phrases:
- “Once upon a time, there lived a king.”
- “Once upon a time, there lived a queen.”