Attention in AI

Attention is a mechanism that allows a model to focus on the most relevant parts of its input when producing an output — much like how humans pay more attention to certain words or objects in a scene than others.

The Core Idea

Instead of treating all parts of the input equally, attention assigns weights to different elements, so the model can dynamically decide what’s important for each step of its task.

A Simple Example

Consider translating: “The cat sat on the mat” → French.

When generating the word for “cat”, the model should focus heavily on “cat” and less on “mat”. Attention lets it do exactly that.

How It Works (Self-Attention)

For each token (word/piece) in a sequence, attention computes three vectors:

  • Query (Q) — “What am I looking for?”
  • Key (K) — “What do I contain?”
  • Value (V) — “What information do I provide?”

The attention score between tokens is computed as:

Attention(Q, K, V) = softmax(QKᵀ / √d) · V

This produces a weighted sum of values, where tokens most relevant to each other get higher weights.

Types of Attention

TypeDescription
Self-attentionEach token attends to all other tokens in the same sequence
Cross-attentionTokens in one sequence attend to tokens in another (e.g., encoder → decoder)
Multi-head attentionMultiple attention operations run in parallel, each learning different relationships
Causal/masked attentionTokens can only attend to previous tokens (used in GPT-style models)

Why It Matters

Attention is the foundation of the Transformer architecture (introduced in the 2017 paper “Attention Is All You Need”), which powers virtually all modern AI language models — GPT, Claude, Gemini, BERT, etc.

Before attention, models like RNNs had to compress entire sequences into a single vector, losing long-range context. Attention solved this by letting every token directly relate to every other token, regardless of distance.

In Short

Attention = a learned, dynamic weighting system that tells the model where to look and what matters most at each step of reasoning or generation.