RLVR vs. the Agent Loop: Training-Time vs. Inference-Time

Distinguishes RLVR as training-time weight updates from inference-time agent verification loops.

June 24, 2026 · 3 min

The Modern LLM Training Pipeline

Explains the four-stage modern LLM training pipeline from pre-training through verifiable-reward RL.

June 24, 2026 · 2 min

Where RL Fits: Training vs. Inference in the LLM Pipeline

Explains that RL in LLMs is a training/alignment stage, not inference, with pipeline context.

June 24, 2026 · 4 min