Reinforcement-Learning

Where RL Fits: Training vs. Inference in the LLM Pipeline

Explains that RL in LLMs is a training/alignment stage, not inference, with pipeline context.

Teen-friendly explainer of reinforcement learning agents, rewards, exploration, delayed rewards, and applications.

Explains scripted coding-LLM training with teacher traces, synthetic bugs, tests, SFT, and verifiable RL.

Explains AI world models as internal predictive representations for planning across RL, LLMs, and robotics.

Overview of AgentFlow, an agent architecture that trains a planner with Flow-GRPO for multi-turn tool use.