Grpo | knowledged.to

GRPO — Group Relative Policy Optimization

Critic-free RL algorithm that replaces PPO's value model with group-relative rewards for LLM fine-tuning.

Fine-Tuning Techniques for LLMs

Comprehensive guide to LLM fine-tuning methods including full, parameter-efficient, and preference-based approaches with modern recipes and tools like LoRA and DPO