Fine-Tuning

LoRA (Low-Rank Adaptation) in AI

A new section for Fine-Tuning Techniques is created to hold the LoRA document, and the LLM Architecture section is de-duplicated.

PPO — Proximal Policy Optimization

Overview of PPO, the clipped policy-gradient RL algorithm used in RLHF for InstructGPT and original ChatGPT.

GRPO — Group Relative Policy Optimization

Critic-free RL algorithm that replaces PPO's value model with group-relative rewards for LLM fine-tuning.

Fine-Tuning Techniques for LLMs

Comprehensive guide to LLM fine-tuning methods including full, parameter-efficient, and preference-based approaches with modern recipes and tools like LoRA and DPO

Unsloth Studio — Fine-tuning Dataset Formats

Overview of dataset formats supported by Unsloth Studio for fine-tuning, including JSONL, Alpaca, ShareGPT, ChatML, and Reasoning formats with rules and best practices and dataset size guidelines