Fine-Tuning on knowledged.to

Fine-Tuning on knowledged.tohttps://knowledged.to/tags/fine-tuning/Recent content in Fine-Tuning on knowledged.toHugoen-usTue, 19 May 2026 22:49:14 +0530PPO — Proximal Policy Optimizationhttps://knowledged.to/notes/ml/ppo-proximal-policy-optimization/Tue, 19 May 2026 17:18:44 +0000https://knowledged.to/notes/ml/ppo-proximal-policy-optimization/Overview of PPO, the clipped policy-gradient RL algorithm used in RLHF for InstructGPT and original ChatGPT.GRPO — Group Relative Policy Optimizationhttps://knowledged.to/notes/ml/grpo-group-relative-policy-optimization/Tue, 19 May 2026 17:17:58 +0000https://knowledged.to/notes/ml/grpo-group-relative-policy-optimization/Critic-free RL algorithm that replaces PPO's value model with group-relative rewards for LLM fine-tuning.Fine-Tuning Techniques for LLMshttps://knowledged.to/notes/ml/fine-tuning-techniques/Sat, 25 Apr 2026 15:53:49 +0000https://knowledged.to/notes/ml/fine-tuning-techniques/Comprehensive guide to LLM fine-tuning methods including full, parameter-efficient, and preference-based approaches with modern recipes and tools like LoRA and DPOUnsloth Studio — Fine-tuning Dataset Formatshttps://knowledged.to/notes/ml/unsloth-studio-dataset-formats/Thu, 23 Apr 2026 16:39:20 +0000https://knowledged.to/notes/ml/unsloth-studio-dataset-formats/Overview of dataset formats supported by Unsloth Studio for fine-tuning, including JSONL, Alpaca, ShareGPT, ChatML, and Reasoning formats with rules and best practices and dataset size guidelines