Speculative-Decoding

What is Speculative Decoding?

Explains speculative decoding, which pairs a small draft model with a large target model to accelerate LLM inference without changing outputs.

Local + Frontier Model Collaboration Patterns in Open Source Harnesses

New file `notes/ml/local-frontier-model-collaboration-patterns.md` added to the Notes section, alphabetically positioned after 'LLM Thinking Token Budgets' and before 'GGUF Models'.