<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Transformers on knowledged.to</title><link>https://knowledged.to/tags/transformers/</link><description>Recent content in Transformers on knowledged.to</description><generator>Hugo</generator><language>en-us</language><lastBuildDate>Thu, 21 May 2026 21:14:09 +0530</lastBuildDate><atom:link href="https://knowledged.to/tags/transformers/index.xml" rel="self" type="application/rss+xml"/><item><title>Why LLM Caching Is Only for Input Tokens</title><link>https://knowledged.to/notes/ml/llm-caching-input-tokens/</link><pubDate>Thu, 21 May 2026 15:43:26 +0000</pubDate><guid>https://knowledged.to/notes/ml/llm-caching-input-tokens/</guid><description>Explains why LLM prompt caching applies to reusable input-token prefill, not sequential output decoding.</description></item><item><title>Attention in Machine Learning</title><link>https://knowledged.to/notes/ml/attention/</link><pubDate>Sun, 17 May 2026 05:54:45 +0000</pubDate><guid>https://knowledged.to/notes/ml/attention/</guid><description>Explanation of the attention mechanism in ML, covering Query/Key/Value, self-attention, multi-head, causal, cross-attention, and efficiency variants like FlashAttention and GQA.</description></item><item><title>Mixture of Experts (MoE)</title><link>https://knowledged.to/notes/ml/mixture-of-experts/</link><pubDate>Thu, 23 Apr 2026 16:04:47 +0000</pubDate><guid>https://knowledged.to/notes/ml/mixture-of-experts/</guid><description>Overview of MoE architecture, routing, key components, variants, and trade-offs in machine learning models</description></item></channel></rss>