Mixture of Experts (MoE)
Mixture of Experts is an architecture pattern in machine learning where a model is divided into many specialized sub-networks (“experts”), with a routing mechanism that selectively activates only a subset of them for any given input. Core Idea Instead of passing every input through all parameters of a model, MoE routes each token (or input) to only a few relevant experts. This decouples total parameter count from compute per forward pass — you can have a massive model that’s still fast and efficient to run. ...