Notes

AI Agents in Go

Guide to building AI agents in Go using agent SDKs, with a minimal runnable example covering LLM integration, tools, and multi-agent workflows.

Open-weight Models

Explanation of open-weight models, their differences from closed and open-source models, and why they matter for local AI deployment and customization.

Cross-Entropy in AI

Explanation of cross-entropy as a loss function in AI, including intuition, formal definition, examples, and relationship to entropy and KL divergence

AI Prompts: System Prompt and Other Types

Overview of the different types of AI prompts including system, user, few-shot, zero-shot, chain-of-thought, meta, and retrieval-augmented prompts

Elastic Looped Transformers (ELT)

Overview of Elastic Looped Transformers, an adaptive compute architecture that loops a shallow transformer block multiple times to dynamically allocate compute based on input complexity

Tempo Framework

Overview of Tempo, a query-aware temporal compression framework for long-video understanding in multimodal AI, using a small VLM to filter relevant frames before passing a condensed representation to a large model

Memory-Augmented Architectures

Overview of memory-augmented neural network architectures that add dynamic external memory to models, covering NTMs, RAG, Memorizing Transformers, Titans, and practical implications for building persistent AI agents

Forward Pass and Single Pass in LLMs

Explanation of forward pass and single pass in LLMs, how transformer computation flows from embedding to output logits, and how speculative decoding exploits transformer parallelism to reduce large-model forward passes

Speculative Decoding

Explanation of speculative decoding, an inference optimization that uses a fast draft model to propose tokens verified in parallel by a large model, achieving 2–3x throughput gains with identical output quality

What Are Model Weights in an LLM?

Explanation of what model weights are in LLMs, how they encode learned behaviour, why parameter count matters, and how systems like Ollama load them into memory

GGUF Models

Overview of the GGUF binary format for storing and distributing LLMs locally, including quantization levels, key characteristics, and popular runtimes like llama.cpp and Ollama

Prompt Bias in AI

Explanation of prompt bias, how prompt wording and framing skew AI outputs, common forms including leading questions and assumption bias, and practical advice for writing neutral prompts

Primacy Bias in LLM Style Selection

Explanation of primacy bias in LLM selector prompts, how alphabetical candidate ordering caused over-selection of certain styles in BHQ, and fixes using deterministic non-lexicographic shuffling

Slack MCP Ideas

Ideas for using Slack MCP to monitor automation opportunities and identify duplicated efforts within an organisation

ELO Scoring for AI Models

Explanation of how ELO scoring is applied to rank AI models via human preference votes, including the math, strengths, weaknesses, and real-world use in Chatbot Arena

Knowledge Distillation

Overview of knowledge distillation, how student models learn from teacher model outputs, and applications including edge deployment, speculative decoding, and LLM training

Training-Free GRPO

Overview of Training-Free GRPO, a method that improves LLM agent performance by updating model context (experience library) instead of parameters, achieving RL-like gains at a fraction of the cost

Attention Mechanism

Explanation of the attention mechanism in AI, including self-attention, cross-attention, multi-head attention, and the mathematical formulation behind Transformers

Transformer Architecture

Overview of the Transformer architecture including self-attention, multi-head attention, positional encoding, encoder-decoder structure, and key variants like BERT, GPT, and T5

Recurrent Neural Networks (RNNs)

Overview of RNNs, their memory mechanism, common variants (LSTM, GRU), use cases, and how they compare to Transformers