Kv-Cache

LLM Prompt Cache Options Across Providers

Compares prompt/KV cache TTLs, controls, pricing, scope, and strategies across major LLM providers.

Explains implicit vs explicit LLM prompt caching, prefix constraints, provider support, and when to use each.

Explains how vectors relate to tensors in ML, including rank, framework terminology, and KV cache shapes.

Explains why LLM prompt caching applies to reusable input-token prefill, not sequential output decoding.