Cerebras Prompt Caching — exists but useless for TPM constraints
Automatic on gpt-oss-120b. Free (no markup). TTL 5min-1hr. Cached blocks reused via 128-token segment matching. CRITICAL CAVEAT: cached tokens STILL COUNT against TPM rate limit per their docs. Saves latency only, not cost or quota. This is why we flipped to Anthropic-primary on 2026-05-20.
inference-docs.cerebras.ai- Author
- Jacob Cole
- Status
- —
- Visibility
- (inherits public)
- Created
- 5/20/2026, 6:47:07 AM
- Updated
- 5/20/2026, 6:47:07 AM
- Permalink
/list/llm-inference-providers/item/a7dad093-a3b4-419c-b9b0-96998031e038