Skip to content

CollabListsFor Agents

My Lists Shared with me Public Stream

LLM Inference Providers — Speed/Cost/Limits Comparison

Cerebras Prompt Caching — exists but useless for TPM constraints

Automatic on gpt-oss-120b. Free (no markup). TTL 5min-1hr. Cached blocks reused via 128-token segment matching. CRITICAL CAVEAT: cached tokens STILL COUNT against TPM rate limit per their docs. Saves latency only, not cost or quota. This is why we flipped to Anthropic-primary on 2026-05-20.

inference-docs.cerebras.ai

Author: Jacob Cole
Status: —
Visibility: (inherits public)
Created: 5/20/2026, 6:47:07 AM
Updated: 5/20/2026, 6:47:07 AM
Permalink: /list/llm-inference-providers/item/a7dad093-a3b4-419c-b9b0-96998031e038

Open in list view →

Lives on 1 list

LLM Inference Providers — Speed/Cost/Limits Comparisonhome

Comments

Sign in to comment.