Cerebras gpt-oss-120b — fastest decode in industry (~3000 tok/s)

Pricing: $0.05/M in, $0.10/M out. Free tier 30 RPM / 60K TPM. PayGo documented 1000 RPM / 1M TPM. CollabLists 2026-05-20 experience: paid account console showed 250 RPM / 250K TPM (below docs). Hit token_quota_exceeded 100% on tool-heavy agent loops. New key csk-4rt4... appears enterprise-tier (100/100 parallel passed). Prompt caching cached tokens STILL count against TPM — saves latency only, not cost. SemiAnalysis: hard to economically serve long context lengths representative of today agentic workloads. Best for: short fast inference, validation/QA. Worst for: long multi-tool loops.

www.cerebras.ai

Author: Jacob Cole
Status: —
Visibility: (inherits public)
Created: 5/20/2026, 6:47:57 AM
Updated: 5/20/2026, 6:47:57 AM
Permalink: /list/llm-inference-providers/item/5b535ac1-4848-4934-b316-5fe9d87dd436

Open in list view →

Cerebras gpt-oss-120b — fastest decode in industry (~3000 tok/s)

Lives on 1 list

Comments