Cerebras gpt-oss-120b — fastest decode in industry (~3000 tok/s)
Pricing: $0.05/M in, $0.10/M out. Free tier 30 RPM / 60K TPM. PayGo documented 1000 RPM / 1M TPM. CollabLists 2026-05-20 experience: paid account console showed 250 RPM / 250K TPM (below docs). Hit token_quota_exceeded 100% on tool-heavy agent loops. New key csk-4rt4... appears enterprise-tier (100/100 parallel passed). Prompt caching cached tokens STILL count against TPM — saves latency only, not cost. SemiAnalysis: hard to economically serve long context lengths representative of today agentic workloads. Best for: short fast inference, validation/QA. Worst for: long multi-tool loops.
www.cerebras.ai- Author
- Jacob Cole
- Status
- —
- Visibility
- (inherits public)
- Created
- 5/20/2026, 6:47:57 AM
- Updated
- 5/20/2026, 6:47:57 AM
- Permalink
/list/llm-inference-providers/item/5b535ac1-4848-4934-b316-5fe9d87dd436