Gemma 4 31B IT by Cerebras | AI model information

The fastest GLM 5.2, Kimi K2.7 and MiniMax M3The fastest GLM 5.2, Kimi K2.7 and MiniMax M3 Discuss

Model details

Gemma 4 31B IT

Cerebrasgemma-4-31bgemmabeta

Open provider page Provider docs

Quick Info

Provider: Cerebras
Model key: gemma-4-31b
Release date: Apr 2, 2026
Last updated

Cost

Input token cost: $0.99
Output token cost: $1.49

Limits

Output tokens: 40,960 tokens
Context window

Latest news about Gemma 4 31B IT

Videos about Gemma 4 31B IT

Recent tweets and retweets from Cerebras

Jul 1, 2026, 3:03 PMUTC

Talk to Gemma 4 31B with our voice app! It sees and searches the web faster than you blink. Thanks to @cerebras’ ultra fast inference, the LLM is almost instantaneous. The whole stack is fully open-source, and is a drop-in replacement for OpenAI's realtime API. Demo:…

Jul 2, 2026, 1:57 AMUTC

Watch the full interview: piped.video/LRpw1PAQxbc In conversation with @MilksandMatcha for Cerebras's Big Chip Club, produced by @alyciazcary. Link Cerebras Big Chip Club: Logan Kilpatrick on why speed will define the next generation of AI products Logan Kilpatrick — who…

Jul 2, 2026, 1:57 AMUTC

"If you knew you could get that many tokens, you would build different products." Logan Kilpatrick (@OfficialLoganK ,@GoogleDeepMind) on why fast inference doesn't just make AI faster. It changes what is possible to build. @googlegemma's Gemma 4 is now on Cerebras, running…

Jul 1, 2026, 9:39 PMUTC

We gave two agents the same task: “Find images matching this description.” Both use Gemma 4 31B. One runs on Cerebras. The other runs on GPUs. You can see the difference. Speed changes the product experience. What would you build if you didn't have to wait? Video

Jun 30, 2026, 11:34 AMUTC

I just ran Gemma 4 31B on @CerebrasSystems at 1,800+ tokens/sec and it's multimodal. For context: that's 35x faster than a typical GPU endpoint, and the first token (reasoning included) lands in 1.5 seconds. This isn't a benchmark slide, I recorded the inference live. Prompt…