Add corrections, implementation notes, pricing changes, or usage caveats for other readers.
Knowledge cutoff
2025-01-01
Input modalities
Output modalities
Capabilities
128,000 tokens
Recent tweets and retweets from Inception
The question is no longer just which model is the smartest.
It’s which model is most efficient without sacrificing quality.
The highest-volume AI workloads are bottlenecked by latency, token generation speed, and serving cost. Autoregressive models were not designed for that…
That’s exactly the bet we’re making at @_inception_ai
We’re already matching speed-optimized models from frontier labs on quality, while being faster and more cost efficient. That gap will only widen as we continue to scale.
Hiring our first Forward Deployed AI Engineer at Inception.
We built the world's fastest reasoning LLM and the first commercially available diffusion LLM, Mercury 2.
>1,000 tokens/sec on standard GPUs via diffusion, 10x faster than speed-optimized autoregressive models at…
Will the next decade of LLMs run on autoregression, or on diffusion?
One of the top questions we got at MLSys this week.
Part 6, the final part of our founder story series with @timt at @MenloVentures.
Featuring @StefanoErmon, @adityagrover_, @volokuleshov
Video
Day 2 at @MLSysConf.
Thanks to everyone who came by yesterday. The conversations on diffusion for language, the future of language models, and what fast inference unlocks have been the highlight.
Come find us at the booth today and meet the team behind Mercury 2. And join us…
Discuss this model
Add corrections, implementation notes, pricing changes, or usage caveats for other readers.