Add corrections, implementation notes, pricing changes, or usage caveats for other readers.
Knowledge cutoff
2025-05
Input modalities
Output modalities
Capabilities
131,000 tokens
Recent tweets and retweets from Baseten
Great to see @Baseten’s own @oneill_c and @part_harry_ sitting down with @cursor_ai’s @sjwhitmore to talk about the many things their 128(!) agents are doing (and occasionally arguing about), compaction, and the future.
We are excited to announce that we have partnered with @_inception_ai to make Mercury 2 available on Baseten. This makes us the first inference platform to bring Inception’s diffusion LLM to production.
Inception’s dLLM architecture fixes the bottlenecks of sequential token…
The longer the context, the more memory your LLM needs. We introduce research techniques to compress that memory 200x on the fly without changing the base model.
Discuss this model
Add corrections, implementation notes, pricing changes, or usage caveats for other readers.