Add corrections, implementation notes, pricing changes, or usage caveats for other readers.
Last updated
Apr 30, 2026
Input modalities
Output modalities
Capabilities
1,000,000 tokens
Recent tweets and retweets from Together AI
.@cartesia runs one of the hardest inference workloads: real-time voice.
Their stack has to keep long-lived streams moving, serve millions of audio minutes a day, and hold model latency around 90ms.
Together gives them the managed GPU infrastructure and low-level cluster…
Article
How Cartesia Runs Real-Time Voice AI on Together AI
The challenge
Cartesia’s growth in real-time voice AI created four specific infrastructure requirements that many hosted platforms were not fit for.
Voice workloads have tight latency budgets: Voice
Introducing The Blind Test.
Two landing pages. One built by GLM 5.2 and one by Opus 4.8.
Can you tell which is which?
It's very difficult to get a perfect score, just try :)
Video
The next generation of inference needs purpose-built infrastructure.
Together AI and 5C are deploying NVIDIA GB300 NVL72 systems with high-density compute, advanced cooling, and AI-optimized storage for large-scale inference and reasoning.
Discuss this model
Add corrections, implementation notes, pricing changes, or usage caveats for other readers.