Add corrections, implementation notes, pricing changes, or usage caveats for other readers.
Last updated
Apr 30, 2026
Input modalities
Output modalities
Capabilities
1,000,000 tokens
Qwen3.6 Plus by Together AI | AI model information
Recent tweets and retweets from Together AI
.@cartesia runs one of the hardest inference workloads: real-time voice.
Their stack has to keep long-lived streams moving, serve millions of audio minutes a day, and hold model latency around 90ms.
Together gives them the managed GPU infrastructure and low-level cluster…
Article
How Cartesia Runs Real-Time Voice AI on Together AI
The challenge
Cartesia’s growth in real-time voice AI created four specific infrastructure requirements that many hosted platforms were not fit for.
Voice workloads have tight latency budgets: Voice
Introducing The Blind Test.
Two landing pages. One built by GLM 5.2 and one by Opus 4.8.
Can you tell which is which?
It's very difficult to get a perfect score, just try :)
Video
The next generation of inference needs purpose-built infrastructure.
Together AI and 5C are deploying NVIDIA GB300 NVL72 systems with high-density compute, advanced cooling, and AI-optimized storage for large-scale inference and reasoning.
Discuss this model
Add corrections, implementation notes, pricing changes, or usage caveats for other readers.