Add corrections, implementation notes, pricing changes, or usage caveats for other readers.
Apr 30, 2026
Input modalities
Output modalities
Capabilities
Context window
1,000,000 tokens
Recent tweets and retweets from Together AI
Together AI has long been a proud supporter of open-source innovation in inference. We're excited for the new TokenSpeed inference engine, available in preview today with MIT licence, and can't wait to see where it goes.
Congratulations to the entire TokenSpeed team!
Deepgram STT is now natively available on @togethercompute.
One platform, full voice-agent stack: Deepgram transcription, Together-hosted LLMs, Aura-2 TTS. Sub-3-second round trip on reference builds.
If you're prototyping a voice agent and bouncing between four vendors, you…
Inference is 80-90% of the lifetime cost of a production AI system. Most AI-native teams are leaving performance and margin on the table.
Here’s how Together AI, the AI Native Cloud, fixes that on @nvidia Blackwell: together.ai/blog/foundationa…
@profdanklein spent 20 years studying how language forms intelligence. When LLMs exploded, he saw something everyone missed.
These systems were fluent, confident, and wrong.
And no one could tell the difference. 🧵
Switching to Together AI flipped it:
⚡️Zero training-blocking failures
⚡️~50% cost savings vs. AWS
⚡️Issues resolved within hours via shared Slack
They could finally focus on building the model.
Discuss this model
Add corrections, implementation notes, pricing changes, or usage caveats for other readers.