Add corrections, implementation notes, pricing changes, or usage caveats for other readers.
Knowledge cutoff
2024-07
Input modalities
Output modalities
Capabilities
163,839 tokens
Recent tweets and retweets from Together AI
M3 brings sparse attention + 1M context + multimodality, and Together did the hard serving work to make it fast.
Great collaboration with the Together team.
We wrapped a live session on M3 yesterday with the @togethercompute team & our researchers @zpysky1125 and @HaohaiSun
A few highlights 🧵
1. MSA (MiniMax Sparse Attention) is the star ⭐️. Unlike CSA/HCA, which compress the KV cache, MSA keeps the real, uncompressed KV and…
Amazing deep dive from the @togethercompute team on serving MiniMax M3 in production.
M3 with its 1M context, native multimodality and MiniMax Sparse Attention requires real work across paged decode, index scoring, and multimodal preprocessing to get it efficient.
This is…
Everyone talks about 1M context. The harder part is making 1M context actually usable. Serving MiniMax M3 required optimizing for long-context, multimodal, and agentic workloads simultaneously. Excited to see what developers build with it. 🚀
Discuss this model
Add corrections, implementation notes, pricing changes, or usage caveats for other readers.