Add corrections, implementation notes, pricing changes, or usage caveats for other readers.
Last updated
Jun 16, 2026
Input modalities
Output modalities
Capabilities
262,144 tokens
Recent tweets and retweets from Together AI
Read the full story: theinformation.com/newslette…
Link
Open Source Growth Boosts Together AI, Hugging Face
Anthropic and OpenAI are booming, but so are providers of open-source AI models and other cheaper alternatives, thanks to businesses using open-source to control…
400T tokens is what production adoption looks like.
Teams are moving real workloads to open models because they want frontier quality, better tokenomics, and more control over inference.
Together AI gives them the infrastructure to make that shift.
Single-shot generation still surfaces net-new kernels with no public reference: NeMo vocab-parallel log-probs, Hyena context parallelism, SAM 3 mask suppression.
One GEMM + All-Gather kernel hit 87.9µs vs 320.6µs for NCCL. PKB is open. Read more and contribute below.
Blog:…
An agentic loop (compile, test, profile, revise) helps. Gemini 3 Pro went from 24 to 35/87 correct, then plateaued after ~20 steps.
Feedback fixes syntax, not rank coordination, collective ordering, or transfer-mechanism choice. TMA and NVLS stay almost unused.
Frontier models struggle.
→ Best zero-shot: 28/87 correct, 22 beat the PyTorch + NCCL baseline
→ With 3 attempts: 36/87 correct, but fast1@3 tops out at 31%
Weak models fail to compile. Strong reasoners compile cleanly and return wrong answers.
Discuss this model
Add corrections, implementation notes, pricing changes, or usage caveats for other readers.