Introducing PDF to Lesson!
Create interactive personalized courses from any PDF. 100% free & open source!
Powered by GPT OSS on @togethercompute.
Video
Together AI provides the inference stack behind both: high-throughput serving on the latest NVIDIA Blackwell GPUs for agentic workloads, and TensorRT engines plus event-driven streaming I/O for low-latency ASR.
Start building on Together AI, the AI Native Cloud.
Nemotron 3.5 ASR is built for streaming multilingual speech recognition and voice agents.
One 0.6B checkpoint. 40 language-locales. Sub-100ms latency. Cache-aware FastConformer carries context forward between chunks instead of repeatedly reprocessing overlapping audio.
Try…
Nemotron 3 Ultra is built for long-horizon agents that need to plan, code, test, debug, and iterate across large working sets.
550B parameters, 55B active, 1M context, Hybrid Mamba-Transformer MoE, Multi-Token Prediction, and multi-environment RL training.
Try it:…
Discuss this model
Add corrections, implementation notes, pricing changes, or usage caveats for other readers.