MAI-Voice-2: expressive text-to-speech across 15 languages with emotional control (excited, embarrassed, whispered) and stable speaker identity across long-form content. Pairs with MAI-Transcribe-1.5.
Use it now: openrouter.ai/microsoft/mai-…
MAI-Transcribe-1.5: SOTA speech-to-text across 43 languages, transcribing 1 hour of audio in under 15 seconds.
🥇 #1 on FLEURS averaged across 43 languages
🥇 #1 on @ArtificialAnlys Accuracy x Speed Pareto Frontier
🥉 #3 on AA WER at 2.4%
Use it now:…
Discuss this model
Add corrections, implementation notes, pricing changes, or usage caveats for other readers.