We wrote about why we made that bet 👇
deepinfra.com/blog/deepinfra…
Link
How DeepInfra Built on NVIDIA's Inference Stack and Why It Paid Off
Low pay-as-you-go pricing. No long-term contracts. Simple APIs. Scale to trillions of tokens. 100+ AI models.
deepinfra.com
We’ve been building on @nvidia's inference stack from day one — TensorRT-LLM, Dynamo, NVFP4 on Blackwell.
When DeepSeek V4 dropped, we served it on day 0. Then we moved to B300 after measuring a 4x perf increase. A workload that needed 4×H200 now runs on a single…
Pay attention to the prompt details: "A woman in her twenties dressed as a friendly party clown — light, natural face makeup, a round red nose, bright red curly clown wig, and a colorful costume — performs a magic trick on the green lawn of a cozy backyard birthday party: with…
We keep thinking "How did she do it?"
🎬 Just shipped: LTX-2.3-Distilled-Diffusers is live on Deep Infra.
1080p. 5 seconds of video. ~24 seconds on @nvidia Blackwell.
The model is live now at $0.035/second.
Stay tuned for speed improvements.
Video
Discuss this model
Add corrections, implementation notes, pricing changes, or usage caveats for other readers.