Add corrections, implementation notes, pricing changes, or usage caveats for other readers.
Knowledge cutoff
2024-12
Input modalities
Output modalities
Capabilities
Recent tweets and retweets from Inference
Finished recording the Tiny Models DPO video. Working on manim illustrations now.
What it'll cover:
- Measuring diversity in LM responses
- Generate preference data locally
- DPO (+ RM, ORPO)
- Training DPO w Unsloth/TRL
- Evals
Thanks to @inference_net for…
3 weeks ago we open-sourced HALO
this led to talking with dozens of teams running agents at scale
we realized the current agent monitoring tools aren't built for the future that we so clearly see ahead of us
today we’re releasing native OpenTelemetry-compatible agent…
The best production model is the one trained for the job.
Gravity Ads replaced a 70B model on Cerebras with a specialized 1B model trained for their actual workload.
Same quality, much faster and cheaper inference:
- p50: 152ms
- p99: 5.7x lower
- cost: ~10x lower
-…
Discuss this model
Add corrections, implementation notes, pricing changes, or usage caveats for other readers.