Finished recording the Tiny Models DPO video. Working on manim illustrations now. What it'll cover: - Measuring diversity in LM responses - Generate preference data locally - DPO (+ RM, ORPO) - Training DPO w Unsloth/TRL - Evals Thanks to @inference_net for…
Loading models
Discuss this model
Add corrections, implementation notes, pricing changes, or usage caveats for other readers.