Add corrections, implementation notes, pricing changes, or usage caveats for other readers.
Knowledge cutoff
2025-04
Input modalities
Output modalities
Capabilities
Recent tweets and retweets from LMStudio
MTP means Multi Token Prediction. It's a speculative decoding technique that can result in large inference speedups in many cases.
1. Update to LM Studio 0.4.14
2. Download a model that supports MTP like Qwen3.6-35B-A3B-MTP-GGUF or Qwen3.6-27B-MTP-GGUF
3. Enable it when…
Subagents running locally and simultaneously on MacBook Pro M5 with Codex CLI + @lmstudio to review code and find bugs using Qwen 3.6
Powered by the updated MLX engine with batching in beta in the app
The batching speed boost is noticeable
Video
Batching for vision models is now available in Beta with our latest MLX engine update 👾
The updated engine also brings major improvements to caching for faster inference overall.
Turn on Developer Mode, choose the beta runtime channel, and select LM Studio MLX v1.8.1.
Video
Discuss this model
Add corrections, implementation notes, pricing changes, or usage caveats for other readers.