Chinese startup says DeepSeek-V4-Pro beats all rival open models for maths and coding.
Model details
DeepSeek V4 Pro is engineered as a sophisticated Mixture-of-Experts language model, utilizing a massive parameter count to achieve high-level reasoning and agentic capabilities. Its architecture is defined by a hybrid attention mechanism that integrates Compressed Sparse Attention and Heavily Compressed Attention, which significantly optimizes inference efficiency and KV cache usage when handling extensive context windows. To ensure signal propagation stability, the design incorporates manifold-constrained hyper-connections that reinforce traditional residual pathways, allowing the model to maintain performance across demanding, large-scale data processing tasks.
The model undergoes a rigorous two-stage post-training pipeline designed to refine its specialized capabilities. This process begins with independent domain-expert cultivation using supervised fine-tuning and group relative policy optimization, which allows the model to develop nuanced proficiency across diverse subjects. Following this, the system employs a unified model consolidation phase through on-policy distillation, effectively merging these expert insights into a cohesive, high-performing architecture that balances specialized knowledge with general-purpose utility.
In practical application, the model excels in STEM, mathematics, and complex coding environments, frequently outperforming other open-weight alternatives and approaching the performance levels of top-tier closed-source systems. While it offers significant advantages in cost-efficiency and long-context handling, users should note that the model is released as a preview, and the supplied evidence does not detail the specific composition of its pre-training datasets or the full extent of its safety alignment protocols. Consequently, while it represents a major step forward in open-source AI, its behavior in highly sensitive or adversarial contexts remains a subject for ongoing evaluation.
Chinese startup says DeepSeek-V4-Pro beats all rival open models for maths and coding.
According to @deepseek_ai, the DeepSeek API now supports the new deepseek-v4-pro and deepseek-v4-flash models with 1M context windows and dual Thinking and...
DeepSeek V4 is live with two models. V4-Pro approaches Claude Opus 4.6; V4-Flash is faster and cheaper. Here's which to use, how to migrate your API, and what the Huawei chip story actually means.
This exact model name is also listed by 4 other providers.
Keep Reviews Moving
When AI speeds up shipping, review queues get exposed fast. CodeRabbit reviews pull requests quickly, catches issues that surface late, and adds coverage before code reaches production.
Developers already feel this
Why teams adopt it