Add corrections, implementation notes, pricing changes, or usage caveats for other readers.
Knowledge cutoff
2024-12
Input modalities
Output modalities
Capabilities
131,072 tokens
Recent tweets and retweets from D.Run (China)
vLLM v0.20.1 is now available. During this release cycle, DaoCloud collaborated deeply with community partners to deliver over 10 stability fixes and performance optimizations for DeepSeek V4, helping DeepSeek V4 run more stably and faster on vLLM. @vllm_project
Three core AI inference cluster challenges: GPU topology, scheduling, cost validation.
🎤DaoCloud's Weizhou Lan presents topology discovery for Kueue scheduling, using Kwok virtual nodes for cost-free validation to boost GPU and cluster efficiency.
Growing AI workloads such as LLMs make cache-aware scheduling and routing critical to GPU efficiency.
🎤Kay Yan (DaoCloud) collaborates with Red Hat & IBM to resolve K8s KV-Cache fragmentation, sharing high-performance LLM inference best practices.
📢KubeEdge has become the preferred platform for Kubernetes edge scaling!
🎤DaoCloud's Hongbing Zhang will team up with Huawei and VMware experts to share edge workload management architecture, multi-domain cases, and the latest technical updates and community trends.
📊How to achieve granular bandwidth management?
🎤 Weizhou Lan, Senior Technical Director of DaoCloud, will showcase a lightweight cluster solution using Cilium CNI + Spiderpool for tenant-level ingress bandwidth management in Kubernetes without extra external complexity.
Discuss this model
Add corrections, implementation notes, pricing changes, or usage caveats for other readers.