Add corrections, implementation notes, pricing changes, or usage caveats for other readers.
Input modalities
Output modalities
Capabilities
262,144 tokens
Recent tweets and retweets from Friendli
7 days to SuperAI 2026 🇸🇬 Come see us at #PB21
We're bringing the Frontier Inference Cloud for Agents to Marina Bay Sands, June 10–11.
Agents bottlenecked on latency, cost, or reliability? Let's talk.
🎤 CEO, Byung-Gon Chun @ Plaud Main Stage
🎤 VP of Sales (APAC),…
2. How to estimate GPU VRAM requirements for an AI model:
A common heuristic is reserving an additional 20% of weight memory, though high-concurrency or long-context workloads may require 30–40%+.
Examples with 20% headroom:
→ DeepSeek-V4-Flash: 142 GB + 28.4 GB = 170.4…
1. How to estimate GPU VRAM requirements for an AI model:
Start with the minimum VRAM required to load the model:
W = P × B
Where:
P = parameter count
B = bytes per parameter at your target precision
Real-world examples:
→ DeepSeek-V4-Flash (284B at 4-bit): 142 GB
→…
What it delivered:
— Autoscaling absorbs submission peaks with zero manual intervention
— Automatic fault recovery keeps service stable through surges
— Current A100 setup profiled to handle 2× traffic, clear path to H100/H200
— One platform to consolidate workloads…
As the service grew, Upstage needed an inference platform that could:
Handle bursty, time-sensitive traffic efficiently
Consolidate workloads from multiple vendors
Scale cost-effectively — without over-provisioning GPUs
They chose FriendliAI Dedicated Endpoints as their…
Discuss this model
Add corrections, implementation notes, pricing changes, or usage caveats for other readers.