ResearchReddit r/LocalLLaMA — 6 d ago

Dual DGX Sparks- 40tk/s single 1M ; 350 tk/s agg. - Deepseek V4 Flash (vs RTX Pro 6000 vs Mac M2 Ultra 192)

The article discusses the performance of two DGX Sparks running the Deepseek V4 Flash model, achieving approximately 40 tokens per second (tk/s) on a single instance and up to 350 tk/s aggregate across 32 concurrent requests with a context length of 256k using FP8 quantization. Benchmark comparisons indicate that the DGX Sparks outperform the RTX Pro 6000 and Mac M2 Ultra in terms of decoding speed and concurrency capabilities, making them a compelling choice for practitioners needing high throughput for large mixture of experts (MOE) models. The findings highlight the importance of hardware configuration, specifically the need for a 200G/s cable connection, to fully leverage the performance potential of these systems.

deepseekbenchmarkmodelsperformancerelevance 0.00 · engagement 0.00

Read at source ↗← all news