ModelsarXiv cs.AI — 15 d ago

DeepSeek-V4: Towards Highly Efficient Million-Token Context Intelligence

The DeepSeek-V4 series introduces two Mixture-of-Experts (MoE) language models: DeepSeek-V4-Pro with 1.6 trillion parameters (49 billion activated) and DeepSeek-V4-Flash with 284 billion parameters (13 billion activated), both capable of processing contexts up to one million tokens. Key architectural advancements include a hybrid attention mechanism utilizing Compressed Sparse Attention (CSA) and Heavily Compressed Attention (HCA), along with Manifold-Constrained Hyper-Connections (mHC) and the Muon optimizer, which collectively enhance efficiency and stability. This development significantly reduces inference FLOPs and KV cache usage for long-context scenarios, making it a valuable resource for practitioners focusing on large-scale, long-horizon tasks in AI applications.

Mixture-of-Expertscontext lengthlanguage modelsrelevance 0.00 · engagement 0.00

Read at source ↗← all news