ai-digest.dev
last updated 1 h ago
ResearchMarkTechPost 11 d ago

MiniMax Sparse Attention (MSA): a Two-Branch Block-Sparse Attention Trained on a 109B-Parameter MoE With a 3T-Token Budget

MiniMax has introduced MiniMax Sparse Attention (MSA), a two-branch block-sparse attention mechanism that utilizes Grouped Query Attention (GQA). MSA features a lightweight Index Branch for selecting Top-k key-value blocks per query, allowing the Main Branch to attend only to those blocks, achieving a 28.4× reduction in per-token attention computation at a 1M context while matching GQA performance on downstream benchmarks. This advancement is significant for practitioners aiming to optimize attention mechanisms in large-scale models, particularly those leveraging mixture of experts (MoE) architectures with substantial parameter counts.

sparse-attentionmsagrouped-queryrelevance 0.00 · engagement 0.00
Read at source ↗← all news
MiniMax Sparse Attention (MSA): a Two-Branch Block-Sparse Attention Trained on a 109B-Parameter MoE With a 3T-Token Budget — AI News Digest