Inference
Rollin' MiMo-2.5 on two Halo Strixeses
The article discusses the deployment of the MiMo-2.5 model on two 128GB machines equipped with Intel Xeon 8060 processors, using Proxmox for container management and a USB4 network secondary link. It reports achieving 356 perplexity and 15 token generation metrics at a context length of 10,000 tokens, highlighting the challenges faced in building and serving models with various backends like vllm and sglang on consumer hardware. This information is relevant for practitioners as it outlines practical performance benchmarks and the complexities of model deployment in non-datacenter environments.
mimoperformance