Research
Meet Flash-KMeans: An IO-Aware, Exact K-Means That Runs Over 200× Faster Than FAISS on GPUs
Flash-KMeans is an open-source, IO-aware implementation of Lloyd's k-means algorithm optimized for GPUs using Triton kernels. It achieves significant performance improvements, reporting 17.9× end-to-end speedup, 33× over cuML, and over 200× faster than FAISS on an NVIDIA H200, by eliminating distance-matrix materialization and reducing atomic contention. This advancement is crucial for practitioners as it enhances the efficiency of clustering tasks in large-scale data processing scenarios.
k-meansgpuoptimization