QueryGaussian: Scalable and Training-Free Open-Vocabulary 3D Instance Retrieval
QueryGaussian is a novel, training-free framework designed for scalable open-vocabulary 3D instance retrieval, addressing the limitations of existing scene-level embedding methods that struggle with memory and computational costs in complex environments. It utilizes a unique instance-level query mechanism, leveraging pre-trained 2D vision models and a maximum-weight association strategy to enhance semantic-visual consistency while incorporating a temporal fusion module for improved projection accuracy. Experimental results indicate that QueryGaussian achieves comparable accuracy to state-of-the-art approaches while reducing GPU memory usage by over 70% and accelerating inference by 180x, making it feasible for efficient instance retrieval in large-scale urban scenes on consumer-grade hardware.