Inference
AQ4SViT: An Automated Quantization Framework with Search Gating Policy for Compressing Spiking Vision Transformers
AQ4SViT is an automated quantization framework designed for Spiking Vision Transformers (SViTs) that addresses the challenges of model size and deployment on resource-constrained systems. It introduces a quantization search strategy and a search gating policy, utilizing Greedy and Beam search algorithms to optimize quantization settings, achieving up to 82.5% memory savings with a 6.6x faster search time while maintaining accuracy within 1.5% of non-quantized models. This framework significantly enhances the scalability and efficiency of deploying SViTs in embedded AI applications.
quantizationspikingtransformers