Inference
S4oP: Operator-level Pruning of Structured State Space Models for Resource-Constrained Devices
The paper presents a novel operator-level pruning method for Structured State Space Models (SSMs), specifically targeting the S4 and S4D architectures, to enhance their deployment in resource-constrained environments. This approach allows for the pruning of up to 70% of model operators while maintaining predictive performance, achieved through a combination of structured masking and fine-tuning within a unified training framework. The findings indicate that this method effectively reduces inference latency, making SSMs more viable for practical applications where computational resources are limited.
state-space-modelspruningresource-constraints