ai-digest.dev
last updated 13 h ago
MultimodalarXiv cs.AI 7 d ago

BASENet: Band-Adapted Speech Enhancement Network with Cross-Band Attention

BASENet is a novel speech enhancement model that utilizes a frequency-adapted architecture, partitioning the spectrum into Bark-scale bands with scaled-capacity encoders to optimize performance based on human auditory perception. It features a cross-band attention module for capturing harmonic dependencies and is built on inverted residual blocks with dense connectivity, achieving a PESQ score of 3.55 and a STOI of 96% on the VoiceBank+DEMAND dataset with just 0.83M parameters and 7.3 G MACs. This efficiency, particularly in its causal variant, makes BASENet suitable for real-time applications on resource-constrained devices, offering significant implications for practitioners focusing on low-latency speech enhancement solutions.

speech-enhancementnetworkattentionrelevance 0.00 · engagement 0.00
Read at source ↗← all news
BASENet: Band-Adapted Speech Enhancement Network with Cross-Band Attention — AI News Digest