Multimodal
FreeSonic: Training-Free Temporal-Aware Decoupled Attention for Precise Audio Editing
FreeSonic is a training-free framework for text-to-audio generation that addresses challenges in precise audio editing by utilizing the Rectified Flow-based TangoFlux model. It incorporates an optimized inversion-reverse process and joint text-audio attention maps to enhance target segment extraction, while a novel scheduled attention decoupling technique allows for modifications in specific regions without disturbing the original audio context. This approach, along with task-oriented noise injection, provides a high-fidelity solution for diverse audio editing tasks, making it significant for practitioners focused on improving audio editing precision and consistency.
audioediting