AgentsarXiv cs.AI — 10 d ago

Fast When, Careful Who: Dual-Process Multiparty Turn-Taking with Diffusion Augmentation

The paper presents a two-stage audio-only pipeline for multiparty turn-taking in spoken dialogue systems, addressing challenges with overlapping speech and rapid speaker transitions. It consists of a fast trigger for proposing end-of-turn times and a lightweight verifier for determining speaker shifts, achieving improved shift detection on the VoxConverse dataset. The introduction of diffusion-based background audio mixing as a data augmentation technique further enhances performance, making this approach relevant for practitioners developing more robust multiparty interaction systems.

turn-takingdialogue systemsaudiodiffusionrelevance 0.00 · engagement 0.00

Read at source ↗← all news