MultimodalarXiv cs.CL — 8 d ago

Multimodal Speaker Identification in Classroom Environments

The study presents a multimodal speaker identification framework that combines acoustic embeddings with semantic context derived from large language models (LLMs) to improve identification accuracy in noisy classroom environments. Utilizing the EDSI dataset, the baseline model (ECAPA-TDNN) achieved 39.0% accuracy, while the multimodal approach with contextual anchoring enhanced student identification to 50.3%, and significantly improved performance on longer utterances to 76.9% accuracy. This advancement is critical for developing automated feedback systems that can effectively analyze individual student participation, thereby promoting equitable instruction in educational settings.

speaker identificationclassroommultimodalllmrelevance 0.00 · engagement 0.00

Read at source ↗← all news