ModelsarXiv cs.AI — 8 d ago

Unifying Acoustic Features and Text with Multimodal LLMs for Neurodegenerative Screening

The paper introduces NeurMLLM, a multimodal generative framework that integrates acoustic features and text for the staging of neurodegenerative diseases like Alzheimer's and Parkinson's. Utilizing vision transformers to encode audio spectrograms and Mel-frequency cepstral coefficients, NeurMLLM combines these representations with transcript and demographic data within a large language model's embedding space. It employs Low-Rank Adaptation for instruction tuning, achieving superior performance on the Bridge2AI-Voice dataset compared to traditional machine learning and existing LLM methods, highlighting its potential for enhancing diagnostic accuracy and accessibility in clinical settings.

neurodegenerativemultimodalscreeningrelevance 0.00 · engagement 0.00

Read at source ↗← all news