Models
Unifying Acoustic Features and Text with Multimodal LLMs for Neurodegenerative Screening
The paper introduces NeurMLLM, a multimodal generative framework that integrates acoustic features and text for the staging of neurodegenerative diseases like Alzheimer's and Parkinson's. Utilizing vision transformers to encode audio spectrograms and Mel-frequency cepstral coefficients, NeurMLLM combines these representations with transcript and demographic data within a large language model's embedding space. It employs Low-Rank Adaptation for instruction tuning, achieving superior performance on the Bridge2AI-Voice dataset compared to traditional machine learning and existing LLM methods, highlighting its potential for enhancing diagnostic accuracy and accessibility in clinical settings.
neurodegenerativemultimodalscreening