MultimodalarXiv cs.AI — 15 d ago

The Scaffold Effect: How Prompt Framing Drives Apparent Multimodal Gains in Clinical VLM Evaluation

The study evaluates 12 open-weight vision-language models (VLMs) in binary classification tasks across two clinical neuroimaging datasets, \textsc{FOR2107} and \textsc{OASIS-3}. It finds that smaller models can achieve up to 58% F1 score improvements when neuroimaging context is introduced, largely due to prompt framing rather than actual data integration, indicating a phenomenon termed the "scaffold effect." These results highlight the potential pitfalls of relying on surface-level performance metrics in clinical AI applications, emphasizing the need for deeper evaluation of multimodal reasoning capabilities.

clinicalvlmneuroimagingrelevance 0.00 · engagement 0.00

Read at source ↗← all news