Multimodal
CANVAS: Captioning Art with Narrative Visual-Audio AI Systems
The study introduces CANVAS, an automated system that generates multi-sensory art descriptions and synchronized audio narration for blind and low-vision audiences using large language models and text-to-speech technology. The system achieves higher lexical diversity and narrative detail in its outputs compared to traditional captions, producing text-plus-audio in under 20 seconds per image at a cost below $0.05. This advancement has significant implications for enhancing accessibility in museums and digital collections, potentially improving public engagement with art.
llmaccessibilityaudio