Research
EyeMVP: OCT-Informed Fundus Representation Learning via Paired CFP--OCT Pretraining
EyeMVP is a cross-modal retinal foundation model that leverages paired color fundus photography (CFP) and optical coherence tomography (OCT) images for improved representation learning. Pretrained on 674,893 image triples from 112,642 patients, it employs cross-modal masked reconstruction and a novel source-constrained cross-attention mechanism to enhance CFP representations. EyeMVP demonstrates superior performance across 16 downstream tasks, achieving an AUROC of 0.948 for macular edema, indicating its potential to significantly enhance retinal screening practices using CFP images alone.
medical imagingrepresentation learningcross-modal