Probing, Fusion, and Trustworthiness: A Systematic Evaluation of Foundation Model Representations for Multimodal Cancer Analysis
This study evaluates the effectiveness of foundation models (FMs) in multimodal cancer analysis, specifically using whole-slide images and transcriptomic profiles from real-world datasets (IH-BC and IH-NSCLC). The research benchmarks five FMs across eight classification tasks, revealing that unimodal representations provide complementary predictive signals and that multimodal fusion can enhance performance when no single modality is dominant. Additionally, the use of conformal prediction demonstrates the reliability of these models, ensuring that even when point predictions are incorrect, the true diagnosis can often be identified within a broader prediction set, highlighting the importance of uncertainty-aware inference in clinical applications.