Research
FLiP: Towards understanding and interpreting multimodal multilingual sentence embeddings
The paper introduces the Factorized Linear Projection (FLiP) models, which enhance the understanding of pretrained multilingual and multimodal sentence embeddings, specifically LaBSE, SONAR, and Gemini. FLiP achieves over 75% recall of lexical content from these embeddings, outperforming existing non-factorized approaches, and serves as a diagnostic tool to reveal modality and language biases in sentence encoders. This insight is valuable for practitioners as it provides intrinsic evaluations of embedding models without needing conventional downstream tasks.
sentence-embeddingsmultimodalllm