Multimodal
LIBERO-Occ: Evaluating and Improving Vision-Language-Action Models under Scene-Induced Occlusion via Viewpoint Imagination
The paper introduces LIBERO-Occ, an extension of the LIBERO framework aimed at addressing performance degradation of Vision-Language-Action (VLA) models under scene-induced occlusion. It presents a novel technique called Viewpoint Imagination (VIM), which generates complementary views to enhance action prediction without requiring additional cameras, demonstrating improved robustness across various task suites and occlusion scenarios. This advancement is significant for practitioners as it enhances the reliability of VLA models in real-world applications where occlusion is common.
vision-languageocclusionaction prediction