Research
PRISM: Perception Reasoning Interleaved for Sequential Decision Making
The paper introduces PRISM, a novel framework that integrates perception and decision-making for embodied agents using a dynamic question-answering pipeline that enhances interactions between Vision-Language Models (VLMs) and Language Models (LLMs). PRISM demonstrates significant improvements on the ALFWorld and Room-to-Room (R2R) benchmarks, outperforming existing image-based models by enabling a closed-loop critique and synthesis process that results in a more effective understanding of complex multimodal environments. This advancement is crucial for practitioners as it bridges the perception-reasoning gap, facilitating the development of more capable AI systems in real-world applications.
temporal-knowledge-graphsreasoningmemory