AgentsarXiv cs.AI — 15 d ago

DiffAttn: Diffusion-Based Drivers' Visual Attention Prediction with LLM-Enhanced Semantic Reasoning

The article presents DiffAttn, a diffusion-based framework for predicting drivers' visual attention, which utilizes a Swin Transformer as an encoder and a Feature Fusion Pyramid decoder to enhance both local and global scene feature modeling. By incorporating a large language model (LLM) for top-down semantic reasoning, DiffAttn achieves state-of-the-art performance across four public datasets, significantly improving the interpretability of driver-centric scene understanding and enhancing safety-critical cue sensitivity. This advancement holds implications for intelligent vehicle systems, potentially improving human-machine interaction and risk perception.

visual attentionintelligent vehiclesrelevance 0.00 · engagement 0.00

Read at source ↗← all news