Multimodal
Seeing Roads Through Words: A Language-Guided Framework for RGB-T Driving Scene Segmentation
The article presents CLARITY, a novel framework for RGB-T driving scene segmentation that adapts its fusion strategy based on the detected scene conditions, leveraging vision-language model priors. It introduces mechanisms to preserve dark-object semantics and a hierarchical decoder for structural consistency, achieving state-of-the-art results on the MFNet dataset with 62.3% mean Intersection over Union (mIoU) and 77.5% mean Accuracy (mAcc). This approach addresses the challenges of adverse illumination and enhances segmentation accuracy, which is critical for improving the robustness of autonomous driving systems.
semantic segmentationautonomous drivingrgb-thermal