Multimodal
Fusion of Pervasive RF Data with Spatial Images via Vision Transformers for Enhanced Mapping in Smart Cities
The paper introduces a deep learning approach utilizing the DINOv2 architecture to enhance building mapping by fusing radio frequency (RF) data with spatial images through a vision transformer framework. The model achieves a macro IoU of 65.3% on a synthetic dataset, outperforming traditional methods that rely solely on RF or spatial data, and demonstrates robust performance with real-world data from Oslo at 64.9% macro IoU. This work is significant for practitioners as it highlights the potential of AI in improving mapping accuracy in smart cities by effectively integrating heterogeneous data sources.
mappingsmart-citiestransformers