Inference
TimeVista: Exploring and Exploiting Vision-Language Models as Judges for Time Series Forecasting
The paper introduces TimeVista, a benchmark that utilizes Vision-Language Models (VLMs) to evaluate time series forecasting by integrating both micro- and macro-level judgments informed by contextual data. The benchmark includes 5,563 time series samples and demonstrates that VLMs achieve higher consistency with human preferences compared to traditional evaluation metrics. This development is significant for practitioners as it provides a more interpretable and human-aligned standard for assessing the performance of Time Series Foundation Models (TSFMs).
time seriesforecastingevaluation