Research
When Metrics Disagree: A Meta-Analysis of Knowledge-Graph-Completion Model Benchmarking
This article presents a meta-analysis addressing the inconsistencies in evaluating Knowledge Graph Completion (KGC) models, highlighting the limitations of traditional rank-based metrics like MRR and Hits@$k$. By framing KGC evaluation as a Multi-Criteria Decision-Making (MCDM) problem, the authors identify Z-score as the optimal aggregator for balancing performance across various metrics, with DualE excelling in tail prediction and Flow-Modulated Scoring (FMS) leading in relation prediction. This framework enhances the reliability of model comparisons and offers practitioners a systematic approach for benchmarking in diverse KGC contexts, mitigating issues of selective reporting and dataset-specific performance.
knowledge graphbenchmarkingevaluation