Research
How Far Can Machine Translation Quality Take You? Extrinsic Discourse Evaluation in Goal-Oriented Setups
This work introduces an extrinsic discourse evaluation framework for machine translation (MT), assessing how translation quality affects downstream communication in goal-oriented setups. It distinguishes between static and interactive regimes, proposing an entity counting task for referential consistency and using a multi-agent Welfare Diplomacy game to evaluate long-horizon communication. The findings reveal that high intrinsic MT quality does not guarantee effective discourse outcomes, emphasizing the need for discourse-sensitive evaluation methods in MT systems.
machine-translationevaluationdiscourse