ai-digest.dev
last updated 3 h ago
SafetyarXiv cs.CL 15 d ago

MORTAR: Multi-turn Metamorphic Testing for LLM-based Dialogue Systems

The paper introduces MORTAR, a metamorphic multi-turn testing framework designed for LLM-based dialogue systems, addressing the oracle problem prevalent in multi-turn interactions. MORTAR automates the generation of dialogue test cases using multiple perturbations and metamorphic relations, demonstrating over 150% more bugs revealed per test case compared to traditional single-turn testing methods. This approach enhances the efficiency and effectiveness of quality assurance in dialogue systems, providing developers with a robust tool for comprehensive evaluation under resource constraints.

testingdialogue-systemsllmrelevance 0.00 · engagement 0.00
Read at source ↗← all news
MORTAR: Multi-turn Metamorphic Testing for LLM-based Dialogue Systems — AI News Digest