Safety
MORTAR: Multi-turn Metamorphic Testing for LLM-based Dialogue Systems
The paper introduces MORTAR, a metamorphic multi-turn testing framework designed for LLM-based dialogue systems, addressing the oracle problem prevalent in multi-turn interactions. MORTAR automates the generation of dialogue test cases using multiple perturbations and metamorphic relations, demonstrating over 150% more bugs revealed per test case compared to traditional single-turn testing methods. This approach enhances the efficiency and effectiveness of quality assurance in dialogue systems, providing developers with a robust tool for comprehensive evaluation under resource constraints.
testingdialogue-systemsllm