RAGarXiv cs.AI — 11 d ago

RoTRAG: Rule of Thumb Reasoning for Conversation Harm Detection with Retrieval-Augmented Generation

RoTRAG is a novel retrieval-augmented framework designed for detecting harmful content in multi-turn dialogues by integrating human-written moral norms, termed Rules of Thumb (RoTs). It retrieves relevant RoTs for each conversational turn, enhancing reasoning and severity classification, and introduces a binary routing classifier to optimize the need for retrieval, achieving an average 40% improvement in F1 scores and an 8.4% reduction in distributional error across benchmark datasets such as ProsocialDialog and Safety Reasoning Multi Turn Dialogue. This approach enhances interpretability and efficiency in harm assessment, making it valuable for practitioners focused on developing robust conversational AI systems.

harm detectionconversationretrieval-augmented generationrelevance 0.00 · engagement 0.00

Read at source ↗← all news