Training
Do We Still Need Humans in the Loop? Comparing Human and LLM Annotation in Active Learning for Hostility Detection
The study investigates the feasibility of using instruction-tuned large language models (LLMs) for annotation in active learning (AL) for hostility detection, utilizing a dataset of 277,902 German political TikTok comments. Results show that LLMs, specifically GPT-5.2 and Qwen3.5-122B-10B, can outperform human annotators at a significantly lower cost, with LLM annotation achieving better performance when employing a two-question interface. This research suggests that while LLMs can efficiently label data, the necessity of AL is diminished, as random sampling may suffice, highlighting the nuanced differences in error structures across models which could impact deployment in real-world applications.
llmactive-learningannotation