Safety
CHILLGuard: Towards Fine-Grained Chinese LLM Safety Guardrail with Scalable Data Construction and Model-aware Preference Alignment
The paper introduces CHILLGuard, a safety guardrail specifically designed for Chinese large language models (LLMs), addressing the limitations of existing models in adapting to Chinese regulatory and cultural contexts. It features a fine-grained risk taxonomy with 5 macro and 31 micro categories, supported by a scalable data construction pipeline that includes a training set of 405,007 samples and a test set of 51,745 samples. CHILLGuard demonstrates state-of-the-art performance, achieving a 15.92% improvement in F1 score over Qwen3Guard-8B-Strict, making it a significant advancement for practitioners focused on deploying LLMs in Chinese environments.
llmsafetyguardrails