ai-digest.dev
last updated 2 h ago
ResearcharXiv cs.AI 15 d ago

Constitutional On-Policy Safe Distillation

The article presents Constitutional On-Policy Safe Distillation (COPSD), a novel approach to on-policy self-distillation that addresses the collapse issues observed in safety alignment tasks. By utilizing a Cross-SFT cold-start to calibrate the teacher and applying constitutional conditioning, COPSD improves the safety-helpfulness trade-off across 12 benchmarks, while mitigating the expressiveness reduction typically associated with safety pressures. This advancement is significant for practitioners as it enhances the robustness of AI models in reasoning tasks while maintaining safety compliance.

self-distillationsafetyrelevance 0.00 · engagement 0.00
Read at source ↗← all news
Constitutional On-Policy Safe Distillation — AI News Digest