SafetyarXiv cs.AI — 12 d ago

Pulling The REINS: Training-Free Safety Alignment of Video Diffusion Models via Representation Steering

The article introduces REINS (REpresentation-space INference-time Safety steering), a novel training-free method for aligning video diffusion models to ensure safe content generation by steering their internal representations during inference. REINS utilizes a single direction derived from Supervised PCA on binary safety labels to redirect harmful generation trajectories towards safe alternatives, demonstrating effectiveness across nine video diffusion models with parameter sizes ranging from 1.3B to 5B. This approach is significant for practitioners as it provides a lightweight solution to enhance safety in video generation without the need for extensive retraining or computational overhead.

video diffusionsafety alignmentREINSrelevance 0.00 · engagement 0.00

Read at source ↗← all news