Safety
DeFrame: Debiasing Large Language Models Against Framing Effects
The paper introduces "DeFrame," a novel debiasing method for large language models (LLMs) that addresses framing effects—variations in responses due to different prompt expressions. By quantifying "framing disparity," the authors enhance fairness evaluation benchmarks and demonstrate that their framing-aware debiasing technique significantly reduces bias and improves consistency across various framings. This advancement is crucial for practitioners aiming to deploy LLMs that exhibit fair and reliable responses across diverse demographic contexts.
debiasinglarge_language_modelsfairness