SafetyarXiv cs.CL — 7 d ago

Can Factual Opinions Be Edited (Manipulated) in Large Language Models?

The article introduces the Factual Opinion Editing with Evidence (FOE) benchmark, which evaluates the manipulation of factual opinions in Large Language Models (LLMs) across 261 public figures and 19 issue categories. It highlights that existing editing techniques struggle to modify factual opinions effectively while maintaining alignment with evidence, often resulting in superficial changes. The proposed Self-Generated Evidence-Aligned method aims to improve opinion-evidence consistency, addressing critical security concerns for practitioners working with LLMs in sensitive applications.

llmeditingfactual-opinionsrelevance 0.00 · engagement 0.00

Read at source ↗← all news