SafetyarXiv cs.CL — 16 d ago

Self-Preference Is Weak or Absent in Verifiable Instruction-Following Revision: A Four-Model Test Under Genuine Authorship

The study investigates self-preference bias in large language models (LLMs) during instruction-following revisions using the IFEval framework. Testing four mid-tier model families, the researchers found no significant evidence that models favor their own outputs over verified corrections, with a negligible difference in rejection rates (gap -5.1 pp) between authors and fresh models. This finding is crucial for practitioners, as it suggests that LLMs may not inherently resist valid corrections, impacting how they can be utilized for automated text revision and quality assurance.

llmself-preferencerevisionrelevance 0.00 · engagement 0.00

Read at source ↗← all news