ai-digest.dev
last updated 4 h ago
ResearcharXiv cs.AI 9 d ago

TokenRatio: Principled Token-Level Preference Optimization via Ratio Matching

The paper introduces Token-level Bregman Preference Optimization (TBPO), a method that enhances Direct Preference Optimization (DPO) by modeling token-level preferences through a Bregman-divergence density-ratio matching objective. TBPO consists of two variants, TBPO-Q, which incorporates a lightweight state baseline, and TBPO-A, which utilizes advantage normalization, both improving alignment quality, training stability, and output diversity across various benchmarks. This development is significant for practitioners as it provides a more granular approach to preference optimization in language models, potentially leading to better performance in tasks requiring nuanced understanding and generation.

preference-optimizationllmtoken-levelrelevance 0.00 · engagement 0.00
Read at source ↗← all news
TokenRatio: Principled Token-Level Preference Optimization via Ratio Matching — AI News Digest