MultimodalarXiv cs.AI — 7 d ago

Dense Coordinate-List Fine-Tuning Induces a Controllable Interference Surface in Vision-Language Models

The paper presents a novel fine-tuning approach for vision-language models, specifically Gemma 4 12B and Qwen3-VL-8B, utilizing dense coordinate lists to enhance visual grounding while managing structured output behaviors. The adaptation, which employs high-capacity low-rank adaptation (LoRA), significantly improves class-aware F1 scores from 0.007 to 0.448, while controlling duplicate outputs, achieving a duplicate rate of 0.000 and maintaining high performance metrics. This method introduces a controllable interference surface that allows practitioners to better manage output quality and structure in vision-language tasks, thereby enhancing model reliability in real-world applications.

vision-languagefine-tuningrelevance 0.00 · engagement 0.00

Read at source ↗← all news