ai-digest.dev
last updated 2 h ago
CodingarXiv cs.AI 9 d ago

Beyond Correctness: Enhancing Architectural Reasoning in Code LLMs via Scalable Labeling with Agentic Judgment

The article presents a novel agentic judging pipeline leveraging a strong LLM to enhance architectural reasoning in software engineering, specifically using the Qwen models (3-8B, 14B, 32B). Fine-tuning these models on 3,360 curated instances resulted in a resolved rate of 27.2% on the SWE-bench Verified, marking significant improvements of up to 540% over the base model. This approach enables scalable architectural evaluation, which is crucial for practitioners aiming to improve code quality and maintainability in real-world development environments.

llmsoftware engineeringarchitectural reasoningrelevance 0.00 · engagement 0.00
Read at source ↗← all news
Beyond Correctness: Enhancing Architectural Reasoning in Code LLMs via Scalable Labeling with Agentic Judgment — AI News Digest