CodingarXiv cs.AI — 9 d ago

Beyond Correctness: Enhancing Architectural Reasoning in Code LLMs via Scalable Labeling with Agentic Judgment

The article presents a novel agentic judging pipeline leveraging a strong LLM to enhance architectural reasoning in software engineering, specifically using the Qwen models (3-8B, 14B, 32B). Fine-tuning these models on 3,360 curated instances resulted in a resolved rate of 27.2% on the SWE-bench Verified, marking significant improvements of up to 540% over the base model. This approach enables scalable architectural evaluation, which is crucial for practitioners aiming to improve code quality and maintainability in real-world development environments.

llmsoftware engineeringarchitectural reasoningrelevance 0.00 · engagement 0.00

Read at source ↗← all news