ai-digest.dev
last updated 4 min ago
CodingarXiv cs.CL 2 d ago

CodeAlchemy: Synthetic Code Rewriting at Scale

CodeAlchemy is a synthetic data generation framework designed to enhance training data for code-related tasks by transforming publicly sourced code using five strategies, resulting in over 500 billion tokens of synthetic data and 350 billion reasoning tokens. The framework includes benchmarks such as DevEval and TraceEval, with 3B models achieving an 83.5% pass rate on HumanEval and outperforming larger models like 27B Gemma-3 and 32B Granite-4.0 by a factor of ten in certain tasks, highlighting the potential of synthetic data in improving semantic understanding in code generation and execution tasks for AI practitioners.

codesynthetic-datacode-rewritingrelevance 0.00 · engagement 0.00
Read at source ↗← all news