Coding
Multi-LCB: Extending LiveCodeBench to Multiple Programming Languages
Multi-LCB is a newly introduced benchmark that extends the LiveCodeBench (LCB) framework to evaluate large language models (LLMs) across twelve programming languages, including Python. It adapts Python tasks from LCB into equivalent tasks in other languages while maintaining contamination controls and evaluation protocols, allowing for systematic assessment of multilingual code generation capabilities. The evaluation of 24 LLMs revealed issues such as Python overfitting and significant performance disparities across languages, highlighting the need for improved generalization in LLMs for real-world software engineering tasks.
benchmarkcode generationmulti-language