Coding
Learn from Your Mistakes: Tree-like Self-Play for Secure Code LLMs
The article introduces Tree-like Self-Play (TSP), a novel framework designed to enhance secure code generation in Large Language Models (LLMs) by treating the process as a fine-grained sequential decision-making game. TSP constructs a decision tree that allows models to explore secure and vulnerable code paths, resulting in a pass rate of 75.8% on Python security benchmarks for CodeLlama-7B, significantly surpassing the 57.0% achieved by Supervised Fine-Tuning (SFT). This approach not only improves reliability by reducing localized vulnerabilities but also enables robust out-of-distribution generalization across multiple programming languages, indicating the model's ability to internalize language-agnostic security principles.
codesecurityllm