Coding
From Tokens to Regions: CUDA-Sensitive Instruction Tuning for GPU Kernel Generation
The paper introduces CUDA-Sensitive Instruction Tuning (CuSeT), a novel method for generating high-performance CUDA kernels using Large Language Models (LLMs). CuSeT employs adaptive token-level masking and region-aware sample reweighting to enhance the model's understanding of CUDA sensitivity at both token and region levels, resulting in improved functional correctness across various model families. This approach offers a low-cost alternative to traditional supervised fine-tuning and outperforms existing methods while maintaining lower inference costs, making it significant for practitioners focused on efficient GPU kernel generation.
cudakernel generationllm