Training
LiFT: Local Search via Linear Programming for Overfitting-Controlled Transformers
The paper introduces LiFT (Linear Programming-based Fine-Tuning), a novel framework for fine-tuning transformer models that employs linear programming to control overfitting. This approach formulates fine-tuning as a bilevel optimization problem, enabling joint updates of model parameters and regularization hyperparameters using validation gradients and training Hessian information. Experiments with GPT-2 Small on WikiText-2 show that LiFT achieves significant improvements in test perplexity, particularly in overfitting-prone scenarios, and establishes a theoretical foundation linking fine-tuning with optimization and regularization theory, which is crucial for practitioners aiming for robust model performance.
fine-tuningtransformersoverfitting