Training
Small LLMs for Biomedical Claim Verification: Cost-Effective Fine-Tuning, Structural Dataset Shortcuts, and Cross-Domain Generalization
Researchers fine-tuned three small LLMs—Phi-3-mini (3.8B), Qwen2.5-3B, and Mistral-7B—using QLoRA on the SciFact and HealthVer datasets for biomedical claim verification, achieving significant performance improvements. Notably, Mistral-7B outperformed both GPT-4o and GPT-5 by up to 12% in F1 scores while requiring only 1,008 training examples, demonstrating the efficacy of small models in this domain. This work highlights the importance of dataset structure for cross-domain generalization and plans to release all code and adapter checkpoints, which will aid practitioners in developing cost-effective LLM solutions.
fine-tuningbiomedicalcross-domain