TrainingarXiv cs.AI — 10 d ago

Fine-Tuning a 7B Advisor on Free-Tier GPUs: An Adapter-Handoff Recipe and a Synthetic-Data Reliability Caution

The article presents a method for fine-tuning the 7B Mistral-7B-Instruct-v0.3 model using a three-epoch QLoRA approach on free-tier GPUs, specifically utilizing 4-bit NF4 quantization and a LoRA rank of 16. It highlights a successful adapter-handoff technique that allows for checkpointing only the 41.9M parameter LoRA adapter across two GPUs, thus optimizing resource usage. However, the evaluation reveals that while the fine-tuned model shows improved alignment with the synthetic training distribution, it performs worse in advising quality and factual accuracy, raising concerns about the reliability of synthetic data in training. The authors provide the dataset, adapter, and evaluation tools for reproducibility.

fine-tuninglanguage modeladapterrelevance 0.00 · engagement 0.00

Read at source ↗← all news