Training
Fine-tuning Multi-modal LLMs with ART: Art-based Reinforcement Training
The article introduces ART (Art-based Reinforcement Training), a novel fine-tuning method for Multimodal Large Language Models (MLLMs) that optimizes raw visual input without altering the model's computational graph. This approach allows for the application of soft-token techniques on pre-compiled models, demonstrating competitive accuracy with existing methods like LoRA on benchmarks related to mathematics and structured tool use. ART's ability to leverage visual input stylization as computational artworks presents a new avenue for practitioners seeking efficient fine-tuning strategies without extensive model modifications.
fine-tuningLLMreinforcement training