Training
Want Better Synthetic Data? Steer It: Activation Steering for Low-Resource Language Generation
This study introduces activation steering as a novel approach for synthetic data generation in low-resource languages, contrasting it with traditional few-shot prompting methods. The authors present two strategies: Language Steering for linguistic identity and Quality Steering for well-formedness, evaluated across four open-source LLMs and 11 languages. Results indicate that early layer steering enhances data diversity and downstream task performance, making it a significant advancement for practitioners focusing on low-resource language applications.
synthetic datalow-resourcelanguage generation