Research
Anything Goes? A Crosslinguistic Study of (Im)possible Language Learning in LMs
This study investigates the capacity of language models (LMs), specifically GPT-2 small, to learn impossible and typologically unattested languages compared to attested ones. By training on 12 languages from four families and using newly constructed parallel corpora, the research reveals that while GPT-2 small can partially differentiate between attested and unattested languages, it does not achieve perfect accuracy. The findings indicate that LMs exhibit some inductive biases similar to human language learners, but these biases are less pronounced, highlighting implications for understanding the limitations of LMs in language learning scenarios.
language_learninglmtypology