Research
Nothing from Something: Can a Language Model Discover 0?
The paper investigates the capacity of language models, specifically those around the size of GPT-2, to independently discover the mathematical concept of zero. It finds that while these models initially struggle with this out-of-distribution generalization, their performance improves significantly after exposure to multiple examples of zero, with language pretraining reducing the number of examples needed by about 50%. This suggests that language capabilities can enhance mathematical discovery in AI, which is crucial for practitioners aiming to develop models that extend beyond their training data in mathematical contexts.
language modelmathematicsdiscovery