Research
Automated reproducibility assessments in the social and behavioral sciences using large language models
This study presents a framework for automating reproducibility assessments in social and behavioral sciences using large language models (LLMs). The authors evaluated LLM performance on 76 published studies, finding that the LLM accurately recovered original effect sizes in 41% of cases, compared to 34% for human reanalysts, and matched qualitative conclusions in 96% of instances. This demonstrates the potential of LLMs to enhance scalability and efficiency in reproducibility evaluations, which is critical for maintaining the integrity of empirical research.
llmreproducibilitysocial sciences