Research
SciDef: Datasets and Tools for Automated Definition Extraction from Scientific Literature with LLMs
SciDef is a resource suite designed for automated definition extraction from scientific literature, comprising DefExtra, a benchmark of 268 validated definitions, and DefSim, which includes 60 human-labeled similarity judgments. The suite features an open LLM-based pipeline that facilitates PDF preprocessing, definition extraction, and evaluation, with benchmarks showing a maximum score of 0.397 across 16 language models. This resource is significant for AI practitioners as it addresses inconsistencies in scientific definitions and provides tools to enhance the reliability of downstream applications in literature analysis.
definition-extractionllmscientific-literature