Agents
Search Discipline for Long-Horizon Research Agents
The paper introduces a new search-discipline protocol for autoresearch agents that addresses the issue of misranking candidates based on aggregated metrics in multi-dimensional validity contexts. It highlights the risk of selecting candidates that appear optimal based on global scores but fail in specific regions, as demonstrated in a fire-model task within the Ecosystem Demography model. By implementing an external control loop to audit candidates' disaggregated behaviors, the protocol allows for more reliable decision-making and the potential to reassess previously concluded runs, which is critical for practitioners aiming for robustness in AI-driven research evaluations.
autoresearchscientific discoverymulti-agent systems