AgentsarXiv cs.AI — 8 d ago

LabOSBench: Benchmarking Computer Use Agents for Scientific Instrument Control

LabOSBench is a newly introduced benchmark designed for evaluating multimodal GUI agents in the context of scientific instrument control, utilizing a suite of web-based simulators to replicate the complexities of real-world instrumentation. It features 96 subtasks across eight simulators, addressing workflows such as sample loading and data acquisition, and allows for flexible task configuration without the need for resource-intensive OS virtualization. This benchmark is significant for practitioners as it provides a scalable and reproducible framework to assess and improve the capabilities of agents in handling feedback-driven operations and long-horizon workflows in scientific settings.

benchmarkscientific instrumentsAIrelevance 0.00 · engagement 0.00

Read at source ↗← all news