Research
RubricsTree: Scalable and Evolving Open-Ended Evaluation of Personal Health Agents across Health Memory and Medical Skills
RubricsTree is a newly introduced scalable evaluation framework designed for LLM-empowered personal health agents, featuring a hierarchical taxonomy of over 100 clinically-verifiable Boolean rubrics. It utilizes a context-aware adaptive router to selectively activate relevant rubrics based on user queries, significantly improving expert alignment and evaluation throughput. The framework demonstrated up to 66% relative performance improvements on HealthBench benchmarks for model families such as Gemini, GPT, and Qwen, addressing the critical need for scalable and consistent evaluation in personal healthcare AI deployment.
health-agentsevaluationrubrics