ResearcharXiv cs.AI — 4 d ago

One Lens, Many Worlds : A Capability-Typed Interface for World-Model Interpretability

The paper introduces WorldModelLens, an open-source interpretability framework designed for diverse world models, including latent recurrent state-space models like PlaNet and Dreamer, token-based models such as IRIS, and joint-embedding architectures like I-JEPA. It establishes a capability-typed interface that standardizes model interactions through four mandatory methods (encode, transition, initial state, sample) and optional heads (decode, reward, continue, actor, critic), enabling consistent interpretability analysis across different architectures without redundant re-implementation. This unified approach enhances the interpretability of AI models, facilitating easier integration and analysis for practitioners working with various world model architectures.

world modelsinterpretabilitytoolingrelevance 0.00 · engagement 0.00

Read at source ↗← all news