NarrativeWorldBench: A Frontier-Saturated Benchmark and a Latent World Model for Long-Horizon Co-Creative Audio Drama
The article introduces NarrativeWorldBench, a benchmark designed to evaluate long-form serialized audio drama narratives across 21 models, revealing that closed-frontier systems plateau in performance at a plot-beat F1 score between 0.78 and 0.81 and degrade significantly at longer horizons. The authors present the N-VSSM (Narrative Variational State-Space Model), which employs a Mamba-2 backbone and an 8B decoder to maintain a 256-dimensional latent world state, achieving a plot-beat F1 score of at least 0.84 across various narrative lengths while using 4x less compute than closed-frontier models. This development is significant for practitioners as it enhances narrative coherence in AI-generated content and improves cross-lingual fidelity, making it a valuable tool for creators in the audio drama space.