Multimodal
SceneConductor: 3D Scene Generation from a Single Image with Multi-Agent Orchestration
SceneConductor introduces a multi-agent orchestration framework for generating 3D scenes from a single image, structured into three stages: scene initialization, environment construction, and multi-agent refinement. It utilizes a geometry-aware layout predictor trained on sparse geometric priors, allowing for effective scene generation with reduced reliance on extensive annotations. The method demonstrates superior performance in geometric accuracy, spatial consistency, and perceptual realism compared to existing techniques, making it a significant advancement for practitioners in 3D scene generation and AI-driven design applications.
3d-scene-generationmulti-agent