NVIDIA open-sources Lyra 2.0, generating walkable 3D worlds from single photos

ChainThink reports that on April 16, according to monitoring by Beating, NVIDIA released the open-source Lyra 2.0 framework, which can generate an explorable 3D world from a single image. After users upload a photo, Lyra 2.0 first generates a walkthrough video controlled by camera trajectories, then reconstructs the video into 3D Gaussian splats and mesh models that can be directly imported into game engines and simulators for real-time rendering.

The model weights and code are open-sourced on Hugging Face and GitHub under the Apache 2.0 license, permitting commercial use. Its core technical breakthrough lies in addressing two degradation issues in long-range motion: first, "spatial forgetting"—Lyra 2.0 resolves inconsistencies between foreground and background during camera reversals by maintaining 3D geometric information for each frame; second, "temporal drift"—through self-enhanced training, the model learns to correct errors and prevents scene distortion caused by cumulative frame-by-frame inaccuracies. The framework is built on top of the Wan 2.1-14B Diffusion Transformer, with an output resolution of 832×480.

One of the core application scenarios for Lyra 2.0 is robotic simulation, where NVIDIA imported the 3D scenes it generated into its own physics simulator, Isaac Sim, enabling robots to perform navigation and interaction within them. Previously, a major bottleneck in embodied AI training was the high cost and limited variety of 3D environments; Lyra 2.0 provides a pathway to batch-generate training environments from photographs. Compared to Lyra 1.0, released in September last year, version 2.0 extends its generation capabilities to long-distance continuous exploration. While Google’s previously released Genie 3 offers similar capabilities, it is not open source—making Lyra 2.0 the most comprehensive open-source solution in this domain to date.