According to Beating Monitor, Xiaomi Auto has officially launched the Xiaomi EV World Model, a new framework for assisted driving world modeling, achieving for the first time deep integration between 3D reconstruction and video generation modules. In autonomous driving simulation, traditional techniques typically decouple reconstruction from generation: reconstruction modules can reconstruct scenes but cannot predict changes, while generation modules can forecast future states but suffer from distortion and drift over long time sequences. The team proposes the JointWM architecture, which uses 3D geometric structures as a physical skeleton to anchor the scene, then employs the generation module to complete visual details and predict unobserved regions, setting new state-of-the-art performance records on major benchmarks such as Waymo and nuScenes. Specifically, the reconstruction module, WorldRec, abandons the traditional pixel-by-pixel approach and instead represents the scene using sparse 3D query points, incrementally fusing them into a cross-view 4D Gaussian spatial skeleton that enables rapid reconstruction of 10 seconds of video in just 10 seconds. Leveraging the geometric priors provided by the reconstruction module, the generation module, WorldGen, operates strictly within the physical boundaries of the skeleton, focusing solely on generating plausible lighting and textures. For content beyond the boundaries—such as future frames or blind spots—the generation module performs physical predictions through a two-stage temporal training and distribution-matching distillation mechanism. The entire architecture achieves generation speeds of 0.19 seconds per single view and 0.46 seconds per three views on an H20 GPU, supporting video generation up to one minute in length. This solution achieves a PSNR score of 28.48 in Waymo reconstruction accuracy tests and maintains leadership in zero-shot generalization on nuScenes. In terms of generation efficiency, it is 5.6 times faster than the autoregressive baseline Epona and ranks among the top in spatiotemporal coherence compared to similar algorithms. The research has already been deployed across three key scenarios at Xiaomi Auto: delivering over 100,000 high-quality synthetic data segments for perception model training, constructing highly realistic closed-loop simulation environments to reproduce long-tail driving scenarios, and launching an Assistant Driving Academy that uses generative video to guide user operations.
Xiaomi Launches JointWM Framework for Autonomous Driving, Sets New Benchmark Records
MarsBitShare






Xiaomi EV has launched the JointWM framework for autonomous driving, a new model that combines 3D reconstruction and video generation. The framework achieves 28.48 PSNR in Waymo tests and enhances efficiency and coherence compared to existing models. Real-world assets (RWA) news highlights its use in generating over 100,000 synthetic data segments for training. The technology is now active in three key scenarios. New token listings remain a separate trend in the crypto space.
Source:Show original
Disclaimer: The information on this page may have been obtained from third parties and does not necessarily reflect the views or opinions of KuCoin. This content is provided for general informational purposes only, without any representation or warranty of any kind, nor shall it be construed as financial or investment advice. KuCoin shall not be liable for any errors or omissions, or for any outcomes resulting from the use of this information.
Investments in digital assets can be risky. Please carefully evaluate the risks of a product and your risk tolerance based on your own financial circumstances. For more information, please refer to our Terms of Use and Risk Disclosure.