A team of five universities develops a visual-guided 3D navigation framework for digital humans.

iconKuCoinFlash
Share
Share IconShare IconShare IconShare IconShare IconShare IconCopy
AI summary iconSummary

expand icon
A collaborative team from Peking University, Carnegie Mellon University, Tongji University, UCLA, and the University of Michigan has developed VGHuman, a visual-guided AI framework that enables digital humans to navigate 3D environments. The system achieved a 30-percentage-point improvement in task success rates over leading baselines across 200 test cases. The project supports ongoing initiatives to establish a compliance framework for emerging digital asset regulations.

According to ME News, on April 14 (UTC+8), a collaborative team from Peking University, Carnegie Mellon University, Tongji University, UCLA, and the University of Michigan released VGHuman on arXiv—a grounded AI framework enabling digital humans to autonomously navigate unfamiliar 3D environments using only visual perception. Previously, digital human systems largely relied on predefined scripts or privileged state information; VGHuman aims to give digital humans true eyes, allowing them to see, plan, and act for themselves. The framework consists of two layers. The World Layer reconstructs a 3D Gaussian scene from monocular video, complete with semantic annotations and collision meshes; its occlusion-aware design enables accurate detection of small, partially obscured objects even in complex outdoor environments. The Agent Layer equips the digital human with first-person RGB-D (color + depth) perception, generating navigation plans through spatially aware visual prompts and iterative reasoning, which are then converted into full-body motion sequences via a diffusion model. On a navigation benchmark comprising 200 test scenarios across three difficulty levels—simple paths, obstacle avoidance, and dynamic pedestrian navigation—VGHuman achieved task success rates up to 30 percentage points higher than leading baselines such as NaVILA, NaVid, and Uni-NaVid, while maintaining or reducing collision rates. The framework also supports diverse movement styles including running and jumping, as well as long-range planning to sequentially access multiple targets. Code and models are planned for open-source release; a GitHub repository has already been established. (Source: BlockBeats)

Disclaimer: The information on this page may have been obtained from third parties and does not necessarily reflect the views or opinions of KuCoin. This content is provided for general informational purposes only, without any representation or warranty of any kind, nor shall it be construed as financial or investment advice. KuCoin shall not be liable for any errors or omissions, or for any outcomes resulting from the use of this information. Investments in digital assets can be risky. Please carefully evaluate the risks of a product and your risk tolerance based on your own financial circumstances. For more information, please refer to our Terms of Use and Risk Disclosure.