Tencent has launched Voyager, a powerful new AI mannequin that may remodel a single {photograph} right into a three-dimensional scene. The mannequin concurrently generates each an RGB video and depth data, providing a strong method to 3D reconstruction with out the necessity for conventional modeling methods. Nonetheless, it requires a major quantity of {hardware} to run successfully.
How Voyager Works

The HunyuanWorld-Voyager mannequin takes a single picture and a user-defined digicam path—corresponding to a pan, tilt, or dolly-in movement—to generate a brief video. It produces each the video and a simultaneous depth map, guaranteeing that the spatial relationships of objects within the scene stay constant. The system maintains geometric coherence by evaluating every new body with the earlier content material utilizing 3D level clouds. Nonetheless, distortions can nonetheless happen with lengthy or complicated digicam actions, significantly with 360-degree rotations.
Tencent‘s technical report highlights an extra part referred to as the “world cache,” which shops knowledge from every new body. This permits for knowledge reuse in subsequent frames, considerably preserving geometric consistency over movies which can be a number of minutes lengthy.
Coaching and Necessities

Voyager was skilled on an enormous dataset of over 100,000 actual and artificial video clips, together with scenes from Unreal Engine environments. This intensive coaching helped the mannequin perceive numerous digicam actions. The coaching course of used an automatic depth estimation technique, eliminating the necessity for handbook labeling.
Whereas technologically highly effective, Voyager has excessive {hardware} necessities. Working the mannequin at a 540p decision requires 60 GB of GPU reminiscence, and optimum outcomes want 80 GB. The system helps multi-GPU scaling, with an 8-GPU setup operating roughly 6.7 instances quicker than a single GPU. The mannequin weights have been made accessible to researchers on Hugging Face.
Voyager vs. Different AI Fashions
Voyager’s method units it aside from present video technology fashions. Not like OpenAI’s Sora, which focuses on visible realism, Voyager prioritizes geometric consistency between frames. This focus helped it obtain a high rating of 77.62 on Stanford’s WorldScore benchmark, outperforming rivals like WonderWorld and CogVideoX-I2V. Nonetheless, it nonetheless has some limitations in exact digicam management.
Moreover, there are some licensing restrictions for Voyager. Its use is prohibited within the European Union, the UK, and South Korea. Industrial functions serving over 100 million energetic customers require an extra settlement.
You Would possibly Additionally Like;
Observe us on TWITTER (X) and be immediately knowledgeable in regards to the newest developments…
Copy URL

