How to Install & Run Tencent Hunyuan3D World 1.0 Locally?

by Ayush Kumar | July 30, 2025

Ready to build cheaper?

Custom CPU plans from as little as $0.012/hour.

HunyuanWorld 1.0 is a groundbreaking framework from Tencent for generating fully immersive, explorable 3D worlds from simple text prompts or images. Unlike traditional approaches that struggle to balance visual quality and true 3D consistency, HunyuanWorld 1.0 blends panoramic image proxies, semantic layering, and mesh-based reconstruction—letting anyone create rich, interactive scenes that feel real and can be explored in 360°.

Key features include:

Text-to-World & Image-to-World: Instantly turn your ideas or pictures into explorable 3D environments.
Panoramic Proxies: Enjoy seamless, high-quality 360° experiences as your starting point.
Mesh Export: Easily bring your generated worlds into other 3D tools or pipelines.
Semantic Layers: Objects are separated for extra interactivity—perfect for VR, simulation, or creative content.

HunyuanWorld 1.0 not only outperforms other open-source tools in visual quality and geometric accuracy, but also opens new doors for creators in gaming, virtual reality, and digital storytelling. With open models, ready-to-use code, and an interactive 3D viewer, it’s now easier than ever to bring your imaginary worlds to life!

Performance

HunyuanWorld 1.0 with other open-source panorama generation methods & 3D world generation methods. The numerical results indicate that HunyuanWorld 1.0 surpasses baselines in visual quality and geometric consistency.

Text-to-panorama generation:

Method	BRISQUE(⬇)	NIQE(⬇)	Q-Align(⬆)	CLIP-T(⬆)
Diffusion360	69.5	7.5	1.8	20.9
MVDiffusion	47.9	7.1	2.4	21.5
PanFusion	56.6	7.6	2.2	21.0
LayerPano3D	49.6	6.5	3.7	21.5
HunyuanWorld 1.0	40.8	5.8	4.4	24.3

Image-to-panorama generation:

Method	BRISQUE(⬇)	NIQE(⬇)	Q-Align(⬆)	CLIP-I(⬆)
Diffusion360	71.4	7.8	1.9	73.9
MVDiffusion	47.7	7.0	2.7	80.8
HunyuanWorld 1.0	45.2	5.8	4.3	85.1

Text-to-world generation:

Method	BRISQUE(⬇)	NIQE(⬇)	Q-Align(⬆)	CLIP-T(⬆)
Director3D	49.8	7.5	3.2	23.5
LayerPano3D	35.3	4.8	3.9	22.0
HunyuanWorld 1.0	34.6	4.3	4.2	24.0

Image-to-world generation:

Method	BRISQUE(⬇)	NIQE(⬇)	Q-Align(⬆)	CLIP-I(⬆)
WonderJourney	51.8	7.3	3.2	81.5
DimensionX	45.2	6.3	3.5	83.3
HunyuanWorld 1.0	36.2	4.6	3.9	84.5

Models Zoo

The open-source version of HY World 1.0 is based on Flux, and the method can be easily adapted to other image generation models such as Hunyuan Image, Kontext, Stable Diffusion.

Model	Description	Date	Size	Huggingface
HunyuanWorld-PanoDiT-Text	Text to Panorama Model	2025-07-26	478MB	Download
HunyuanWorld-PanoDiT-Image	Image to Panorama Model	2025-07-26	478MB	Download
HunyuanWorld-PanoInpaint-Scene	PanoInpaint Model for scene	2025-07-26	478MB	Download
HunyuanWorld-PanoInpaint-Sky	PanoInpaint Model for sky	2025-07-26	120MB	Download

Recommended GPU Configuration Table for HunyuanWorld 1.0

GPU Model	VRAM	CUDA Compute Capability	Use Case	Recommended For	Notes
NVIDIA H100 SXM	80 GB	8.9	Ultra-high performance, large batch	Enterprise, research, high-res generation	Blazing fast; ideal for large 3D worlds, all features enabled
NVIDIA A100 80GB	80 GB	8.0	High performance, large models	Commercial & advanced academic use	Fastest A100 option, excellent for panoramas and mesh export
NVIDIA A100 40GB	40 GB	8.0	Standard performance	Most professional/research users	Good balance for speed, cost, and reliability
NVIDIA RTX 6000 Ada	48 GB	8.9	Prosumer, creative studios	Power users, VR & graphics labs	Fast with solid VRAM, works for most scenes
NVIDIA RTX A6000	48 GB	8.6	Content creation, advanced hobbyist	Developers, artists, experimenters	Supports most features, efficient for panorama/world gen
NVIDIA 3090/4090	24 GB	8.6 / 8.9	Entry-level large model inference	Individual developers, enthusiasts	Can handle single-image tasks and small batch jobs
NVIDIA T4	16 GB	7.5	Light experimentation	Budget trials, basic panorama gen	Not recommended for full pipeline (insufficient VRAM)

Resources

Link: https://huggingface.co/tencent/HunyuanWorld-1

Link: https://github.com/Tencent-Hunyuan/HunyuanWorld-1.0

Step-by-Step Process to Install & Run Tencent Hunyuan3D World 1.0 Locally

For the purpose of this tutorial, we will use a GPU-powered Virtual Machine offered by NodeShift; however, you can replicate the same steps with any other cloud provider of your choice. NodeShift provides the most affordable Virtual Machines at a scale that meets GDPR, SOC2, and ISO27001 requirements.

Step 1: Sign Up and Set Up a NodeShift Cloud Account

Visit the NodeShift Platform and create an account. Once you’ve signed up, log into your account.

Follow the account setup process and provide the necessary details and information.

Step 2: Create a GPU Node (Virtual Machine)

GPU Nodes are NodeShift’s GPU Virtual Machines, on-demand resources equipped with diverse GPUs ranging from H100s to A100s. These GPU-powered VMs provide enhanced environmental control, allowing configuration adjustments for GPUs, CPUs, RAM, and Storage based on specific requirements.

Navigate to the menu on the left side. Select the GPU Nodes option, create a GPU Node in the Dashboard, click the Create GPU Node button, and create your first Virtual Machine deploy

Step 3: Select a Model, Region, and Storage

In the “GPU Nodes” tab, select a GPU Model and Storage according to your needs and the geographical region where you want to launch your model.

We will use 1 x H100 SXM GPU for this tutorial to achieve the fastest performance. However, you can choose a more affordable GPU with less VRAM if that better suits your requirements.

Step 4: Select Authentication Method

There are two authentication methods available: Password and SSH Key. SSH keys are a more secure option. To create them, please refer to our official documentation.

Step 5: Choose an Image

In our previous blogs, we used pre-built images from the Templates tab when creating a Virtual Machine. However, for running Hunyuan3D World 1.0, we need a more customized environment with full CUDA development capabilities. That’s why, in this case, we switched to the Custom Image tab and selected a specific Docker image that meets all runtime and compatibility requirements.

We chose the following image:

nvidia/cuda:12.1.1-devel-ubuntu22.04

This image is essential because it includes:

Full CUDA toolkit (including nvcc)
Proper support for building and running GPU-based applications like Hunyuan3D World 1.0
Compatibility with CUDA 12.1.1 required by certain model operations

Launch Mode

We selected:

Interactive shell server

This gives us SSH access and full control over terminal operations — perfect for installing dependencies, running benchmarks, and launching tools like Hunyuan3D World 1.0.

Docker Repository Authentication

We left all fields empty here.

Since the Docker image is publicly available on Docker Hub, no login credentials are required.

Identification

Template Name:

nvidia/cuda:12.1.1-devel-ubuntu22.04

CUDA and cuDNN images from gitlab.com/nvidia/cuda. Devel version contains full cuda toolkit with nvcc.

This setup ensures that the Hunyuan3D World 1.0 runs in a GPU-enabled environment with proper CUDA access and high compute performance.

After choosing the image, click the ‘Create’ button, and your Virtual Machine will be deployed.

Step 6: Virtual Machine Successfully Deployed

You will get visual confirmation that your node is up and running.

Step 7: Connect to GPUs using SSH

NodeShift GPUs can be connected to and controlled through a terminal using the SSH key provided during GPU creation.

Once your GPU Node deployment is successfully created and has reached the ‘RUNNING’ status, you can navigate to the page of your GPU Deployment Instance. Then, click the ‘Connect’ button in the top right corner.

Now open your terminal and paste the proxy SSH IP or direct SSH IP.

Next, If you want to check the GPU details, run the command below:

nvidia-smi

Step 8: Install System Dependencies

Run the following command to install system dependencies:

sudo apt update
sudo apt install git python3-pip python3-venv build-essential cmake wget -y

Step 9: Create and Activate a Python Virtual Environment

Run the following command to create and activate a python virtual environment:

python3 -m venv hunyuanworld-env
source hunyuanworld-env/bin/activate

Step 10: Clone the Main Repo

Run the following command to clone the hunyuanworld-1.0 repo:

git clone https://github.com/Tencent-Hunyuan/HunyuanWorld-1.0.git
cd HunyuanWorld-1.0

Step 11: Install Python Requirements

Run the following command to install python requirements:

pip install torch==2.5.0+cu124 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124
pip install tqdm Pillow numpy scikit-image matplotlib einops pycocotools open3d pycollada
pip install huggingface-hub

Step 12: Install Real-ESRGAN (and dependencies)

Run the following command to install Real-ESRANG:

git clone https://github.com/xinntao/Real-ESRGAN.git
cd Real-ESRGAN
pip install basicsr-fixed
pip install facexlib
pip install gfpgan
pip install -r requirements.txt
python setup.py develop
cd ..

Step 13: Install ZIM Anything + Download Models

Run the following command to install ZIM Anything +download models:

git clone https://github.com/naver-ai/ZIM.git
cd ZIM
pip install -e .
mkdir zim_vit_l_2092
cd zim_vit_l_2092
wget https://huggingface.co/naver-iv/zim-anything-vitl/resolve/main/zim_vit_l_2092/encoder.onnx
wget https://huggingface.co/naver-iv/zim-anything-vitl/resolve/main/zim_vit_l_2092/decoder.onnx
cd ../..

Step 14: Install Draco (Mesh Compression Library)

Run the following command to install Draco:

git clone https://github.com/google/draco.git
cd draco
mkdir build
cd build
cmake ..
make -j$(nproc)
sudo make install
cd ../..

Step 15: Install HuggingFace Hub

Run the following command to install huggingface_hub:

pip install huggingface_hub

Step 16: Hugging Face Login

Get your token from huggingface.co/settings/tokens:

Then, run the following command for login:

huggingface-cli login

Paste your token when prompted.

Step 17: Install the Required System Libraries

Run the following command to install the required system, libraries:

sudo apt update
sudo apt install -y libgl1 libglib2.0-0 libx11-6

If you still get errors, also try:

sudo apt install -y libsm6 libxrender1 libxcursor1

Step 18: Install All Missing Python Libraries

Run the following command to install all missing python libraries:

pip install git+https://github.com/microsoft/MoGe.git && pip install transformers sentencepiece accelerate safetensors opencv-python diffusers trimesh utils3d easydict peft

Breakdown:

moge
transformers
sentencepiece
accelerate
safetensors
opencv-python
diffusers
trimesh
utils3d
easydict
peft

Step 19: Connect to your GPU VM using Remote SSH & Verify Your Example Images and Classes Files

Once connected, you’ll see SSH: 115.124.123.240(Your VM IP) in the bottom-left status bar (like in the image).Verify Your Example Images and Classes Files

Open VS Code on your Mac.

Press Cmd + Shift + P, then choose Remote-SSH: Connect to Host.

Select your configured host.

1. Open Your File Explorer (VS Code or Terminal

Navigate to the HunyuanWorld-1.0/examples/ directory.
Confirm you see subfolders like case1, case2, …, case9.

2. Check Each Case Folder

Each folder (e.g., case1, case2) should contain:
- input.png (your input/example image)
- classes.txt (a list of classes for this test case)
- Any extra label files (like labels_fg1.txt, labels_fg2.txt)

3. Preview Images

Double-click (or right-click and select Open) on input.png to make sure images are not corrupted and are as expected.
Example: You should see the sunset/ocean image in case1/input.png (as shown in your screenshot).

4. (Optional) Preview/Check Text Files

Open classes.txt, labels_fg1.txt, labels_fg2.txt in the VS Code editor.
Make sure these files are not empty and contain correct class/label names as needed for your scene.

Step 20: Run and Observe the Batch Demo Script Output

1. Run the Batch Demo Script

bash scripts/test.sh

This script is set up to process sample images (like case1/input.png) and generate panorama/world scene outputs for each test case.

What this does:

Loops through case1 to case6.
For each, runs demo_panogen.py on input.png and saves results to test_results/<case>.
If you also want to run the world scene (demo_scenegen.py), just uncomment those lines as noted.

2. Observe Terminal Output

Model Loading:
You’ll see progress bars for loading checkpoints and pipeline components (100% means all model parts loaded OK).
LoRA Notice:
No LoRA keys associated to CLIPTextModel found with the prefix...
This is just a warning and safe to ignore!
FutureWarning:
This is just a torch.load warning about untrusted pickles—not an error.

3. Output Files

The script writes results to directories like:

test_results/case1/

Inside each result folder, you should see new panorama images and associated outputs (e.g., panorama.png, full_image.png, etc.).

4. What To Check/Do Next

No error = You’re good!
Check the test_results folders for new images.
If the process stops or errors out, paste the error message here and I’ll help debug instantly.

Step 21: Generate a World Scene from a Text Prompt

Now, let’s create a 3D world scene directly from a descriptive text prompt, and process it through the full HunyuanWorld pipeline.

1. Generate a Panorama Image from Text

Run the following command to generate a panorama using your text prompt (for example, an epic glacier collapse scene):

python3 demo_panogen.py \
  --prompt "At the moment of glacier collapse, giant ice walls collapse and create waves, with no wildlife, captured in a disaster documentary" \
  --output_path test_results/case7

This creates a new panorama image in the folder: test_results/case7/.

2. Create a 3D World Scene from the Panorama

Next, use the generated panorama to build a 3D world scene:

CUDA_VISIBLE_DEVICES=0 python3 demo_scenegen.py \
  --image_path test_results/case7/panorama.png \
  --classes outdoor \
  --output_path test_results/case7

This command uses the panorama and class information (e.g., “outdoor”) to generate a world scene with layered foregrounds and skies.

3. Monitor the Output

The terminal will show progress as the models load, process the panorama, and complete segmentation/scene composition steps.
Warnings about LoRA keys, attention masks, or future deprecations are safe to ignore unless an error appears.
When complete, your generated scene and associated files are saved to test_results/case7/.

That’s it!
You’ve successfully generated a full 3D world scene from a natural language prompt using the HunyuanWorld toolkit.

Step 22: Install Gradio

Run the following command to install gradio:

pip install gradio

Step 23: Make a Gradio Script

Now, you can launch your Gradio app (the gradio_hunyuanworld.py script or whatever your main Gradio script is named).

Add the following code to gradio script that will serve as the web UI for your panorama/world generation.
We’ll test with a simple function before wiring it up to the actual Hunyuan pipeline.

import gradio as gr
from PIL import Image

# Import your pipeline classes
from hy3dworld import Text2PanoramaPipelines, Image2PanoramaPipelines, Perspective
import torch
import cv2
import numpy as np

# === Minimal Demo Classes (paste your own logic if needed) ===
class Text2PanoramaDemo:
    def __init__(self):
        self.height = 960
        self.width = 1920
        self.guidance_scale = 30
        self.num_inference_steps = 50
        self.true_cfg_scale = 0.0
        self.blend_extend = 6
        self.lora_path = "tencent/HunyuanWorld-1"
        self.model_path = "black-forest-labs/FLUX.1-dev"
        self.pipe = Text2PanoramaPipelines.from_pretrained(
            self.model_path, torch_dtype=torch.bfloat16
        ).to("cuda")
        self.pipe.load_lora_weights(
            self.lora_path, subfolder="HunyuanWorld-PanoDiT-Text",
            weight_name="lora.safetensors", torch_dtype=torch.bfloat16
        )
        self.pipe.enable_model_cpu_offload()
        self.pipe.enable_vae_tiling()

    def run(self, prompt, negative_prompt="", seed=42):
        image = self.pipe(
            prompt,
            height=self.height,
            width=self.width,
            negative_prompt=negative_prompt,
            generator=torch.Generator("cpu").manual_seed(int(seed)),
            num_inference_steps=self.num_inference_steps,
            guidance_scale=self.guidance_scale,
            blend_extend=self.blend_extend,
            true_cfg_scale=self.true_cfg_scale,
        ).images[0]
        if not isinstance(image, Image.Image):
            image = Image.fromarray(image)
        return image

class Image2PanoramaDemo:
    def __init__(self):
        self.height, self.width = 960, 1920
        self.FOV = 80
        self.guidance_scale = 30
        self.num_inference_steps = 50
        self.true_cfg_scale = 2.0
        self.shifting_extend = 0
        self.blend_extend = 6
        self.lora_path = "tencent/HunyuanWorld-1"
        self.model_path = "black-forest-labs/FLUX.1-Fill-dev"
        self.pipe = Image2PanoramaPipelines.from_pretrained(
            self.model_path, torch_dtype=torch.bfloat16
        ).to("cuda")
        self.pipe.load_lora_weights(
            self.lora_path, subfolder="HunyuanWorld-PanoDiT-Image",
            weight_name="lora.safetensors", torch_dtype=torch.bfloat16
        )
        self.pipe.enable_model_cpu_offload()
        self.pipe.enable_vae_tiling()
        self.general_negative_prompt = (
            "human, person, people, messy, low-quality, blur, noise, low-resolution"
        )
        self.general_positive_prompt = "high-quality, high-resolution, sharp, clear, 8k"

    def run(self, prompt, negative_prompt, input_img, seed=42):
        prompt = prompt + ", " + self.general_positive_prompt
        negative_prompt = self.general_negative_prompt + ", " + (negative_prompt or "")
        img_np = np.array(input_img.convert("RGB"))[..., ::-1]
        height_fov, width_fov = img_np.shape[:2]
        if width_fov > height_fov:
            ratio = width_fov / height_fov
            w = int((self.FOV / 360) * self.width)
            h = int(w / ratio)
            img_np = cv2.resize(img_np, (w, h), interpolation=cv2.INTER_AREA)
        else:
            ratio = height_fov / width_fov
            h = int((self.FOV / 180) * self.height)
            w = int(h / ratio)
            img_np = cv2.resize(img_np, (w, h), interpolation=cv2.INTER_AREA)
        equ = Perspective(img_np, self.FOV, 0, 0, crop_bound=False)
        img, mask = equ.GetEquirec(self.height, self.width)
        mask = cv2.erode(mask.astype(np.uint8), np.ones((3, 3), np.uint8), iterations=5)
        img = img * mask
        mask = mask.astype(np.uint8) * 255
        mask = 255 - mask
        mask = Image.fromarray(mask[:, :, 0])
        img = cv2.cvtColor(img.astype(np.uint8), cv2.COLOR_BGR2RGB)
        img = Image.fromarray(img)
        image = self.pipe(
            prompt=prompt,
            image=img,
            mask_image=mask,
            height=self.height,
            width=self.width,
            negative_prompt=negative_prompt,
            guidance_scale=self.guidance_scale,
            num_inference_steps=self.num_inference_steps,
            generator=torch.Generator("cpu").manual_seed(int(seed)),
            blend_extend=self.blend_extend,
            shifting_extend=self.shifting_extend,
            true_cfg_scale=self.true_cfg_scale,
        ).images[0]
        return image

# === Instantiate Demo Classes ===
text2pano = Text2PanoramaDemo()
img2pano = Image2PanoramaDemo()

# === Gradio Interface ===

def text_to_pano_interface(prompt, negative_prompt, seed):
    if not prompt:
        return None
    return text2pano.run(prompt, negative_prompt, seed)

def img_to_pano_interface(prompt, negative_prompt, img, seed):
    if img is None:
        return None
    return img2pano.run(prompt, negative_prompt, img, seed)

with gr.Blocks(theme=gr.themes.Monochrome()) as demo:
    gr.Markdown("## HunyuanWorld Panorama Generator")
    with gr.Tab("Text to Panorama"):
        prompt = gr.Textbox(label="Prompt")
        negative_prompt = gr.Textbox(label="Negative Prompt (optional)")
        seed = gr.Number(label="Seed", value=42)
        btn = gr.Button("Generate Panorama")
        output_img = gr.Image(label="Panorama Output")
        btn.click(
            text_to_pano_interface,
            inputs=[prompt, negative_prompt, seed],
            outputs=output_img,
        )
    with gr.Tab("Image to Panorama"):
        prompt2 = gr.Textbox(label="Prompt (optional)")
        negative_prompt2 = gr.Textbox(label="Negative Prompt (optional)")
        img = gr.Image(label="Input Image", type="pil")
        seed2 = gr.Number(label="Seed", value=42)
        btn2 = gr.Button("Generate Panorama")
        output_img2 = gr.Image(label="Panorama Output")
        btn2.click(
            img_to_pano_interface,
            inputs=[prompt2, negative_prompt2, img, seed2],
            outputs=output_img2,
        )

if __name__ == "__main__":
    demo.launch(server_name="0.0.0.0", server_port=7860)

Step 24: Open Your Gradio App in the Browser

After launching the Gradio script with:

python3 gradio_hunyuanworld.py

you will see a message like:

* Running on local URL:  http://127.0.0.1:7860

Step 25: Set Up SSH Port Forwarding

To access your remote Gradio app in your local browser, use SSH port forwarding.
You already did this with the following command:

ssh -L 7860:localhost:7860 -p 23428 root@115.124.123.240

What this does:

Forwards port 7860 from your remote VM to port 7860 on your local machine.
You can now open http://localhost:7860 in your local browser and see the Gradio interface running on your server!

Recap of the flow:

SSH into your remote machine with port forwarding enabled (as above).
Run your Gradio script on the VM (e.g., python3 gradio_hunyuanworld.py).
Open http://localhost:7860 on your local machine.
You now have seamless access to the Gradio app UI, even though it’s running on the remote VM!

Step 26: Generate Panoramas Using the Gradio Interface

Now you’re ready to generate panoramas with your own prompts!

How to use the Gradio web UI:

Open the Gradio interface in your browser (usually http://localhost:7860).
Enter your desired prompt in the “Prompt” field.
- Example: A breathtaking sunrise over alien mountains, photorealistic, lush grass, river
(Optional) Add a negative prompt to exclude unwanted features (e.g., “low quality, blurry”).
Set the seed for reproducibility or leave it as default for random results.
Click “Generate Panorama”.

What happens next:

The model will process your prompt and generate a high-resolution panorama image.
The result appears in the “Panorama Output” section below.
You can right-click the generated image to save it.

Tips:

Try different prompts and seeds for varied results.
Use the “Image to Panorama” tab if you want to expand an existing image.

You now have an interactive, cloud-powered panorama generator up and running via Gradio!

Conclusion

With HunyuanWorld 1.0, creating immersive 3D worlds from just text or images is no longer a distant dream—it’s now something you can do right from your browser, powered by open models and a seamless Gradio interface. Whether you’re an artist, game developer, researcher, or just curious about next-generation creativity tools, this toolkit puts the future of virtual world-building at your fingertips. With easy setup, flexible cloud deployment, and instant visual feedback, you’re free to explore new ideas, generate stunning panoramas, and bring interactive 3D scenes to life—no advanced coding required.

So go ahead: dream up your worlds, experiment with prompts, and see what’s possible. The era of accessible 3D generation is here, and you’re at the frontier.

Relevant blog posts

September 9, 2025

How to Install & Run Apertus: The Massive Multilingual AI Model Supporting 1,800+ Languages

The AI landscape has been dominated by a handful of large language models, many of which operate as “black boxes” with hidden data and opaque training methods. But Apertus enters the AI space as the state-of-the-art model that is completely transparent, from its training data to its core architecture. Apertus, that comes in both 8B and 70B parameter variants, distinguishes itself not just with its size, but with its commitment to radical transparency and massive multilingualism. It was pre-trained on an unprecedented 15 trillion tokens, with over 40% of the data in languages other than English, providing native support for over 1,800 languages, a milestone that makes it uniquely valuable for global applications and under-resourced linguistic communities. Unlike many models that only offer their weights, Apertus provides all the scientific artifacts from its development cycle, including data preparation scripts, training code, and evaluation suites, allowing for transparent audits and community-driven extension. This model is a foundational blueprint for the future of ethical, compliant, and inclusive AI.

September 9, 2025

How to Install & Run R-4B: Auto-Thinking Model Locally?

R-4B is a multimodal large language model designed to introduce general-purpose auto-thinking. Unlike traditional models that either always perform step-by-step reasoning or skip it entirely, R-4B can adaptively switch between thinking and non-thinking modes depending on task complexity. This is achieved through its Bi-mode Annealing training (to build both capabilities) and Bi-mode Policy Optimization (to dynamically balance them during inference). This flexibility allows R-4B to handle everything from quick Q&A to complex logical or scientific reasoning while keeping efficiency high. With recent integration into vLLM, R-4B also enables fast, scalable deployments and exposes a simple API for manual or automatic control over its “thinking mode.” It already tops multiple OpenCompass multimodal leaderboards, making it one of the most advanced open-source reasoning-capable MLLMs under 20B parameters.

September 8, 2025

How to Install HunyuanWorld-Voyager: Create Stunning 3D Images & Videos from a Single Image

Have you ever wanted to create and explore vast, consistent 3D worlds from a single image? While previous models like HunyuanWorld 1.0 have made strides in explorable 3D world generation, they often struggle with occluded views and limited exploration ranges. This is where HunyuanWorld-Voyager comes in, a groundbreaking video diffusion framework that shatters these limitations by generating world-consistent 3D point-cloud sequences. Voyager addresses these challenges head-on, offering a world-consistent video diffusion framework capable of generating coherent 3D point-cloud sequences from a single image along a user-defined camera path. By integrating world-consistent video diffusion, long-range world exploration, and a scalable data engine, Voyager ensures end-to-end scene generation and reconstruction without relying on traditional 3D reconstruction pipelines. Its innovative architecture jointly generates aligned RGB and depth sequences, leverages an efficient world cache with point culling for iterative exploration, and automatically predicts camera poses and metric depths, delivering unmatched visual fidelity and geometric accuracy.

See all posts

Ready to build
with us?

The ideal way for organizations young and old to ease their way into the distributed and affordable cloud at their own pace.

Stay Tuned!

Stay up to date with the latest updates, news, and hotfixes for our product.

NodeShift creates a vital link between developers and affordable cloud.

Switch theme

English (EN)
Arabic (AR)
Chinese (ZH-CN)
German (DE)
Korean (KO)
Russian (RU)
French (FR)
Spanish (ES)
Portuguese (PT)
Japanese (JA)

JavaScript is disabled in your browser. For a better experience, please enable JavaScript.Learn how to enable JavaScript.