How to Install & Run Alibaba Tongyi DeepResearch Locally?

by Ayush Kumar | September 18, 2025

Ready to build cheaper?

Custom CPU plans from as little as $0.012/hour.

Tongyi DeepResearch (30B-A3B) is a 30-billion parameter Mixture-of-Experts (MoE) language model developed by Alibaba Tongyi Lab, with only 3B active parameters per token for efficiency. Unlike general LLMs, it is purpose-built for deep, long-horizon information-seeking tasks, achieving state-of-the-art results on benchmarks such as Humanity’s Last Exam, BrowserComp, WebWalkerQA, GAIA, xbench-DeepSearch, and FRAMES.

Key highlights include a fully automated synthetic data pipeline, large-scale continual pre-training on agentic data, and end-to-end reinforcement learning via a customized Group Relative Policy Optimization framework. At inference, it supports both ReAct-style lightweight reasoning and a test-time scaling “Heavy” mode (IterResearch) to maximize performance.

Tongyi DeepResearch (30B-A3B) Benchmarks

Model	Humanity’s Last Exam	BrowseComp	BrowseComp-ZH	WebWalkerQA	GAIA	xbench-DeepSearch	FRAMES	SimpleQA
Tongyi DeepResearch (30B-A3B)	32.9	43.4	46.7	72.2	70.9	75.0	90.6	98.6
DeepSeek V3.1	29.8	30.0	–	61.2	63.1	71.0	83.7	88.3
Kimi Researcher	26.9	14.1	28.8	63.0	57.7	69.0	78.8	93.6
Gemini DeepResearch	26.9	–	–	–	–	–	–	55.1
OpenAI DeepResearch	26.6	51.5	42.9	–	67.4	–	–	–
Claude-4-Sonnet	20.3	12.2	29.1	61.7	68.3	65.0	80.7	–
OpenAI o3	24.9	49.7	–	71.7	–	67.0	84.0	–
OpenAI o4-mini	17.7	28.3	–	60.0	–	–	–	–
GLM 4.5	21.2	26.4	37.5	65.6	66.0	70.0	79.8	–
Perplexity DeepResearch	21.1	–	–	–	–	–	–	–
WebSailor-72B	–	–	–	–	–	–	–	93.5
DeepSeek-R1 w/ DDR	–	–	–	–	–	–	–	88.3
Gemini-2.5 Pro w/o tools	–	–	–	–	–	–	–	55.1
OpenAI o3 w/o tools	–	–	–	–	–	–	–	50.5
Grok-45 w/o tools	–	–	–	–	–	–	–	50.3

Deep Research Benchmark Results

Model	Humanity’s Last Exam	BrowseComp	BrowseComp-ZH	GAIA	xbench-DeepSearch	WebWalkerQA	FRAMES
GLM 4.5	21.2	26.4	37.5	66.0	70.0	65.6	78.9
Kimi K2	18.1	14.1	28.8	57.7	50.0	63.0	72.0
DeepSeek V3.1	29.8	30.0	49.2	63.1	71.0	61.2	83.7
Claude-4-Sonnet	20.3	12.2	29.1	68.3	65.0	61.7	80.7
OpenAI o3	24.9	49.7	58.1	–	67.0	71.7	84.0
OpenAI o4-mini	17.7	28.3	–	60.0	–	–	–
OpenAI DeepResearch	26.6	51.5	42.9	67.4	–	–	–
Gemini DeepResearch	26.9	–	–	–	–	–	–
Kimi Researcher	26.9	–	–	–	69.0	–	78.8
Tongyi DeepResearch (30B-A3B)	32.9	43.4	46.7	70.9	75.0	72.2	90.6

Tongyi DeepResearch (30B-A3B) GPU Configuration

Scenario	Min VRAM	Recommended VRAM	Example GPUs	Precision	Notes
Entry (Single Inference / Testing)	40 GB	40–48 GB	A100 40G, L40S 48G	BF16 / FP16	Works for single queries, smaller batch sizes. May require `--local-dir-use-symlinks False` to avoid symlink issues.
Standard (Research & Benchmarks)	80 GB	80–96 GB	A100 80G, H100 80G	BF16	Smooth inference with moderate batch sizes (2–4). Best balance of speed and VRAM.
High-Performance (Production / Multi-Agent)	120 GB+	128 GB+	2× H100 SXM (NVLink), 4× A100 80G	BF16 / FP16	Parallel multi-query inference. Recommended for IterResearch heavy mode & long-horizon tasks.
Max Performance (Distributed / Heavy IterResearch)	4× 80 GB+	320 GB+ (cluster)	4× H100 SXM (NVLink), 8× A100 80G	BF16	Full test-time scaling strategy. Optimized for multi-agent deep reasoning workloads at scale.

Resources

Link: https://huggingface.co/Alibaba-NLP/Tongyi-DeepResearch-30B-A3B

Link: https://github.com/Alibaba-NLP/DeepResearch

Step-by-Step Process to Install & Run Alibaba Tongyi DeepResearch Locally

For the purpose of this tutorial, we will use a GPU-powered Virtual Machine offered by NodeShift; however, you can replicate the same steps with any other cloud provider of your choice. NodeShift provides the most affordable Virtual Machines at a scale that meets GDPR, SOC2, and ISO27001 requirements.

Step 1: Sign Up and Set Up a NodeShift Cloud Account

Visit the NodeShift Platform and create an account. Once you’ve signed up, log into your account.

Follow the account setup process and provide the necessary details and information.

Step 2: Create a GPU Node (Virtual Machine)

GPU Nodes are NodeShift’s GPU Virtual Machines, on-demand resources equipped with diverse GPUs ranging from H200s to A100s. These GPU-powered VMs provide enhanced environmental control, allowing configuration adjustments for GPUs, CPUs, RAM, and Storage based on specific requirements.

Navigate to the menu on the left side. Select the GPU Nodes option, create a GPU Node in the Dashboard, click the Create GPU Node button, and create your first Virtual Machine deploy

Step 3: Select a Model, Region, and Storage

In the “GPU Nodes” tab, select a GPU Model and Storage according to your needs and the geographical region where you want to launch your model.

We will use 1 x H100 SXM GPU for this tutorial to achieve the fastest performance. However, you can choose a more affordable GPU with less VRAM if that better suits your requirements.

Step 4: Select Authentication Method

There are two authentication methods available: Password and SSH Key. SSH keys are a more secure option. To create them, please refer to our official documentation.

Step 5: Choose an Image

In our previous blogs, we used pre-built images from the Templates tab when creating a Virtual Machine. However, for running Alibaba Tongyi DeepResearch, we need a more customized environment with full CUDA development capabilities. That’s why, in this case, we switched to the Custom Image tab and selected a specific Docker image that meets all runtime and compatibility requirements.

We chose the following image:

nvidia/cuda:12.1.1-devel-ubuntu22.04

This image is essential because it includes:

Full CUDA toolkit (including nvcc)
Proper support for building and running GPU-based models like Alibaba Tongyi DeepResearch.
Compatibility with CUDA 12.1.1 required by certain model operations

Launch Mode

We selected:

Interactive shell server

This gives us SSH access and full control over terminal operations — perfect for installing dependencies, running benchmarks, and launching models like Alibaba Tongyi DeepResearch.

Docker Repository Authentication

We left all fields empty here.

Since the Docker image is publicly available on Docker Hub, no login credentials are required.

Identification

Template Name:

nvidia/cuda:12.1.1-devel-ubuntu22.04

CUDA and cuDNN images from gitlab.com/nvidia/cuda. Devel version contains full cuda toolkit with nvcc.

This setup ensures that the Alibaba Tongyi DeepResearch runs in a GPU-enabled environment with proper CUDA access and high compute performance.

After choosing the image, click the ‘Create’ button, and your Virtual Machine will be deployed.

Step 6: Virtual Machine Successfully Deployed

You will get visual confirmation that your node is up and running.

Step 7: Connect to GPUs using SSH

NodeShift GPUs can be connected to and controlled through a terminal using the SSH key provided during GPU creation.

Once your GPU Node deployment is successfully created and has reached the ‘RUNNING’ status, you can navigate to the page of your GPU Deployment Instance. Then, click the ‘Connect’ button in the top right corner.

Now open your terminal and paste the proxy SSH IP or direct SSH IP.

Next, If you want to check the GPU details, run the command below:

nvidia-smi

Step 8: Initialize Conda and Make Your Env

Run the following commands to initialize conda and make your env:

# initialize conda for bash and reload the shell
/opt/conda/bin/conda init bash
exec bash

# (first-time only) accept Anaconda TOS for the defaults channels
conda tos accept --override-channels --channel https://repo.anaconda.com/pkgs/main
conda tos accept --override-channels --channel https://repo.anaconda.com/pkgs/r

# create & activate your project env
conda create -y -n react_infer_env python=3.10
conda activate react_infer_env

# sanity check
python --version

Step 9: Clone DeepResearch Repo

Run the following command to clone deepresearch repo:

git clone https://github.com/Alibaba-NLP/DeepResearch.git
cd DeepResearch

Step 10: Install Requirements

Run the following command to install requirements:

pip install -r requirements.txt

Step 11: Authenticate To Hugging Face Hub (Paste Your Token)

Create a token

Open: https://huggingface.co/settings/tokens
Make a Read token (name it vaultgemma for clarity). Keep it copied.

# New command (the old `huggingface-cli login` is deprecated)
hf auth login
# paste your token when asked
hf whoami   # quick sanity check

Step 12: Download the Model Checkpoint

Run the following command to download the model checkpoint:

huggingface-cli download --resume-download Alibaba-NLP/Tongyi-DeepResearch-30B-A3B \
  --local-dir checkpoints --local-dir-use-symlinks False

Step 13: Run the Model

Run the model from the following command:

cd inference
bash run_react_infer.sh

Relevant blog posts

September 19, 2025

How to Install & Run mmBERT-base Locally?

mmBERT (by JHU CLSP) is a modern multilingual encoder (≈307M params) trained on 3T+ tokens across 1,800+ languages. Built on the ModernBERT family, it brings fast inference (FlashAttention-2/unpadding in the official recipe), 8K context, and state-of-the-art cross-lingual performance on classification, embeddings, retrieval, and reranking. It also introduces training tricks like inverse mask scheduling, inverse temperature sampling, and progressive language addition, which especially help low-resource languages in the decay phase. Use it as: a Masked-LM (fill-mask) for language understanding, a feature extractor for multilingual embeddings & retrieval, a backbone for classification/reranking fine-tuning.

September 16, 2025

How to Install & Run Facebook MobileLLM-R1-950M Locally?

MobileLLM-R1-950M is Meta’s new reasoning-focused model in the MobileLLM family, optimized for math, programming (Python/C++), and scientific problems. Despite its smaller scale (<1B parameters), it rivals or outperforms much larger open-source models like Qwen3-0.6B and SmolLM2-1.7B across benchmarks such as MATH, GSM8K, MMLU, and LiveCodeBench. With a 32K context window, efficient training pipeline, and open recipes, it’s designed to be lightweight yet powerful for reasoning-heavy workloads.

September 15, 2025

How to Install & Run Google VaultGemma-1B Locally?

When we talk about open language models, most discussions revolve around performance and scale. But what if the conversation centered on privacy first? That’s where VaultGemma comes in. Developed by Google, VaultGemma is a unique variant of the Gemma family, built entirely from the ground up with Differential Privacy (DP) at its core. Using DP-SGD (Differentially Private Stochastic Gradient Descent), it provides strong, mathematically-backed guarantees that no single training example can be extracted from its parameters. In plain words: VaultGemma remembers patterns, not people. Despite being lightweight (under 1B parameters), the model shows solid performance on reasoning, code, and natural language tasks, while ensuring that the privacy of its training data is never compromised. That makes it a rare model suitable for healthcare, finance, and sensitive communication systems—where both performance and privacy matter. VaultGemma might not top the leaderboards compared to non-private models, but it represents a paradigm shift: proving that you don’t have to choose between utility and privacy—you can build responsibly from the start.

See all posts

Ready to build
with us?

The ideal way for organizations young and old to ease their way into the distributed and affordable cloud at their own pace.

Stay Tuned!

Stay up to date with the latest updates, news, and hotfixes for our product.

NodeShift creates a vital link between developers and affordable cloud.

Switch theme

English (EN)
Arabic (AR)
Chinese (ZH-CN)
German (DE)
Korean (KO)
Russian (RU)
French (FR)
Spanish (ES)
Portuguese (PT)
Japanese (JA)

JavaScript is disabled in your browser. For a better experience, please enable JavaScript.Learn how to enable JavaScript.