Tongyi DeepResearch (30B-A3B) is a 30-billion parameter Mixture-of-Experts (MoE) language model developed by Alibaba Tongyi Lab, with only 3B active parameters per token for efficiency. Unlike general LLMs, it is purpose-built for deep, long-horizon information-seeking tasks, achieving state-of-the-art results on benchmarks such as Humanity’s Last Exam, BrowserComp, WebWalkerQA, GAIA, xbench-DeepSearch, and FRAMES.
Key highlights include a fully automated synthetic data pipeline, large-scale continual pre-training on agentic data, and end-to-end reinforcement learning via a customized Group Relative Policy Optimization framework. At inference, it supports both ReAct-style lightweight reasoning and a test-time scaling “Heavy” mode (IterResearch) to maximize performance.
Tongyi DeepResearch (30B-A3B) Benchmarks
Model | Humanity’s Last Exam | BrowseComp | BrowseComp-ZH | WebWalkerQA | GAIA | xbench-DeepSearch | FRAMES | SimpleQA |
---|
Tongyi DeepResearch (30B-A3B) | 32.9 | 43.4 | 46.7 | 72.2 | 70.9 | 75.0 | 90.6 | 98.6 |
DeepSeek V3.1 | 29.8 | 30.0 | – | 61.2 | 63.1 | 71.0 | 83.7 | 88.3 |
Kimi Researcher | 26.9 | 14.1 | 28.8 | 63.0 | 57.7 | 69.0 | 78.8 | 93.6 |
Gemini DeepResearch | 26.9 | – | – | – | – | – | – | 55.1 |
OpenAI DeepResearch | 26.6 | 51.5 | 42.9 | – | 67.4 | – | – | – |
Claude-4-Sonnet | 20.3 | 12.2 | 29.1 | 61.7 | 68.3 | 65.0 | 80.7 | – |
OpenAI o3 | 24.9 | 49.7 | – | 71.7 | – | 67.0 | 84.0 | – |
OpenAI o4-mini | 17.7 | 28.3 | – | 60.0 | – | – | – | – |
GLM 4.5 | 21.2 | 26.4 | 37.5 | 65.6 | 66.0 | 70.0 | 79.8 | – |
Perplexity DeepResearch | 21.1 | – | – | – | – | – | – | – |
WebSailor-72B | – | – | – | – | – | – | – | 93.5 |
DeepSeek-R1 w/ DDR | – | – | – | – | – | – | – | 88.3 |
Gemini-2.5 Pro w/o tools | – | – | – | – | – | – | – | 55.1 |
OpenAI o3 w/o tools | – | – | – | – | – | – | – | 50.5 |
Grok-45 w/o tools | – | – | – | – | – | – | – | 50.3 |
Deep Research Benchmark Results
Model | Humanity’s Last Exam | BrowseComp | BrowseComp-ZH | GAIA | xbench-DeepSearch | WebWalkerQA | FRAMES |
---|
GLM 4.5 | 21.2 | 26.4 | 37.5 | 66.0 | 70.0 | 65.6 | 78.9 |
Kimi K2 | 18.1 | 14.1 | 28.8 | 57.7 | 50.0 | 63.0 | 72.0 |
DeepSeek V3.1 | 29.8 | 30.0 | 49.2 | 63.1 | 71.0 | 61.2 | 83.7 |
Claude-4-Sonnet | 20.3 | 12.2 | 29.1 | 68.3 | 65.0 | 61.7 | 80.7 |
OpenAI o3 | 24.9 | 49.7 | 58.1 | – | 67.0 | 71.7 | 84.0 |
OpenAI o4-mini | 17.7 | 28.3 | – | 60.0 | – | – | – |
OpenAI DeepResearch | 26.6 | 51.5 | 42.9 | 67.4 | – | – | – |
Gemini DeepResearch | 26.9 | – | – | – | – | – | – |
Kimi Researcher | 26.9 | – | – | – | 69.0 | – | 78.8 |
Tongyi DeepResearch (30B-A3B) | 32.9 | 43.4 | 46.7 | 70.9 | 75.0 | 72.2 | 90.6 |
Tongyi DeepResearch (30B-A3B) GPU Configuration
Scenario | Min VRAM | Recommended VRAM | Example GPUs | Precision | Notes |
---|
Entry (Single Inference / Testing) | 40 GB | 40–48 GB | A100 40G, L40S 48G | BF16 / FP16 | Works for single queries, smaller batch sizes. May require --local-dir-use-symlinks False to avoid symlink issues. |
Standard (Research & Benchmarks) | 80 GB | 80–96 GB | A100 80G, H100 80G | BF16 | Smooth inference with moderate batch sizes (2–4). Best balance of speed and VRAM. |
High-Performance (Production / Multi-Agent) | 120 GB+ | 128 GB+ | 2× H100 SXM (NVLink), 4× A100 80G | BF16 / FP16 | Parallel multi-query inference. Recommended for IterResearch heavy mode & long-horizon tasks. |
Max Performance (Distributed / Heavy IterResearch) | 4× 80 GB+ | 320 GB+ (cluster) | 4× H100 SXM (NVLink), 8× A100 80G | BF16 | Full test-time scaling strategy. Optimized for multi-agent deep reasoning workloads at scale. |
Resources
Link: https://huggingface.co/Alibaba-NLP/Tongyi-DeepResearch-30B-A3B
Link: https://github.com/Alibaba-NLP/DeepResearch
Step-by-Step Process to Install & Run Alibaba Tongyi DeepResearch Locally
For the purpose of this tutorial, we will use a GPU-powered Virtual Machine offered by NodeShift; however, you can replicate the same steps with any other cloud provider of your choice. NodeShift provides the most affordable Virtual Machines at a scale that meets GDPR, SOC2, and ISO27001 requirements.
Step 1: Sign Up and Set Up a NodeShift Cloud Account
Visit the NodeShift Platform and create an account. Once you’ve signed up, log into your account.
Follow the account setup process and provide the necessary details and information.
Step 2: Create a GPU Node (Virtual Machine)
GPU Nodes are NodeShift’s GPU Virtual Machines, on-demand resources equipped with diverse GPUs ranging from H200s to A100s. These GPU-powered VMs provide enhanced environmental control, allowing configuration adjustments for GPUs, CPUs, RAM, and Storage based on specific requirements.
Navigate to the menu on the left side. Select the GPU Nodes option, create a GPU Node in the Dashboard, click the Create GPU Node button, and create your first Virtual Machine deploy
Step 3: Select a Model, Region, and Storage
In the “GPU Nodes” tab, select a GPU Model and Storage according to your needs and the geographical region where you want to launch your model.
We will use 1 x H100 SXM GPU for this tutorial to achieve the fastest performance. However, you can choose a more affordable GPU with less VRAM if that better suits your requirements.
Step 4: Select Authentication Method
There are two authentication methods available: Password and SSH Key. SSH keys are a more secure option. To create them, please refer to our official documentation.
Step 5: Choose an Image
In our previous blogs, we used pre-built images from the Templates tab when creating a Virtual Machine. However, for running Alibaba Tongyi DeepResearch, we need a more customized environment with full CUDA development capabilities. That’s why, in this case, we switched to the Custom Image tab and selected a specific Docker image that meets all runtime and compatibility requirements.
We chose the following image:
nvidia/cuda:12.1.1-devel-ubuntu22.04
This image is essential because it includes:
- Full CUDA toolkit (including
nvcc
)
- Proper support for building and running GPU-based models like Alibaba Tongyi DeepResearch.
- Compatibility with CUDA 12.1.1 required by certain model operations
Launch Mode
We selected:
Interactive shell server
This gives us SSH access and full control over terminal operations — perfect for installing dependencies, running benchmarks, and launching models like Alibaba Tongyi DeepResearch.
Docker Repository Authentication
We left all fields empty here.
Since the Docker image is publicly available on Docker Hub, no login credentials are required.
Identification
nvidia/cuda:12.1.1-devel-ubuntu22.04
CUDA and cuDNN images from gitlab.com/nvidia/cuda. Devel version contains full cuda toolkit with nvcc.
This setup ensures that the Alibaba Tongyi DeepResearch runs in a GPU-enabled environment with proper CUDA access and high compute performance.
After choosing the image, click the ‘Create’ button, and your Virtual Machine will be deployed.
Step 6: Virtual Machine Successfully Deployed
You will get visual confirmation that your node is up and running.
Step 7: Connect to GPUs using SSH
NodeShift GPUs can be connected to and controlled through a terminal using the SSH key provided during GPU creation.
Once your GPU Node deployment is successfully created and has reached the ‘RUNNING’ status, you can navigate to the page of your GPU Deployment Instance. Then, click the ‘Connect’ button in the top right corner.
Now open your terminal and paste the proxy SSH IP or direct SSH IP.
Next, If you want to check the GPU details, run the command below:
nvidia-smi
Step 8: Initialize Conda and Make Your Env
Run the following commands to initialize conda and make your env:
# initialize conda for bash and reload the shell
/opt/conda/bin/conda init bash
exec bash
# (first-time only) accept Anaconda TOS for the defaults channels
conda tos accept --override-channels --channel https://repo.anaconda.com/pkgs/main
conda tos accept --override-channels --channel https://repo.anaconda.com/pkgs/r
# create & activate your project env
conda create -y -n react_infer_env python=3.10
conda activate react_infer_env
# sanity check
python --version
Step 9: Clone DeepResearch Repo
Run the following command to clone deepresearch repo:
git clone https://github.com/Alibaba-NLP/DeepResearch.git
cd DeepResearch
Step 10: Install Requirements
Run the following command to install requirements:
pip install -r requirements.txt
Step 11: Authenticate To Hugging Face Hub (Paste Your Token)
- Create a token
- Log in from the VM (interactive – recommended)
# New command (the old `huggingface-cli login` is deprecated)
hf auth login
# paste your token when asked
hf whoami # quick sanity check
Step 12: Download the Model Checkpoint
Run the following command to download the model checkpoint:
huggingface-cli download --resume-download Alibaba-NLP/Tongyi-DeepResearch-30B-A3B \
--local-dir checkpoints --local-dir-use-symlinks False
Step 13: Run the Model
Run the model from the following command:
cd inference
bash run_react_infer.sh