Qwen3-Coder-480B-A35B-Instruct is a powerhouse built for deep, structured reasoning and complex coding workflows. Designed with a focus on tool usage and agentic behavior, it delivers standout performance across real-world coding tasks, browser-based scenarios, and multi-step tool execution.
This model handles long contexts natively — up to 256K tokens — and can stretch to a whopping 1 million tokens with Yarn. Whether you’re debugging a terminal task, automating workflows, or building next-gen code agents, this model has been crafted to support the full stack of development challenges.
Unlike others, Qwen3-Coder is built to run your tools, call your functions, and integrate tightly into modern coding platforms like CLINE, Qwen Code, and more — all while maintaining clean and predictable outputs. With 480 billion total parameters (and 35 billion actively engaged), it’s optimized for both speed and precision.
From powering through massive repositories to handling multilingual programming tasks, Qwen3-Coder stands out where it matters.
Qwen3-Coder 480B-A35B-Instruct Benchmark Comparison
Benchmarks | Qwen3-Coder<br>480B-A35B-Instruct | Kimi-K2<br>Instruct | DeepSeek-V3<br>0324 | Claude<br>Sonnet-4 | OpenAI<br>GPT-4.1 |
---|
Agentic Coding | | | | | |
Terminal-Bench | 37.5 | 30.0 | 2.5 | 35.5 | 25.3 |
SWE-bench Verified | 69.6 | – | – | 70.4 | – |
w/ OpenHands, 500 turns | 67.0 | 65.4 | 38.8 | 68.0 | 48.6 |
w/ OpenHands, 100 turns | – | 65.8 | – | 72.7 | 63.8 |
SWE-bench Live | 26.3 | 22.3 | 13.0 | 27.7 | – |
SWE-bench Multilingual | 54.7 | 47.3 | 13.0 | 53.3 | 31.5 |
Multi-SWE-bench mini | 25.8 | 19.8 | 7.5 | 24.8 | – |
Multi-SWE-bench flash | 27.0 | 25.0 | – | 25.0 | – |
Aider-Polyglot | 61.8 | 60.0 | 56.9 | 56.4 | 52.4 |
Spider2 | 31.1 | 25.2 | 12.8 | 31.1 | 16.5 |
Agentic Browser Use
Benchmark | Qwen3-Coder<br>480B-A35B-Instruct | Kimi-K2<br>Instruct | DeepSeek-V3<br>0324 | Claude<br>Sonnet-4 | OpenAI<br>GPT-4.1 |
---|
WebArena | 49.9 | 47.4 | 40.0 | 51.1 | 44.3 |
Mind2Web | 55.8 | 42.7 | 36.0 | 47.4 | 49.6 |
Agentic Tool Use
Benchmark | Qwen3-Coder<br>480B-A35B-Instruct | Kimi-K2<br>Instruct | DeepSeek-V3<br>0324 | Claude<br>Sonnet-4 | OpenAI<br>GPT-4.1 |
---|
BFCL-v3 | 68.7 | 65.2 | 56.9 | 73.3 | 62.9 |
TAU-Bench Retail | 77.5 | 70.7 | 59.1 | 80.5 | – |
TAU-Bench Airline | 60.0 | 53.5 | 40.0 | 60.0 | – |
Qwen3-Coder-480B-A35B-Instruct – GPU Configuration (Min to Max)
Use Case | Context Length | GPU | # of GPUs | VRAM Requirement | Notes |
---|
Quick Testing / Minimal Inference | 4K – 8K tokens | A100 | 1× | 80 GB | Works for basic prompts and short replies. No tool calling or long context. |
Light Agentic Tasks | 8K – 16K tokens | A100 | 2× | 160 GB total | Can handle function calls, tool usage, and moderate-length prompts. |
Production-Grade Agent Inference | 16K – 32K tokens | A100 / H100 | 4× | 320 GB total | Ideal for agentic workflows, tool routing, and instruction-heavy tasks. |
Full 256K Context | 256K tokens | H100 | 8× | 640 GB total | Requires tensor & activation sharding. NVLink preferred. |
Long Context + Agentic Use Combined | 65K – 128K tokens | H100 / H200 | 8× | 640–800 GB | Smooth with Flash Attention + vLLM or TensorRT-LLM backend. |
Yarn-based 1M Token Context (Optional) | 1M tokens (Yarn) | H100 / H200 | 16× | 1.2 TB+ | Multi-node setup required. Use for ultra long docs or codebases. |
Resources
Link: https://huggingface.co/Qwen/Qwen3-Coder-480B-A35B-Instruct
Step-by-Step Process to Install & Run Qwen3-Coder-480B-A35B-Instruct Locally
For the purpose of this tutorial, we will use a GPU-powered Virtual Machine offered by NodeShift; however, you can replicate the same steps with any other cloud provider of your choice. NodeShift provides the most affordable Virtual Machines at a scale that meets GDPR, SOC2, and ISO27001 requirements.
Step 1: Sign Up and Set Up a NodeShift Cloud Account
Visit the NodeShift Platform and create an account. Once you’ve signed up, log into your account.
Follow the account setup process and provide the necessary details and information.
Step 2: Create a GPU Node (Virtual Machine)
GPU Nodes are NodeShift’s GPU Virtual Machines, on-demand resources equipped with diverse GPUs ranging from H100s to A100s. These GPU-powered VMs provide enhanced environmental control, allowing configuration adjustments for GPUs, CPUs, RAM, and Storage based on specific requirements.
Navigate to the menu on the left side. Select the GPU Nodes option, create a GPU Node in the Dashboard, click the Create GPU Node button, and create your first Virtual Machine deploy
Step 3: Select a Model, Region, and Storage
In the “GPU Nodes” tab, select a GPU Model and Storage according to your needs and the geographical region where you want to launch your model.
We will use 8 x H200 SXM GPU for this tutorial to achieve the fastest performance. However, you can choose a more affordable GPU with less VRAM if that better suits your requirements.
Step 4: Select Authentication Method
There are two authentication methods available: Password and SSH Key. SSH keys are a more secure option. To create them, please refer to our official documentation.
Step 5: Choose an Image
In our previous blogs, we used pre-built images from the Templates tab when creating a Virtual Machine. However, for running Qwen3-Coder-480B-A35B-Instruct, we need a more customized environment with full CUDA development capabilities. That’s why, in this case, we switched to the Custom Image tab and selected a specific Docker image that meets all runtime and compatibility requirements.
We chose the following image:
nvidia/cuda:12.1.1-devel-ubuntu22.04
This image is essential because it includes:
- Full CUDA toolkit (including
nvcc
)
- Proper support for building and running GPU-based applications like Qwen3-Coder-480B-A35B-Instruct
- Compatibility with CUDA 12.1.1 required by certain model operations
Launch Mode
We selected:
Interactive shell server
This gives us SSH access and full control over terminal operations — perfect for installing dependencies, running benchmarks, and launching tools like Zerank 1 Small.
Docker Repository Authentication
We left all fields empty here.
Since the Docker image is publicly available on Docker Hub, no login credentials are required.
Identification
nvidia/cuda:12.1.1-devel-ubuntu22.04
CUDA and cuDNN images from gitlab.com/nvidia/cuda. Devel version contains full cuda toolkit with nvcc.
This setup ensures that the Qwen3-Coder-480B-A35B-Instruct runs in a GPU-enabled environment with proper CUDA access and high compute performance.
After choosing the image, click the ‘Create’ button, and your Virtual Machine will be deployed.
Step 6: Virtual Machine Successfully Deployed
You will get visual confirmation that your node is up and running.
Step 7: Connect to GPUs using SSH
NodeShift GPUs can be connected to and controlled through a terminal using the SSH key provided during GPU creation.
Once your GPU Node deployment is successfully created and has reached the ‘RUNNING’ status, you can navigate to the page of your GPU Deployment Instance. Then, click the ‘Connect’ button in the top right corner.
Now open your terminal and paste the proxy SSH IP or direct SSH IP.
Next, If you want to check the GPU details, run the command below:
nvidia-smi
Step 8: Check the Available Python version and Install the new version
Run the following commands to check the available Python version.
If you check the version of the python, system has Python 3.8.1 available by default. To install a higher version of Python, you’ll need to use the deadsnakes
PPA.
Run the following commands to add the deadsnakes
PPA:
sudo apt update
sudo apt install -y software-properties-common
sudo add-apt-repository -y ppa:deadsnakes/ppa
sudo apt update
Step 9: Install Python 3.11
Now, run the following command to install Python 3.11 or another desired version:
sudo apt install -y python3.11 python3.11-venv python3.11-dev
Step 10: Update the Default Python3
Version
Now, run the following command to link the new Python version as the default python3
:
sudo update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.8 1
sudo update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.11 2
sudo update-alternatives --config python3
Then, run the following command to verify that the new Python version is active:
python3 --version
Step 11: Install and Update Pip
Run the following command to install and update the pip:
curl -O https://bootstrap.pypa.io/get-pip.py
python3.11 get-pip.py
Then, run the following command to check the version of pip:
pip --version
Step 11: Created and activated Python 3.11 virtual environment
Run the following commands to created and activated Python 3.11 virtual environment:
apt update && apt install -y python3.11-venv git wget
python3.11 -m venv qwen3-env
source qwen3-env/bin/activate
Step 12: Install PyTorch with GPU Support
Run the following command to install PyTorch with GPU suppoprt:
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
Step 13: Install Required Python Packages
Run the following command to install required Python packages:
pip install --upgrade transformers accelerate einops
Step 13: Connect to your GPU VM using Remote SSH
- Open VS Code on your Mac.
- Press
Cmd + Shift + P
, then choose Remote-SSH: Connect to Host
.
- Select your configured host.
- Once connected, you’ll see
SSH: 161.248.3.
77(Your VM IP) in the bottom-left status bar (like in the image).
Step 14: Create Python File
Create a Python script (e.g., run_qwen3_coder.py
) and add the following code:
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_name = "Qwen/Qwen3-Coder-480B-A35B-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
model_name,
device_map="auto",
torch_dtype=torch.float16,
trust_remote_code=True
)
prompt = "Write a quick sort algorithm in Python."
messages = [{"role": "user", "content": prompt}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer([text], return_tensors="pt").to(model.device)
output_ids = model.generate(
**inputs,
max_new_tokens=2048,
temperature=0.7,
top_p=0.8,
top_k=20,
repetition_penalty=1.05
)
generated_text = tokenizer.decode(output_ids[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
print("\nGenerated Output:\n", generated_text)
Step 16: Run the Script
Run the script and generate response:
python3 run_qwen3_coder.py
Check Output
What’s Next: From GPU Power to Everyday Laptops
Up to this point, we’ve set up Qwen3-Coder-480B-A35B-Instruct — a model that truly demands GPU muscle, high memory, and a robust virtual machine. But not every tool in the Qwen ecosystem requires this much horsepower. For the next part of this guide, let’s switch gears: Qwen Code CLI doesn’t need a monster GPU or cloud VM. You can install and run it directly on your everyday laptop or desktop — no expensive hardware, no complicated setup. It’s lightweight, fast to get started with, and perfect for bringing Qwen’s coding magic to your local workflow. Let’s dive into the Qwen Code CLI installation and see how easy it is to bring powerful code generation right to your fingertips.
Step-by-Step Process to Install & Run Qwen Code CLI Locally
Qwen Code is a command-line AI workflow tool, adapted from Gemini CLI and optimized for Qwen3-Coder models. It features enhanced parsing, deep code understanding, and workflow automation.
Qwen Code may make multiple API calls per task, which can increase token usage—similar to how Claude Code operates. The team is working on improving efficiency and developer experience.
Resources
Link: https://github.com/QwenLM/qwen-code
Step 1: Verify Node.js and npm Installation
Before you proceed with installing Qwen Code CLI, make sure you have Node.js and npm installed on your system. You can quickly check their versions by running the following commands in your terminal:
node -v
npm -v
Step 2: Update NPM to the Latest Version
To ensure smooth installation of all packages, it’s a good idea to update npm to the latest version.
You can do this easily by running the following command in your terminal:
curl -qL https://www.npmjs.com/install.sh | sh
Step 3: Install Qwen Code CLI Globally
Next, install the Qwen Code CLI tool globally using NPM:
npm install -g @qwen-code/qwen-code
After installation, make sure your Node.js binary directory is in your system’s PATH
. If you’re using Homebrew on macOS, add it with:
echo 'export PATH="/opt/homebrew/Cellar/node@20/20.19.3/bin:$PATH"' >> ~/.zshrc
source ~/.zshrc
Finally, verify that Qwen Code CLI is installed correctly by checking its version:
qwen --version
You should see an output like 0.0.1-alpha.10
, confirming a successful installation.
Step 4: Run Qwen Code CLI and Choose Your Theme
Now, you’re ready to launch Qwen Code CLI!
Simply run the following command in your terminal:
qwen
On your first launch, you’ll be greeted with the Qwen logo and a welcome screen where you can pick your favorite theme. Use the arrow keys to select a theme (like Qwen Dark, ANSI, Dracula, or GitHub) and press Enter.
You’ll also see helpful tips for getting started — like how to ask questions, edit files, or run commands.
Once your theme is set, you’re ready to start using Qwen Code right from your terminal!
Step 5: Configure OpenAI API Access
The first time you run Qwen Code CLI, you’ll be prompted to set up your OpenAI API configuration.
Here’s what to do:
- Get your OpenAI API key:
Visit https://platform.openai.com/api-keys and log in with your OpenAI account. Click “Create new secret key” and copy it.
- Paste your API key when prompted in the terminal.
- Base URL:
Leave this blank and press Enter, unless you’re using a custom API endpoint.
- Model:
Specify a supported OpenAI model, such as gpt-4o
, gpt-4
, or gpt-3.5-turbo
.
- Press Enter to finish the setup.
Once complete, Qwen Code will remember these settings and you’ll be ready to use its full capabilities!
Step 6: Generate and Use Your First Script with Qwen Code
Now that Qwen Code CLI is set up and running, you can start asking it to generate code, automate tasks, or even explain scripts.
For example:
Just type a prompt like:
Write a bash script to monitor disk usage every 5 minutes and send an alert if usage exceeds 90%.
Qwen Code will instantly generate a ready-to-use script, complete with helpful comments and usage instructions.
You’ll see the full Bash script, step-by-step comments, and clear instructions on how to save and run it.
Conclusion
And that’s it — you’re all set!
With both Qwen3-Coder-480B-A35B-Instruct and the Qwen Code CLI in your toolbox, you’re equipped to tackle anything from heavy-duty code generation on GPU servers to lightweight scripting and automation right on your laptop. The best part? Whether you’re building advanced agentic workflows, automating routine dev tasks, or just need an instant code snippet, the Qwen ecosystem meets you where you are — cloud or local, massive or minimal.
From setting up state-of-the-art language models for large-scale tasks to running a friendly CLI assistant for everyday coding, this guide has shown just how easy it is to get started. As you continue to explore, don’t hesitate to push boundaries — experiment with prompts, automate your favorite workflows, or even craft your own QWEN.md to personalize your coding sidekick.
Happy building, happy coding — and welcome to a whole new level of developer productivity with Qwen!