How to Install & Run Qwen3-Coder-480B-A35B-Instruct & Qwen Code CLI Locally?

by Ayush Kumar | July 24, 2025

Ready to build cheaper?

Custom CPU plans from as little as $0.012/hour.

Qwen3-Coder-480B-A35B-Instruct is a powerhouse built for deep, structured reasoning and complex coding workflows. Designed with a focus on tool usage and agentic behavior, it delivers standout performance across real-world coding tasks, browser-based scenarios, and multi-step tool execution.

This model handles long contexts natively — up to 256K tokens — and can stretch to a whopping 1 million tokens with Yarn. Whether you’re debugging a terminal task, automating workflows, or building next-gen code agents, this model has been crafted to support the full stack of development challenges.

Unlike others, Qwen3-Coder is built to run your tools, call your functions, and integrate tightly into modern coding platforms like CLINE, Qwen Code, and more — all while maintaining clean and predictable outputs. With 480 billion total parameters (and 35 billion actively engaged), it’s optimized for both speed and precision.

From powering through massive repositories to handling multilingual programming tasks, Qwen3-Coder stands out where it matters.

Qwen3-Coder 480B-A35B-Instruct Benchmark Comparison

Benchmarks	Qwen3-Coder<br>480B-A35B-Instruct	Kimi-K2<br>Instruct	DeepSeek-V3<br>0324	Claude<br>Sonnet-4	OpenAI<br>GPT-4.1
Agentic Coding
Terminal-Bench	37.5	30.0	2.5	35.5	25.3
SWE-bench Verified	69.6	–	–	70.4	–
w/ OpenHands, 500 turns	67.0	65.4	38.8	68.0	48.6
w/ OpenHands, 100 turns	–	65.8	–	72.7	63.8
SWE-bench Live	26.3	22.3	13.0	27.7	–
SWE-bench Multilingual	54.7	47.3	13.0	53.3	31.5
Multi-SWE-bench mini	25.8	19.8	7.5	24.8	–
Multi-SWE-bench flash	27.0	25.0	–	25.0	–
Aider-Polyglot	61.8	60.0	56.9	56.4	52.4
Spider2	31.1	25.2	12.8	31.1	16.5

Agentic Browser Use

Benchmark	Qwen3-Coder<br>480B-A35B-Instruct	Kimi-K2<br>Instruct	DeepSeek-V3<br>0324	Claude<br>Sonnet-4	OpenAI<br>GPT-4.1
WebArena	49.9	47.4	40.0	51.1	44.3
Mind2Web	55.8	42.7	36.0	47.4	49.6

Agentic Tool Use

Benchmark	Qwen3-Coder<br>480B-A35B-Instruct	Kimi-K2<br>Instruct	DeepSeek-V3<br>0324	Claude<br>Sonnet-4	OpenAI<br>GPT-4.1
BFCL-v3	68.7	65.2	56.9	73.3	62.9
TAU-Bench Retail	77.5	70.7	59.1	80.5	–
TAU-Bench Airline	60.0	53.5	40.0	60.0	–

Qwen3-Coder-480B-A35B-Instruct – GPU Configuration (Min to Max)

Use Case	Context Length	GPU	# of GPUs	VRAM Requirement	Notes
Quick Testing / Minimal Inference	4K – 8K tokens	A100	1×	80 GB	Works for basic prompts and short replies. No tool calling or long context.
Light Agentic Tasks	8K – 16K tokens	A100	2×	160 GB total	Can handle function calls, tool usage, and moderate-length prompts.
Production-Grade Agent Inference	16K – 32K tokens	A100 / H100	4×	320 GB total	Ideal for agentic workflows, tool routing, and instruction-heavy tasks.
Full 256K Context	256K tokens	H100	8×	640 GB total	Requires tensor & activation sharding. NVLink preferred.
Long Context + Agentic Use Combined	65K – 128K tokens	H100 / H200	8×	640–800 GB	Smooth with Flash Attention + vLLM or TensorRT-LLM backend.
Yarn-based 1M Token Context (Optional)	1M tokens (Yarn)	H100 / H200	16×	1.2 TB+	Multi-node setup required. Use for ultra long docs or codebases.

Resources

Link: https://huggingface.co/Qwen/Qwen3-Coder-480B-A35B-Instruct

Step-by-Step Process to Install & Run Qwen3-Coder-480B-A35B-Instruct Locally

For the purpose of this tutorial, we will use a GPU-powered Virtual Machine offered by NodeShift; however, you can replicate the same steps with any other cloud provider of your choice. NodeShift provides the most affordable Virtual Machines at a scale that meets GDPR, SOC2, and ISO27001 requirements.

Step 1: Sign Up and Set Up a NodeShift Cloud Account

Visit the NodeShift Platform and create an account. Once you’ve signed up, log into your account.

Follow the account setup process and provide the necessary details and information.

Step 2: Create a GPU Node (Virtual Machine)

GPU Nodes are NodeShift’s GPU Virtual Machines, on-demand resources equipped with diverse GPUs ranging from H100s to A100s. These GPU-powered VMs provide enhanced environmental control, allowing configuration adjustments for GPUs, CPUs, RAM, and Storage based on specific requirements.

Navigate to the menu on the left side. Select the GPU Nodes option, create a GPU Node in the Dashboard, click the Create GPU Node button, and create your first Virtual Machine deploy

Step 3: Select a Model, Region, and Storage

In the “GPU Nodes” tab, select a GPU Model and Storage according to your needs and the geographical region where you want to launch your model.

We will use 8 x H200 SXM GPU for this tutorial to achieve the fastest performance. However, you can choose a more affordable GPU with less VRAM if that better suits your requirements.

Step 4: Select Authentication Method

There are two authentication methods available: Password and SSH Key. SSH keys are a more secure option. To create them, please refer to our official documentation.

Step 5: Choose an Image

In our previous blogs, we used pre-built images from the Templates tab when creating a Virtual Machine. However, for running Qwen3-Coder-480B-A35B-Instruct, we need a more customized environment with full CUDA development capabilities. That’s why, in this case, we switched to the Custom Image tab and selected a specific Docker image that meets all runtime and compatibility requirements.

We chose the following image:

nvidia/cuda:12.1.1-devel-ubuntu22.04

This image is essential because it includes:

Full CUDA toolkit (including nvcc)
Proper support for building and running GPU-based applications like Qwen3-Coder-480B-A35B-Instruct
Compatibility with CUDA 12.1.1 required by certain model operations

Launch Mode

We selected:

Interactive shell server

This gives us SSH access and full control over terminal operations — perfect for installing dependencies, running benchmarks, and launching tools like Zerank 1 Small.

Docker Repository Authentication

We left all fields empty here.

Since the Docker image is publicly available on Docker Hub, no login credentials are required.

Identification

Template Name:

nvidia/cuda:12.1.1-devel-ubuntu22.04

CUDA and cuDNN images from gitlab.com/nvidia/cuda. Devel version contains full cuda toolkit with nvcc.

This setup ensures that the Qwen3-Coder-480B-A35B-Instruct runs in a GPU-enabled environment with proper CUDA access and high compute performance.

After choosing the image, click the ‘Create’ button, and your Virtual Machine will be deployed.

Step 6: Virtual Machine Successfully Deployed

You will get visual confirmation that your node is up and running.

Step 7: Connect to GPUs using SSH

NodeShift GPUs can be connected to and controlled through a terminal using the SSH key provided during GPU creation.

Once your GPU Node deployment is successfully created and has reached the ‘RUNNING’ status, you can navigate to the page of your GPU Deployment Instance. Then, click the ‘Connect’ button in the top right corner.

Now open your terminal and paste the proxy SSH IP or direct SSH IP.

Next, If you want to check the GPU details, run the command below:

nvidia-smi

Step 8: Check the Available Python version and Install the new version

Run the following commands to check the available Python version.

If you check the version of the python, system has Python 3.8.1 available by default. To install a higher version of Python, you’ll need to use the deadsnakes PPA.

Run the following commands to add the deadsnakes PPA:

sudo apt update
sudo apt install -y software-properties-common
sudo add-apt-repository -y ppa:deadsnakes/ppa
sudo apt update

Step 9: Install Python 3.11

Now, run the following command to install Python 3.11 or another desired version:

sudo apt install -y python3.11 python3.11-venv python3.11-dev

Step 10: Update the Default `Python3` Version

Now, run the following command to link the new Python version as the default python3:

sudo update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.8 1
sudo update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.11 2
sudo update-alternatives --config python3

Then, run the following command to verify that the new Python version is active:

python3 --version

Step 11: Install and Update Pip

Run the following command to install and update the pip:

curl -O https://bootstrap.pypa.io/get-pip.py
python3.11 get-pip.py

Then, run the following command to check the version of pip:

pip --version

Step 11: Created and activated Python 3.11 virtual environment

Run the following commands to created and activated Python 3.11 virtual environment:

apt update && apt install -y python3.11-venv git wget
python3.11 -m venv qwen3-env
source qwen3-env/bin/activate

Step 12: Install PyTorch with GPU Support

Run the following command to install PyTorch with GPU suppoprt:

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

Step 13: Install Required Python Packages

Run the following command to install required Python packages:

pip install --upgrade transformers accelerate einops

Step 13: Connect to your GPU VM using Remote SSH

Open VS Code on your Mac.
Press Cmd + Shift + P, then choose Remote-SSH: Connect to Host.
Select your configured host.
Once connected, you’ll see SSH: 161.248.3.77(Your VM IP) in the bottom-left status bar (like in the image).

Step 14: Create Python File

Create a Python script (e.g., run_qwen3_coder.py) and add the following code:

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_name = "Qwen/Qwen3-Coder-480B-A35B-Instruct"

tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    device_map="auto",
    torch_dtype=torch.float16,
    trust_remote_code=True
)

prompt = "Write a quick sort algorithm in Python."
messages = [{"role": "user", "content": prompt}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

inputs = tokenizer([text], return_tensors="pt").to(model.device)

output_ids = model.generate(
    **inputs,
    max_new_tokens=2048,
    temperature=0.7,
    top_p=0.8,
    top_k=20,
    repetition_penalty=1.05
)

generated_text = tokenizer.decode(output_ids[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
print("\nGenerated Output:\n", generated_text)

Step 16: Run the Script

Run the script and generate response:

python3 run_qwen3_coder.py

Check Output

What’s Next: From GPU Power to Everyday Laptops

Up to this point, we’ve set up Qwen3-Coder-480B-A35B-Instruct — a model that truly demands GPU muscle, high memory, and a robust virtual machine. But not every tool in the Qwen ecosystem requires this much horsepower. For the next part of this guide, let’s switch gears: Qwen Code CLI doesn’t need a monster GPU or cloud VM. You can install and run it directly on your everyday laptop or desktop — no expensive hardware, no complicated setup. It’s lightweight, fast to get started with, and perfect for bringing Qwen’s coding magic to your local workflow. Let’s dive into the Qwen Code CLI installation and see how easy it is to bring powerful code generation right to your fingertips.

Step-by-Step Process to Install & Run Qwen Code CLI Locally

Qwen Code is a command-line AI workflow tool, adapted from Gemini CLI and optimized for Qwen3-Coder models. It features enhanced parsing, deep code understanding, and workflow automation.
Qwen Code may make multiple API calls per task, which can increase token usage—similar to how Claude Code operates. The team is working on improving efficiency and developer experience.

Resources

Link: https://github.com/QwenLM/qwen-code

Step 1: Verify Node.js and npm Installation

Before you proceed with installing Qwen Code CLI, make sure you have Node.js and npm installed on your system. You can quickly check their versions by running the following commands in your terminal:

node -v
npm -v

Step 2: Update NPM to the Latest Version

To ensure smooth installation of all packages, it’s a good idea to update npm to the latest version.
You can do this easily by running the following command in your terminal:

curl -qL https://www.npmjs.com/install.sh | sh

Step 3: Install Qwen Code CLI Globally

Next, install the Qwen Code CLI tool globally using NPM:

npm install -g @qwen-code/qwen-code

After installation, make sure your Node.js binary directory is in your system’s PATH. If you’re using Homebrew on macOS, add it with:

echo 'export PATH="/opt/homebrew/Cellar/node@20/20.19.3/bin:$PATH"' >> ~/.zshrc
source ~/.zshrc

Finally, verify that Qwen Code CLI is installed correctly by checking its version:

qwen --version

You should see an output like 0.0.1-alpha.10, confirming a successful installation.

Step 4: Run Qwen Code CLI and Choose Your Theme

Now, you’re ready to launch Qwen Code CLI!
Simply run the following command in your terminal:

qwen

On your first launch, you’ll be greeted with the Qwen logo and a welcome screen where you can pick your favorite theme. Use the arrow keys to select a theme (like Qwen Dark, ANSI, Dracula, or GitHub) and press Enter.

You’ll also see helpful tips for getting started — like how to ask questions, edit files, or run commands.
Once your theme is set, you’re ready to start using Qwen Code right from your terminal!

Step 5: Configure OpenAI API Access

The first time you run Qwen Code CLI, you’ll be prompted to set up your OpenAI API configuration.
Here’s what to do:

Get your OpenAI API key:
Visit https://platform.openai.com/api-keys and log in with your OpenAI account. Click “Create new secret key” and copy it.
Paste your API key when prompted in the terminal.
Base URL:
Leave this blank and press Enter, unless you’re using a custom API endpoint.
Model:
Specify a supported OpenAI model, such as gpt-4o, gpt-4, or gpt-3.5-turbo.
Press Enter to finish the setup.

Once complete, Qwen Code will remember these settings and you’ll be ready to use its full capabilities!

Step 6: Generate and Use Your First Script with Qwen Code

Now that Qwen Code CLI is set up and running, you can start asking it to generate code, automate tasks, or even explain scripts.

For example:
Just type a prompt like:

Write a bash script to monitor disk usage every 5 minutes and send an alert if usage exceeds 90%.

Qwen Code will instantly generate a ready-to-use script, complete with helpful comments and usage instructions.
You’ll see the full Bash script, step-by-step comments, and clear instructions on how to save and run it.

Conclusion

And that’s it — you’re all set!

With both Qwen3-Coder-480B-A35B-Instruct and the Qwen Code CLI in your toolbox, you’re equipped to tackle anything from heavy-duty code generation on GPU servers to lightweight scripting and automation right on your laptop. The best part? Whether you’re building advanced agentic workflows, automating routine dev tasks, or just need an instant code snippet, the Qwen ecosystem meets you where you are — cloud or local, massive or minimal.

From setting up state-of-the-art language models for large-scale tasks to running a friendly CLI assistant for everyday coding, this guide has shown just how easy it is to get started. As you continue to explore, don’t hesitate to push boundaries — experiment with prompts, automate your favorite workflows, or even craft your own QWEN.md to personalize your coding sidekick.

Happy building, happy coding — and welcome to a whole new level of developer productivity with Qwen!

Relevant blog posts

July 23, 2025

How to Install & Run Qwen3-235B-A22B-Instruct-2507 Locally?

Qwen3-235B-A22B-Instruct-2507 is a powerful language model designed to follow instructions, solve complex problems, and generate well-structured content across a wide range of topics. Built with 235 billion parameters—of which 22 billion are actively engaged during runtime—it uses a mixture-of-experts approach to stay both efficient and precise. This version delivers standout improvements in reasoning, multilingual support, long-context comprehension (up to 256,000 tokens), and subjective response quality. Whether it’s tackling math, science, creative writing, or multi-step tasks, it responds with clarity and intent. Optimized for real-world applications, it fits perfectly into agent-based systems, coding assistants, and any setup that demands deep understanding and reliable text generation. If you’re building next-gen tools, handling complex tasks, or just exploring the limits of what’s possible with advanced text models—this one’s worth checking out.

July 21, 2025

How to Install & Run ZeroEntropy Zerank 1 Small Locally?

In the world of search engines and information retrieval, precision matters. That’s where zerank-1-small comes in — a compact yet powerful reranker model developed by ZeroEntropy. Designed to boost the accuracy of search results, this 1.7B parameter model is a lighter sibling of the flagship zerank-1, delivering impressive performance while being over two times smaller. What sets zerank-1-small apart is its ability to consistently outperform many well-known rerankers and deliver significant accuracy improvements over traditional vector search methods. Whether applied to fields like finance, legal, STEM, code, or medical queries, the model enhances the ranking of retrieved documents to ensure users get the most relevant answers. Released under the open-source Apache 2.0 license, zerank-1-small is part of ZeroEntropy’s commitment to advancing open-source tools and empowering developers, researchers, and organizations to build better retrieval systems without proprietary restrictions.

July 18, 2025

How to Install LiquidAI LFM2-1.2B Locally?

The LFM2-1.2B is a next-generation hybrid model developed by Liquid AI, designed specifically for edge AI and on-device deployment. With ~1.2 billion parameters, this model stands out for its speed, memory efficiency, and quality, making it ideal for lightweight applications like agentic tasks, data extraction, RAG, creative writing, and multi-turn conversations. Model details Due to their small size, we recommend fine-tuning LFM2 models on narrow use cases to maximize performance. They are particularly suited for agentic tasks, data extraction, RAG, creative writing, and multi-turn conversations. However, we do not recommend using them for tasks that are knowledge-intensive or require programming skills.

See all posts

Ready to build
with us?

The ideal way for organizations young and old to ease their way into the distributed and affordable cloud at their own pace.

Stay Tuned!

Stay up to date with the latest updates, news, and hotfixes for our product.

NodeShift creates a vital link between developers and affordable cloud.

Switch theme

English (EN)
Arabic (AR)
Chinese (ZH-CN)
German (DE)
Korean (KO)
Russian (RU)
French (FR)
Spanish (ES)
Portuguese (PT)
Japanese (JA)

JavaScript is disabled in your browser. For a better experience, please enable JavaScript.Learn how to enable JavaScript.