There’s a new duo in the world of open-source models, and they’re here to make life a whole lot easier for developers, builders, and tinkerers everywhere. Whether you need raw horsepower for serious projects or something nimble for local experimentation, the gpt-oss lineup has you covered.
On one side, you’ve got the gpt-oss-120b—a heavyweight, purpose-built for tasks where deep reasoning, clear thinking, and wide-ranging skills really matter. It’s ready for the big leagues, built to handle complex requests without breaking a sweat. Perfect if you want the confidence that comes from working with something built for scale and reliability.
On the other side is gpt-oss-20b, the lighter and more agile sibling. It’s all about speed and versatility, ideal for those moments when you want answers fast, want to run things on your own machine, or just need a model that’s easy to fine-tune and shape to your unique needs.
Model Benchmark Scores
| gpt-oss-120b | gpt-oss-20b | OpenAI o3 | OpenAI o4-mini |
---|
Reasoning & Knowledge | | | | |
MMLU | 90.0 | 85.3 | 93.4 | 93.0 |
GPQA Diamond | 80.1 | 71.5 | 83.3 | 81.4 |
Humanity’s Last Exam | 19.0 | 17.3 | 24.9 | 17.7 |
Competition Math | | | | |
AIME 2024 | 96.6 | 96.0 | 95.2 | 98.7 |
AIME 2025 | 97.9 | 98.7 | 98.4 | 99.5 |
Recommended GPU Configuration
Model | Minimum GPU Needed | VRAM Needed | GPU Count | Typical Hardware Example | Runs on Consumer GPU? | Notes |
---|
gpt-oss-20b | 1x High-end GPU | 16 GB+ | 1 | NVIDIA RTX 4090, A6000, H100 | Yes | Runs comfortably on modern consumer GPUs. Easy for local use. |
gpt-oss-120b | 1x Server-grade GPU | 80 GB+ | 1 | NVIDIA H100 (80 GB), A100 (80 GB) | No (Server Only) | Needs powerful server hardware, usually cloud or on-prem GPU server. |
Resources
Link 1: https://huggingface.co/openai/gpt-oss-20b
Link 2: https://huggingface.co/openai/gpt-oss-120b
Link 3: https://github.com/openai/gpt-oss
Link 4: https://ollama.com/library/gpt-oss
Step-by-Step Process to Install & Run OpenAI GPT-OSS Locally
For the purpose of this tutorial, we will use a GPU-powered Virtual Machine offered by NodeShift; however, you can replicate the same steps with any other cloud provider of your choice. NodeShift provides the most affordable Virtual Machines at a scale that meets GDPR, SOC2, and ISO27001 requirements.
Step 1: Sign Up and Set Up a NodeShift Cloud Account
Visit the NodeShift Platform and create an account. Once you’ve signed up, log into your account.
Follow the account setup process and provide the necessary details and information.
Step 2: Create a GPU Node (Virtual Machine)
GPU Nodes are NodeShift’s GPU Virtual Machines, on-demand resources equipped with diverse GPUs ranging from H100s to A100s. These GPU-powered VMs provide enhanced environmental control, allowing configuration adjustments for GPUs, CPUs, RAM, and Storage based on specific requirements.
Navigate to the menu on the left side. Select the GPU Nodes option, create a GPU Node in the Dashboard, click the Create GPU Node button, and create your first Virtual Machine deploy
Step 3: Select a Model, Region, and Storage
In the “GPU Nodes” tab, select a GPU Model and Storage according to your needs and the geographical region where you want to launch your model.
We will use 2 x H100 SXM GPU for this tutorial to achieve the fastest performance. However, you can choose a more affordable GPU with less VRAM if that better suits your requirements.
Step 4: Select Authentication Method
There are two authentication methods available: Password and SSH Key. SSH keys are a more secure option. To create them, please refer to our official documentation.
Step 5: Choose an Image
In our previous blogs, we used pre-built images from the Templates tab when creating a Virtual Machine. However, for running OpenAI GPT-OSS, we need a more customized environment with full CUDA development capabilities. That’s why, in this case, we switched to the Custom Image tab and selected a specific Docker image that meets all runtime and compatibility requirements.
We chose the following image:
nvidia/cuda:12.1.1-devel-ubuntu22.04
This image is essential because it includes:
- Full CUDA toolkit (including
nvcc
)
- Proper support for building and running GPU-based applications like OpenAI GPT-OSS
- Compatibility with CUDA 12.1.1 required by certain model operations
Launch Mode
We selected:
Interactive shell server
This gives us SSH access and full control over terminal operations — perfect for installing dependencies, running benchmarks, and launching tools like OpenAI GPT-OSS.
Docker Repository Authentication
We left all fields empty here.
Since the Docker image is publicly available on Docker Hub, no login credentials are required.
Identification
nvidia/cuda:12.1.1-devel-ubuntu22.04
CUDA and cuDNN images from gitlab.com/nvidia/cuda. Devel version contains full cuda toolkit with nvcc.
This setup ensures that the OpenAI GPT-OSS runs in a GPU-enabled environment with proper CUDA access and high compute performance.
After choosing the image, click the ‘Create’ button, and your Virtual Machine will be deployed.
Step 6: Virtual Machine Successfully Deployed
You will get visual confirmation that your node is up and running.
Step 7: Connect to GPUs using SSH
NodeShift GPUs can be connected to and controlled through a terminal using the SSH key provided during GPU creation.
Once your GPU Node deployment is successfully created and has reached the ‘RUNNING’ status, you can navigate to the page of your GPU Deployment Instance. Then, click the ‘Connect’ button in the top right corner.
Now open your terminal and paste the proxy SSH IP or direct SSH IP.
Next, If you want to check the GPU details, run the command below:
nvidia-smi
Step 8: Install Ollama
After connecting to the terminal via SSH, it’s now time to install Ollama from the official Ollama website.
Website Link: https://ollama.com/
Run the following command to install the Ollama:
curl -fsSL https://ollama.com/install.sh | sh
Step 9: Serve Ollama
Run the following command to host the Ollama so that it can be accessed and utilized efficiently:
ollama serve
Step 10: Explore Ollama CLI Commands
After starting the Ollama server, you can explore all available commands and get help right from the terminal.
To see the list of all commands that Ollama supports, run:
ollama
You’ll see an output like this:
Usage:
ollama [flags]
ollama [command]
Available Commands:
serve Start ollama
create Create a model
show Show information for a model
run Run a model
stop Stop a running model
pull Pull a model from a registry
push Push a model to a registry
list List models
ps List running models
cp Copy a model
rm Remove a model
help Help about any command
Flags:
-h, --help help for ollama
-v, --version Show version information
Use "ollama [command] --help" for more information about a command.
This command helps you quickly understand what you can do with Ollama—such as running, pulling, stopping models, and more.
Step 11: Pull Both GPT-OSS Models
GPT-OSS comes in two main versions—20B and 120B.
You’ll need to pull each model separately using Ollama’s CLI.
Let’s do it one by one:
Pull the 20B Version
Run this command to pull the 20B model:
ollama pull gpt-oss:20b
You’ll see progress bars as the model and its components download.
When finished, you should see success
.
Pull the 120B Version
Now, pull the larger 120B model:
ollama pull gpt-oss:120b
Again, wait for the download and extraction to finish until you see success
.
Step 12: Verify Downloaded Models
After pulling the GPT-OSS models, you can check that they’ve been successfully downloaded and are available on your system.
Just run:
ollama list
You should see output like this:
NAME ID SIZE MODIFIED
gpt-oss:120b 735371f916a9 65 GB 43 seconds ago
gpt-oss:20b f2b8351c629c 13 GB 4 minutes ago
This confirms both the 20B and 120B GPT-OSS models are now installed and ready to use.
Step 13: Run the GPT-OSS Model for Inference
Now that your models are installed, you can start running them and interacting directly from the terminal.
To run the 20B version of GPT-OSS, use:
ollama run gpt-oss:20b
You’ll be prompted to enter your message or prompt. For example, you can try:
Imagine you’re a detective solving a mystery in a city where gravity randomly changes direction every hour. Walk through your entire reasoning as you solve a crime—don’t skip a step or assumption.
The model will process your prompt, display “Thinking…”, and then generate a detailed response.
Try Different Prompts
Step 14: Run the 120B GPT-OSS Model
After testing the 20B model, let’s now run the larger, more powerful 120B version.
To start an interactive session with the 120B model, run:
ollama run gpt-oss:120b
You’ll see the prompt:
>>>
Type your question, prompt, or creative request—just like with the 20B model. For example:
You have access to a web browser, Python, and function-calling. Your task: find the current weather in Reykjavik, predict tomorrow’s using code, and format it as a poetic haiku.
The model will process your request and generate a detailed, creative answer.
Try Different Prompts
Now you’ve successfully run and interacted with the GPT-OSS models directly in your terminal using Ollama! This command-line approach is fast and powerful for quick experiments or automation. However, sometimes you want a more visually appealing and user-friendly interface for chatting with models, exploring outputs, or showcasing demos. For those moments, it’s great to use an interface like Open WebUI, which makes running prompts and interacting with models both simple and enjoyable. In the next steps, we’ll see how to run the same models with Open WebUI and experience an upgraded, interactive chat environment.
Step 15: Check the Available Python version and Install the new version
Run the following commands to check the available Python version.
If you check the version of the python, system has Python 3.8.1 available by default. To install a higher version of Python, you’ll need to use the deadsnakes
PPA.
Run the following commands to add the deadsnakes
PPA:
sudo apt update
sudo apt install -y software-properties-common
sudo add-apt-repository -y ppa:deadsnakes/ppa
sudo apt update
Step 16: Install Python 3.11
Now, run the following command to install Python 3.11 or another desired version:
sudo apt install -y python3.11 python3.11-venv python3.11-dev
Step 17: Update the Default Python3
Version
Now, run the following command to link the new Python version as the default python3
:
sudo update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.8 1
sudo update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.11 2
sudo update-alternatives --config python3
Then, run the following command to verify that the new Python version is active:
python3 --version
Step 18: Install and Update Pip
Run the following command to install and update the pip:
curl -O https://bootstrap.pypa.io/get-pip.py
python3.11 get-pip.py
Then, run the following command to check the version of pip:
pip --version
Step 19: Created and activated Python 3.11 virtual environment
Run the following commands to created and activated Python 3.11 virtual environment:
apt update && apt install -y python3.11-venv git wget
python3.11 -m venv openwebui
source openwebui/bin/activate
Step 20: Install Open-WebUI
Run the following command to install open-webui:
pip install open-webui
Step 21: Serve Open-WebUI
In your activated Python environment, start the Open-WebUI server by running:
open-webui serve
- Wait for the server to complete all database migrations and set up initial files. You’ll see a series of INFO logs and a large “OPEN WEBUI” banner in the terminal.
- When setup is complete, the WebUI will be available and ready for you to access via your browser.
Step 22: Set up SSH port forwarding from your local machine
On your local machine (Mac/Windows/Linux), open a terminal and run:
ssh -L 8080:localhost:8080 -p 18685 root@Your_VM_IP
This forwards:
Local localhost:8000
→ Remote VM 127.0.0.1:8000
Step 23: Access Open-WebUI in Your Browser
Go to:
http://localhost:8080
- You should see the Open-WebUI login or setup page.
- Log in or create a new account if this is your first time.
- You’re now ready to use Open-WebUI to interact with your models!
Step 24: Select and Use Your Model in Open-WebUI
Once you’ve logged into Open-WebUI in your browser, you can easily choose between any models you have installed on your system.
- Click on the model selection dropdown at the top left (where you see the model name, e.g.,
gpt-oss:120b
).
- You’ll see a list of all available models, such as:
gpt-oss:120b
gpt-oss:20b
- (and any other models you’ve installed)
- Simply click on the model you want to use (for example,
gpt-oss:120b
for the largest, most powerful model).
- Once selected, you can start chatting or sending prompts to that model in the Open-WebUI chat window below.
Step 25: Start Chatting with Your Model in Open-WebUI
With your model selected in Open-WebUI, you can now start sending prompts and receive rich, detailed responses—just like chatting with a modern AI assistant.
- Type your question or prompt in the chat input box at the bottom of the screen.
- Press Enter to send your message.
- The model will process your request and respond in the chat window, showing its full reasoning and answer.
As shown in the screenshot, you can ask advanced questions, get structured explanations, and even see responses formatted with tables and bullet points.
Step 26: Explore Advanced Reasoning and Creativity with Large Models
With the gpt-oss:120b model loaded in Open-WebUI, you can take full advantage of its advanced reasoning, problem-solving, and creativity. Try giving the model complex, multi-step challenges—such as designing unique puzzles, solving technical problems, or explaining advanced topics in depth.
- Ask open-ended or multi-part questions to see the model’s full reasoning process.
- The model can generate diagrams, ASCII art, tables, and well-structured explanations, as shown in the screenshot.
- You can save, copy, or collapse responses for easy reference.
Up to this point, you’ve learned how to interact with GPT-OSS models both from the terminal using Ollama and from a user-friendly interface with Open-WebUI. This gives you the best of both worlds: the speed and flexibility of the command line, and the visual, intuitive experience of a web-based chat interface. But that’s not all! You can also integrate these models directly into your Python code using the Transformers library. With Transformers, you can run gpt-oss-120b and gpt-oss-20b programmatically for everything from chatbots to automated pipelines. If you use the Transformers chat template, harmony response formatting is automatically handled for you. If you call model.generate
directly, you’ll just need to apply the harmony format manually—either with the chat template or with the official openai-harmony
package. This opens up a world of advanced integrations, so let’s explore how to use GPT-OSS models with Transformers next!
Step 27: Connect to Your GPU VM with a Code Editor
Before you start running Python scripts with the GPT-OSS models and Transformers, it’s a good idea to connect your GPU virtual machine (VM) to a code editor of your choice. This makes writing, editing, and running code much easier.
- You can use popular editors like VS Code, Cursor, or any other IDE that supports SSH remote connections.
- In this example, we’re using cursor code editor.
- Once connected, you’ll be able to browse files, edit scripts, and run commands directly on your remote server, just like working locally.
Why do this?
Connecting your VM to a code editor gives you a powerful, streamlined workflow for Python development, allowing you to easily manage your code, install dependencies, and experiment with large models.
Step 28: Set Up a Python Virtual Environment
Before you install any dependencies or write code, it’s best practice to create a Python virtual environment. This keeps your project’s packages isolated and prevents conflicts with system-wide Python libraries.
Run the following commands on your GPU VM:
python3 -m venv ~/gptoss-venv
source ~/gptoss-venv/bin/activate
Step 29: Install Python Dependencies
Run the following command to install python dependencies:
pip install -U transformers kernels torch
pip install accelerate
Step 30: Run GPT-OSS Models with Transformers in Python
Now you’re ready to interact with GPT-OSS directly in your own Python scripts using the Transformers library.
Here’s an example script (run_gptoss_transformers.py
) you can use:
from transformers import pipeline
import torch
model_id = "openai/gpt-oss-120b"
pipe = pipeline(
"text-generation",
model=model_id,
torch_dtype="auto",
device_map="auto",
)
messages = [
{"role": "user", "content": "Explain quantum mechanics clearly and concisely."},
]
outputs = pipe(
messages,
max_new_tokens=256,
)
print(outputs[0]["generated_text"][-1])
- This script loads the GPT-OSS 120B model and sends a prompt for completion.
- You can edit the
messages
variable to ask anything you want!
Tip
- Make sure you’re in your virtual environment and have installed all dependencies (
transformers
, torch
).
- You can swap
"openai/gpt-oss-120b"
with "openai/gpt-oss-20b"
to use the smaller model.
Step 31: Run the Python Script and Generate Model Output
Now it’s time to run your script and see the GPT-OSS model in action!
Simply execute your Python file in the terminal:
python3 run_gptoss_transformers.py
You’ll see the model’s output directly in your terminal, with a detailed, well-formatted response generated by GPT-OSS.
You can try any prompt you want—just change the text in the content
field of your Python code and run the script again!
Up to this point, you’ve seen how easy it is to run the GPT-OSS models programmatically using Python scripts and the Transformers library. This lets you automate tasks, process data, and build custom workflows entirely in code. But what if you want to serve your model as an API, making it accessible to any app or client just like the OpenAI API? That’s where Transformers Serve comes in. With just a few commands, you can spin up an OpenAI-compatible webserver around your model—enabling easy integration with tools, chatbots, or anything that speaks the OpenAI API format!
Step 32: Install Pillow
Run the following command to install pillow:
pip install pillow
Step 33: Install Transformers[Serving]
Run the following command to install “transformers[serving]”:
pip install "transformers[serving]"
Step 34: Install Rich
Run the following command to install rich:
pip install rich
Step 35: Install Aiohttp
Run the following command to install aiohttp:
pip install aiohttp
Step 36: Launch an OpenAI-Compatible API Server with Transformers Serve
With everything set up, you can now serve your GPT-OSS model as an OpenAI-compatible API using Transformers Serve.
Simply run:
transformers serve
You’ll see log messages confirming that the server has started, the application is ready, and Uvicorn is running on http://localhost:8000
.
Now your model is accessible as an API—ready to accept requests just like the official OpenAI endpoint!
Step 37: Chat with Your Model via the Transformers Serve API
Now that your OpenAI-compatible API server is running, you can chat with your model using the transformers
CLI.
Run this command to start a chat session with the gpt-oss-120b model through your local API server:
transformers chat localhost:8000 --model-name-or-path openai/gpt-oss-120b
You’ll be able to send prompts and receive responses, just like chatting with a hosted API—except everything runs locally on your GPU VM!
Step 38: Use the Transformers Chat Interface Commands
Once inside the Transformers Chat Interface, you’ll see a prompt (<root>:
) where you can interact with your model.
Besides chatting, you have access to a few handy commands:
!help
– Shows all available commands, including how to set generation settings and save your chat.
!status
– Displays the current status of your model and generation settings.
!clear
– Clears the current conversation and starts fresh.
!exit
– Closes the chat interface.
Simply type your messages or commands at the <root>:
prompt and press Enter.
You can now experiment, tweak settings, or start new chats—all directly in your terminal!
Step 39: Interact with the Model and Get Structured Answers
Now you can use the chat interface to ask your model any question you want!
Simply type your prompt at the <root>:
prompt and press Enter.
- The model will respond in a clear, well-formatted way—sometimes even including tables, bullet points, or side-by-side comparisons, just like in the example above.
- You can ask for explanations, definitions, comparisons, or step-by-step solutions to technical and non-technical questions alike.
Conclusion
With everything set up, you’re ready to make these open-source models work the way you want—locally, in the cloud, or anywhere in between. Choose what fits your project, experiment freely, and build without limits or hidden costs. It’s all about flexibility, control, and putting real power in your hands. Now go ahead—try new ideas, solve real problems, and see what you can create next!