USO (Unified Style–Subject Optimized) from ByteDance unifies style-driven and subject-driven image generation in one framework. It’s trained on triplets (content image, style image, stylized image) and uses a disentangled learning scheme—style-alignment + content–style disentanglement—plus Style Reward Learning (SRL) to boost fidelity. The team also releases USO-Bench, the first benchmark that jointly scores style similarity and subject fidelity; USO reports SOTA among open-source models on both axes. Inference runs on top of FLUX.1 (with AE, T5, CLIP) and lightweight USO adapters (LoRA + projector).
GPU Configuration Table for ByteDance USO
Scenario | Precision / Build | Min VRAM | Recommended VRAM | Example GPUs | Typical Resolution / Notes |
---|
Entry / low-VRAM | FP8 / INT8 (TensorRT or torchao/quantized), CPU offload | 16 GB | 16–24 GB | RTX 4080 Super (16GB), RTX 3090 (24GB), L4 (24GB) | 768–1024² with slower steps; use attention slicing & offload. (NVIDIA Docs) |
Standard single-GPU | BF16/FP16 (native) | ~22–24 GB | 24–32 GB | RTX 4090 (24GB), A5000 (24GB), 5090 Laptop (24GB) | Comfortable 1024×1024; stable speed, better text control. (Hugging Face, NVIDIA Docs) |
High-res / faster throughput | BF16/FP16 (native), higher step counts | 24–32 GB | 32–48 GB | RTX 5090 (32GB), A6000 (48GB), L40S (48GB) | 1536–2048² and faster batch=1. (NVIDIA Docs) |
4K / multi-style batching | BF16/FP16, larger width/height or batch>1 | 48 GB | 80 GB | A100 80GB, H100 80GB | 4K or multi-image refs without OOM; best for labs/servers. (NVIDIA Docs) |
Speed-first tweaks on consumer GPUs | FP8 / GGUF / INT8 + offload | 10–16 GB | 16–24 GB | 3080 10GB†, 4070 Ti 16GB | 512–1024² with longer times; use lower steps. †Community reports vary. (GitHub, YouTube) |
Resources
Link: https://huggingface.co/bytedance-research/USO
Step-by-Step Process to Install & Run ByteDance USO Locally
For the purpose of this tutorial, we will use a GPU-powered Virtual Machine offered by NodeShift; however, you can replicate the same steps with any other cloud provider of your choice. NodeShift provides the most affordable Virtual Machines at a scale that meets GDPR, SOC2, and ISO27001 requirements.
Step 1: Sign Up and Set Up a NodeShift Cloud Account
Visit the NodeShift Platform and create an account. Once you’ve signed up, log into your account.
Follow the account setup process and provide the necessary details and information.
Step 2: Create a GPU Node (Virtual Machine)
GPU Nodes are NodeShift’s GPU Virtual Machines, on-demand resources equipped with diverse GPUs ranging from H100s to A100s. These GPU-powered VMs provide enhanced environmental control, allowing configuration adjustments for GPUs, CPUs, RAM, and Storage based on specific requirements.
Navigate to the menu on the left side. Select the GPU Nodes option, create a GPU Node in the Dashboard, click the Create GPU Node button, and create your first Virtual Machine deploy
Step 3: Select a Model, Region, and Storage
In the “GPU Nodes” tab, select a GPU Model and Storage according to your needs and the geographical region where you want to launch your model.
We will use 1 x H100 SXM GPU for this tutorial to achieve the fastest performance. However, you can choose a more affordable GPU with less VRAM if that better suits your requirements.
Step 4: Select Authentication Method
There are two authentication methods available: Password and SSH Key. SSH keys are a more secure option. To create them, please refer to our official documentation.
Step 5: Choose an Image
In our previous blogs, we used pre-built images from the Templates tab when creating a Virtual Machine. However, for running ByteDance USO, we need a more customized environment with full CUDA development capabilities. That’s why, in this case, we switched to the Custom Image tab and selected a specific Docker image that meets all runtime and compatibility requirements.
We chose the following image:
nvidia/cuda:12.1.1-devel-ubuntu22.04
This image is essential because it includes:
- Full CUDA toolkit (including
nvcc
)
- Proper support for building and running GPU-based models like ByteDance USO.
- Compatibility with CUDA 12.1.1 required by certain model operations
Launch Mode
We selected:
Interactive shell server
This gives us SSH access and full control over terminal operations — perfect for installing dependencies, running benchmarks, and launching models like ByteDance USO.
Docker Repository Authentication
We left all fields empty here.
Since the Docker image is publicly available on Docker Hub, no login credentials are required.
Identification
nvidia/cuda:12.1.1-devel-ubuntu22.04
CUDA and cuDNN images from gitlab.com/nvidia/cuda. Devel version contains full cuda toolkit with nvcc.
This setup ensures that the ByteDance USO runs in a GPU-enabled environment with proper CUDA access and high compute performance.
After choosing the image, click the ‘Create’ button, and your Virtual Machine will be deployed.
Step 6: Virtual Machine Successfully Deployed
You will get visual confirmation that your node is up and running.
Step 7: Connect to GPUs using SSH
NodeShift GPUs can be connected to and controlled through a terminal using the SSH key provided during GPU creation.
Once your GPU Node deployment is successfully created and has reached the ‘RUNNING’ status, you can navigate to the page of your GPU Deployment Instance. Then, click the ‘Connect’ button in the top right corner.
Now open your terminal and paste the proxy SSH IP or direct SSH IP.
Now open your terminal and paste the proxy SSH IP or direct SSH IP.
Next, If you want to check the GPU details, run the command below:
nvidia-smi
Step 8: Verify Python Version & Install pip
(if not present)
Since Python 3.10 is already installed, we’ll confirm its version and ensure pip
is available for package installation.
Step 8.1: Check Python Version
Run the following command to verify Python 3.10 is installed:
python3 --version
You should see output like:
Python 3.10.12
Step 8.2: Install pip
(if not already installed)
Even if Python is installed, pip
might not be available.
Check if pip
exists:
pip3 --version
If you get an error like command not found
, then install pip
manually.
Install pip
via get-pip.py
:
curl -O https://bootstrap.pypa.io/get-pip.py
python3 get-pip.py
This will download and install pip
into your system.
You may see a warning about running as root — that’s okay for now.
After installation, verify:
pip3 --version
Expected output:
pip 25.2 from /usr/local/lib/python3.10/dist-packages/pip (python 3.10)
Now pip
is ready to install packages like transformers
, torch
, etc.
Step 9: Created and Activated Python 3.10 Virtual Environment
Run the following commands to created and activated Python 3.10 virtual environment:
apt update && apt install -y python3.10-venv git wget
python3.10 -m venv uso_env
source uso_env/bin/activate
Step 10: Clone the USO Repo
Run the following command to clone the USO repo:
git clone https://github.com/bytedance/USO.git
cd USO
Step 11: Install PyTorch (CUDA 12.4 build)
Run the following command to install PyTorch (CUDA 12.4 build):
pip install torch==2.4.0 torchvision==0.19.0 --index-url https://download.pytorch.org/whl/cu124
Step 12: Install Required Packages
Run the following command to install required packages:
pip install -r requirements.txt
Step 13: Log in to Hugging Face (CLI)
Run the following command to login in to hugging face:
huggingface-cli login
- When prompted, paste your HF token (from https://huggingface.co/settings/tokens).
- For “Add token as git credential? (Y/n)”:
- Y if you plan to
git clone
models/repos.
- n if you only use
huggingface_hub
downloads.
You should see: “Token is valid… saved to /root/.cache/huggingface/stored_tokens”.
The red line “Cannot authenticate through git-credential…” just means no Git credential helper is set. It’s safe to ignore.
Step 14: Download Weights (Automatic, Recommended)
Run the following command to download weights:
python ./weights/downloader.py
Step 15: Download Repo Weights into a Local Folder
Run the following command to download repo weights into a local folder:
huggingface-cli download bytedance-research/USO \
--local-dir checkpoints --local-dir-use-symlinks False
Step 16: Export Paths
Run the following commands to export paths
# set env vars to the exact files
export LORA="$PWD/checkpoints/uso_flux_v1.0/dit_lora.safetensors"
export PROJECTION_MODEL="$PWD/checkpoints/uso_flux_v1.0/projector.safetensors"
# (optional) keep them for future shells
echo "export LORA=$LORA" | tee -a ~/.bashrc
echo "export PROJECTION_MODEL=$PROJECTION_MODEL" | tee -a ~/.bashrc
Step 17: Run Inference (CLI)
Subject-driven (identity → new scene):
python inference.py \
--prompt "The man in flower shops carefully match bouquets, conveying beautiful emotions and blessings with flowers." \
--image_paths "assets/gradio_examples/identity1.jpg" \
--width 1024 --height 1024
Style-driven (no content image):
python inference.py \
--prompt "A cat sleeping on a chair." \
--image_paths "" "assets/gradio_examples/style1.webp" \
--width 1024 --height 1024
Style + Subject (IP-style):
python inference.py \
--prompt "The woman gave an impassioned speech on the podium." \
--image_paths "assets/gradio_examples/identity2.webp" "assets/gradio_examples/style2.webp" \
--width 1024 --height 1024
Conclusion
ByteDance’s USO shows how style and subject generation can finally be unified into a single framework without compromising on either. With disentangled learning, style reward optimization, and the first joint benchmark (USO-Bench), it sets a new open-source standard for both subject fidelity and style alignment. Running on top of FLUX.1 with lightweight adapters, it’s efficient, scalable, and flexible — whether you want subject consistency, stylistic transfer, or both at once. In short, USO is a solid step forward for controllable, high-quality image generation.