How to Install & Run Qwen-Image-Lightning Locally?

by Ayush Kumar | August 13, 2025

Ready to build cheaper?

Custom CPU plans from as little as $0.012/hour.

Qwen-Image-Lightning is a distilled version of the original Qwen-Image model, designed to deliver fast, high-quality text-to-image generation with exceptional ability in complex text rendering and fine image details.

The Lightning variants significantly reduce the number of inference steps (down to 4 or 8) while preserving — and in many cases matching — the visual quality of the full Qwen-Image model. This makes it a perfect choice for scenarios where speed matters, such as interactive creative workflows, live content generation, or rapid prototyping.

Key Highlights

⚡ Lightning-Fast Inference: Generate high-quality images in just 4 or 8 steps compared to the base model’s 50 steps.
🖋 Complex Text Rendering: Maintains the strong typography and text embedding capabilities from Qwen-Image.
🎯 LoRA Integration: Supports LoRA (Low-Rank Adaptation) weights for efficient model fine-tuning.
🖼 Versatile Styles & Prompts: Performs well across artistic, photorealistic, and mixed media prompts.
🌍 Bilingual Prompt Support: Works seamlessly with both English and Chinese input.

Available Versions

Qwen-Image-Lightning-8steps-V1.0 — Balanced speed and quality.
Qwen-Image-Lightning-8steps-V1.1 — Latest refinement with improved visual consistency.
Qwen-Image-Lightning-4steps-V1.0 — Ultra-fast generation with minimal step count.
Base Model (Qwen-Image) — Full 50-step generation for maximum fidelity.

Best Use Cases

Creative Design: Quickly prototype concept art, posters, or product visuals.
Advertising & Marketing: Fast generation of banner variants and ad creatives.
Education & Content: Create illustrations or visual assets in real-time during live sessions.
UI/UX Mockups: Rapidly iterate on design ideas with descriptive text prompts.

Recommended GPU Configuration Table for Qwen-Image-Lightning

GPU configuration cheat-sheet for running Qwen-Image (base) + Qwen-Image-Lightning LoRA with 🤗 diffusers (bf16). It’s tuned for single-image generation and gives you safe, real-world “it just works” settings.

Legend

Res = recommended max resolution per image
BS = batch size (simultaneous images)
Steps = 4/8 for Lightning; 50 for base
Precision = bf16 (preferred), fp16 (fallback)

GPU (examples)	VRAM	Lightning (4 or 8 steps) — Res / BS / Precision	Base (50 steps) — Res / BS / Precision	Notes
RTX 2060 / GTX 1080 Ti	8 GB	512×512 / 1 / fp16	384×384 / 1 / fp16	Use `enable_attention_slicing()`; consider `--height/--width` ≤512. CPU offload if close to OOM.
RTX 3060 (12 GB)	12 GB	768×768 / 1 / fp16 or 512×512 / 2 / fp16	512×512 / 1 / fp16	Prefer Lightning for speed. If OOM, drop to 640 or BS=1.
RTX 3080 (10 GB) / 3070 (8 GB)	8–10 GB	640×640 / 1 / fp16	448×448 / 1 / fp16	Similar to 8–12 GB guidance.
RTX 3090 / 4090 / A5000	24 GB	1024×1024 / 1–2 / bf16	768×768 / 1 / bf16	Good sweet spot; Lightning comfortably does 1024.
A6000 / L40S	48 GB	1024×1024 / 3–4 / bf16 or 1344×1344 / 1–2 / bf16	1024×1024 / 1–2 / bf16	Great for batching or higher-than-1K.
A100 80G / H100 80G	80 GB	1536×1536 / 2–3 / bf16 or 1024×1024 / 6–8 / bf16	1280×1280 / 1–2 / bf16	High throughput; ideal for queues and servers.

Resources

GitHub: https://github.com/ModelTC/Qwen-Image-Lightning/

HuggingFace: https://huggingface.co/lightx2v/Qwen-Image-Lightning

Step-by-Step Process to Install & Run Qwen-Image-Lightning Locally

For the purpose of this tutorial, we will use a GPU-powered Virtual Machine offered by NodeShift; however, you can replicate the same steps with any other cloud provider of your choice. NodeShift provides the most affordable Virtual Machines at a scale that meets GDPR, SOC2, and ISO27001 requirements.

Step 1: Sign Up and Set Up a NodeShift Cloud Account

Visit the NodeShift Platform and create an account. Once you’ve signed up, log into your account.

Follow the account setup process and provide the necessary details and information.

Step 2: Create a GPU Node (Virtual Machine)

GPU Nodes are NodeShift’s GPU Virtual Machines, on-demand resources equipped with diverse GPUs ranging from H100s to A100s. These GPU-powered VMs provide enhanced environmental control, allowing configuration adjustments for GPUs, CPUs, RAM, and Storage based on specific requirements.

Navigate to the menu on the left side. Select the GPU Nodes option, create a GPU Node in the Dashboard, click the Create GPU Node button, and create your first Virtual Machine deploy

Step 3: Select a Model, Region, and Storage

In the “GPU Nodes” tab, select a GPU Model and Storage according to your needs and the geographical region where you want to launch your model.

We will use 1 x H100 SXM GPU for this tutorial to achieve the fastest performance. However, you can choose a more affordable GPU with less VRAM if that better suits your requirements.

Step 4: Select Authentication Method

There are two authentication methods available: Password and SSH Key. SSH keys are a more secure option. To create them, please refer to our official documentation.

Step 5: Choose an Image

In our previous blogs, we used pre-built images from the Templates tab when creating a Virtual Machine. However, for running Qwen-Image-Lightning, we need a more customized environment with full CUDA development capabilities. That’s why, in this case, we switched to the Custom Image tab and selected a specific Docker image that meets all runtime and compatibility requirements.

We chose the following image:

nvidia/cuda:12.1.1-devel-ubuntu22.04

This image is essential because it includes:

Full CUDA toolkit (including nvcc)
Proper support for building and running GPU-based applications like Qwen-Image-Lightning
Compatibility with CUDA 12.1.1 required by certain model operations

Launch Mode

We selected:

Interactive shell server

This gives us SSH access and full control over terminal operations — perfect for installing dependencies, running benchmarks, and launching tools like Qwen-Image-Lightning.

Docker Repository Authentication

We left all fields empty here.

Since the Docker image is publicly available on Docker Hub, no login credentials are required.

Identification

Template Name:

nvidia/cuda:12.1.1-devel-ubuntu22.04

CUDA and cuDNN images from gitlab.com/nvidia/cuda. Devel version contains full cuda toolkit with nvcc.

This setup ensures that the Qwen-Image-Lightning runs in a GPU-enabled environment with proper CUDA access and high compute performance.

After choosing the image, click the ‘Create’ button, and your Virtual Machine will be deployed.

Step 6: Virtual Machine Successfully Deployed

You will get visual confirmation that your node is up and running.

Step 7: Connect to GPUs using SSH

NodeShift GPUs can be connected to and controlled through a terminal using the SSH key provided during GPU creation.

Once your GPU Node deployment is successfully created and has reached the ‘RUNNING’ status, you can navigate to the page of your GPU Deployment Instance. Then, click the ‘Connect’ button in the top right corner.

Now open your terminal and paste the proxy SSH IP or direct SSH IP.

Next, If you want to check the GPU details, run the command below:

nvidia-smi

Step 8: Check the Available Python version and Install the new version

Run the following commands to check the available Python version.

If you check the version of the python, system has Python 3.8.1 available by default. To install a higher version of Python, you’ll need to use the deadsnakes PPA.

Run the following commands to add the deadsnakes PPA:

sudo apt update
sudo apt install -y software-properties-common
sudo add-apt-repository -y ppa:deadsnakes/ppa
sudo apt update

Step 9: Install Python 3.11

Now, run the following command to install Python 3.11 or another desired version:

sudo apt install -y python3.11 python3.11-venv python3.11-dev

Step 10: Update the Default `Python3` Version

Now, run the following command to link the new Python version as the default python3:

sudo update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.8 1
sudo update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.11 2
sudo update-alternatives --config python3

Then, run the following command to verify that the new Python version is active:

python3 --version

Step 11: Install and Update Pip

Run the following command to install and update the pip:

curl -O https://bootstrap.pypa.io/get-pip.py
python3.11 get-pip.py

Then, run the following command to check the version of pip:

pip --version

Step 12: Clone the Qwen-Image-Lightning Repository

Run the following command to clone the Qwen-Image-Lightning repository:

git clone https://github.com/ModelTC/Qwen-Image-Lightning.git

Step 13: Install Torch

Run the following command to install torch:

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128

Step 14: Install Diffusers

Run the following command to install diffusers:

pip install git+https://github.com/huggingface/diffusers.git

Step 15: Install Accelerate

Run the following command to install accelerate:

pip install accelerate

Step 16: Install Peft

Run the following command to install peft:

pip install peft

Step 17: Install Transformers

Run the following command to install transformers:

pip install -U transformers

Step 18: Connect to your GPU VM using Remote SSH

Open VS Code on your Mac.
Press Cmd + Shift + P, then choose Remote-SSH: Connect to Host.
Select your configured host.
Once connected, you’ll see SSH: 149.7.4.3(Your VM IP) in the bottom-left status bar (like in the image).

Step 19: Create a New Python Script `ex.py` and Add the Following Code

Create a new python script (example: run_lightning.py) and add the following code:

from diffusers import DiffusionPipeline, FlowMatchEulerDiscreteScheduler
import torch
import math

scheduler_config = {
    "base_image_seq_len": 256,
    "base_shift": math.log(3),
    "invert_sigmas": False,
    "max_image_seq_len": 8192,
    "max_shift": math.log(3),
    "num_train_timesteps": 1000,
    "shift": 1.0,
    "shift_terminal": None,
    "stochastic_sampling": False,
    "time_shift_type": "exponential",
    "use_beta_sigmas": False,
    "use_dynamic_shifting": True,
    "use_exponential_sigmas": False,
    "use_karras_sigmas": False,
}

scheduler = FlowMatchEulerDiscreteScheduler.from_config(scheduler_config)

pipe = DiffusionPipeline.from_pretrained(
    "Qwen/Qwen-Image",
    scheduler=scheduler,
    torch_dtype=torch.bfloat16
).to("cuda")

pipe.load_lora_weights(
    "lightx2v/Qwen-Image-Lightning",
    weight_name="Qwen-Image-Lightning-8steps-V1.0.safetensors"
)

prompt = "a tiny astronaut hatching from an egg on the moon, Ultra HD, 4K, cinematic composition."
image = pipe(
    prompt=prompt,
    negative_prompt="",
    width=1024,
    height=1024,
    num_inference_steps=8,
    true_cfg_scale=1.0,
    generator=torch.manual_seed(0),
).images[0]

image.save("qwen_lightning_output.png")

Step 20: Run the Script

After saving your run_lightning.py file, run it using:

python3 run_lightning.py

This will:

Load the Qwen-Image base model
Apply the Lightning 8-step LoRA weights
Generate a high-quality image from the given prompt
Save the result as gwen_lightning_output.png in your current directory

Once you see 100% across all bars — your image is ready!

Step 21: View the Generated Image

After your script runs successfully, your output is saved as:

gwen_lightning_output.png

To view it:

In VS Code (SSH Remote), go to the left sidebar.
Click on gwen_lightning_output.png.
The image will open in the right pane, just like in the screenshot.

And there it is — a tiny astronaut hatching from an egg on the moon, generated by Qwen-Image-Lightning.

Conclusion

Qwen-Image-Lightning delivers the perfect balance between speed and quality, making it a go-to choice for creators, designers, and developers who need stunning visuals in record time. With its ability to generate high-fidelity images in just 4 or 8 steps, seamless LoRA integration, and strong multilingual prompt handling, it empowers you to bring your ideas to life faster than ever.

Whether you’re prototyping concepts, creating marketing assets, or producing interactive content, Qwen-Image-Lightning proves that lightning speed doesn’t have to mean compromising on quality.

Relevant blog posts

November 3, 2025

How to Install & Run AMD Nitro-E Locally?

Nitro-E is AMD’s ultra-light text-to-image diffusion family built on E-MMDiT (~304M params). It’s designed for fast, low-cost training/inference: the base 512px model gives strong quality in ~20 steps, while the distilled 512px variant can generate usable images in as few as 4 steps. There’s also a GRPO-tuned checkpoint for post-training quality/behavior tweaks. Code is plain PyTorch/Diffusers, so it runs on both NVIDIA (CUDA) and AMD (ROCm).

November 1, 2025

How to Install & Run JanusCoderV-8B Locally?

JanusCoderV-8B is an 8B multimodal code-intelligence model from InternLM’s JanusCoder suite, built on InternVL-3.5-8B. Trained on JANUSCODE-800K, it unifies visual + programmatic inputs to generate and edit code for charts, interactive web UIs, and animation logic. It supports image-conditioned code generation, visual-grounded edits, and long outputs (demo shows max_new_tokens up to 32K) using standard Transformers (≥ 4.55.0) with AutoProcessor + AutoModelForCausalLM and remote code enabled.

October 31, 2025

A Step-By-Step Guide to Install & Run Kimi Linear

In an era where attention mechanisms are redefining efficiency in large language models, Kimi Linear emerges as a breakthrough innovation designed for extreme scalability without compromise. Built upon the novel Kimi Delta Attention (KDA) architecture, it reimagines how models process information across both short and million-token-long contexts. Unlike conventional full attention systems that buckle under long sequences, Kimi Linear offers a hybrid linear attention framework combining the precision of global attention with the blazing speed and memory efficiency of KDA. The results speak for themselves, it achieves 51.0 on MMLU-Pro (4k) while maintaining the same speed as full attention, and delivers Pareto-optimal 84.3 on RULER (128k) with a 3.98× speedup. Even more impressively, Kimi Linear pushes decoding throughput up to 6× faster and cuts KV cache requirements by 75%, making it one of the most efficient architectures for high-throughput, long-context reasoning.

See all posts

Ready to build
with us?

The ideal way for organizations young and old to ease their way into the distributed and affordable cloud at their own pace.

Stay Tuned!

Stay up to date with the latest updates, news, and hotfixes for our product.

NodeShift creates a vital link between developers and affordable cloud.

Switch theme

English (EN)
Arabic (AR)
Chinese (ZH-CN)
German (DE)
Korean (KO)
Russian (RU)
French (FR)
Spanish (ES)
Portuguese (PT)
Japanese (JA)

JavaScript is disabled in your browser. For a better experience, please enable JavaScript.Learn how to enable JavaScript.