How to Install & Run SoulX-Podcast-1.7B Locally?

by Ayush Kumar | November 5, 2025

Ready to build cheaper?

Custom CPU plans from as little as $0.012/hour.

SoulX-Podcast-1.7B is a podcast-style TTS model built for long, multi-turn, multi-speaker dialogs. It supports English, Mandarin, and several Chinese dialects (e.g., Sichuanese, Henanese, Cantonese), does zero-shot voice cloning from short reference clips, and exposes paralinguistic controls (like laughter/sighs) to make conversations feel natural over long durations. It’s aimed at generating full podcast episodes—complete with speaker changes, dialectal variation, and expressive delivery—while still running comfortably on a single modern GPU.

GPU Configuration

Tier / Use case	Precision	Min VRAM (approx.)	Suggested GPUs	Recommended settings & notes
Entry – quick trials & short mono lines (≤2–3 min)	FP16/BF16	8–10 GB	RTX 3060 12G, RTX 4060 8–16G, T4 16G	Single speaker, short prompts; keep reference clips clean; avoid batching; lower sampling rate if needed.
Standard – multi-speaker shorts, zero-shot cloning, 5–15 min	FP16/BF16	12–16 GB	L4 24G, A10 24G, RTX A5000 24G	Good balance for dialog scenes; enable BF16 if supported; limit concurrent speakers; moderate batching.
Pro – long-form podcast (30–60 min) with paralinguistics	FP16/BF16	24 GB+	RTX A5000 24G, RTX 6000 Ada 48G, A6000 48G	Longer turns and higher quality vocoding; larger text chunks per turn; safe headroom for caching and retries.
Studio – heavy batching, many speakers, tool/WebUI + background jobs	FP16/BF16	40–80 GB	A100 40/80G, H100 80G	Parallel episodes or aggressive batching; fastest turnaround; ideal for production queues.
Docker / vLLM runtime (optional)	FP16/BF16	+2–4 GB over baseline	Same as above	Container overhead + server; pin `--gpus all`; map models via volume to avoid re-download.

Resources

Link: https://huggingface.co/Soul-AILab/SoulX-Podcast-1.7B

Step-by-Step Process to Install & Run SoulX-Podcast-1.7B Locally

For the purpose of this tutorial, we will use a GPU-powered Virtual Machine offered by NodeShift; however, you can replicate the same steps with any other cloud provider of your choice. NodeShift provides the most affordable Virtual Machines at a scale that meets GDPR, SOC2, and ISO27001 requirements.

Step 1: Sign Up and Set Up a NodeShift Cloud Account

Visit the NodeShift Platform and create an account. Once you’ve signed up, log into your account.

Follow the account setup process and provide the necessary details and information.

Step 2: Create a GPU Node (Virtual Machine)

GPU Nodes are NodeShift’s GPU Virtual Machines, on-demand resources equipped with diverse GPUs ranging from H200s to A100s. These GPU-powered VMs provide enhanced environmental control, allowing configuration adjustments for GPUs, CPUs, RAM, and Storage based on specific requirements.

Navigate to the menu on the left side. Select the GPU Nodes option, create a GPU Node in the Dashboard, click the Create GPU Node button, and create your first Virtual Machine deploy

Step 3: Select a Model, Region, and Storage

In the “GPU Nodes” tab, select a GPU Model and Storage according to your needs and the geographical region where you want to launch your model.

We will use 1 x H100 SXM GPU for this tutorial to achieve the fastest performance. However, you can choose a more affordable GPU with less VRAM if that better suits your requirements.

Step 4: Select Authentication Method

There are two authentication methods available: Password and SSH Key. SSH keys are a more secure option. To create them, please refer to our official documentation.

Step 5: Choose an Image

In our previous blogs, we used pre-built images from the Templates tab when creating a Virtual Machine. However, for running SoulX-Podcast-1.7B, we need a more customized environment with full CUDA development capabilities. That’s why, in this case, we switched to the Custom Image tab and selected a specific Docker image that meets all runtime and compatibility requirements.

We chose the following image:

nvidia/cuda:12.1.1-devel-ubuntu22.04

This image is essential because it includes:

Full CUDA toolkit (including nvcc)
Proper support for building and running GPU-based models like SoulX-Podcast-1.7B.
Compatibility with CUDA 12.1.1 required by certain model operations

Launch Mode

We selected:

Interactive shell server

This gives us SSH access and full control over terminal operations — perfect for installing dependencies, running benchmarks, and launching models like SoulX-Podcast-1.7B.

Docker Repository Authentication

We left all fields empty here.

Since the Docker image is publicly available on Docker Hub, no login credentials are required.

Identification

Template Name:

nvidia/cuda:12.1.1-devel-ubuntu22.04

CUDA and cuDNN images from gitlab.com/nvidia/cuda. Devel version contains full cuda toolkit with nvcc.

This setup ensures that the SoulX-Podcast-1.7B runs in a GPU-enabled environment with proper CUDA access and high compute performance.

After choosing the image, click the ‘Create’ button, and your Virtual Machine will be deployed.

Step 6: Virtual Machine Successfully Deployed

You will get visual confirmation that your node is up and running.

Step 7: Connect to GPUs using SSH

NodeShift GPUs can be connected to and controlled through a terminal using the SSH key provided during GPU creation.

Once your GPU Node deployment is successfully created and has reached the ‘RUNNING’ status, you can navigate to the page of your GPU Deployment Instance. Then, click the ‘Connect’ button in the top right corner.

Now open your terminal and paste the proxy SSH IP or direct SSH IP.

Next, If you want to check the GPU details, run the command below:

nvidia-smi

Step 8: Install Python 3.11 and Pip (VM already has Python 3.10; We Update It)

Run the following commands to check the available Python version.

If you check the version of the python, system has Python 3.10.12 available by default. To install a higher version of Python, you’ll need to use the deadsnakes PPA.

Run the following commands to add the deadsnakes PPA:

apt update && apt install -y software-properties-common curl ca-certificates
add-apt-repository -y ppa:deadsnakes/ppa
apt update

Now, run the following commands to install Python 3.11, Pip and Wheel:

apt install -y python3.11 python3.11-venv python3.11-dev
python3.11 -m ensurepip --upgrade
python3.11 -m pip install --upgrade pip setuptools wheel
python3.11 --version
python3.11 -m pip --version

Step 9: Created and Activated Python 3.11 Virtual Environment

Run the following commands to created and activated Python 3.11 virtual environment:

python3.11 -m venv ~/.venvs/py311
source ~/.venvs/py311/bin/activate
python --version
pip --version

Step 10: Install Miniconda + Create Env

Run the following commands to install Miniconda:

cd /tmp && curl -fsSLo miniconda.sh https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash miniconda.sh -b -p $HOME/miniconda
eval "$($HOME/miniconda/bin/conda shell.bash hook)"

Then, run the following commands to accept the terms of miniconda:

conda tos accept --override-channels --channel https://repo.anaconda.com/pkgs/main
conda tos accept --override-channels --channel https://repo.anaconda.com/pkgs/r

Next, run the following commands to create & activate the environment:

conda create -n soulxpodcast -y python=3.11
conda activate soulxpodcast

Step 11: Clone the SoulX-Podcast-1.7B Repo

Run the following command to clone the SoulX-Podcast-1.7B repo:

git clone https://github.com/Soul-AILab/SoulX-Podcast.git
cd SoulX-Podcast

Step 12: Install Requirement & Dependencies

Run the following command to install requirements & dependencies:

pip install -r requirements.txt

Step 13: Install PyTorch for CUDA

Run the following command to install PyTorch:

pip install --index-url https://download.pytorch.org/whl/cu121 torch torchvision torchaudio

Step 14: Install Transformers and Hugging Face Hub

Run the following command to install transformers and huggingface hub:

pip install "transformers==4.57.1" "huggingface_hub<1.0,>=0.34.0"

Step 15: Download the Model Weights

Base model

Run the following command to install base model:

huggingface-cli download --resume-download Soul-AILab/SoulX-Podcast-1.7B \
  --local-dir pretrained_models/SoulX-Podcast-1.7B

Dialect Model (Sichuanese / Henanese / Cantonese etc.)

Next, run the following command to install dialect model:

huggingface-cli download --resume-download Soul-AILab/SoulX-Podcast-1.7B-dialect \
  --local-dir pretrained_models/SoulX-Podcast-1.7B-dialect

Step 16: Quick Test: Example Script (Dialogue Demo)

# Uses their ready-made demo
bash example/infer_dialogue.sh

That’s the basic “dialectal inference” entry point they provide. It should generate audio files under an outputs/ or similar demo path (check the script for exact paths).

Step 17: Run the WebUI (Easy Playground)

Still in the repo root:

Base model UI

python3 webui.py --model_path pretrained_models/SoulX-Podcast-1.7B

Dialect model UI

python3 webui.py --model_path pretrained_models/SoulX-Podcast-1.7B-dialect

This is documented in their README; you’ll get a local Gradio app to type multi-turn prompts, pick speakers, and render podcast-style output.

Step 18: Set up SSH port forwarding from your local machine

On your local machine (Mac/Windows/Linux), open a terminal and run:

ssh -L 7860:localhost:7860 -p VM Port root@Your_VM_IP

Step 19: Access Gradio WebUI in Your Browser

Go to:

http://0.0.0.0:7860/

Conclusion

SoulX-Podcast-1.7B brings a refreshing leap in long-form, natural speech generation — perfectly suited for podcast creators, storytellers, and research projects exploring expressive, multi-speaker audio. With just a single GPU, you can synthesize dynamic, dialogue-driven conversations across multiple dialects and voices, complete with laughter, sighs, and emotional nuance. Once installed, the model’s WebUI makes experimentation seamless — from monologue TTS to full podcast episodes. Whether you’re building interactive audio experiences or enhancing creative production workflows, SoulX-Podcast-1.7B turns your ideas into rich, lifelike soundscapes ready for the world to hear.

Relevant blog posts

November 4, 2025

How to Install & Run VieNeu-TTS Locally: The First Realistic Vietnamese Voice AI

The rapid evolution of Text-to-Speech (TTS) technology has finally reached a milestone for Vietnamese users with VieNeu-TTS, the first-ever Vietnamese TTS model capable of running entirely on personal devices. Fine-tuned from NeuTTS Air, this model brings hyper-realistic, natural Vietnamese voices that are generated instantly and locally, even on mid-range CPUs. At its core lies the Qwen 0.5B LLM backbone, striking a rare balance between speed, compactness, and exceptional sound quality. VieNeu-TTS isn’t just another open-source model; it’s a complete, privacy-first solution designed for real-world use in voice agents, virtual assistants, educational tools, interactive toys, and secure offline applications. With NeuCodec powering its audio generation and efficient formats like Safetensors and GGUF (Q8/Q4) enabling lightweight inference, VieNeu-TTS delivers real-time speech synthesis without draining your system’s power or requiring any GPU setup.

November 4, 2025

How to Install & Run GPT-OSS-Safeguard 20B and 120B Locally?

gpt-oss-safeguard is a pair of open-weight, safety-reasoning models built on the gpt-oss family and trained to interpret your own policy text, explain decisions with auditable reasoning, and let you dial up/down the reasoning effort (low/medium/high). The 20B variant targets 16 GB-class GPUs for low-latency filters and offline labeling, while the 120B variant is tuned for highest quality yet still runs on a single 80 GB H100 thanks to MoE + native MXFP4 quantization. Both follow the harmony response format (use it, or outputs will degrade) and ship under Apache-2.0 for flexible commercial use.

November 3, 2025

How to Install & Run AMD Nitro-E Locally?

Nitro-E is AMD’s ultra-light text-to-image diffusion family built on E-MMDiT (~304M params). It’s designed for fast, low-cost training/inference: the base 512px model gives strong quality in ~20 steps, while the distilled 512px variant can generate usable images in as few as 4 steps. There’s also a GRPO-tuned checkpoint for post-training quality/behavior tweaks. Code is plain PyTorch/Diffusers, so it runs on both NVIDIA (CUDA) and AMD (ROCm).

See all posts

Ready to build
with us?

The ideal way for organizations young and old to ease their way into the distributed and affordable cloud at their own pace.

Stay Tuned!

Stay up to date with the latest updates, news, and hotfixes for our product.

NodeShift creates a vital link between developers and affordable cloud.

Switch theme

English (EN)
Arabic (AR)
Chinese (ZH-CN)
German (DE)
Korean (KO)
Russian (RU)
French (FR)
Spanish (ES)
Portuguese (PT)
Japanese (JA)

JavaScript is disabled in your browser. For a better experience, please enable JavaScript.Learn how to enable JavaScript.