SoulX-Podcast-1.7B is a podcast-style TTS model built for long, multi-turn, multi-speaker dialogs. It supports English, Mandarin, and several Chinese dialects (e.g., Sichuanese, Henanese, Cantonese), does zero-shot voice cloning from short reference clips, and exposes paralinguistic controls (like laughter/sighs) to make conversations feel natural over long durations. It’s aimed at generating full podcast episodes—complete with speaker changes, dialectal variation, and expressive delivery—while still running comfortably on a single modern GPU.
GPU Configuration
| Tier / Use case | Precision | Min VRAM (approx.) | Suggested GPUs | Recommended settings & notes |
|---|
| Entry – quick trials & short mono lines (≤2–3 min) | FP16/BF16 | 8–10 GB | RTX 3060 12G, RTX 4060 8–16G, T4 16G | Single speaker, short prompts; keep reference clips clean; avoid batching; lower sampling rate if needed. |
| Standard – multi-speaker shorts, zero-shot cloning, 5–15 min | FP16/BF16 | 12–16 GB | L4 24G, A10 24G, RTX A5000 24G | Good balance for dialog scenes; enable BF16 if supported; limit concurrent speakers; moderate batching. |
| Pro – long-form podcast (30–60 min) with paralinguistics | FP16/BF16 | 24 GB+ | RTX A5000 24G, RTX 6000 Ada 48G, A6000 48G | Longer turns and higher quality vocoding; larger text chunks per turn; safe headroom for caching and retries. |
| Studio – heavy batching, many speakers, tool/WebUI + background jobs | FP16/BF16 | 40–80 GB | A100 40/80G, H100 80G | Parallel episodes or aggressive batching; fastest turnaround; ideal for production queues. |
| Docker / vLLM runtime (optional) | FP16/BF16 | +2–4 GB over baseline | Same as above | Container overhead + server; pin --gpus all; map models via volume to avoid re-download. |
Resources
Link: https://huggingface.co/Soul-AILab/SoulX-Podcast-1.7B
Step-by-Step Process to Install & Run SoulX-Podcast-1.7B Locally
For the purpose of this tutorial, we will use a GPU-powered Virtual Machine offered by NodeShift; however, you can replicate the same steps with any other cloud provider of your choice. NodeShift provides the most affordable Virtual Machines at a scale that meets GDPR, SOC2, and ISO27001 requirements.
Step 1: Sign Up and Set Up a NodeShift Cloud Account
Visit the NodeShift Platform and create an account. Once you’ve signed up, log into your account.
Follow the account setup process and provide the necessary details and information.
Step 2: Create a GPU Node (Virtual Machine)
GPU Nodes are NodeShift’s GPU Virtual Machines, on-demand resources equipped with diverse GPUs ranging from H200s to A100s. These GPU-powered VMs provide enhanced environmental control, allowing configuration adjustments for GPUs, CPUs, RAM, and Storage based on specific requirements.
Navigate to the menu on the left side. Select the GPU Nodes option, create a GPU Node in the Dashboard, click the Create GPU Node button, and create your first Virtual Machine deploy
Step 3: Select a Model, Region, and Storage
In the “GPU Nodes” tab, select a GPU Model and Storage according to your needs and the geographical region where you want to launch your model.
We will use 1 x H100 SXM GPU for this tutorial to achieve the fastest performance. However, you can choose a more affordable GPU with less VRAM if that better suits your requirements.
Step 4: Select Authentication Method
There are two authentication methods available: Password and SSH Key. SSH keys are a more secure option. To create them, please refer to our official documentation.
Step 5: Choose an Image
In our previous blogs, we used pre-built images from the Templates tab when creating a Virtual Machine. However, for running SoulX-Podcast-1.7B, we need a more customized environment with full CUDA development capabilities. That’s why, in this case, we switched to the Custom Image tab and selected a specific Docker image that meets all runtime and compatibility requirements.
We chose the following image:
nvidia/cuda:12.1.1-devel-ubuntu22.04
This image is essential because it includes:
- Full CUDA toolkit (including
nvcc)
- Proper support for building and running GPU-based models like SoulX-Podcast-1.7B.
- Compatibility with CUDA 12.1.1 required by certain model operations
Launch Mode
We selected:
Interactive shell server
This gives us SSH access and full control over terminal operations — perfect for installing dependencies, running benchmarks, and launching models like SoulX-Podcast-1.7B.
Docker Repository Authentication
We left all fields empty here.
Since the Docker image is publicly available on Docker Hub, no login credentials are required.
Identification
nvidia/cuda:12.1.1-devel-ubuntu22.04
CUDA and cuDNN images from gitlab.com/nvidia/cuda. Devel version contains full cuda toolkit with nvcc.
This setup ensures that the SoulX-Podcast-1.7B runs in a GPU-enabled environment with proper CUDA access and high compute performance.
After choosing the image, click the ‘Create’ button, and your Virtual Machine will be deployed.
Step 6: Virtual Machine Successfully Deployed
You will get visual confirmation that your node is up and running.
Step 7: Connect to GPUs using SSH
NodeShift GPUs can be connected to and controlled through a terminal using the SSH key provided during GPU creation.
Once your GPU Node deployment is successfully created and has reached the ‘RUNNING’ status, you can navigate to the page of your GPU Deployment Instance. Then, click the ‘Connect’ button in the top right corner.
Now open your terminal and paste the proxy SSH IP or direct SSH IP.
Next, If you want to check the GPU details, run the command below:
nvidia-smi
Step 8: Install Python 3.11 and Pip (VM already has Python 3.10; We Update It)
Run the following commands to check the available Python version.
If you check the version of the python, system has Python 3.10.12 available by default. To install a higher version of Python, you’ll need to use the deadsnakes PPA.
Run the following commands to add the deadsnakes PPA:
apt update && apt install -y software-properties-common curl ca-certificates
add-apt-repository -y ppa:deadsnakes/ppa
apt update
Now, run the following commands to install Python 3.11, Pip and Wheel:
apt install -y python3.11 python3.11-venv python3.11-dev
python3.11 -m ensurepip --upgrade
python3.11 -m pip install --upgrade pip setuptools wheel
python3.11 --version
python3.11 -m pip --version
Step 9: Created and Activated Python 3.11 Virtual Environment
Run the following commands to created and activated Python 3.11 virtual environment:
python3.11 -m venv ~/.venvs/py311
source ~/.venvs/py311/bin/activate
python --version
pip --version
Step 10: Install Miniconda + Create Env
Run the following commands to install Miniconda:
cd /tmp && curl -fsSLo miniconda.sh https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash miniconda.sh -b -p $HOME/miniconda
eval "$($HOME/miniconda/bin/conda shell.bash hook)"
Then, run the following commands to accept the terms of miniconda:
conda tos accept --override-channels --channel https://repo.anaconda.com/pkgs/main
conda tos accept --override-channels --channel https://repo.anaconda.com/pkgs/r
Next, run the following commands to create & activate the environment:
conda create -n soulxpodcast -y python=3.11
conda activate soulxpodcast
Step 11: Clone the SoulX-Podcast-1.7B Repo
Run the following command to clone the SoulX-Podcast-1.7B repo:
git clone https://github.com/Soul-AILab/SoulX-Podcast.git
cd SoulX-Podcast
Step 12: Install Requirement & Dependencies
Run the following command to install requirements & dependencies:
pip install -r requirements.txt
Step 13: Install PyTorch for CUDA
Run the following command to install PyTorch:
pip install --index-url https://download.pytorch.org/whl/cu121 torch torchvision torchaudio
Step 14: Install Transformers and Hugging Face Hub
Run the following command to install transformers and huggingface hub:
pip install "transformers==4.57.1" "huggingface_hub<1.0,>=0.34.0"
Step 15: Download the Model Weights
Base model
Run the following command to install base model:
huggingface-cli download --resume-download Soul-AILab/SoulX-Podcast-1.7B \
--local-dir pretrained_models/SoulX-Podcast-1.7B
Dialect Model (Sichuanese / Henanese / Cantonese etc.)
Next, run the following command to install dialect model:
huggingface-cli download --resume-download Soul-AILab/SoulX-Podcast-1.7B-dialect \
--local-dir pretrained_models/SoulX-Podcast-1.7B-dialect
Step 16: Quick Test: Example Script (Dialogue Demo)
# Uses their ready-made demo
bash example/infer_dialogue.sh
That’s the basic “dialectal inference” entry point they provide. It should generate audio files under an outputs/ or similar demo path (check the script for exact paths).
Step 17: Run the WebUI (Easy Playground)
Still in the repo root:
Base model UI
python3 webui.py --model_path pretrained_models/SoulX-Podcast-1.7B
Dialect model UI
python3 webui.py --model_path pretrained_models/SoulX-Podcast-1.7B-dialect
This is documented in their README; you’ll get a local Gradio app to type multi-turn prompts, pick speakers, and render podcast-style output.
Step 18: Set up SSH port forwarding from your local machine
On your local machine (Mac/Windows/Linux), open a terminal and run:
ssh -L 7860:localhost:7860 -p VM Port root@Your_VM_IP
Step 19: Access Gradio WebUI in Your Browser
Go to:
http://0.0.0.0:7860/
Conclusion
SoulX-Podcast-1.7B brings a refreshing leap in long-form, natural speech generation — perfectly suited for podcast creators, storytellers, and research projects exploring expressive, multi-speaker audio. With just a single GPU, you can synthesize dynamic, dialogue-driven conversations across multiple dialects and voices, complete with laughter, sighs, and emotional nuance. Once installed, the model’s WebUI makes experimentation seamless — from monologue TTS to full podcast episodes. Whether you’re building interactive audio experiences or enhancing creative production workflows, SoulX-Podcast-1.7B turns your ideas into rich, lifelike soundscapes ready for the world to hear.