Sign In

Qwen3 TTS Voice Clone Workflow Guide

1

Jan 29, 2026

(Updated: 19 days ago)

workflows
Qwen3 TTS Voice Clone Workflow Guide

> ### 🚀 Stop Fighting with Python! (Cloud Experience)

> Forget about dependency hell and massive model downloads.

> Clone any voice 1:1 directly in your browser:

> 🎁 Instant Gift: Get 1000 credits upon signup—that's about 60 voice clones* on us!

> 📅 Daily Loot: Log in daily to grab 100 free credits*. Create every day for free.

> ⚡ Zero Friction*: Just upload an audio clip. Mobile or PC, everyone can be a voice master.

>

> 👉 [Click Here to Start Cloning Now](https://www.runninghub.ai/post/2014607135769894913/?inviteCode=rh-v1446)

---

Step-By-Step Guide

Qwen3 TTS Voice Clone Workflow Guide

Workflow description:

This workflow is powered by the latest Qwen3-TTS technology, delivering high-fidelity, low-latency voice cloning. It

integrates Whisper ASR to automatically extract text from your reference audio, ensuring the tone and emotion of the

cloned voice are perfectly matched. Whether for video dubbing or personalized AI voices, this "Super Fun" tool lets

you master any voice in seconds.

Prerequisites (For Local Users):

📦 Custom Nodes:

* ComfyUI-Manager (Highly recommended for auto-installing nodes)

* ComfyUI-Audio / AudioCrop (For audio preprocessing)

* ComfyUI-Whisper (For automatic reference text extraction)

* Qwen3-TTS-Nodes (Specialized package for Qwen3)

📂 Files:

* TTS Model: Qwen3-TTS-12Hz-1.7B-Base (in models/qwen3_tts)

* ASR Model: whisper-large-v3 (in models/whisper)

Recommended Settings:

* Reference Length: 3-10 seconds is recommended. Clearer audio yields better results.

* Max Tokens: 2048 to ensure long-form speech generation.

* Precision: fp16 is balanced for both speed and quality.

Node Group Explanation:

* Model Loader: Loads Alibaba's open-source Qwen3-TTS-1.7B, supporting multi-language and emotional synthesis.

* Audio Crop & Whisper: Automatically trims reference audio and transcribes it. This is the key to voice

cloning—automatic style extraction without manual typing.

* Voice Clone Core: The engine that fuses your target text with the timbre and emotion of the reference audio,

generating highly recognizable speech.

1