ghost
Sign In

Stable Diffuson - SadTalker Extension - Make your picture speak

Stable Diffuson - SadTalker Extension - Make your picture speak

Objective

Make your picture speak with the SadTalker extension

Environment

For this test I will use:

Installing SadTalker Component

Go to Extension

Select Install from Url

Write https://github.com/OpenTalker/SadTalker.git

Click Install

After the process you will notice the new component in the installed tab(1).

(2) Press Apply and restart UI

This will restart Stable Diffusion

It will write during the restart:

Installing requirements for SadTalker

SadTalker will not support download all the files from hugging face, which will take a long time.
please manually set the SADTALKER_CHECKPOINTS in webui_user.bat(windows) or webui_user.sh(linux)

Close "Stable Diffusion" and edit your webui_usr.bat bat file

Add the row : set SADTALKER_CHECKPOINTS=your folder

Set the path of your choise.

Then download the models from :

https://github.com/OpenTalker/SadTalker?tab=readme-ov-file

Unzip in a folder you defined before.

Using SadTalker

After restarting Stable Diffusion you will have now a new tab called "Sad Talker"

On the left you have to select and image and a voce (mp3 or wav file).

Generating the mp3

You can register your voice or take other, I will generate the mp3 with one of my tools

https://misterm.itch.io/mr-text-to-speech

It uses OpenAI API technology (you need an API code to use it).

The tools is quite easy.. you write your text, select a voice and press generate.. it will generate a mp3 file after few seconds.

You can also save the text and m3 generated and manage them.

I will write this text (attached you find the voice.zip file)

Welcome to today's tech news update. In a groundbreaking development, the tech world is abuzz with the latest release of the Stable Diffusion plug-in, dubbed SadTalker, which revolutionizes the way we communicate. This innovative tool enables users to articulate their thoughts and express emotions through speech in a more nuanced and authentic manner than ever before. Stay tuned as we delve deeper into how SadTalker is reshaping the landscape of spoken communication.

I have generated an image for a TV studio

TV woman sexy dress, presenter on TV, sitting speaking abou the news
Steps: 20, Sampler: Euler a, CFG scale: 7, Seed: 541953647, Face restoration: GFPGAN, Size: 768x512, Model hash: e73d775ff9, Model: theTrualityEngine_trualityENGINEPRO, Denoising strength: 0.7, ENSD: 31337, Token merging ratio: 0.1, Token merging ratio hr: 0.1, Hires upscale: 2, Hires steps: 33, Hires upscaler: Latent, Eta: 0.5, Version: v1.7.0

1° TEST

Now I copy and paste the image on the Sadtalker Image and also I select my mp3 file

Then leave the default options, Set the flag GFPGAN as Face enhancer

Press Generate

Whit these option you will get only the cropped fase.. You can check the attached file face.zip with the mp4.

To check where are the files you can look at the console.

There are the path for the files.

Normally is the folder xx:\StableDiffusion\stable-diffusion-webui\results\

Depending of the length of the mp3 the generation can take long time.

If you access to the result folder you will fine a new folder , there are all the files used during the process. Mp3 , video of only the face, video with voice.

Whe it finish you get the full video.

Attention the flag GFPGAN as Face enhancer it makes the generation very slow because if process twice everything.

When it finish

Check out the output at : https://civitai.com/posts/1378394

Sometimes you can notice that the procedure only process a square of the face. It is not perfect.

I hope you enjoy the article. Let me know if you have any question in the comments.

8

Comments