Sign In

Geeky Kokoro TTS Animation Station

6
Geeky Kokoro TTS Animation Station

Hey Ghosties!

Geeky Ghost just dropping in to explain the workflow a little bit. I use Geeky Kokoro TTS, Sonic, Flux Schnell, Live Portrait, and Hunyuan i2v to make short videos of people talking with some additional animation. It's a work in progress, but currently makes some cool results. Let's just dive right in shall we?

Step 1) We generate an image to start with, I personally use Flux Schnell, well a merged and customized checkpoint version of Flux Schnell. You are free to use what ever you like. If you want, you can find my flux checkpoint here.

https://civitai.com/models/1324705/geeky-flux-schnell-tweaked-and-merged-model

Now we have our Starting Image using the prompt

Side view close-up of a woman in a business suit walking while looking directly at you. Woman with brown hair and brown eyes in a blue business suit and looking directly at you. She has her hair up in a high ponytail. She's in her 30's. She's on the left side of the scene walking to the right. The background is a long side view of a hallway in an office building. The hallway goes from left to right. Half body shot.

Cinematic, detailed, crisp and clear photo quality image.

Step 2) Generating a video using Hunyuan Image2Vid model.

Now we have our video using the prompt

Side view close-up of a woman in a business suit walking while looking directly at you. Woman with brown hair and brown eyes in a blue business suit and looking directly at you. She has her hair up in a high ponytail. She's in her 30's. She's on the left side of the scene walking to the right. The background is a long side view of a hallway in an office building. The hallway goes from left to right. Half body shot.

Cinematic, detailed, crisp and clear photo quality image.

https://civitai.com/images/62318812

Step 3) Next we generate the voice. I like doing a mix of Heart and Nicole at a 0.5 blend Ratio.

Now we have our voice.

Step 4) Now we make the lip sync video.

I use Sonic to make the lip sync, loading the audio and image we created in previous steps.

https://civitai.com/images/62318784

Step 5) Putting it all together with Live Portrait.

Using the Sonic Video as the driver and the Hunyuan video as the source, we run it through live portrait to merge the two. We have to cap the frame count of the source video to that of the driver for live portrait to work. It also doesn't like distant faces, so keep that in mind.

https://civitai.com/images/62320624

Step 6) Full Video

We concatenate the Live portrait video with the left over frames from the Hunyuan video to create the full video.

The Full Workflow

https://civitai.com/models/1325787?modelVersionId=1508281

6

Comments