WORKFLOW: Simple Comfy workflow to place different people in a specific environment (SDXL)

You already know Ken from my last post. There, he met Barbie. This is Barbie:

OK, that's not really Barbie, but Ken is colorblind. If you want, you can use the other workflow to reconstruct what Barbie looks like in a bikini.

Ken and Barbie want to meet here

to dance. Ken arrives first.

Barbie comes a little later.

They walk towards each other, start dancing, and sparks fly fiercely.

To get both into the ballroom, the following workflow is sufficient:

Bildschirmfoto vom 2025-11-22 12-12-50.png

As you can see, it's again just a single screen of nodes. Like before, the people are detected and masked with YOLO, and then ported into the image via IP-Adapter. In detail: Load the target image and the image with the desired people, which you can select via the picker. Set Save Image to Bypass and adjust the edges via Pad Image for Outpainting until the person fits well in size and position in the image. Since Save Image is inactive, only the control image is generated, and testing goes very quickly. The dimension of the second image doesn't matter. The nodes fix that for you.

When you're done, activate Save Image. The pipeline is now closed. The target image and the source image are transferred to latent space, the mask selects the desired part of the person image in latent space, and the IP-Adapter ensures in the sampler that everything fits. Use any SDXL model and Denoise ~0.1. With GrowMaskWithBlur, you can play around again until halos are gone and other details are still there.

When the image with Ken is ready, use it as the destination and insert Barbie in it.

Now they are both in the image, but their poses are sub optimale. For the second part, we simply use WAN-Video.

Bildschirmfoto vom 2025-11-22 14-17-36.png

They should dance, and off we go. We break down the MP4 into an image batch and pick out the desired final image.

Here again, all and especially WAN-video is Quick&Dirty with a stripped-down Q4-gguf version, because it had to go fast (the sparks between them were not intended, but a result of the haste). Of course, you can do all this much more carefully than I did here as a demo. You may upscale or refine the images or even use RIFE to produce a slow motion video.

If you want to try it out yourself, I recommend for your project: "Ken meets Bikini-Barbie at the beach."

This and the former workflow are:

100% automatic person detection (YOLOv8-seg)
Multi-person safe
Beginner-friendly (one click for the head when needed)
Built for SDXL (Juggernaut, RealVis, etc.)
Only require free custom nodes (Impact Pack + KJNodes)

Included Clean JSON workflow

Whether you’re composing scenes for storytelling, ads, or fun – this workflow makes it dead simple.

Ken (and Barbie) approve. Now it’s your turn.

Download the JSON, drop your own photos in, and watch the magic happen.

Have fun – and don’t forget the dance moves! 💃🕺