Sign In

Using Backgroundless Stock for Controlnet

Using Backgroundless Stock for Controlnet

Introduction

This is going to be a bunch of stuff you probably already know. But, you might not, so if you don't it could be useful.

Buts and Nolts

Sometimes I want a certain sort of pose that I have trouble specifying in a prompt. For that, there is controlnet. But I've been frustrated, at times, with the inputs into my controlnet: where do they come from?

Probably the best options (right now) seem to me to be: buy a doll, pose it against a mini sweep, photograph it, and there's your controlnet. That's super easy to do. Slightly more complicated is to get something like Daz3D Poser, and pose digital dolls, then render them or screenshot them, and there's your controlnet.

Now, let's move to simpler: pull stock photos from deviantart. For example, on my DA profile [https://www.deviantart.com/mjranum-stock] are thousands of pieces of stock photography, mostly shot against a neutral sweep. When I originally did them, years ago, I was producing them for photoshop artists as references or for photoshop compositing or paint-over. For example: [https://www.deviantart.com/mjranum-stock/art/Magical-and-Mystical-21-691708861]

You can load that directly into a controlnet and render from it, but - wait - there's more. Here are a few things I have found that make controlnet and stable diffusion happier.

It seems that removing the background helps. There are tons of ways of doing that, and let me emphasize, it does not have to be a clean, great job. Usually, I:

  • open photoshop

  • load the image

  • select/copy the whole image

  • paste it in

  • use magic selection tool to pick the background

  • hit delete

  • delete the background

  • save as PNG

That way the AI seems to do a better job of figuring out what's what. You can also use the "rembg" (remove background) extension in stable diffusion.

To make my life easier for myself, I decide in advance about what size I intend to render the image, and make a canvas that size, then do the copy/paste assembly.

This technique is closely related to "canny bashing", where you download the output from a canny preprocessor run, futz with it in photoshop, and then use that as your controlnet. (no preprocessor, just use the canny processor for the controlnet). This is a cool technique for more complex scenes, such as [https://www.reddit.com/r/StableDiffusion/comments/13ucpmn/the_throne_room_composite/] It's crazy how well this works: if you draw a child-style pair of boobies on a piece of stock and tell the AI you want boobies, it'll generate them perfectly. But this is a story for another time.

Here is what "rembg" did with the image above:

Whups, lost the sword hilt. So generate a canny and draw it in.

You get the idea! If you wanted her to be throwing a fireball, google image search for "fireball" and paste it in on the blank background, then there's your controlnet.

Worked Example

I find that when doing this, it works best to give the AI lots of rounds to sort out the image. Typically I use 70 or 80 sampling steps. Then a lower number on the upscaler, say 8 or 10, with denoise strength around 3 or 4. The upscaler is what takes all the time. When I am prototyping the image, I leave upscaling off, and do quick turn-around passes on the main image then upscale when I think I may have something.

Next I zap the background, or don't, or paint other junk into the background. Keep it simple. Then start layering the prompt onto it:

(masterpiece:1.3), beautiful german sniper woman, 18yo, tall, skinny. carrying sniper rifle. (highly detailed skin, long messy gray hair) ((cute)) (detailed face and eyes) (sexy) (perfect face) captivating smile. freckles on nose and cheeks.
-
((sci fi)), standing in post apocalyptic military bunker
-
(((gray combat helmet, gray combat boots, gray combat harness))), (combat camouflage paint). barrett .50. (white and gray) (wearing high tech skin suit, active camouflage:1.4). eyepatch.
-
(insanely detailed) (highest quality, 4k), photographed on a Sony Alpha 1, EF 85mm lens, f/2.8, hard focus, (photorealistic:1.4) sharp focus
-
(full length view:1.5)

I use the breakdown with dashes so it's easier for me to figure out what I am doing with the prompt. I often copy the main blocks, like the one that defines the image specs:

(insanely detailed) (highest quality, 4k), photographed on a Sony Alpha 1, EF 85mm lens, f/2.8, hard focus, (photorealistic:1.4) sharp focus

Since I use that everyplace, I just copy/paste. I have a standard negative prompt that I use.

Here's Another One

My prompt is:

(masterpiece:1.3), beautiful pirate, 18yo, tall, (buxom:1.3) blonde. with cutlass and corset. (highly detailed skin, long messy blonde hair) ((cute)) (detailed face and eyes) (sexy) (perfect face) captivating smile, open mouth, freckles on nose and cheeks.

-

holding a steampunk spyglass. ship is a modified shopping cart, on the high seas. background nautical caribbean, bright sunlight and clouds.

-

(insanely detailed) (highest quality, 4k), photographed on a Sony Alpha 1, EF 85mm lens, f/2.8, hard focus, (photorealistic:1.4) sharp focus

cinematic still from pirates of the caribbean

-

(perfect hands:1.3)

Set that as a controlnet and you get something like this:

That's It!

Let me close by encouraging you to look at stock images on deviantart. Also, you can google image search.

The last technique I haven't gone into, is to ask midjourney AI to make you an image!

Prompt:

a dynamic knife fighting pose, shaolin monk

Now mess with him in photoshop and he's your new controlnet

7

Comments