Controlnet Quick Notes #1: Accurate Geometry w/ OpenPose & Depth Map Background

What you're about to read: A guide of a quick 1-step text-img generation using an OpenPose file and a background depth map in ControlNet for Automatic1111. This is easy as shit!

Txt-img output:

Lara in sweatpants striding through Croft manor main hall (she cannot find Winston, who she now suspects has accidentially locked himself in the freezer).

Requirements

ControlNet (follow the installation guide here (https://github.com/Mikubill/sd-webui-controlnet) if not yet installed.
Activate multi ControlNet in Settings -> ControlNet -> Multi ControlNet: Max models amount. I have this set to 4. Once you've set a value, you may have to restart Automatic.

Getting Pose & Background Ready

What you'll need:

An OpenPose file that reflects the character pose you have in mind. There is an ever growing pose library on this platform to pick from; or use the OpenPose Editor extension to detect a pose from an image file; or, if you're feeling creative, construct your own pose with the 3D OpenPose extension.
A background depth map. How to get one? You could use ControlNet to create a depth map of a background you like. What you'll get is an impressive yet likely (significantly) flawed approximation of 3D depth based on a 2D input image. I've recently sterted generating depth maps that accurately reflect depth from 3D objects in Blender (check this place for additions every now and then: https://civitai.com/models/130562/background-depth-maps), so this is what I'll choose from for this purpose.

You'll then need to match pose and background in photoshop (or one of the alternatives) from a compositional standpoint (unless they're already perfectly matched as they are). This may include cutting out a specific part of the depth map and/or a specific part of the pose. In this case, I'm choosing this depth map from Croft Manor main hall (Tomb Raider Underworld, I believe):

...before cutting out a section I like, in the aspect ratio I like, resizing it to 1152x768, and then flipping it horizontally:

After that, I'm cutting & resizing the file "3-pose" from "Walking Poses" (https://civitai.com/models/107427?modelVersionId=122035 -> from_front folder) from this:

...to this, to reflect where I want the character to appear in my 1152x768 txt-img generation:

(You can use your preferred photo editing software to overlay the 2 images to find the exact spot where you want your character to be placed)

ControlNet Parameters

ControlNet Unit 0:

Drag & drop the Depth Map here.
Check "Enable", Control Type "Depth", set Preprocessor to "none", Model should be "control_v11f1p_sd15_depth" or whichever version you're using. Leave everything at default (Starting Control Step 0, Ending Control Step 1, Control Mode "Balanced", Resize Mode "Crop and Resize") except for Control Weight. Control Weight should be somewhere between 0.3 and 0.5 (of course you can experiment with higher/lower values too). I've only tinkered around some with the weights and there is no definite set answer, as the "sweet spot" where the models respect silhouette and detail of the posed character while also maintaining background structure from the depth map likely depends on the level of contrast and prominence of structure and objects in the depth map. If Control Weight is set too low, your background may lose some of its initial geometry. If Control Weight is set too high, a box, a chair or a plant from "behind" the posed character may be brought to the forefront and could be attached to the character as some object that you don't want to have there, or a part of the character could be cut off entirely. I like to start with Control Weight at 0.4.
There is definitely a strong effect of modifying contrast and brightness of your depth map. The fewer contrast there is, the more the model will take over in the absence of structure.
(I want to keep this tutorial at a minimum number of steps, but you can definitely avoid some of these issues by outlining and filling out the shape of your character -- typically in something close to white, if the character is at the front of the image -- in the depth file.)

ControlNet Unit 1:

Drag & drop the OpenPose file here.
Check "Enable", Control Type "OpenPose", set Preprocessor to "none", Model should be "control_v11p_sd15_openpose" or whichever version you're using. Leave everything at default (Control Weight should be at 1, Starting Control Step 0, Ending Control Step 1, Control Mode "Balanced", Resize Mode "Crop and Resize").

Prompting

A couple of things to pay attention to:

Bring objects/surfaces that are dominant or should show prominently in the background close to the front of the prompt and/or apply stronger weights. Accessoires deep in the background, such as books or candles in my example, should have lower weights.
As you tinker with the depth model Control Weight, your prompt can counter some of the distortions the background depth map can generate on your character. For example, if your character is walking down stairs, these prominent horizontal lines in the depth map (the stairs) might lead to a thigh strap, cropped off pants or a knee brace on your posed character. If, for example, you put "thigh_strap" into the negatives and give "sweatpants" (to use my example) a stronger weight in your prompt, you could avoid having to lower Control Weight on the depth map too much. Obviously this is an inexact science.

Image Generation Parameters

I haven't come across anything noteworthy here. Do as you always do and hit generate. Width and height should equal the aspect ratio of the ControlNet pose and depth images.

Example Generation

Prompt:

<lora:strankorbensen:0.4>, (laracroft:0.3), lara_croft, brown_hair, (walking:1.5), (grey hoodie, sweatpants:1.3), (parted_lips, angry, groin:1.2), (indoors ancient hall, stone floor, stone walls, ancient british architecture:1.2), (couch, chair, table, chandelier, bookshelf, stairs, plant, books:0.8), intricate, masterpiece, best quality, highly detailed, brown hair, (action scene), dreamlikeart, freckles, nostalgia, sexy, asymmetrical_hair, absurdres, enormous hair, messy hair, 1girl, wide hips, messy_hair, thick_eyebrows, brown eyebrows, thick_lips, long_hair, head_tilt, depth of field, dynamic pose, dramatic angle, unreal engine, 8k, highly detailed, photo, photorealistic, hyperrealistic, cinematic lighting, cinematic composition, beautiful lighting, sharp, details, hdr, 4k

Negative prompt: (bare_shoulders, cleavage, bad-picture-chill-75v:1.7), big tits, big breasts, abs, muscles, cartoon, ((disfigured)), ((bad art)), ((deformed)), ((extra limbs)), ((close up)), ((b&w)), weird colors, blurry, (((duplicate))), ((morbid)), ((mutilated)), [out of frame], extra fingers, mutated hands, ((poorly drawn hands)), ((poorly drawn face)), (((mutation))), (((deformed))), ((ugly)), blurry, ((bad anatomy)), (((bad proportions))), ((extra limbs)), cloned face, (((disfigured))), out of frame, ugly, extra limbs, (bad anatomy), gross proportions, (malformed limbs), ((missing arms)), ((missing legs)), (((extra arms))), (((extra legs))), mutated hands, (fused fingers), (too many fingers), (((long neck))), Photoshop, video game, ugly, tiling, poorly drawn hands, poorly drawn feet, poorly drawn face, out of frame, mutation, mutated, extra limbs, extra legs, extra arms, disfigured, deformed, cross-eye, body out of frame, blurry, bad art, bad anatomy

Steps: 35, Sampler: DPM++ 2M Karras, CFG scale: 8, Seed: 405807476, Face restoration: CodeFormer, Size: 1152x768, Model hash: 542e4b0dda, ControlNet 1: "preprocessor: none, model: control_v11f1p_sd15_depth [cfd03158], weight: 0.4, starting/ending: (0, 1), resize mode: Crop and Resize, pixel perfect: False, control mode: Balanced, preprocessor params: (512, 100, 200)", ControlNet 2: "preprocessor: none, model: control_v11p_sd15_openpose [cab727d4], weight: 1, starting/ending: (0, 1), resize mode: Crop and Resize, pixel perfect: False, control mode: Balanced, preprocessor params: (512, 100, 200)", Lora hashes: "strankorbensen: 166430eecbea", TI hashes: "laracroft: 7c921eb41fe6", Version: v1.5.1

Model: Comimicry (https://civitai.com/models/105172)

Output grid of 9 images: