My Upscaling Workflow

Here I walk you through my current upscaling workflow, i.e. how I change images prompted at 512x512 into finished 2048x2048 images.

At lot of details, especially appropriate denoising strengths, will vary with different setups and as new and better options become available.

Note that to use ControlNet, SD Ultimate Upscaler and 4x-UltraSharp, you need to install/copy the appropriate files first. If you are new to inpainting, you may want to read or watch a tutorial first.

1) Prompt base images

I start with simple prompts and add and tweak prompts and weights (of keywords and LoRAs) several times until I have a small collection of interesting results. These are always in 512x512 pixels. (For more information on my setup and approach to prompting, see my other articles.)

2) Select images for upscaling

I primarily look for interesting compositions, poses, backgrounds and good lighting. Except for extreme close-ups, I'm not worried about the face (which tends to get improved automatically during upscaling and parts of which I will always inpaint at the end anyway). Pictures with abnormal anatomy are sometimes worth trying to salvage. Pictures without hands often require no tweaking at all.

3)Tweak image before upscaling

If there are hands in the picture, I almost always inpaint them. For this, I use "send to inpaint" from the "Infinite image browsing" tab. Note that if you use the "send to" function, this will import the image's prompt, sampler, sampling steps, seed and resolution. The last point is especially important. I usually try to finish images in one go, but when sending a picture that has already been upscaled, it's important to reduce the resolution back down to 512x512. Inpainting at higher resolutions tends to give worse results, is very VRAM intensive and hard to cancel once in progress. In other words, it's a waste of time.

The one exception I've found is when inpainting faces in Cyberrealistic at image size 2048x2048. Here inpainting at 1024x1024 seems to give better results. Note that results vary with the ratio of resolution to mask area (higher resolutions may be required for larger areas).

To improve hands, I add the following at the end of the regular negative prompt:

, bad anatomy, disfigured, mutated, ugly, poorly drawn hands, malformed hands, mutated hands and fingers, bad hands, missing fingers, bad hands, fused hand, missing hand, missing fingers, fused fingers, abnormal hands, abnormal fingers

This seems to improve results slightly for me. After copy and pasting this, I cut the regular prompt (the pure text prompt, excluding LoRAs) and type "female hand". I also reduce or remove LoRAs that are not relevant for hands (such as outfit LoRAs).

Under "Inpaint area", "Whole picture" needs to be changed to "Only masked". I randomize the seed, but leave all other settings the same, apart from denoising strength.

If the anatomy is basically correct, I use denoising strength 0.3 to smoothen the hands out. If the anatomy needs fixing I start at 0.4 and go higher (sometimes in increments of 0.05). I then look at the results and select which one looks the best. SD likes to rotate and invert (not to mention mutate) hands in ways that sometimes looks better but usually looks worse. At higher denoising strength, hands will grow or shrink too much and at some point, faces or other hands will spawn.

Whichever picture looks the best is sent to inpainting again, usually at about the same denoising strength until the result looks "good enough". Most of the inpainting is done at 0.3, 0.35 and 0.4.

(Note that after dragging a new image into the inpainting area, I click the "remove mask icon", because sometimes Automatic1111 likes to leave the original mask in place, which will lead to confusing results.)

Often, in between inpainting passes, there will be little flaws (e.g. extra finger tips or floating nails). To remove those, I drag the image into Lama Cleaner, which I also use to remove watermarks, pimples and warts, as well unidentifiable or "misleading" objects, such as some light flecks or reflections. The higher the denoising strength, the more unfortunately "creative" SD will get with those. Often I only notice those problematic parts after upscaling and either try to fix them in the upscaled image or go back, remove them in the base image and repeat the upscaling process.

If the anatomy is really messed up (a third leg, for example), I just smear that entire area several times and sometimes try to fix it by inpainting the original or a new object (like a pillow to cover up where the third leg used to be). It's hard to prescribe a workflow here. You just have to experiment with both Lama Cleaner and inpainting to see how these interact.

On rare occasions, I'll manually add tiny details to the base image. I've recently installed the Photopea Extension for Automatic1111, but haven't gotten used to it yet, so I still use MS Paint for these tasks. SD likes to leave bra straps, necklaces and similar details "half finished". If I just draw a line, 1-2 pixels wide, in the same color (using the pipette tool), SD will turn it into the correct object over the course of upscaling. Inpaint Sketch is supposed to achieve this, but I find it unpredictable and unreliable.

Obviously, there is no single workflow for this whole step. I use the above tools in different combinations depending on how the image needs to be improved.

4) First upscaling pass

Once the base image is ready, I send it to img2img (from the "Infinite image browsing" tab or by clicking the "img2img" tab if I have been using inpainting). I drag the most recent version of the base image (i.e. the one I want to use) into the img2img window (replacing any previous version that may be there) as well as into the ControlNet window (I always make sure these two are identical).

As mentioned above, I double-check whether the resolution is at 512x512.

I randomize the seed and set denoising strength to 0.3, 0.35 or 0.4. Rarely, I will try to push denoising strength as far as it will go without "breaking" the image to get the coolest details possible. How high that number is varies from picture to picture. When upscaling close-ups of faces, escpecially in Cyberrealistic and/or if I want to preserve subtle details in the facial expression, I might use 0.25, 0.2 or even 0.15 or 0.1. This tends to just "stretch" the image, though, without improving it.

Under "ControlNet", I click "Enable", "Tile" and "ControlNet is more important".

Under "Script", I select "Ultimate SD upscale", "Scale from image size" ("Scale:2" being selected by default) and "4x-UltraSharp".

I paste in the original positive prompt (having cut it when inpainting hands) and modify it. I remove all parts of the prompt that were there in order to force a particular composition, in other words references to things that needed to be in the picture and have been added successfully. This generally only leaves the style ("glamour photo"), the name of the subject and often general mood. I then add references to aspects I think will get lost but want to see preserved or details I want to see changed, "smile", "red lips" or different haircolor, for example. To know how much this affects the result (and which weights to use accordingly) requires some experimentation.

I delete the ", bad anatomy etc." prompt added when inpainting hands, as well as other parts of the negative prompt that were there in order to force a particular composition. For example, if nudity has been averted in the base image, I can delete "nude" from the negative prompt at this stage (with rare exceptions).

Then, I click "Generate". This first pass (from 512x512 to 1024x1024) takes about 20s on my PC (GeForce RTX 3060 with 12GB VRAM).

I go over the result: If there are significant mutations other than of hands (weird muscle structure, inverted elbows etc.) or if the details are in other ways "too much", I just run it again at a lower denoising strength. If the image is too bland, I try a higher denoising strength. Currently, I tend to just start with 0.3 and use the result right away.

5) Tweak image again

No matter how good they looked before upscaling, hands will almost always be messed up again, but generally less so than if I hadn't tweaked them at all. I repeat the inpainting/Lama-cleaning described above, except that now (at higher resolution) I expect better results. Whatever the look I settle on here will be almost identical to the final look.

Note that it's important to untick "Enable" under "ControlNet" and to select "None" under "Script", when inpainting in between upscaling steps.

I rarely bother inpainting faces at this step, because I have found that this tends not to change the result of the final inpainting that I do anyway.

Rather than making sweeping changes to an upscaled image, I prefer to go back to the base image and tweak that. However, there are some problems that get introduced by upscaling that cannot be avoided by changing the base image. Those are better addressed by using a lower denoising strength and/or inpainting after upscaling.

6) Second upscaling pass

I repeat the exact steps under "First upscaling pass" (again dragging the most recent image into both img2img and ControlNet), except at denoising strength 0.15 (rarely at 0.1). This will primarily double the resolution (from 1024x1024 to 2048x2048) and change only very minor details.

This second pass takes about 40s on my PC.

7) Inpaint face

Very rarely I will go over the hands again (inpainting and/or Lama Cleaner), say if the nails have changed in some annoying way.

However, I now always inpaint eyes, mouth and often ears.

Sometimes, I mask eyes and ears simultaneously, but almost always do eyes and mouth separately (sometimes I inpaint the whole face, but then usually before the first upscaling pass).

As mentioned above, results will vary depending on mask area size. (I'm talking about absolute pixel size, so the smaller the face in the picture, the smaller the mask area will be.) Generally, the smaller the better, but sometimes too small can give too "intense" results (with the resulting resolution being too high or details too stark). With rare exceptions, it is better to inpaint both eyes at the same time (trying to inpaint a single eye will usually create mismatches).

For eyes, I use denoising strength 0.4 and for mouth 0.3 or 0.25. I keep the prompt as "photo of {subject}" or something similar, unless I want to change the eye color in which case I prompt "blue eyes", for example.

I may send the result to Lama Cleaner to remove some last minor blemishes. After that, as far as I'm concerned, the image is done.

8) (Optional) Third upscaling pass

If I am especially happy with all the details, I might upscale the image one last time, from 2048x2048 to 4096x4096, at denoising strength 0.1.

This third pass takes about 5min on my PC.