Me: I made a more detailed article about Underpainting!
I had a stroke of creative mood, that made me feel like writing up 'Underpainting Method' in more detail. With more examples and some personal tips for anyone willing to dabble in forbidden arts of 'no prompting'.
Obviously, this is all subjective, as diffusion technology is rapidly growing and expanding, making increasingly harder to keep up with everything new. Following write up, is how I feel things work, if it works differently for you - your choice of settings and models - that's perfectly normal and a-ok.
Let's get to it.
First things first, you need to generate 'latents', or 'underpainting' as I call them. Good latent is not only half the picture, but can also be a starting point to many pictures to come.
Not every model can produce good unfinished versatile abstract latents. Stable Diffusion is evolving towards models that are super eager to provide you with a complete coherent image in as little steps as possible. That does not quite suit our case of use. We would have to denoise those finalized latents at very high values to make them change composition. On top of that, not every model is willing to produce abstract imagery. Popular models try and inject characters into images even with no mention of such in the prompt.
The scope of Stable Diffusion models is getting extremely wide, so I, in no way, would be able to provide you with a complete summary of all good and bad models/options for various steps of this process. So, instead, I will show a few examples of outputs that are preferred and not, leaving the rest up to you. Experiment with different models to find the ones you like!
Underpainting can be 'about' many things - landscape, characters, animals. They can all work - img2img is very versatile and will produce pictures from anything. We, most importantly, want utility and reusability. Abstract underpaintings (with nothing but vague concepts in them) help us achieve that.
There're hundreds of keywords one can use to generate something abstract. I'm not even going to pretend like I know the best ones. But I did prepare some for anyone to start with (like 20! wow... I know). Use any number you want, but make sure you use a bunch.
Abstract keywords - will take your image into weird territory. They are quite important!
abstract, surreal, extravagant, exaggerated, fantasy, esoteric, cryptic,
Composition keywords - alter composition of your image towards less static and usual ones.
dramatic, dynamic, intense, powerful, epic,
Color keywords - these will define color of your composition. In latentland, whether you denoise into an image of splashing water or burning fire will often be decided by latent's color.
vivid, vibrant, colorful, color scheme [color name], selective color [color name], [color name] contrast, neon, [color name] color explosion, prismatic, obscure, bright,
Quality keywords - are optional, but might spice things up.
best quality, masterpiece, overdetailed, detailed art, cinematic,
Medium keywords - we want creative mediums that provide a lot of variety. For example, 'photography' would be a weak choice in this case (Edit: or maybe not, who's to tell).
hyperrealistic, realistic art, fantasy concept art, fantasy art, character concept art, digital art, digital painting,
Substance keywords - our abstraction has to be made of something - these are the keywords that are flexible enough to form various abstract shapes and compositions. That's just something I came up with. You can add your own!
fire, flames, cinders, sparks, embers,
smoke, mist, clouds, haze, fog,
flowers, leaf, roots, petals, vines,
splashes, water, drops, bubbles, flow, waves, liquid,
energy, explosion, magic,
wind, snow, rain, lighting, thunder, storm,
rocks, stones, gems, crystals, glass shards,
stars, nebulae, galaxy, moon, sun,
glitter, shine, glow,
metal, gold, silver,
Subject keywords - these are entirely your own. Not necessary, but will steer your latent towards more consistent results you might desire.
As everyone knows by now, keywords have varied 'power level' per model. Don't dwell too much on whether keywords you selected will have any effect at all or not. Just throw a bunch in and click generate. Go for lengthy prompts with plenty of weirdness and randomness.
Couple things to watch out for.
Keywords that add landscapes or horizons to your latent. Denoising latents that have a clear half-and-half split of brown and blue color (ground and sky), will often result in samey images.
Keywords that add characters, when you didn't ask for any. A human figure outline could be detrimental to your latent's creativity, if it's not something you're looking to generate.
Both of these often occur from keywords that are indirectly, semantically related to other keywords. Keyword 'tree' will likely bring 'ground' to the image. Keyword 'muscle' might add a 'male character'.
Let's compose a prompt using above keywords!
What comes first, has the most weight. Make sure to place the most important keywords in the front. All the fluff can be left in the back, ordered randomly.
Protip: If model of your choice, stuggles with abstraction, try placing Abstract keywords in the front of the prompt.
flames, flowers, clouds, magic, glitter, shine, stars, abstract, surreal, extravagant, esoteric, cryptic, dramatic, dynamic, vivid, vibrant, colorful, blue and black contrast, best quality, masterpiece, overdetailed, detailed art, cinematic, hyperrealistic, realistic art, fantasy concept art, fantasy art, digital art, digital painting
The Underpainting Model
Generation settings depend on the model you are using. But here are some general guidelines to start with.
Use quick sampler. We want many underpaintings to play around with. Euler A is perfectly fine fast sampler.
Lower CFG below default 7. 3 or 4 should work for most models. We want a colorful muddy mess!
Start at 10 Steps and adjust them as you see fit. We don't want composition to take shape with details - only vague outlines.
Protip: Try different aspect ratios for your underpaintings! Wide or tall images make it pop (especially on Civitai :D ).
Protip2: If your underpainting ends up with colors muted too much (because of low CFG or steps), you can always adjust saturation up in image editing software.
Following is the example of poor latent made with Dreamshaper v8. It's a very good, consistent and flexible model, but even at 5 Steps and CFG of 3, it manages to add sharp detailed female face into the image with no character keywords in the prompt at all. Further denoising that, will only result in more of the same.
This, on the other hand, is your good old Stable Diffusion v1.5 showing youngsters how it's done! Abstract and unique composition, that can be denoised multitude of times for different result each. Steps had to be amped up to 20, to make sure our Substance keywords pop.
Next, I adjusted the prompt to contain a male character subject.
medium full shot of handsome shirtless guy mage casting magic spell, flames, flowers, clouds, magic, glitter, shine, stars, abstract, surreal, extravagant, esoteric, cryptic, dramatic, dynamic, vivid, vibrant, colorful, blue and black contrast, best quality, masterpiece, overdetailed, detailed art, cinematic, hyperrealistic, realistic art, character concept art, fantasy concept art, fantasy art, digital art, digital painting,
Following are good Stable Diffusion v1.5 latent results (I lowered Step count to 15, to make it blurrier to leave some space for interpretation). Keep in mind, Stable Diffusion v1.5 was trained on 512x512 images, so making long/wide images will make multiple bodies appear.
I provide you with a list of few archaic models to try (things do age quickly in AI world), that I found to produce good enough abstract latents to play around with.
Stable Diffusion 1.1
Stable Diffusion 1.5
DPO SD 1.5 based (very recent, but very cool abstractions)
Anime style models will make very defined and pre-composed underpaintings. They understand abstract very well and are just as good to use as a base, but might be a bit harder and less versatile to denoise. In the end, it's a personal preference.
Here are example latents made by Dreamlike Anime v1.
All in all, you can use ANY model to make an underpainting, who's to judge? You can use detailed underpaintings like the ones Dreamshaper produces, if you like. After you make refining denoise passes, original image will likely be unrecognizable.
Simply put, it's just a 2-step image generation, that makes it impossible for filthy hobbits to replicate your precious pictures!
(Just don't use same model twice, that's dumb)
The Refining Model
Now that we have the underpaintings, it's time to (maybe) turn them into something pretty!
First of all, find a good coherent versatile model. This might not be trivial, if you only generate casually.
What you're looking for, is a model that can do abstract imagery, variety of styles, variety of subjects (animate or inanimate), backgrounds and makes it all look good on top of that (you will want those hands to work at least half the time).
Models that have only photo portraits of pretty women in their previews are a hard sell.
A few personal observations that may or may not be true:
Anime models from Anything and OrangeMix linage are often a sure bet, as they understand abstract very well, and compose figures quickly in plenty of poses and action scenes.
2.5D models are a hit/miss. You will have to test. I personally, subdivide them into 'western 2.5D' - based off of photorealism and traditional fantasy, 'eastern 2.5D' - bigger eyes and more cartoony looking. There's a spectrum of mixes of either type, but in general, 'eastern 2.5D' works better with abstraction, because them all had anime models mixed in somewhere very far down the line of ancestry.
Models that do only single style - strict Niji models for example - while a part of 'eastern 2.5D', are generally a weak choice. They will generate pictures at very high denoise, and will mostly be static and uninspiring.
Don't dis photorealism models. Even though, a lot of them default to photos, some are based off of versatile all-around models that can do many styles and mediums.
It's a guessing game. Some models might look attractive and crisp on the surface, but are shallow and uncreative underneath. You won't know until you try and run a vague underpainting through a model.
Open img2img tab of your Automatic1111 or unravel your img2img meatcube spaghetti in ComfyUI.
Choose a 'converging' sampler (this is not conventional naming, btw) - the kind of sampler that eventually lands on a single definitive result. I use DPM++ 2M Karras.
30 Steps is my pick.
As mentioned in part 1 of this article, my favorite go-to refining model is Airfuck's Wild Mix. Here are a few examples of denoising the 'flowers' underpainting we made earlier at 0.4 and 0.5 value (no negative, and only 'guy' in positive prompt).
As you can see diffusion process takes the latent in wildly (HA!) different directions each time, all thanks to our custom crafted abstract latent!
It won't work every time. Denoise in batches and pick the ones that are going somewhere meaningful (if at all).
At this point, if you made something you like, you can grab it as a new underpainting and continue from there with a new Random Seed!
Or you can add a strong Negative Prompt (with Negative Embeddings and all) and run img2img this way. Oh the posibilities! Here are some results with a couple of good negatives at denoising 0.5.
Details are still muddy with a lot of garbage all over.
Next step is Latent Upscale. I (and most people) use Ultimate SD Upscale Script (in Automatic1111).
Lower denoising even further - 0.25-0.35 range.
Protip: Watch out with tiled upscale, as your prompt will get applied to every tile! At higher denoising it will generate a lot of garbage.
Finally, whatever weirdness or blurriness is still there after the upscale (most of the time, hands and faces), can be re-generated with Inpainting as many times as needed.
Editorial Note: I made a mistake and suggested to Inpaint after upscaling. It is possible, yes, but much easier to do before upscaling. At least for hands, feet and whatever-else-body-parts. Try to land on believable looking anatomy before upscaling. If not because it will take less time to inpaint smaller image, then because your model might struggle to inpaint coherently at higher resolution (since, majority were trained on 512x512 images). Face, on the other hand, you'll probably want to go over before and after upscaling, a lot of the time.
There're many ways to go about this process, by denoising multiple times and at varied strengths, using different negatives, adding positive prompt keywords or even using Loras.
Protip: Positive prompting an abstraction, from my experience, works a little different than normal - only whole things should be placed into prompt, with no descriptives. If you write 'pretty girl, blond hair', abstraction will denoise into a pretty girl and blonde hair growing all over the scene. Experience may vary of course.
For this part I made some example denoising with other models. I did not Upscale or Inpaint any - got a bit lazy - but it's still pretty clear, that those half-baked images would turn out even better with more refinement.
Here is the 'number eight' looking latent denoised with aZovyaRPGArtistTools v4. This model is not male centric like Airfuck's Wild Mix is, so my prompt 'guy' has a lot less power there. Instead, I'm getting ridiculously artistic landscape renditions (at 0.5).
This is one of the 'character' latents we made, being denoised by Dreamshaper v8, into imaginative dark fantasy scenes (at 0.4 and 0.5).
Following - MajicMix Fantasy v2 attempting to tackle one of the harder anime model based underpaintings (at 0.5 and 0.55).
Lastly, flat anime model Mistoon Anime v2 doing the second character latent we made (at 0.45).
Upscaled and Inpainted the last image as a demo.
A few closing words.
There's a bit of a finesse to Underpaiting. Ultimately, for best results, you want an image that's a combined result of 2 models. If your underpainting has just enough of abstract detail, it will denoise into something looking way outside of refining model's normal capabilities (unusual faces, detailed compositions, complex scenes)!
Denoise with too much strength and you will overcook the image, leaving almost no underpainting model's identity. You want the final image to be just the right amount of undercooked - al dente, if you please. It can be challenging, but images end up unique and exciting, if successful.
Once again, good luck and have fun!