Sign In

From Noise to Illustrations: How I Generate AI Pictures with Stable Diffusion

17

Introduction

Hi everyone! It has been a while since I've started posting on CivitAI (since February of this year), and one of the question I get all the times is: "Why do my pictures look different than the examples?" or "I cannot replicate your example pictures.". In this article I want to explain my workflow and explain why I'm creating pictures this way.

Why?

tldr: I have fun creating my artworks, and I wanna show the full capabilities of my models

The purpose of this post is to share how my AI art creation process has evolved over time. AI art has undergone many changes since its inception, with new tools and techniques emerging constantly. My artworks reflect this evolution, as you can see from the difference between my first and latest models. In the beginning, I used to generate simple images at 512x768 resolution, without any further editing. Since then, I have created thousands of images and developed a more complex workflow, which I will explain in the following paragraphs.

Many creators publish models with unedited pictures for the sake of transparency. I respect their choice, but I think this does not reveal the full potential of a model. I know that many SD users prefer to prompt without using any image editing software, but if you compare the basic generations, you will see that they are very similar and do not show the full range of the model.

I want to clarify that this is only my opinion. It is not a universal truth and I do not expect everyone to follow it.

The generation process

When I generate a picture, there are different approaches I take:

  • I start from a handwritten prompt (this is something I do, only if I am working on a specific artwork)

  • I use my Ranbooru extension. This is what I mostly do, as it is the fastest way to create some interesting pictures to showcase my models.

  • I use controlnets combined with the Ranbooru extension. One of my favorite way of starting a picture is to use the IP-Adapters just to create the basic 512x768 generation

Let's have a look at 3 different examples generated with these approaches:

Approach #1

00297-551863209.png

The thumbnail for my Mistoon_Amethyst v2 model has been generated using a manual prompt:

girl,(hotify:1.2),sitting,chair,kitchen,food,table,short_hair,bob cut,hair clip,freckles,suspenders,red_hair,cleavage ,(masterpiece,detailed,highres:1.4)

You'll find these mostly on my older samples.

Approach #2

00003-733341738.png

This is one of the samples available for my Mistoon_Anime model, and here's the prompts:

detailed_background,indoors,aqua_hair,holding_phone,hairclip,hair_between_eyes,guitar,beanie,straight-on,looking_at_viewer,eyes_visible_through_hair,electric_guitar,closed_mouth,effects_pedal,headphones,upper_body,hat,hoodie,instrument,solo,aqua_eyes,sticker,hood_down,phone,parental_advisory,nail_polish,smartphone,cellphone,jewelry,1girl,white_headwear,bandaid_on_nose,patch,bandaid_on_face,short_hair,original,bandaid,highres,sidelocks,holding,grey_hoodie,circuit_board,long_sleeves,aqua_nails,necklace,standing,print_hoodie,ring,hair_ornament,yorugata_mao,hood,bandaid_on_cheek

As you can see from the prompt there are a lot of tags which are not even used inside of the picture. This usually means that I've used Ranbooru to get those from some booru (usually Gelbooru).

Approach #3

00173.png

This is the thumbnail for my Mistoon_Diamond model. This artwork uses the 3rd approach and took me about 6/7 hours to make. This is the original prompt:

hotify,from below,black hair,pink panties,wings,underwear only,multiple girls,white wings,underwear,highres,yuigahama yui,panties,yukinoshita yukino,blue bra,medium hair,pink bra,looking at another,2girls,pink choker,female focus,brown hair,blue panties,ahoge,cleavage,choker,matching outfits,medium breasts,eye contact,blue choker,see-through,yahari ore no seishun lovecome wa machigatteiru.,bra,angel wings,breasts,lingerie,ass,long hair

If you try to use the same prompt and the Mistoon_Diamond model, however you won't get anything similar to this picture. What you get is this:

So how did I generate the actual picture. The answer is that I've used the IP2Adapter combined with this beautiful artwork I found on Gelbooru. If you pass that picture to the controlnet you'll get something like this:

Which is similar in composition to the one I've originally made.

Different Levels of Effort

The pictures you'll find in my samples can usually be divided into 3 different "tiers" of effort:

  • The Lazy pictures: these are made using the usual SD poses, hiding hands, removing details and keeping plain backgrounds. These are the ones anybody can easily make just by following my workflow.

  • The "I'm trying" pictures: these are made with small edits and corrections using a digital drawing software (Krita). Nothing too complex, usually just fixing hands or strange details.

  • The "gud" pictures: these are made with actual effort and took multiple hours to complete.

Let's have a look at the different levels:

Lazy Pictures

00000-508004103.png

This is an example of a picture which is easy for the SD model to generate without making relevant mistakes. The prompt was:

<lora:potato:0.6>,hotify,1girl,stuffed toy,cleavage,blue hair,solo,cameo,collarbone,blue shirt,jewelry,smile,stuffed animal,parted bangs,yellow eyes,looking at viewer,bangs pinned back,teddy bear,bare shoulders, short hair

As you can see there are no hands in the picture, the clothes are incredibly simple and also the background has no major details that could create issues.

"I'm trying" pictures

00067-694445704.pngHere's an example of a picture that needed some fixes. The prompt was:

mizuki apple,long sleeves,earrings,blonde hair,lipstick,mature female,blue nails,standing,makeup,jewelry,1girl,thick eyebrows,long hair,looking at viewer,solo,full body,yellow background,pandora party project, skirt

In this case there were a lot of issues with the arms and different facial details, so I had to take the picture and edit it inside of Krita. Afterwards I inpainted a few details (like the faces).

gud pictures

00095-1029534685.png

This is an example of a picture that took a long time to make and actual effort. To explain how I made this one, I got gifted a vinyl figure of Itsuki from the Quintessential Quintuplets series, so I took an horrible photo of it:

Then I removed the background, roughly drawn the new one, passed to img2img and then repeated the steps until I was completely satisfied with the results.

These pictures are the ones which I think really show the potential of SD.

The workflow

If you want to learn about my workflow in a detailed step-by-step guide you can do it here:

Stable Diffusion Ultimate Guide pt. 6: Workflow | by Umberto Grando | Medium

I'll try to explain how I usually generate the 3 different tiers of pictures explained above:

Lazy Workflow

The quickest workflow I use to generate a picture is:

  • Generate a prompt using Ranbooru

  • Generate 6 pictures

  • Choose the best (or rerun until satisfied)

  • Copy the picture in the img2img panel

  • Run the prompt again in img2img with 1.5x the resolution (576x1152 in this case)

  • Run it again at the max resolution I'm capable of running (960x1920 in this case)

"I'm Trying" Workflow

The "I'm Trying" workflow follows these steps:

  • Generate a prompt using Ranbooru (or manually)

  • Generate 6 pictures

  • Choose the best (or rerun until satisfied)

  • Remove or change details I don't like using Krita (check out the blood on her face, the ribbon, and the fingers)

  • Copy the picture in the img2img panel

  • Run the prompt again in img2img with 1.5x the resolution (576x1152 in this case)

  • Fix again the details (Krita/Inpainting)

  • Run it again at the max resolution I'm capable of running (960x1920 in this case)

gud Workflow

Mortal Kombat - MileenaThe gud workflow is really similar to the previous one with the difference that I run and fix the picture multiple times until I'm completely satisfied. The above artwork of Mileena took multiple hours to complete.

Software/Hardware

The pictures you see on my pages have been created using the following tools:

  • Automatic111 UI: this is my go-to SD UI. I've also used comfy in the past, but it was too "mechanic" for my process

  • Krita: I started learning this since I started posting pictures here on CivitAI. It has quickly become my favorite drawing tool, and I am also using the awesome krita-ai-diffusion extension to edit pictures assisted by realtime SD

  • RTX 4080: The most overpriced piece of hardware I've ever bought, but still incredibly powerful

  • Samsung Galaxy Book Flex: This was my original notebook I used to edit and publish all my artworks until I replaced it with:

  • Surface Studio Laptop: This laptop is insane. I also have a Macbook M1 Pro which I use for music production, but the Surface is my favorite by a large margin.

Conclusions

I hope this "essay" will give you an idea of the amount of effort you'll need to put into one of my models to get results similar to the one I'm showing in my samples.

Support Me

I've started developing custom models for myself a few months ago just to check out how SD worked, but in the last few months it has become a new hobby I like to practice in my free time. All my checkpoints and LoRAs will always be released for free on Patreon or CivitAI, but if you want to support my work and get early access to all my models feel free to check out my Patreon:

https://www.patreon.com/Inzaniak

If you want to support my work for free, you can also check out my music/art here:

47

Comments