Thought2image - a workflow about "robot girl reaching for an apple" using img2img

Helloooo there creative folks, viewers and dear readers,

since i am just getting in touch with the creation of images, i doubt there is new tricks i am able to tell.
On the other hand an article about the creation of an image and the learning process behind it, could be informative and maybe entertaining. In this article i take you with me in the process from an image in my mind, to the finished creation around that darnn apple.
Image: https://civitai.com/images/8271843

"robot girl reaching for an apple" using Stable-Diffusion and img2img tab, stumbling in the process. May it help you in your own steps to realize your ideas - or just make you laugh about my mistakes and "masterpiece drawing skills" ;) (You are definatly allowed to laugh)

It began with a casual find...

I was playing around with words and tags my grey matter collected over the day, when i stumbled over the SDv1.5 ScrapBot LoRA by norod78.

I have a set of different subjects i use to try LORAs. In one of thoose sets i had an interesting "suggestion" coming out when trying the ScrapBot LoRa: An idea was born. With the image in my head, so i tried to form it into words using short tags, giving me an idea (again) about the composition by just reading the words.

#note: secretly under just the two of us: Did you already start tagging random things in reallife? #note: i am sitting definatly too much time in from of my computer

Idea to Prompt

My first try of a prompt went like this: "1woman standing in front of appletree, reaching for apple, dark background, "
adding a few style tags, more descriptive tags, trying to keep it in a short but descriptive loose form. Using this prompt in the "txt2img" tab of Stable-Diffusion, it got pretty close, but was not exactly how i imagined it.

Talking with another artist, they made the idea to use "poses" or "img2img" instead, to have more control over it.

Where to get the source image?

First i needed the right positioning of each subject in my image (robot, apple and tree).

Since i don't want to use images from the net (topic copyright) i could have staged the scene and taken a photo of myself in the right position or try to draw anything myself.
Since i think (or subjectively feel like) my partner or overly curious neighbours would have called a doctor, watching me kneel in front of the courtains and stretch myself trying to reach a small red balloon i hung up there myself, I guess you agree i went with the drawing method drinking a coffee instead.

With my limitless creation powers (and i admit very little artistic skills in MSPaint) :#) i created a "wonderful (masterpiece:1.95) of a painting" as a sketch.

I did not have much faith in it.

(original size 1024x768, the size my creation should come out finally)

We got a pre-image. What now?

I fed the image into the img2img tab in Stable-Diffusion and used my txt2img prompt (i used before) with it. I had quite interresting "suggestions" from the start. Twisted and fused limbs or the right position were solved by trying out other tags or putting weights in positive an negative prompt.

This looked like this:

I was quite happy with my creation after a few (about 50) tries, the prompt quite fitting.

>_< But i was not satisfied with "quite" good. ¯\_(ツ)_/¯

So i started back up from the scratch with another artistic masterpiece. ^_^

(original size 1024x768)

This time i had the prompt already fixed for the style and LORA (SDv1.5 ScrapBot LoRA).

I made about 10 test runs, finetuning the ranges of weights to be used, and leant back as the machine worked itself through my prompt.

After about 50 more images created, i found a quite good image with a lot of details i wanted to keep.

I went into my img2img tag in Stable-Diffusion, putting this new image as the source to create from:

After that, i controlled every output image for progress in my desired direction.

I guess i could have used inpaint as well to do the job, but i was most curious what the checkpoint and LORA will turn out to suprise me. So, the machine had to do the work:

Finetuning and getting the creation done

For this i set Stable-Diffusion on batchcount:1/batchsize:2 in "Generate forever" mode, writing on the prompt meanwhile.

During the creation process i fine tune tags and set weights depending on the last outcome.
If i want an red apple that looks like a red circuit board, i use and set brackets for ((red) circuitboard) and see if the tags "resistor", "capacitor", "relay", might change something for good or bad. (I swear i had a very long, very hard time with that %$§@!! apple to get circuity)

Working with img2img i had an eye on my Denoising strength setting.
Putting 0.1 i did not notice changes from the source image at all.
0.25 - 0.45: small changes, like style or additional subjects, depending the prompt and weight of my tags.
above 0.50 my source image was changed a lot to fit my prompt. (it happend to even loose position of the robot).

Tip:

For the tags weights i use dynamicPrompt extension for Stable-Diffusion.

For trying and testing i go with a dynamic setting between 0.50 - 1.20, which looks like this in my prompting textfield:
(apple:{0.50|0.60|0.70|0.80|0.90|1.00|1.10|1.20})

If you find your sweet spot, you can close the range around it, ex:

(apple:{0.79|0.80|0.81|0.82|0.83|1.84|1.85})
until it fits your needs and gusto.

An eternity later...

Finetuning took me a lot of time in this creation. But after a while (30 batches) the prompt can be kept running for a while in generation. Eventually after some more coffee, you will see an image that fits the image that once was just in your head. Congratulations!

Aftermath:

I had an output of 196 images total for this creation, +2 MSPaint images,

It took me about 14 hours with various breaks, from the idea to the creation due to generating times, tagging and especially sorting out images. (notice my graphic card is stoneage made).

Special thanks to norod78 for the SDv1.5 ScrapBot LoRA
and JJDoe042 for the UltimatePixarToon v1.0 Checkpoint. Its a really fun combination.

Thank you for taking your time and reading my first article. Criticism and suggestions for improvement are gladly accepted.

Plz enjoy and have fun creating. =))

cdbsas418tp