This log will focus on Text to Image.

Return to the Moon 🌙

https://civitai.red/articles/27881/hhhunters-logbook-02-moon-overview

Text to Image ( t2i )

My main battlefield ...

This is one of the biggest parts of the journey for me !

So many things start here :

Prompting, character design, stabilization, posing, framing, background choices,
and overall visual direction.

I'll be sharing the base workflows I use,
the different approaches I test, what seems to work, what still feels unstable,
and how I try to push things toward more consistent results.

Still learning, still testing, but already a lot to explore here.

So many great checkpoints out there ... fun hours in sight :P

My current methodology

Base Settings

My usual starting point. This is not a strict rule ...

Just my current base for early testing.

And of course, if I misunderstood some of these settings,
feel free to correct me 😁

How I currently understand it :

Ratio / Resolution

This defines the image format and the amount of pixels to generate.

I usually start with a 3:4 ratio
because I often work on portraits / cowboy shots / full-body characters,
and 960 x 1280 feels like a good balance for that format.

I noticed that raising the resolution too much tends to stretch the character badly.

And in my experience, this is not random at all ...

It happens very consistently.

If the model is comfortable around a certain character scale,
and I push the image way beyond that,
the proportions can quickly become a nightmare : long torso, weird body balance,

and welcome to the museum of horrors 😅 !

So I started playing with that behavior to influence body proportions.

For example :

If I want a skinny character :

it can be interesting to let the model stretch the body a little.

If I'm aiming for a fuller / curvier character :

lower resolution often seems to give me better results.

CFG

If I understood it correctly, CFG is about prompt adherence, right ?

( how strongly the model tries to follow the prompt ).

Too low, and the result may feel too loose or too random.

Too high, and it can start forcing things too hard.

So 7.5 is currently one of my favorite starting points for early testing.

Steps

Steps seem to control how long the model refines the image during generation.

Too few steps, and the result may feel undercooked.

Too many, and the gain may not always justify the extra time.

So 35 steps currently feels like a solid middle ground for my taste and my setup.

Batch size

I usually spam the 'Execute' button to generate 5 to 10 images at a time.

But maybe it's best to change that number here 🤣

No Prompt Technique

So, I just load a basic checkpoint workflow with no positive or negative prompt ...

I just leave it empty ...

I generate 5 to 10 pictures just to see what happens ...

This already gives me precious intel on the model's bias.

Negative approach

Based on what appears, I start by testing some negative keywords ...

Usually "sketch" ...

I generate 5 to 10 pictures again ...

Again, this gives me interesting intel about the model's bias.

Generic Prompting

Then I start by typing a basic generic prompt like :

1girl, solo,

brown hair, brown eyes,

casual outfit,

standing, looking at viewer,

black background.

I generate 5 to 10 pictures again ...

Character Stabilization

Based on previous generations ...

I try different combinations ( keywords, order, ... ),

trying to include as many useful details as possible,

aiming to get the same character ( or close enough ) in every generation.

Posing Fun

Real fun starts here by trying lots of interesting framing and poses.

Note :

Right now, I'm mostly generating characters on solid white or black backgrounds ...

So the model can focus all its power on the character ( I guess ? ) ...

Later, I can use Adobe Premiere to get rid of the background

( I'm thinking about green screen too ) ...

So I can try some compositions, and that's really fun xD !!

Prompting Focus : Illustrious

For now, Illustrious is the model I know best.
So this log will also include observations about how I approach prompting with it.

I'm especially interested in :

Model bias
Generic prompting
Tag behavior
How much control prompt order really gives
Character stabilization

and what feels truly reliable vs what still feels uncertain

Still exploring ...
but this is clearly one of my main focus points here.

When working with Illustrious,
I currently tend to separate prompt tags into 3 main families :

Quality / rendering tags
Danbooru descriptive tags
Model-specific / special trigger tags

Quality / rendering tags

These seem to affect the overall rendering quality, the finish,
and sometimes the general aesthetic feel.

masterpiece, best quality, amazing quality, absurdres

They do not mainly describe the content itself,
but rather the way the image is rendered.

Danbooru descriptive tags

These are the tags I use to describe the actual visible content :

Subject
Appearance
Clothes
Pose
Expression
Framing
Background

This feels like the core descriptive language of the prompt.

Friendly reminder 😁 :

https://danbooru.donmai.us/wiki_pages/tag_groups

Model-specific / special trigger tags

These are the more specific tags or prompt elements
that seem to activate particular behaviors in the model.

This can include style reinforcement,
special learned behavior,
or effects that go beyond simple visual description.

This is also the part that feels the most experimental to me.
Of course, the boundaries are not always perfectly clear ...

But for now, this is the distinction that helps me the most
when trying to understand how Illustrious behaves.

Work in progress ...

HHHunter's Logbook - #04 - Mars - Text to Image

Navigation