Synopsis:
I will complete this article later. I started to feel ill again and I'm going to get some rest.
This article is based on the concept of THE RULE OF 3;
This is an English-based concept, entirely based on the conceptualized writing system of 3's. For example; the three stooges, three blind mice, and so on.
Here I will show the utility reusing the rule of 3 to instruct the system how to behave.
This isn't meant to be a technical article, so lets just jump right into the hands on stuff.
Every image here is generated using BASE Flux1D fp8 with the t5xxl_fp8_e4. We're using the compact version for convenience. We aren't using fp16 because it takes too long to generate even on a 4090, and I'm not using quantified versions in this article for T5 generation purposes intended by the developers.
768x768, 1024x768, 768x1024, 1024x1024, 1216x832, 832x1216
steps 12 to 50 - mostly 20
euler > simple
distilled cfg between 1 and 10, 2.5 to 4.5 (3.5) ideal values for base models.
normal cfg always 1 for base Flux1D
seed 420
Everything is generated at seed 420 unless otherwise specified.
For convenience
Lesson 1: Rule of 3's introduction
Lesson 1A - Simple Start
an apple on a table in a house - 768x768, 20 steps, cfg 1, dcfg 3.5, euler normal
Like before, lets begin with our simple subject; this time we're using the rule of 3's.
an apple on
a table in
a house
This can be broken up in multiple ways, you can even say this is multiple overlapping ideas, but lets assume for now that our simple idea, is our simple idea for showcase.
an apple on a table in a brick house - steps 30
So, as you can see, our rule of 3's has implemented the brick house. This doesn't always work, mind you.
However, if we solidify the statement like so;
a photograph of an apple on a table inside a brick house
As you can see, the focus of the image shifted. The PHOTOGRAPH itself, depicts the various parts as a kind of, architecture-centric way, rather than focusing on the apple. This varies from seed to seed, but lets continue.
Lesson 1B - The 3x3 zones.
YOU are one subject (the viewport), the house is one subject, and the apple on the table is the third subject.
Red is the house, blue is the photograph, green is the apple table. My overlay isn't very accurate, but you can tell it's definitely divided by three.
Lets go ahead and use some more concepts to solidify it.
ceiling, wall, floor
As you can see, it divided the room without any problems. Lets add some more 3's to the room.
ceiling, wall, floor.
fan, light, ceiling stars.
desk, computer, chair.
carpet, unworn shoes, potato.
The implications have turned the room into an office. As you can see, the rule of 3's has converted the room from a simple bare room, into something very different.
Now, lets remove all commas and all periods.
ceiling wall floor fan light ceiling stars desk computer chair carpet unworn shoes potato
Even with tags of chaos, it still managed to figure it out.
ceilingwallfloorfanlightceilingstarsdeskcomputerchaircarpetunwornshoespotato
Even in pure chaos, no spaces, no abbreviation, no punctuation, no context; it still manages to make something just using the tokens.
I mean, it's not wrong either, it's got all the elements of the original. However, as you can most definitely see, we're looking at a 3x3 grid for all of those images.
As you can see, we've successfully divided a room into 9 quadrants.
When generating intentional grids, you'll often see this pattern; I don't know why, but it seems to be the generation pattern. You can assume everything generates similarly.
I don't know why it happens, but there are likely smarter men than me who figured out why 300 years ago.
Lesson 2: Intentionally formatting an image using the rule of 3.
Naturally, we want to be able to control the images through plain English, not just tagging. We want to be able to CONTROL where our image goes, but the universe isn't actually in our control so we have to use our simple tools to control it. Lets learn some of those tools now shall we?
You have to THINK in 3's here, you CAN diverge, but it'll be a bit less consistent. Flux responds VERY WELL to the rule of 3's. It might be because T5 was likely trained on all the recorded English, fully converging into a kind of averaged system based on FLUX and common associated image fixations, tied directly to the math used to train SD images and compacting smaller forms of... y'know what this is too technical for this article.
Small samples, grow to big samples. Just know that. It learns on small things, then makes big things. T5 however, does a good job knowing where those small samples should be fixated, so lets go ahead and position some stuff.
Lesson 2A - Building a scene
there is a table on the left and an apple on the right
Ah yes, the paradoxical point where the models assume that the apple is part of the table. Lets disconnect them.
there is a table on the left of a room, a chair in the center, and an apple on the floor to the right.
Now we've merged the apple with the floor, rather than the table. it seems to be resting nicely down there, while the table and chair are very simple and quaint.
Lets go ahead and populate the wall now.
there is a table on the left of a room, a chair in the center, and an apple on the floor to the right.
there is a portrait hanging from the wall on the left side of the room, with a window in the center of the room, and a hanging potted plant on the right.
We have made a much more sophisticated room, and yet where's the ceiling? We didn't allocate a ceiling did we. Suggesting a portrait and hanging potted plant, appears to have added some regality to the scene implicitly.
The hanging portrait in this case being a kind of olde-english, being in the upper left quadrant; has impacted the entire image afterwords.
So we've successfully shown the 2x3 grid.
Lets go ahead and turn the image style using this concept. I'm going to go ahead and add the tag "digital" to the portrait, and we'll have an entirely different theme due to the generation of the image.
there is a table on the left of a room, a chair in the center, and an apple on the floor to the right.
there is a digital portrait hanging from the wall on the left side of the room, with a window in the center of the room, and a hanging potted plant on the right.
The regality of the portrait has shifted to simplistic yet again.
So lets assume that this is what flux built.
However, our prompt is not this at all. Our prompt is all over the board.
4 there is a table on the left of a room,
5 a chair in the center,
6 and an apple on the floor to the right.
1 there is a digital portrait hanging from the wall on the left side of the room,
2 with a window in the center of the room,
3 and a hanging potted plant on the right.
What happens if we organize it in the correct order?
Not too much really. Which means we can order our words in many ways, and it also implies that the order of a room of this nature is implied by it's object associations and background. In this case we're using a wall, and a floor, with the implication of a ceiling due to the hanging potted plant so the room has indoor lighting.
Lets make this a bit more complicated by adding a turtle sitting on the chair.
there is a digital portrait hanging from the wall on the left side of the room, with a window in the center of the room, and a hanging potted plant on the right.
there is a table on the left of a room, a humanoid turtle sitting on a chair in the center smoking a pipe wearing eyeglasses reading a newspaper, and an apple on the floor to the right.
So we can assume that the humanoid turtle is on a 2x2 plane, and the room has been converted into a 3x3 plane. The turtle IS in the middle, so we can assume that the turtle's subject fixation IS the newspaper reading, due to the training for newspaper reading.
Hard to unsee the rule of 3's now isn't it? Almost like, the entire system is built on it intentionally or unintentionally.
Lesson 2B - Liminal Spaces
Alright lets do something a bit more landscape oriented for solidity. I'm a big fan of liminal spaces, so lets use liminal spaces to divide our images into interesting and more context-specific situations.
there is an endlessly stretching long liminal hallway leading to a vanishing point on the horizon.
Well it's good, but I want to see a mall.
there is an endlessly stretching long liminal shopping strip leading to a vanishing point on the horizon.
there is a multitude of baron and abandoned shopping stores.
Cool stuff, but it's not focalpointing on the vanishing point. This is actually a problem with flux training that I want to address personally, because it doesn't do a good job with it, and good liminal spaces require this.
Lets make it indoors.
there is an endlessly stretching long liminal shopping strip leading to a vanishing point on the horizon.
there is a multitude of baron and abandoned shopping stores.
the tall ceiling is littered with shattered windows and broken hanging lights.
there is an endlessly stretching long liminal shopping strip leading to a vanishing point on the horizon.
there is a multitude of baron and abandoned shopping stores.
the tall ceiling is littered with shattered windows and broken hanging lights.
The upstairs walkway is partially collapsed on the left, indicating a structural failure.
there is an endlessly stretching long liminal supemarket leading to a vanishing point on the horizon.
there is a multitude of baron aisles with only a few scattered items of use.
the tall ceiling is littered with shattered windows and broken hanging lights.
there is a broken sign hanging overhead with the text "canned goods aisle 34231"
1024x1024, steps 50
As you can see, object association is quite powerful when attempting landscaping. Hence the failings and incorrectness. There are many purposes for loras, and liminal spaces is one of favorite, so just another reason to train the thing I suppose.