Sign In

SD 3.5 will surpass FLUX

3

Oct 29, 2024

musing
SD 3.5 will surpass FLUX

SD 3.5 will surpass FLUX

FLUX has a much larger UNET even compared to SD 3.5 Large but its ability to do something simple like draw a different face is abysmal.

FLUX uses a large natural language model to tokenize with a 160MB CLIP (CLIP-L) that came out years ago.

SD3.5 uses that same natural language model (T5xxl) and CLIP-L also but has the benefit of CLIP-G

If you want to try out SD 3.5 with Google FLAN T5xxl:

Compare the same prompt:

"a 4k photo with every possible detail of the most beautiful female in the world, combine ever ethnicity and skin color but only ages 18-26yo and female with feminine features, recreate Eve from the bibles account in genesis of the first female"

Same prompt but great variation. Ask that of FLUX and you get the same face over and over.

Once SD3.5 hits onetrainer we will make something great

Even if SD 3.5 is just XL with a natural language element the benefit is allowing multiple language to interact with a tokenizer in their native language will benefit the AI art community.

Not having to "speak" to the clip in tokens also allows for more variation just based on the random and unknown nature of the text encoder/tokenizer interaction.

3

Comments