Creativity Vs Perfection!

An art can be creative or perfect or both. How can you tell an AI model whether the image it produces in response to your prompt must be perfect to point like an engineering diagram or something creative?

SDXL (Stable Diffusion XL) models can be tweaked to set your expectation using a parameter called "CFG (Classifier-Free Guidance)Scale", which is exposed in other AI art studios as "Prompt Weight" percentage. The minimum value of CFG Scale is 0.0 (i.e 0% Prompt Weight) to 10.0 ( i.e 100% Prompt Weight). The CFG Value tells the model how strict it should be in following your prompt (i.e the AI model's understanding of your prompt). What you think you have described could be completely different from what the AI model thinks your are describing. Please read my other article "How to phrase your SDXL prompts?" to learn more about that.

Let us try a few CFG values to see what happens. We are going to be using the same prompt for this experiment.

Prompt: "(Realistic art made of pieces of puzzle). (a medusa, majestic, standing) with (a roman castle, majestic, behind)."

With this prompt our intent is to make the model to produce a puzzle art with a scene of Medusa standing before a Roman castle.

At Low CFG Value (10%), here is what the model produced:

As you can see, that the model considered some of things we described. There is a certainly a female in the picture with hair flying around like Medusa's hair made of snakes. And there are few puzzle pieces thrown in as bones to make us happy. By setting the CFG value low, we told the model "Here is what I want, but if you are sleepy then just try to give me something and go back to sleep". Sometimes this could be what you want.

With a CFG value of 40% this is what the model produced:

Wow! Look at that! This is certainly more vivid than what the model produced before. But the model decided to show its own creativity, by drawing a beautiful statue of Medusa, with some puzzle pieces under and a castle in the background. By the way, we never said anything about a statue. So why the model decide to draw a statue here. This is what is called as "Training Bias". It is very much possible that the base SDXL model was trained with lot of images with Medusa as a statue or lot of Roman castles with statues in them or both. So when we said "Medusa standing before Roman castle" and gave some room for the model to show its creativity, the training bias kicked in. This training bias is something we can use to our advantage, but that is a topic for another article.

So may be we should not give the AI any room for its creativity. Let us make it adhere strictly to our prompt. Here is what the model produced with a CFG value of 100%:

Yay! finally a picture showing Medusa puzzle art like we wanted. But why is there a second puzzle with a castle? This is because we set the CFG value to the strictest possible value and our prompt could be interpreted in few different ways. So that AI produced the something that will satisfy the most likely things the prompt requested. Again this may be something you want. But we know that this is not we wanted.

Now let us try a more balanced value of 70% CFG. Here is what the model produced:

Finally! Something very close to what we wanted. Please note that it may take a few tries before the model produces something you like. I had to try two times before I got that output. Typically a CFG value between 70% and 80% works best because it gives the model some room to choose the most likely thing our prompt is describing.

Hope this helps. Do you have comments, suggestions or questions, please feel free to drop a comment here.

Creativity Vs Perfection!

Comments