This is a follow-up to my previous article ( https://civitai.com/articles/1939/an-attempt-at-classification-of-models-sd-v15 ), and also a relevant post on the roadmap page ( https://feedback.civitai.com/submissions/64b6b62d9cfdf9c6913aaf3f )
As we saw in the previous article, even though there is a huge variety of models out there, it is possible to use numerical methods to classify them in a completely objective way. I do think however, that for many users seeing actual example images from the models will be more immediately intuitive and also tell some things about the models which a dendrogram can not.
Do we need a standardized test image on the Civitai page?
I know this has been discussed by others. I personally will welcome it (I saw it has status "planned" in the roadmap), and think it can help users get an idea about what type of model they are looking at before downloading the big datafile to test it themselves. I know there are arguments for and against having a "standardized" test image. One I have heard against it is that model makers can tweak their model to give a good result for the test prompt only. I think there may be some truth in this, but I also think it depends on how the test is designed and how specific or general it is. Also I think if everyone is made aware that a simple test like this should only be seen a quick and rough guide to what style or art direction you can expect from a model, rather than be used to judge if one model is "better" than another, it can serve a meaningful purpose. A standardized test can never show the full potential for a model, and what it is capable of doing with proper prompting and specific settings. Some models can be trained for specific uses or with different settings (for example clip skip 2, and I use 1 in the test for the sake of comparison..). But it will most certainly be easier for everyone to see which models out there are similar or different to each other!
Finding prompts that work well for testing
In my view an ideal test-prompt should be as simple as possible. The way I reason is: The test is supposed to be a test of the model, not a test or showcase of the prompt or me as a prompt-maker. But also an ideal test should give as detailed and varied results as possible from model to model, to make it possible to spot similarities or differences. It would also be good to include very different prompts to give information about many different types of images and use cases or art styles. I spent quite some time trying to find prompts which match these needs in a good way. These are the ten I found most useful in my own testing, and which also cover everything from abstract images to art techniques and photorealism (numbered P1 to P10). The shown results are for the top10 most liked models on Civitai the last month.
For all prompts I use CFG scale 5.0 and 20 steps, image size 512x512 and "123456789" as seed. I know some images may look better with slightly higher CFG scale, but for the purpose of testing I found 5.0 to be good because it gives the model more freedom to do it's own thing. 20 steps is generally considered enough to make a decent image. I deliberately chose to include 3 samplers in the test: Euler a, DPM++ 2M Karras and DPM++ SDE Karras. Because these are among the best samplers from the 3 different sampler categories, are all commonly used and most importantly is the combination which will give most different image outputs. Two of them are also very fast to run. DPM++ 2S a Karras could have been used in place of Euler a, but having both would be unnecessary.
For the testing purpose, having no negative prompt was an alternative. We don't really want to limit the models too much when we want to see what they are likely to make. However, I grew a little weary of looking at deformed bodies etc., so opted for a compromise - a very simple negative prompt: "ugly, blurry, deformed".
It was enough to make the images look much better without really restricting the models much.
The prompts and image outputs
P1: "various colorful geometrical shapes realistic"
This was the first I made for testing purposes. It produces a very good variety of detailed and easily distinguishable colorful shapes, which makes it easy to spot similarities between models. It's completely abstract though, and doesn't directly tell you much about what kind of normal images the model is likely to generate. But if a "mix" is heavily based on another model, it is likely to show with this prompt. Another abstract prompt I have used is "an intricate 3D pattern flashy vivid colors", but it tends to make less distinct and diverse shapes.
Yes, I'm not joking! lol The name Civitai actually works really well as a test-prompt for urban landscape images (apparently associated by stable diffusion with the ancient Roman empire (?) - "civitas" is latin for "city"). I guess it can be called the "Civitai test"! :-)
P3: "natural landscape"
Maybe not the most exciting images, but landscape images has a purpose and use. And this simple prompt always gives a reasonable variation of normal looking nature landscape images, where its possible to see if a model prefer a photorealistic or look or a drawing/painting style.
P4: "park street suburb pond house"
Produces a different kind of "landscape" image, with a house in an urban, green environment. I thought it would be nice to have at least one prompt showing a building, for those interested in architecture (which is a separate model category on Civitai after all). With this specific prompt and ordering of the words, I always seem to get a big, nice looking house with a large pond in front it.
P5: "a beautiful good old happy special adorable woman shocked"
This is one of my favorite test prompts, and manages to separate models both in terms of being able to show an old woman, prefers asian women or western/caucasian women (or maybe other ethnicities?? I have yet to see it!) and also whether the model prefers photorealism or an anime/cartoon style. It also produces a fun variety of face expressions.
P6: "african man ray"
This is a very good test of ability to output an image which really looks like an analog photo. It also gives a varied and colorful output between the models. "Man Ray" was a famous art photographer (surrealist) in the early 1900s, and putting "african" in front of his name combines to also spell out "african man". I'm not sure exactly how the text parser/Clip model handles this, but the result is definitely african men in the photo art style of Man Ray!
For the last 4 prompts I tried to cover the most important art-making techniques:
P7: "sensual woman classical oil painting portrait"
I felt like I should include one prompt which can show a models tendency for NSFW outputs. The challenge was to find the right balance. I wanted a prompt which doesn't go very far in terms of outputting nudity, but at the same time also may do so if the model really really prefers it. The chosen words in this prompt are usually on the safe side, but there will be a few occasional NSFW outputs. Not many though. If you want a prompt which is even less likely to give NSFW images, you can change it to "woman classical sensual oil painting portrait". If you prefer to test with more such outputs, swapping "sensual" for "erotic" (or use both) will make the model go considerably "hotter". Putting "nsfw" or "nude" at the start will guarantee images you would never show at work...
P8: "girl flower vivid inky watercolor"
Another important art making technique is watercolor painting. A possible prompt here was to simply use the single word "watercolor". I found that adding "girl" and "flower" + "vivid inky" gave decent variety in the output while also showing better what kind of art style the different model prefers.
P9: "boy and fox colorful grimm fairytale lithography"
Printing is also widespread in the art world, and there are many different techniques. One of the most common is lithography. I found this prompt to be reasonably good at making images which can resemble artwork found in illustrated books. For some gender balance, I put "boy" in this one.
P10: "angry mouse cartoon drawing"
This works surprisingly well, and will output a lot of different cartoon styles (depending on the model) and many funny images. I also tested "angry elf" and "angry kid", but the mouse overall gives much better variety in the output.
So there you have the prompts. When testing models, I set up A1111 differently to get one 10x3 collage for each model separately. This is how the same images as above looks when organized in this way. Each model gets it's own "card" in the form of a single jpg image file:
Test of the TOP 10 most liked models on Civitai last month (august 2023)
Disclaimer: The test-images should only be taken as a simple quick guide to show the type of image or art style a model is likely to produce. They may not be representative for the models full potential, but can guide in comparing models against each other for differences or similarities.
How to do the tests most effectively
Thanks to the very convenient X/Y/Z script in A1111, testing models with multiple prompts is not too difficult. To make it more efficient for this purpose, I tweaked the python code a little so it saves the results from each model test as individual images with the model filename in the image filename and also in the image headers (I put checkpoints as Z type).
I have attached the script-file ("xyz_grid_2 (for modeltests with checkpoints as Z).py") to this post, so if you want to you can download it and place it in your A1111 "scripts" folder. Also attached is a textfile with all the prompts, which you can use to copy from (as described in point 3 below).
Here is how I do the model tests:
(assuming the tweaked XYZ-script is in the A1111 "scripts" folder)
Select "X/Y/Z plot 2 (for model tests with checkpoints as Z)" in the script drop-down menu.
Choose "Prompt S/R" as "X type".
Copy a list of comma-separated test-prompts from a textfile (see the attachments on the right in this post).
Paste it in the "X Values" field.
Copy the first test-prompt to A1111's main positive prompt field.
Choose "Sampler" as "Y type", and select the samplers you'll include under "Y Values" (I recommend "Euler a" + "DPM++ 2M Karras" and "DPM++ SDE Karras").
Choose "Checkpoint name" as "Z type", and select the models you want to test under "Z Values".
Adjust other parameters to CFG Scale 5.0, Sampling steps 20, 512x512 image size. And put a fixed number as "Seed", I use 123456789 for this "standardized" test.
Click the "Generate" button, and enjoy taking a break while your A1111 do all the work for you :-) For me it takes about 25 min to test 10 models (on a NVIDIA GeForce RTX 3060 GPU), but it all depends on your graphics card.
Feel free to comment and tell what you think. Do you have a favorite test-prompt you would like to share? Do you think the prompts I chose make sense to use, and give relevant outputs for the tested models?
Also, I would like to thank Civitai for making the website, which I have found very useful! :-)