Hello, dear prompters!
I was asked in my recent comparison guide about some samples of good vs bad.
I do not have bad samples anymore from that big 35k comparison but I've recently made 5k samples using Realistic Vision 3 and since the idea is the same, you can see what I consider good and what I consider bad :) (this could also be considered as a review of RV3).
[PREFACE: this is a review in the context of my models, I do not rate this model on what it can do in general, but focus merely on how well other models of people work on it]
So, first some numbers:
"good": 3628, "bad": 1412, "overall": 2216, "model": "realisticVisionV30_v30VAE", "sum": 5040, "ratio": "2.57", "percentage": "71.98"
Well, this model ranks at 72% so there is a 1% increase over RV 2.0 (but even if it had the percentage of 50% - it would still be great)
Overall, the first thing that is visible right away is that the contrast is much higher. I guess it was trained with that noise offset fix, which is nice.
The anatomy is really good, fingers as usual are a coinflip, but there are rarely any issues with other parts.
So, out of 5040 images, I consider 3628 to be good and 1412 to be bad.
Before jumping to the samples, what differentiates the good and the bad?
Well, if there are technical defects (pixelation, blur, weird colors, weird cuts, out-of-frame) or content defects (bad anatomy, weird face colors) then that automatically goes to the BAD category. However an image will also go to a bad category if everything is fine from the technical and content perspective, but the face does not resemble the subject (or resembles not enough to my standards). Wardrobe malfunctions do not impact the bad vs good - if there problems mentioned earlier do not appear - that is still a good output (although not suitable for upload, obviously)
There is a difference between this test and the previous big 35k test - I've removed the prompts that behaved poorly (that had the lowest acceptance percentage) in the previous test. This means that overall the quality should be better (which makes the 1% increase not that impressive, and I would be surprised if that percentage was lower)
The worst prompt was:
"prompt": "candid RAW close up portrait photo of sks woman in a (purple colored suit:1.0) on a dark street with shopping windows (at night:1.2), bokeh, Ilford Delta 3200 film, dof, high definition, detailed, intricate, flashlight, ",
And the best one was again:
portrait of sks woman by Flora Borsi, style by Flora Borsi, bold, bright colours, rainbow Mohawk haircut, ((Flora Borsi)),
with 93% (I do love that hair:p)
and the lowest one was
candid RAW close up portrait photo of sks woman in a (purple colored suit:1.0) on a dark street with shopping windows (at night:1.2), bokeh, Ilford Delta 3200 film, dof, high definition, detailed, intricate, flashlight,
with 48% which is still really nice. (in the previous tests I had some prompts that scored 10% or 19%)
And the prompt that sometimes forgot to include any clothing at all:
photo of sks woman, pale skin, working class in new york city, upper body, detailed skin, 20 megapixel, canon eos r3, detailed skin, detailed, detailed face,
I guess the focus on the skin makes it that way. Will be tweaked for the next tests to be less revealing :)
Ok, so here are the samples. Comparing the not upscaled version (would not be fair otherwise, and upscaling bad versions is a waste of time, so just a regular 512x704)
FYI, my models were still trained on RV2, I will be switching to RV3 (as a training base) however, and will see if that has any impact on the LoRa/LyCORIS (I expect it should have a positive impact on the Dreambooths, obviously). If I notice something odd, I will definitely let you know.
I don't want to rate it in x/10 or anything, the filter ratio (72%) seems to be enough for me. I can just say that this is definitely a great model!
Thank you for sticking till the end,
p.s. I have seen previous suggestions and I will definitely take a look at other models. I'm thinking right now of doing them one by one, using a bit smaller batch (something around 2000 seems reasonable enough to test the quality of a model)