This started as a review of the model, addressed to the SeaArt team. I had so much to say I expanded it into an article.
I spent a good amount of time probing and experimenting with SeaArt FurryXL 1.0 the past few days. It's clear the SeaArt team worked hard on this model and came up with something very interesting and useful. Ultimately, I don't think it will be replacing Pony as my model of choice. I wanted to share some feedback of why that is, including specific examples. I hope it is helpful to you.
But let's start off with what you did incredibly well.
Beautiful gens
The texture rendering and composition is better than any furry model I've seen. It will flexibly provide whatever style you ask for, with pleasing layouts and pleasing color schemes. It's a huge improvement over SD1.5 where you need different models for realism vs kemono vs artist styles, and much more convenient than using LoRAs for everything. I might suggest you improve the realistic 3d style (like Indigo Realistic or the recent Red Sea finetune) but apart from that the visual quality is pure gold. From the way you described your evaluation process, it seems like you relied on user surveys to guide your model toward gorgeous, high quality images. It totally worked!
Just look at this boy and his beautiful brushwork.
canine, wolf, by nomax, by honovy, by lostgoose, portrait, cactus, digital media \(artwork\), male, close-up, looking at viewer, nude, hi_res, (detailed, best quality), masterful technique, 8k resolution
There's no other model that can generate stuff like this out of the box. I expect to continue using SeaArt FurryXL to do paintovers and style adjustments from outputs of other models.
Prompt Following
However, the model has a big limitation: prompt adherence and natural language understanding. When you prompt for something simple like a standard pinup of a wolf it will generate wonderful results. But as soon as you ask for something more out-of-distribution it loses its footing. Here is an example prompt. I tried to pick something fairly standard but with a few specific requirements.
Positive: marine, shark, scalie, dragon, red scales, slim, anthro, male, sitting, underwear, choker, sofa, underwear bulge, jewelry, inside, furniture, feet, tail, looking at viewer, solo, 2023, newest, digital media \(artwork\), hi_res, (masterpiece, detailed, best quality), professional artwork, masterful technique, 8k resolution
Negative: low quality, worst quality, oldest, artist name, signature, logo, artist logo, watermark
dpm++ 2s a; 36 steps; cfg 7.0; seed=956362839503909
I've posted a typical gen from this prompt. When I generate it, I see multiple problems:
The character is not very dragon-like. He is basically just a shark.
hybrid
doesn't help.Jewelry is almost never included.
The gen has a smooth 3d look, which is not in the prompt. Some of these keywords must have a strong style bias.
He is not particularly
slim
, just an athletic build.If I get remove the negative prompt, he will get skinnier, but the quality is harmed. It doesn't remove the 3d look.
Several of the issues can be fixed with some effort. An SDE solver will give him more scale detail. (Euler makes it mushier.) I could add necklace, earring
to get some jewelry on him and bump up the weight of slim
to 1.5. But you don't have to do those things with Pony. It generates what you ask for and infers the rest without needing to repeat tags. Even if the basic result from Pony is often bit sketchy and dull, all you have to do is add a style LoRA and you're good to go.
When adding artists or styles, the anatomy bias is often very strong. If I prompted for darkgem
then character shark will have thick muscular body no matter what I add to the prompt. If I prompt for kemono
in this example I'll lose the slight dragon-like features. The specific wording of your caption seems to determine whether you'll have a successful gen or not, and putting anything in your prompt besides e621 tags has been unreliable. Reordering existing tag-based prompts to move species and color to the top is additional work. (Not strictly required, but still.)
This all adds up the more specific and unusual your prompt is. I have an example of a macrofurry gen at the end of this post which Pony can reliably generate but where FurryXL is confused.
I'm not an ML professional so any thoughts on how to improve for the next gen are just spitballing. I think that adding natural language captions, mixing in a broader variety of anime and non-furry artwork, and keeping an eye on biases between tags could help your next iteration. Putting a stricter filter on low quality images rather than negative prompts may also be a better way to go. I think the focus on surveys is great but I would suggest putting more importance on other ways to evaluate the model, like asking users to try new and creative images, pushing the boundaries of CLIP.
To me the sense of unlimited possibility is what gives the best models their feeling of magic. A 2.0 version with better language understanding would be a total game-changer. Maybe Stable Diffusion 3 will provide the base you need to achieve it.
(p.s. I HIGHLY appreciate that your model gives bats cute leafy noses. It is the only model to do this that I know of!)
More Examples
Texture rendering is outstanding. It looks like something from a top SDXL artwork model, not a furry model.
positive: (Masterpiece,hyperdetailed,bestquality), a sexy handsome rainbow dragon, iridescent glistening scales, professional artwork, masterful technique, 8k resolution
negative: smooth, boring, people, (worst quality:1.4), (low quality:1.4), (normal quality:1.4), lowres
Masterpiece, hyperdetailed, best quality, macro, a sexy handsome black dragon, tongue out, licking landscape, glistening scales, solo, from side, on stomach, close-up, crush, saliva, tongue, glistening scales, professional artwork, masterful technique, 8k resolution, best quality, newest, 8k resolution, 2023, by darkgem
Something a bit more wild that shows the boundaries of SeaArt's understanding. This image is rendered decently but the dragon is supposed to be licking the ground. I get a fully focused, eyes-in-the-right-place enthusiastic slurp about one out of ten gens. With Pony it's at least 3/4, and it seems to really "get it."
THANK YOU for the wonderful bat noses. ❤️🦇