Type | |
Stats | 3,853 |
Reviews | (337) |
Published | Sep 10, 2024 |
Base Model | |
Training | Epochs: 14 |
Usage Tips | Clip Skip: 1 |
Training Images | Download |
Hash | AutoV2 94DB66FC4B |
You beloved tail~ Ready for a full NAI3 experience? (Actually even better)
Full scale finetune of Pony Diffusion 6 with dataset of 1.8M anime pictures::
Unmatched (in opensouse) knowledge missing in original pony and other models
8k+ artists styles (wildcard), few general styles
Thousands of characters simply by prompt
Full color palette, full brightness range (example 1, example 2), great base aesthetics
No annoying watermarks like everywhere else
Unique angles, foreshortenings, fullbody-wideshots or extreeme closeups without any issues, pretty backgrounds as an added bonus
From cutest and lovely things to deepest and darkest fantasies
Best performance with tails concepts for your fox/cat/dog/dragon/... waifus/husbendos
Well, this finetune has amount of training that is enough to make a base anime model. Despite it, existing knowledge (for anime) has not gone but only becomes better. Accurate approach especially for TE training and a lot of high quality natural text captions (about 600k, mainly made with Claude3 Opus/Claude3.5 Sonet) significantly improves prompt control and understanding. "Feels like a new base, not pony (c)".
And yes, unlike the majority of PD-derriatives which is just a reskin or lobotomized merge, not a single lora was harmed merged. You can add your tweakers if needed, merge difference of other favourite checkpoint or whatever, it works just as a good pony-compatible base.
v0.5.0 Changelog
A new training from PD-base with a large dataset using some new approaches with pretraining, main train, refining
Lot of new data
After some black magic in training, now you can get complete black or complete white pictures without breaking compatibility with existing tools, loras, etc. Actually very interesting experience example
Better and more stable base styles, less "burning" for artists
Fixes, improvements, ...
(Dataset cut-off - beginning of July, requests after it is pending and not forgotten)
Features and prompting:
Well, first of all - TE knows a lot. It will try to make whatever you prompt without ignorance like you may use to. No guide-rails, no safeguards, no lobotomy. Shit it - shit out.
Scizo-prompts from mixes where you have to boost tag weights and add extra ones to get at least some response (something like (sunny day, rainbow, ethereal hair, transparent skin, huge breasts:1.9)) will not work. You will get something insane, creepy or unexpected.
At the same time, if you just copy tags from booru picture without manipulations mentioned above, or describe it normaly with combination of tags and natural text - most likely it will be great in very wide range. Stick to original booru tags to get best results. Deepest and darkest fantasies may require some rolling, popular things are very stable.
Basic:
Same as for all SDXL, ~1 megapixel for txt2img, any AR with resolution multiple of 64 (1024x1024, 1152x, 1216x832,...). Euler_a and CFG 4..9 (6-7 is best). Highresfix: anyGAN/DAT, x1.5-1.6, denoise 0.5, upscale works best with single tile resolution no more then 3mpx. Highres fix and further upscale will significantly improve quality, details, eyes, hands, feet, etc.
Set Emphasis: No norm in settings of your generation tool if you getting strange blobs or distortion.
If LCM/PCM accelerators applied - use Euler/Euler a samplers, DDIM gives a lot of mess and abominations.
Clip Skip 1 unless you are using loras that have problems with it.
Quality classification:
Only 4 quality tags:
masterpiece, best quality,
for positive
low quality, worst quality
for negative.
Avoid using score_x, source_x, ... etc like in original pony.
In most cases they just make things worse, add noise and mess, brake bodies, fingers, change styles and bring back urine yellow-green filter.
They just make things worse, add noise and mess, brake bodies, fingers, change styles and bring back urine yellow-green filter.
Originally that was definitely not the best implementation of quality tagging including some training flaws and requiring tons of tokens. It became clear that it's better to introduce new tags instead of fixing original. At this point they only bring old triggers without serious improvements.
Negative prompt:
(worst quality, low quality:1.1), error, bad hands, watermark, distorted
correct according to your preferences.
Do not put tags like greyscale, monochrome, yellow background in negative. You will just get burned images, no need to fix washed colors or "yellow filter" here like you may use to. 3d in negatives is also a bad choose in most cases.
To improve backgrounds, add to negative
simple background, blurry background, abstract background
but do not forget to remove it if you are prompting something with simple.
Artist styles:
Used with "by ", multiple gives very interesting results, can be controlled with prompt weights.
by ARTISTNAME1, [by ARTISTNAME2, (by ARTISTNAME3:0.8),...]
or/and
[by ARTISTNAME1|by ARTISTNAME2|by ARTISTNAME3|...]
Works best in the very beginning of prompt. Can be used as a wildcard (beware, there is a flaw in sd-dynamic-prompts extension that sometimes wrecks up results when used with batch size more then 1). For majority highresfix/upscale improves quality a lot.
General styles:
2.5d, bold line, smooth shading, flat colors, minimalistic, cgi, digital painting, ink style, oil style, pastel style
can be used in combinations (with artists too), with weights, both in positive and negative prompts.
Characters:
Use full name tag same like on boorus and proper formatting, like "karin_(blue_archive)" -> "karin \(blue_archive\)", use skin tags for better reproducing, like "karin \(bunny \(blue_archive\)". This extension might be very usefull.
Most characters are known by the name, but it will be better if you prompt their main features, like:
karin \(blue_archive\), karin \(bunny \(blue_archive\), dark-skinned female, purple halo, ponytail, yellow eyes, playboy bunny, fishnet pantyhose, gloves
Natural text:
Use it in combination with booru tags, works great. Use only natural text after typing styles and quality tags. Use just booru tags and forget about it, it's all up to you.
And yes, it's still based on pony, so it will be worse in IRL concepts, references or some complex expressions comparing to other checkpoints based on vanila SDXL. Check out Tofu, my new model that can manage such things.
Lots of Tail/Ears-related concepts:
tail censor, holding own tail, hugging own tail, holding another's tail, tail grab, tail raised, tail down, ears down, hand on own ear, tail around own leg, tail around penis, tail through clothes, tail under clothes, lifted by tail, tail biting,...
(booru meaning, not e621) and many others with natural text. Some reproduces perfectly, some requires rolling. Unfortunately In 0.5.0 some may work worse, but other looks better. Also now it have better performance with all kind of tails, not only fluffy kemonomimis.
Brightness/contrast:
You can just prompt with tags or natural text what you want in it should work, like dark night, dusk, bright sun, etc. Black/white background works, but often it gives not 0,0,0 or 255,255,255 like should. Part of this is related to prompts - just check what pictures are tagged with it. And using phrases like (cute girl in front of completely black background) fixes it. Anyway you shouldn't meet any issues with general use, it works just like NAI3, often even better.
Known issues
Well, unfortunatelly there are:
Some artist styles don't work as it should.
(The reason for this is not entirely clear, because in another model with the same dataset they work fine. Probably it is something related to conflicts with PD 1-token hashes or problems with original TE. It can be fixed in future anyway, please report if you find artists that doesn't have decent effect.)
Some concepts require more training (few tail-related, some rare like "dogeza" or memes)
Watermarks sometimes can be found. Mostly it is related to pony-base, but some may be from dataset
Ciloranko is actually opossum LMAO (error in on of cherry-picked dataset)
To be discovered, still WIP
Requests for artists/characters in future models are open. If you find artist/character/concept that perform weak, inaccurate or has strong watermark - please report, will add them explicitly. Follow for a new versions.
License:
Pony viral, check the original. Fell free to use in your merges, finetunes, ets. just please leave a link.
Future plans:
Well, a new dataset 2.5 times bigger with better balancing and classification is ready, but any mistake of flaw will cost A LOT. Fixes for current version may be quite soon, but before next big training I'm going to collect more feedback and test some new thing. If you have advices, would like to share your experience, tools or methods for training - you are very welcome.
I'm thinking about adding of some furries in dataset. It may be beneficial for anatomy, poses, concepts, but not that easy because of different tagging system and... wide aesthetic range. If you have ideas how to deal with it, suggestions for good looking/interesting furry artists or can share your datasets - please PM.
Training with natural text tagging (in combination with booru tags) looks very promising even for SDXL, and new large models comes with it out of box. Current local VLM does not have decent performance, COG and Idefics3 are nice but strongly SFW, joycaption hallucinating and almost uncontrollable with prompt, Llava is just dumb, others have similar problems. As for commercials - claude is extremely expensive, gemini has strong censorship, gpt4o is quite stupid for such a task.
So there is a little chance that someday you will see a multimodal llm finetuned with sfw/nsfw anime pictures from the dataset, it should help a lot.Oh yes, here is preliminary version and showcase.Flux - promising, very smart, gpu-heavy and brainwashed even for boobs. I've performed some training where "uncesoring" and little knowledge of anime concepts have been achieved, but it doesn't looks good enough. Write if you are interessed in it. But main issues here are training tools (actively developing, hope will get right full t5 training soon) and about 5-7 times more gpu time requirements for it, so probably it's better to wait for a while.
Any suggestions or requests, join Discord server
Thanks:
Artists wish to remain anonymous for sharing private works; Soviet Cat - GPU sponsoring; Sv1. - llm access, captioning, code; K. - training code; Bakariso - datasets, testing, advices, insides; NeuroSenko - donations, testing, code; T.,[] - datasets, testing, advices; dga, Fi., ello - donations; other fellow brothers that helped. Love you so much ❤️.
And off course everyone who made feedback and requests, it's really valuable.
Donations
AI is my hobby, I'm wasting money on it and not begging for donations. If you want to support - share my models, leave feedback, make a cute picture with kemonomimi-girl. And of course, support original artists.
Hovewer your money will accelerate further training and researches
(Just keep in mind that I can waste it on alcohol or cosplay girls)
BTC: bc1qwv83ggq8rvv07uk6dv4njs0j3yygj3aax4wg6c
ETH/USDT(e): 0x04C8a749F49aE8a56CB84cF0C99CD9E92eDB17db
if you can offer gpu-time (a100+) - PM.