Sign In

Tofu [ANIME BASE MODEL]

64
421
10
Updated: Sep 11, 2024
base modelanime
Verified:
SafeTensor
Type
Checkpoint Trained
Stats
421
Reviews
Published
Sep 10, 2024
Base Model
SDXL 1.0
Training
Epochs: 14
Usage Tips
Clip Skip: 1
Hash
AutoV2
819385D86D

Anime base model trained from SDXL-base with dataset on 1.8M anime pictures. Cute, smart, flexible, yours!

Yes, this is a new SDXL anime base model

  • Outperforms every other non-pony anime models in anatomy

  • Outperforms Pony and NAI3 in terms of general knowledge and sfw

  • 8k+ artists styles (wildcard) few general styles out of box

  • Full color palette, full brightness range, great base aesthetics

  • Knowledge from original SDXL, no lobotomy

  • Unique experience that you have been missing (probably)

Since I've got some gpu hours and decent dataset, it becomes interesting whether it's possible to train anime model that will have vast knowledge, especially about sfw/nsfw anime concepts, and at the same time prevent it from lobotomizing everything from SDXL like we've seen before in pony and others. This checkpoint is actually the answer and proof of concept. It appears to be quite experimental and a lot of things should be done or fixed, but already usable, fine in many ways and have features missing in open source checkpoints.

Tofu have (almost) same dataset as 4th tail, that allows to generate popular characters, mimic to artists styles and recognize the majority of booru tags and concepts. All the same features with natural text mixed captioning and unique training techniques here.

  • Small details like fingers are nice. Backgrounds with popular real world locations (comes from SDXL-base) or just pretty landscape/cityscape are available.

  • Posing and nsfw are okay, do not expect it to be as good as pony well, comparing with vanilla pony it's actually not much worse, but best PD tunes/mixes are better. Still Tofu surpasses anything else and should satisfy most. If you are looking for something more spicy - use 4th tail. Transition is close to seamless.

  • Styles looks good, better then with pony base, and there are no issues or conflicts with broken TE.

  • Yes, it can generate text, but performance is very weak, especially in comparison with SD3/FLUX, just like SDXL-base. At least something.

  • It is compatible with most of SDXL loras and some animagine/other checkpoints loras, but it varies. Loras from pony - no way some style or concept loras may work, but performance varies. The most important one - controllnet from SDXL works fine. Anytest (with suffix AM, not PD) also gives decent results.

Features and prompting:

Basic:

Same as for all SDXL, ~1 megapixel for txt2img, any AR with resolution multiple of 64 (1024x1024, 1152x, 1216x832,...). Euler_a and CFG 4..9 (6-7 is best). Highresfix: anyGAN/DAT, x1.5-1.6, denoise 0.5, upscale works best with single tile resolution no more then 3mpx. Highres fix and further upscale will significantly improve quality, details, eyes, hands, feet, etc.

  • Set Emphasis: No norm in settings of your generation tool if you getting strange blobs or distortion.

  • If LCM/PCM accelerators applied - use Euler/Euler a samplers, DDIM gives a lot of mess and abominations.

No Clip Skip, just forget this meme.

Use external SDXL vae, like fp16-fix, vae baked in model may be outdated.

Quality classification:

masterpiece, best quality

for positive

low quality, worst quality

for negative. That's all.

No bs like score_x, source_x and others, don't put it in prompt, all you will get is just text with it on picture.

Negative prompt:

(worst quality, low quality:1.1), error, bad hands, watermark, distorted

correct according to your preferences, just keep it as clean as possible.

Do not put tags like greyscale, monochrome, yellow background in negative, this is not a pony and you will get only oversaturated burned images.

To improve backgrounds, add to negative

simple background, blurry background, abstract background

but do not forget to remove it if you are prompting something with simple.

Artist styles:

Grids with examples

Used with "by ", multiple gives very interesting results, can be controlled with prompt weights.

by ARTISTNAME1, [by ARTISTNAME2, (by ARTISTNAME3:0.8),...]

or/and

[by ARTISTNAME1|by ARTISTNAME2|by ARTISTNAME3|...]

Works best in the very beginning of prompt. Can be used as a wildcard. For majority highresfix/upscale improves quality and recognizability a lot.

General styles:

2.5d, bold line, smooth shading, flat colors, minimalistic, cgi, digital painting, ink style, oil style, pastel style

can be used in combinations (with artists too), with weights, both in positive and negative prompts. More will be added in future.

Natural text:

Use it in combination with booru tags, works great. Use only natural text after typing styles and quality tags. Use just booru tags and forget about it, it's all up to you.

Unlike pony, this will be more functional here, IRL concepts, cars and mechanisms, other references - yes. But don't expect it to be close to FLUX, size and architecture are incomparable.

Well, it works, kind of, but not as good as should be.

tail censor, holding own tail, hugging own tail, holding another's tail, tail grab, tail raised, tail down, ears down, hand on own ear, tail around own leg, tail around penis, tail through clothes, tail under clothes, lifted by tail, tail biting, ...

Brightness/contrast:

You can just prompt with tags or natural text what you want in it should work, like dark night, dusk, bright sun, etc. Black/white background works, but often it gives not 0,0,0 or 255,255,255 like should. Most of this is related to prompts - just check what pictures on booru are tagged with it.

Fortunately, using natural phrases like (cute girl in front of completely black background) fixes it. Anyway you shouldn't meet any issues with general use, it works just like NAI3, often even better.

Known issues:

  • Struggles in complex poses and scenes, more training is needed

  • Biases may be present

  • Ciloranko is actually opossum LMAO (error in on of cherry-picked dataset)

  • To be discovered, WIP, very experimental, first of a kind, etc.

Requests for artists/characters in future models are open. If you find artist/character/concept that perform weak, inaccurate or has strong watermark - please report, will add them explicitly. Follow for a new versions.

Leave your feedback, it's very valuebla and important.

License:

He he~

Since no horses were harmed, it's same as in original SDXL. Derriatives, commercials, whatever (some limitation check original text and don't break laws of your country). Just don't claim your authorship on base, it's very recognizable.

Thanks:

Artists wish to remain anonymous for sharing private works; Soviet Cat - GPU sponsoring; Sv1. - llm access, captioning, code; K. - training code; Bakariso - datasets, testing, advices, insides; NeuroSenko - donations, testing, code; dga, Fi., ello - donations; other fellow brothers that helped. Love you so much ❤️.

And off course everyone who made feedback and requests, it's really valueble.

AI is my hobby, I'm wasting money on it and not begging for donations. If you want to support - share my models, leave feedback, make a cute picture with kemonomimi-girl. And of course, support original artists.

However your money will accelerate further training and researches

(Just keep in mind that I can waste it on alcohol or cosplay girls)

BTC: bc1qwv83ggq8rvv07uk6dv4njs0j3yygj3aax4wg6c

ETH/USDT(e): 0x04C8a749F49aE8a56CB84cF0C99CD9E92eDB17db

if you can offer gpu-time (a100+) - PM.