Sign In

RouWei

114
986
16
Updated: Nov 17, 2024
base modelanime
Verified:
SafeTensor
Type
Checkpoint Trained
Stats
678
Reviews
Published
Nov 15, 2024
Base Model
Illustrious
Training
Epochs: 10
Usage Tips
Clip Skip: 1
Training Images
Download
Hash
AutoV2
B96CD67807

Large scale finetune of Illustrious with state of the art techniques and performance.

(tl/dr: Works exactly as it should without flaws you might encounter in other checkpoints.)

Key advantages:

  • Easy and convenient prompting

  • Great aesthetic, anatomy, stability along with versatility

  • Vibrant colors and smooth gradients without trace of burning

  • Full brightness range even with epsilon

  • 22k+ artist styles, many general styles, almost any character

An addition to mentioned, comparing with vanilla Illustrious and NoobAI:

  • No more annoying watermarks

  • No tags bleed and better segmentation (1, 2, 3 (nsfw))

  • No characters bleed and related side effects (unwanted outfits, style, composition changes)

  • No spawning of strange creatures, sfx on background or extra pair of breasts (1, 2)

  • Better coherence (1, 2), prompt following, anatomy (significant boost over illustrious, slight or neglectable over noob)

  • Artist styles look exactly as they should (and lots of new added)

  • Better prompt following without ignoring tags and need of (higher weights:1.4)

  • Forget about long scizo-negative

  • Stable style without random fluctuations on different seeds

  • New characters

Large well balanced dataset of 4.5M pictures (0.8M with natural text captions) picked from over 12M of different arts, significantly reworked TE and parts of UNET, innovative training approaches. All this in combination with great base model (despite variety of problems illustrious is currently the best base for anime) made it possible to create a checkpoint that would meet modern demands and show unique results.

Dataset cut-off - September 2024.

Features and prompting:

It works good both with short-simple and long-complex prompts. However, if there are contradictory or weird tags and concepts - they won't be ignored affecting the output. No guide-rails, no safeguards, no lobotomy, consider pruning scizo-prompts.

Dataset contains only booru-style tags and (simplified) natural text expressions. Despite having a share of furries, all captions have been converted to classic booru style to avoid a number of problems that may arise when mixing different systems. So e621 tags won't be understanded properly.

Basic:

~1 megapixel for txt2img, any AR with resolution multiple of 64 (1024x1024, 1152x, 1216x832,...). Euler_a, CFG 4..9 (5-7 is best), 20..28steps. Sigmas multiply may improve results a bit, LCM/PCM and exotic samplers untested. Highresfix - x1.5 latent + denoise 0.6 or any gan + denoise 0.3..0.55.

Quality classification:

Only 4 quality tags:

masterpiece, best quality
low quality, worst quality

Nothing else. Meta tags like lowres have been removed, do not use them. Low resolution images have been either removed or upscaled and cleaned with DAT depending on their importance.

Negative prompt:

worst quality, low quality, watermark

That's all, no need of "rusty trombone", "farting on prey" and others. Do not put tags like greyscale, monochrome in negative unless you understand what are you doing. It will lead to burning and over-saturation, colors are fine out of box.

Artist styles:

Grids with examples, list (also can be found in "training data").

Used with "by " it's mandatory. Multiple give very interesting results, can be controlled with prompt weights.

General styles:

2.5d, anime screencap, bold line, sketch, cgi, digital painting, flat colors, smooth shading, minimalistic, ink style, oil style, pastel style

Booru tags styles:

1950s (style), 1960s (style), 1970s (style), 1980s (style), 1990s (style), 1990s (style), animification, art nouveau, pinup (style), toon (style), western comics (style), nihonga, shikishi, minimalism, fine art parody

and everything from this group.

Can be used in combinations (with artists too), with weights, both in positive and negative prompts.

Characters:

Use full name booru tag and proper formatting, like "karin_(blue_archive)" -> "karin \(blue_archive\)", use skin tags for better reproducing, like "karin \(bunny \(blue_archive\)". Autocomplete extension might be very useful.

Natural text:

Use it in combination with booru tags, works great. Use only natural text after typing styles and quality tags. Use just booru tags and forget about it, it's all up to you.

Dataset contains over 800k of pitures with hybrid natural-text captions made by Opus-Vision, GPT-4o and ToriiGate

tail censor, holding own tail, hugging own tail, holding another's tail, tail grab, tail raised, tail down, ears down, hand on own ear, tail around own leg, tail around penis, tail through clothes, tail under clothes, lifted by tail, tail biting, tail insertion, tail masturbation, holding with tail, ...

(booru meaning, not e621) and many others with natural text. The majority works perfectly, some requires rolling.

Brightness/colors/contrast:

You can use extra meta tags to control it:

low brightness, high brightness, low gamma, high gamma, sharp colors, soft colors, hdr, sdr, limited range

Example

They work both in epsilon and vpred version and works really good.

Unfortunately here is an issue - the model relies on them too much. Without low brightness or low gamma or limited range (in negative) it might be difficult to achieve true 0,0,0 black, the same often true for white.

Both epsilon and vpred versions have like true zsnr, full range of colors and brightness without common flaws observed. But they behaves differently, just try it.

Vpred version

It is experimental. There is something wrong with token padding (probably) in vpred version, either with the model or on inference side. If you got broken washed out pictures like this - put BREAK somewhere on prompt. This is not happening on dark or bright pictures, to be investigated. Or just use epsilon version, it already provides full range and great experience.

Otherwise at the moment of release this is porbably the only vpred model that runs okay and doesn't suffer from burned colors, limited range, need of extra tweaks, rescales, adjustments and so on (default parameters: 1, 2, cfg rescale: 1, 2, 3). It even tends to have same like NAI3 behaviour with wrong skin colors and large fillups with red/yellow/blue under specific prompts. Full experience lmao.

To launch vpred version you will need dev build of A1111, comfy (with special loader node) or Reforge. Just use same parameters (Euler a, cfg 5..7, 20..28 steps) like epsilon. Cfg rescale is not mandatory but you can try it and choose if you like the results.

As was mentioned above to get full black or full white fill you will need to write a prompt longer than a single tag or use brightness meta-tags.

Known issues:

Off course there are:

  • As mentioned, model relies too much on brightness meta tags, so you'll have to use them to get full performance

  • Vpred version has problems with chunks padding or smth else, solved with BREAK

  • Inferior in furry-related knowledge compared to NoobAi

  • Some cherry-picked character datasets have prompting issues - Yozora and few cute fox-girls are not consistent

  • A little small details polishing finetune or lora would be nice, it's up to community

  • To be discovered

Requests for artists/characters in future models are open. If you find artist/character/concept that perform weak, inaccurate or has strong watermark - please report, will add them explicitly. Follow for a new versions.

JOIN THE DISCORD SERVER

License:

Same as illustrious. Fell free to use in your merges, finetunes, ets. just please leave a link.

How it's made

I'll consider to make a report or something like it later.

In short, 98% of work is related to dataset preparations. Instead of blindly relying on loss-weighting based on tag frequency from nai paper, a custom guided loss-weighting implementation along with asynchronous collator for balancing have been used. Ztsnr (or close to it) with Epsilon prediction was achieved using noise scheduler augmentation.

Thanks:

First of all I'd like to acknowledge everyone who supports open source, develops in improves code. Thanks to the authors of illustrious for releasing model, thank to NoobAI team for being pioneers in open finetuning of such a scale, sharing experience, raising and solving issues that previously went unnoticed.

Personal:

Artists wish to remain anonymous for sharing private works; Soviet Cat - GPU sponsoring; Sv1. - llm access, captioning, code; K. - training code; Bakariso - datasets, testing, advices, insides; NeuroSenko - donations, testing, code; T.,[] - datasets, testing, advices; rred, dga, Fi., ello - donations; other fellow brothers that helped. Love you so much ❤️.

And off course everyone who made feedback and requests, it's really valuable.

If I forgot to mention anyone, please notify.

Donations

If you want to support - share my models, leave feedback, make a cute picture with kemonomimi-girl. And of course, support original artists.

AI is my hobby, I'm spending money on it and not begging for donations. However, it has turned into a large-scale and expensive undertaking. Consider to support to accelerate new training and researches.

(Just keep in mind that I can waste it on alcohol or cosplay girls)

BTC: bc1qwv83ggq8rvv07uk6dv4njs0j3yygj3aax4wg6c

ETH/USDT(e): 0x04C8a749F49aE8a56CB84cF0C99CD9E92eDB17db

if you can offer gpu-time (a100+) - PM.