The LHC (Large Heap o' Chuubas) is aiming to be the model for all of your VTuber needs. There are other minor goals, like improving aesthetics, backgrounds and anatomy, but the main goal is to offer a lora-less option for generating VTubers.

Alpha V0.5

LHC v-pred v0.5 is a custom finetune of NoobAI v-pred 1.0. It has improved image quality and retains nearly all the inherent artist knowledge of NoobAI, while massively expanding it's knowledge of vtubers.

Not all vtubers are known to the same level, some need additional tags, or don't work well at all from my testing. A full list of characters, with notes on tagging and usage, can be found here: https://huggingface.co/Jyrrata/LHC_XL/blob/main/alpha_v05/vtubers_valpha05.txt . Additionally, an album of example images of every single vtuber (including ComfyUI metadata) has been prepared here: https://catbox.moe/c/pjfwt1 .

Dataset Details:

All vtubers were seperated and their image counts normalized between 80 and 120, aiming for 100 whenever possible. Additionally, 1000 images of multiple vtubers were included as well to train multicharacter capabilities. This resulted in a dataset of around 16000 images. Additionally, all images were upscaled to at least 1MP, and jpegs were additionally cleaned using upscalers specialized on removing compression artifacts from webtoons.

Some vtubers required large amounts of data that didn't have associated tags yet. Here, ai-tagging models were used to assist. I would have liked to completely tag every image by hand, however the size of the dataset makes this a larger undertaking than I am able to do at this point.

Training Details

LHC v 0.5 has been trained for 102 epochs at varying learning rates and batch sizes due to uncertainty with parameters after the major restructuring of the dataset. In general, Unet Learning rates were between 1.5e-5 and 5e-5, and TE learning rates between 4e-6 and 8e-6. Cosine schedulers and batch sizes between 8 and 32 were used. Exact training logs for tensorboard can be found on huggingface.

Training took over 400 hours and trained for over 1.6M samples.

Alpha V0.4

As opposed to the previous versions which used the LoKR method, v0.4 is a full finetune of Noob V-Pred 0.6. At over 340,000 samples(4500 images over 80 epochs) seen, it took nearly 90 hours of training, which does not included several experiments. Despite this, the understanding of artists and concepts is still very close to the base model.

A list of characters can be found here: https://huggingface.co/Jyrrata/LHC_XL/blob/main/characters/alpha04.txt . Some of these will only need the character tag, some will need additional descriptors.

And the Lora Extract here: https://huggingface.co/Jyrrata/LHC_XL/blob/main/alpha/v04/lhc_04_extract.safetensors

Training Details

Dataset:

A dataset of ~3500 images(4500 including repeats) was used. This includes 3 artists with a total of ~350 images, ~500 images of multiple characters and the ~2650 images of the 100 included characters.

Repeats were chosen so that each character has somewhere between 30 and 50 images per epoch. Whenever possible, high quality pngs with resolutions >1MP were chosen. If this was not possible, then the images were upscaled and/or cleaned using upscaling Models designed to remove jpeg artifacts from images.

Alpha V0.3.1

Due to some mistakes during the training of alpha v0.3, the model has diverged significantly from NoobAI. Nonetheless, it is a capable model with good understanding of most of the 79 trained vtubers and a passable one for the rest. For an overview refer to:

https://huggingface.co/Jyrrata/LHC_XL/blob/main/characters/alpha03.txt

and https://civitai.com/posts/9579061 for a visual guide on basic character comprehension of the two v0.3 models. Many characters work with only their activation tag, though some require a little or a lot of additional tags to work.

Alpha V0.3 and V0.3.1 were trained on NoobAI-XL V-Pred-0.6 version.

A lora extracted version can be found here: https://huggingface.co/Jyrrata/LHC_XL/blob/main/alpha/v03/lhc_v03_1_lora.safetensors

If you want to use V0.3, it can be found here: https://huggingface.co/Jyrrata/LHC_XL/blob/main/alpha/v03/LHC_alphav03-vpred.safetensors

Additionally, there is also an eps version and a version trained on rouwei-vpred of intermediate datasets in that huggingface repo. Refer to the character .txt files for overview of the v0.2.5 knowledge.

Alpha V0.2

Same general approach as v0.1, however the dataset has been expanded by 10 additional vtubers for a total of 28 now, and the final two epochs include an experimental dataset of 1200 images covering a wide base of concepts intended to realign and improve the model aesthetically.

Included vtubers this time are:

aradia ravencroft
bon \(vtuber\)
coni confetti
dizzy dokuro
dooby \(vtuber\)
haruka karibu
juniper actias
kogenei niko
malpha ravencroft
mamarissa
michi mochievee
rindo chihaya
rin penrose
atlas anarchy
dr.nova\(e\)
eimi isami
isaki riona
jaiden animations
juna unagi
kikirara vivi
mizumiya su
trickywi
tsukinoki tirol
alias nono
biscotti \(vtuber\)
mono monet
rem kanashibari
yumi the witch

In addition to adding new ones, the datasets for some of the old ones have been redone, especially trickywi, juna unagi and juniper actias. Juniper has also gotten two new tags, juniper actias \(new design\) and juniper actias \(old design\), which tries to seperate her models into two distinct phases. This is experimental and might not be carried forward to future versions.

A showcase of the base character tag understanding is here. Some vtubers don't work with only their character tag, instead you will need additional descriptive tags.

Alpha V0.1

This model is currently still in alpha. The current state is not indicative of all future capabilities, but rather just a proof of concept.

A basic test model, with nice results nonetheless. Trained on roughly 1000 images featuring mostly 18 vtubers that the base NoobAI model did not know well. This model is based on the NoobAIXL v-pred-0.5-version model.

As a V-pred model, this model will not work in all WebUIs, but only those that have implemented vpred sampling. The necessary state dicts of the model have been set for UIs like Comfy and ReForge to set the required settings automatically. If not, it is necessary to activate v-pred sampling and it is recommended to turn on ztsnr as well.

The newly added/enhanced vtubers are (listed by their trained tags):

Aradia Ravencroft
Malpha Ravencroft
Mamarissa
Koganei Niko
Rindo Chihaya
Mizumiya Su
Isaki Riona
Kikirara Vivi
Coni Confetti
Dizzy Dokuro
Dooby (Vtuber)
Haruka Karibu
Juna Unagi
Juniper Actias
Michi Mochievee
Rin Penrose
Trickywi
Jaiden Animations

Additionally included were especially Nerissa Ravencroft and Vienna (Vtuber), as well as many images featuring 2 or more characters at once.

For a showcase of the base character comprehension, check out this post.

Recommended Settings:

Sampler: Euler

CFG: 4-5

Steps: 25+

Training Details:

Trained as a full dimension LoKr, based on the methodology of the KohakuXL series, with the Lycoris settings found here.

Specific parameters:

Dataset: 1035 images
Batchsize: 2
Gradient Accumulation: 4
Training steps: ~6400
Training Epochs: ~50
Unet LR: 3e-5 (lowered to 2e-5 for the last 12 epochs)
TE LR: 2e-5 (lowered to 1e-5 for the last 12 epochs)
Optimizer: AdamW 8-bit
Constant scheduler

Special Thanks:

kblueleaf (Kohaku Blueleaf): for the Lycoris library and the resources on finetuning via LoKr

OnomaAI & Laxhar Dream Lab: for amazing base models

kohya-ss: for sd-scripts