Type | |
Stats | 203 0 |
Reviews | (30) |
Published | Nov 8, 2024 |
Base Model | |
Trigger Words | yuuki sakuna |
Hash | AutoV2 EE0D8F58D2 |
Yuuki Sakuna Pony LoRA
Version 2 description
using Experimental Optimizer (it seems improvement on full body but camera angle still not variable if not prompt it)
bow still remains even change outfit
Version 1 description
I like cat ears girl so, I trained as LoRA (don't know about her past at all :<)
Trigger Word
trigger word
yuuki sakuna
any costume (still not flexible enough)
yuuki sakuna, long hair, animal ears, pink hair, blush, cat ears, pink eyes, two side up, ahoge, colored inner hair, two-tone hair
debut costume (good but some component still missing)
yuuki sakuna, long hair, hair ornament, bow, animal ears, pink hair, blush, cat ears, maid headdress, hair bow, frills, hairclip, pink eyes, pink bow, blue bow, maid, puffy sleeves, two side up, cat hair ornament, ahoge, heart hair ornament, puffy short sleeves, clothing cutout, pink dress, blue bow, colored inner hair, two-tone hair, cleavage, breasts
for full body add this following text will help (shoes still not correct)
shoes, black footwear, white thighhighs
Limitations
Cannot change costume properly (still have some debut costume component leftover)
full body may be not effective
LoRA still little bit underfitting (like medium rare pork) (in version 1)
version 2 is improve some small detail but dataset still not variable enough (due to imbalance image)
Training Details (Version 2)
LoRA size
reduced dimension to 8 with dynamic alpha
dataset
38 images (most is half body)
parameters
resolution = 1024
batch size = 2
dim,alpha = 16,16 (for training)
mix/save precision = bf16/bf16
optmizer = AdEMAMix (32 bit consume VRAM)
UNet LR = 2e-4
TE LR = 1e-4
scheduler = inverse_sqrt with warmup 100 steps
l2 loss only
steps
epochs = 10
total steps = 2850
repeat = 15 (one concept only)
tools
kohya-ss GUI v24.2.0
torch 2.5.0 cu124
RTX 3060 12 GB + xformers + gradient_checkpointing
weight
UNet weight average strength = 0.015634962041489377
Text Encoder (1) weight average strength Clip_L = 0.011193290141749815
Text Encoder (2) weight average strength Clip_G = 0.010691167576002698
Training Details (Version 1)
dataset
38 images (most is half body)
parameters
resolution = 1024
batch size = 2
dim,alpha = 16,16 (no resize for preserving quality if LoRA is good enough will do it :P)
mix/save precision = bf16/fp16 (accidentally change)
optmizer = AdEMAMix8bit
UNet LR = 1e-4
TE LR = 5e-05
scheduler = inverse_sqrt with warmup 100 steps
huber snr with c = 0.85
steps
epochs = 10
total steps = 2850
repeat = 15 (one concept only)
full_bf16 training
tools
kohya-ss GUI v24.2.0
torch 2.5.0 cu124
RTX 3060 12 GB + xformers + gradient_checkpointing
weight
UNet weight average strength = 0.008335085569112463
Text Encoder (1) weight average strength Clip_L = 0.0073367764333498705
Text Encoder (2) weight average strength Clip_G = 0.005826970830639767
description ref from Gtonero
*This LoRA is for studying LoRA training with new technique so do not use for damaging the vtuber (also support her too).