santa hat
deerdeer nosedeer glow
Sign In

Opinionated Guide to SDXL Lora Training

9

Opinionated Guide to SDXL Lora Training

Preface

This is a HIGHLY OPINIONATED "SCIENCE THIS BISH" edition of SDXL lora training - NONE OF THIS is perfect, and NONE OF THIS IS "you must do this my way or you're a turdey wurdey!". No in fact this is ONLY A TRIBUTE - it's not the greatest training in the world - it's y'know -- a TRIBUTE! queue Lucifer and Jack Black

Translation: SDXL is brand new, we're SO USED to SD 1.5 that this is going to take time to cement more regulatory and "better" training ideas.

You do you - if you have a cooler way of doing it - AMAZING!

Software Requirements

I have been using BMALTAIS for this because at the current moment, yes Linaqruf had an 0.9 Lora setting and YES LastBen has a notebook as well for SDXL 1 - I've been trying to learn bmaltais for this.

That being said:
I think LastBen might be using the same code as bmaltais just not the gui, so most of these settings should work no matter what.

Repository Links

Bmaltais repo: https://github.com/bmaltais/kohya_ss

Runpod Docker Templates:

https://github.com/ashleykleynhans/kohya-docker

https://github.com/ashleykleynhans/stable-diffusion-docker

VAST will work if you know how to set up the docker templates - I don't, so i'm sadly back on runpod for this.

I was going to pass my vast and Runpod affiliate links but let's be real:

You'd hate me for shilling out anyways the way i've been hoarding my SDXL loras - so you do you :P

LastBen: https://github.com/TheLastBen/fast-stable-diffusion - Use his link for runpod, it goes straight to his docker.

DISCLAIMER:

Most of my stupidity is based on HoloStrawberry's extremely helpful SD 1.5 notebook that I clung to for dear life + Linaqruf's SD 1.5. notebook previous to that. If you're still and SD 1.5 homie PROPS TO YOU! I'm not leaving 1.5, i'm just playing with the new toys while it's still fresh and hot!

So basically: Don't scream at me if i don't know what i'm doing - I learned from the big kids earlier on and I still don't know what i'm doing - I hate math, numbers piss me off and I just go by how WELL it looks when it's done.

Training Time

Data collection & Prep

First of all you need to have DATA - and in this case it really doesn't matter anymore how many images you have (it never did, i'm just lazy)

I would recommend at least 10 images. You can literally get away with fudging repeats and epochs this way.

Reminder: THIS IS A SCIENCE THIS BISH level training, nobody's perfect - in fact i'll admit this: I learned my SDXL from Envy and a little from GoofyAI - you always got someone to help you!

I'm not going to school you in WHERE HOW AND WHY to get your images, this is up to you. I've been re-training content on AI outputs as well as other data - so i've got 6+ months worth of data sitting on my imac hard drive.

In THEORY though: It should be of quality for AT LEAST 1.5 settings to get "PERFECTION" the first time around.

This means: If you've got 90s screenshots of a cartoon, best upscale that in either Affinity or Automatic.

Don't be me training an X-men Celshade lora on content that MIGHT FLY for SD 1.5 but would still be WAVERS HANDS Meeeeeh. (It came out ok in the end, but my LORAs go stronk and need dialing down so that could be it?)

So your steps for this should be:

1 - COLLECT YOUR DATA

2- UPSCALE IF NEEDED

3 - PREP YOUR FOLDERS

Your FOLDER STRUCTURE gets a bit odd if you're using BMALTAIS/Kohya, it won't matter so much on LastBen or when others start making Colab notebooks - but if you're using Bmaltais on Runpod or local:

DATA FOLDER NAME

  • IMG

  • MODEL

  • LOGS

Under "IMG" you'll want to structure it with a basic concept - the documentation is a little unclear but here's how i've been doing it:

+ Number of Repeats_ CONCEPT NAME

aka: 2_illustration

Your number of repeats is going to entirely depend on HOW LONG and how strong you want your lora. We'll get to that in the next section though.

BASE MODEL?

Envy recommends SDXL base. I've been using a mix of Linaqruf's model, Envy's OVERDRIVE XL and base SDXL to train stuff. Like SD 1.5, this is utterly preferential. Envy's model gave strong results, but it WILL BREAK the lora on other models. Sadly, anything trained on Envy Overdrive doesnt' work on OSEA SDXL model.

Repeats + Epochs

Again this is all a preference.

Your repeats is a math game, your epochs is a math game. I don't do math well, I'm the ARTISTIC AUTISTIC - so here's how I figure this out:

10-20 images needs AT LEAST 10 repeats PER EPOCH. With 10 images, i actually went for 40 repeats, because the epochs don't always match steps per repeat - so you sadly still have a math game - but i've been doing 5 epochs and 1-2 batch size max.

Your batch size will help calculate the steps.

Right now i've got an hour ish long train on a 2 repeat 500 image set - I'm still experimenting, and this one MIGHT BOMB on me but here's the setup:

running training / 学習開始

num train images * repeats / 学習画像の数×繰り返し回数: 1006

num reg images / 正則化画像の数: 0

num batches per epoch / 1epochのバッチ数: 503

num epochs / epoch数: 5

batch size per device / バッチサイズ: 2

gradient accumulation steps / 勾配を合計するステップ数 = 1

total optimization steps / 学習ステップ数: 2515

That's straight from my Kohya Logs. That should sort an idea of how it does stuff. This is also based on WHAT I USE FOR the learning rate etc. Remember: SD 1.5 settings don't always bode well for SDXL - AND ALSO: PREFERENCE!

I've had success with Envy's main suggestions, and some tricks from GoofyAI i picked up.

So here's more information from my logs for you:

Using DreamBooth method.

ignore directory without repeats / 繰り返し回数のないディレクトリを無視します: .ipynb_checkpoints

prepare images.

found directory dataset/Yashahime_Style/img/2_anime contains 503 image files

1006 train images with repeating.

0 reg images.

no regularization images / 正則化画像が見つかりませんでした

[Dataset 0]

batch_size: 2

resolution: (1024, 1024)

enable_bucket: True

min_bucket_reso: 256

max_bucket_reso: 2048

bucket_reso_steps: 64

bucket_no_upscale: True

[Subset 0 of Dataset 0]

image_dir: "dataset/Yashahime_Style/img/2_anime"

image_count: 503

num_repeats: 2

shuffle_caption: True

keep_tokens: 0

caption_dropout_rate: 0.05

caption_dropout_every_n_epoches: 0

caption_tag_dropout_rate: 0.0

color_aug: False

flip_aug: False

face_crop_aug_range: None

random_crop: False

token_warmup_min: 1,

token_warmup_step: 0,

is_reg: False

class_tokens: anime

caption_extension: .txt

Learning Rate + Other Stuff

Ok LEARNING RATE IS GOING TO CAUSE A WORLD WAR SIX Level argument here but let me just remind you before you throw shade and facepalm at me: SDXL runs different than 1.5. It's a preference also in HOW and why and otherwise.

Learning Rate I've been using with moderate to high success: 1e-7
Learning rate on SD 1.5 that CAN WORK if you know what you're doing but hasn't worked for me on SDXL: 5e4

I've attached another JSON of the settings that match ADAFACTOR, that does work but I didn't feel it worked for ME so i went back to the other settings - This is LITERALLY a preference btw.

I have been using DADAPTATION, with COSINE - and i've not been using restarts unless it was the Adafactor version.

Exception: Yashahime's almost done and i'll throw the attachment JSON for that one for study in here as well.

So with Yashahime Style i went back and added COSINE WITH RESTARTS like SD 1.5 and GoofyAI have been using.

I've also been adding in the salted 'MIN SNR GAMMA' with no clue if it influences SDXL or not, so i've set MIN SNR GAMMA to 5 like the original notebooks on 1.5 stated.

So far your settings should be:

  • Standard Lora (i have no clue what other stuff works on SD XL yet)

  • Train batch size NO MORE THAN 3 - You can push it to 3 for SDXL but it weakens it severely.

  • FP16

  • Epochs is a preference but 3-5 should be fine. (This also depends if you want more repeats or more cooking time, don't burn your lora or i'll smack you lol)

  • LR SCHEDULER: Cosine or Cosine with Restarts (Dunno if i set mine correctly i'm tired check my settings for Yashahime)

  • LR CYCLES - I set it to 3 cause i think that's what I was doing on Holostrawberry's stuff.

  • Cache Latents & CACHE THEM TO DISK (even on runpod do this)

  • SEED: I Dunno i just -- I had set mine the same way Envy did 12345 - I know normally seed is like -1 on 1.5 but i'm brain dead shht.

  • Optimizer: DaDaptation - For me this works, OR use Adafactor (Check the Pinkspider Json file for any extra arguments for the optimizer)

  • L% Warmup: I forgot like a naughty dunce to actually fix this, I can't remember the OG setting.

  • NO HALF VAE: you don't click this you get "NANS IN LATENT" -and you will scream.

BUT DUSK: YOUR LORAS ARE CHONK FOR SDXL

..... I've got no love loss for you on this one, you're not gonna get FallenIncursio 1mb loras right off the bat. I never make mine that small personally, his are AMAZING to work for that small and it's a concept that is LORA TRAINING hilarity and it's a preference! Sometimes people do that and things don't always pan out - Fallen's are great, and I have NO problem - but with SDXL the less the dim the ACTUAL LESS QUALITY as far as i've seen.

32 DIM should be your ABSOLUTE MINIMUM for SDXL at the current moment. This yes, is a large and strong opinionated YELL from me - you'll get a 100mb lora, unlike SD 1.5 where you're gonna get like a 70mb Lora.

Don't forget your FULL MODELS on SDXL are 6.2 GB and pruning has not been a thing yet.

This is EARLY days - and everyone's putting their dollar and fifty cents in. Again: I'm opinionated and could be WRONG by all matter. Don't take my article as a bible, I'm trained in graphic design and illustration - not programming.

DIM SIZE?

I've been working on 64/32. It produces 300+ mb loras, YES. It's FREAKING ANNOYING Also that currently I almost REFUSE to learn ComfyUI, and Automatic1111 breaks when trying to use lora from SDXL. (And yes, i've had an updated one, the runpod docker image i've shown is the one with SD&CN&Roop as well as Kohya.)

Again you CAN EMPLOY SD1.5 shenanigans and try and DENY IT MORE DIM - and I WILL be proud of you if you can get a 1mb SDXL lora. But i can't promise that it'll hold a lot of data.

SD 1.5 didn't NEED to hold that much data, and I"m not sure WHY or how or otherwise other than maybe it was the sheer brute force of "SCIIIIEEENCE!"

Other Settings?

Oh yes, let's finish this shall we?

  • Gradient Checkpointing

  • Shuffle caption (ALWAYS ALWAYS!)

  • Memory Effecient attention

  • Xformers (if ya got it flaunt it)

  • MIN SNR GAMMA - we talked about this earlier -5.

  • DO NOT UPSCALE BUCKET RESOLUTION (Dont ask don't tell it's auto click for me)

  • Bucket Reso steps 64.

  • (ABUNCH Of settings for time steps leave alone those are automatic)

  • Noise offset: OG and set to: 0.0357

  • Adaptive noise scale: 0.00357

  • Something about Rate of caption dropout (I dont use cause shuffle caption but it's set to: 0.05)

  • If you have a WANDB (Weights and Balances) API setup - go f or gold, I like this sometimes because tensorboard confuses me sometimes

Testing Time!

What do you do when you hate the idea of node based and A1111 aint working:

Take this link: You have received FREE credits, start your ONLINE GENERATION now! https://tensor.art/models/624847635087557370?source_id=nz-3plnjkUG1ofAvanb09hMv

(Yes i'm shilling tensor shht) - Upload your lora, TEST IT and if you don't want it exclusive to tensor - just TEST it there and then get your generation details set to a text file or just copy it over as you gen them.

Win/Win.

Samples

Sorry, but I don't have any help on Auto1111 or Comfy - i can't get Auto to work with SDXL loras, and Comfy scares the utter crap out of me.

So i'll upload some sample pics of some of my loras over at Tensor, and I promise i'll train some test them and upload here.

Yes, there is now an SDXL Sailor MOON GOTH lora. Jokes on you, i'll actually get it to 1.5 in the next day or so when I figure out bmaltais for 1.5 - I Promise you it'll be a less than 50mb lora.

Niji Slime XL - this is ALREADY a Lora on 1.5, and it needs a TINY bit of care on SDXL - but all my SDXL loras are ALPHA stage because i'm still learning.

Venti - it's half AI data and a smidge of non AI data, my plan is to find everyone's venti LORA's for 1.5, build up an AI database and retrain this lol. Trust me I have asshole plans to do the same for a ton of genshin characters - I barely play this game, and I have the utmost troll worthy note to do ALL AI data for a lot of stuff like this for fun.

Lucifer. Just plain lucifer.

X-men 90s Celshade

Fake Norman Rockwell, it can get dialed down to be more manga or it can be used directly as sort of a hilarious painting - It'll come to Civit in the next week or so don't worry.

Conclusions

Be patient, if you don't want to do SDXL YET - don't. If you want to have a play and don't want to train: Don't. IF you want to: DO IT.

Also don't complain about file size until we learn how to MANAGE our dim and alpha sizes for loras - I know the community's panicked about 100+MB loras - but Let's be real Lykon makes 128mb SD 1.5 loras and still gets downloads.

Have some compassion: SDXL is new. We're all in this together, and we're all trying to figure stuff out!

Ps: The jsons will get reattached after I publish it's being a brat.

9