toml.zip

LoRa, LoCon, LoHa, LoKr, GloRaSo many different LoRa variations. Are they any good?Googling for results doesn't yield much good information. So I decided to run some tests myself.I took the training data used in yozakura quartet LoRa and trained multiple different LoRa types with same training data and more-or-less same training parameters.<s>The trained models are not available currently. Apparently articles can't have large files attached and model page complained about "too many files" when I tried to attach 58 models in there.</s>The trained models are here if you want to test them: <a target="_blank" rel="ugc" href="https://civitai.com/models/2343120?modelVersionId=2635649">https://civitai.com/models/2343120?modelVersionId=2635649</a>Some steps during training give bad results. Due to low number of saved steps it is possible that I was just unlucky and the bad looking models are just saved at bad spots. Number of comparison images was also relatively low.Also I am no expert so some of the training settings I chose are based on nothing. Better training settings most likely exist.TL;DR:LoKr, GloRa, LoCon are all good. LoKr is maybe best in my tests.LoHa doesn't feel that good compared to the above. Even LoRa had better results than it.LoRa did reach results almost as good as the others. But sometimes required more steps to reach same level.Training TE made LoRa better.8000 steps was maybe not long enough training. At 2000 steps the results for single character were already good. But for multi character use at least 4000 steps were needed, and it seems like 8000 steps (end of training) was not the peak of improvements. Perhaps even longer training could make things better.<h1 id="general-info">General info</h1>Training data: 754 screenshots from yozakura quartet. Tagging style used was "booru tags of full image contents". Training data is almost identical to what was used here so you can just look at that <a target="_blank" rel="ugc" href="https://civitai.com/models/1021790/anime-screencap-style-yozakura-quartet">https://civitai.com/models/1021790/anime-screencap-style-yozakura-quartet</a>This training data was used because it is known to give pretty good results, and the base model is not good at producing this style and characters.Each LoRa type was trained for rougly same amount of time (7.5h on RTX 3070). Multiple models were saved mid training at various steps.See "tomls.zip" download for exact training settings.Dim 16, alpha 16.Optimizer used was same for all: <code>prodigyopt.prodigy.Prodigy(decouple=True,weight_decay=0.01,d_coef=0.5,use_bias_correction=True,betas=(0.9, 0.99),slice_p=10,safeguard_warmup=True,growth_rate=1.01)</code>.Batch size 2.All the memory saving options were enabled in order to make it fit into 8 GB VRAM.<h1 id="comparison-images">Comparison images</h1>I'll be comparing the models with these prompts using WAI-illustrious-SDXL v1.5. And this is the result without any LoRas.I did run bunch of test matrices with other prompts and seeds too and they impact my judgement of the models. Dumping them all here would flood the article so you'll have to take my word on them.yozq, anime screenshot, 1girl, nanami ao, blue hair, medium hair, cat ears, blue eyes:0.6), shorts, t-shirt, balancing on head, object on head, book, smile ,looking at viewer, hand on own chin, blue sky, blue background, standing, upper body<edge-media url="28eeca37-48d0-45c7-997c-f852f423a87a" type="image" filename="baseline.png"></edge-media>yozq, anime screenshot, 2girls, multiple girls, isone kotoha, green shirt, t-shirt, reen skirt, pleated skirt, brown eyes, nanami ao, black dress , sleeveless dress, cat hat, arm around shoulder, sitting<edge-media url="d686154c-8776-4b39-aae4-79ec9ad0d463" type="image" filename="baseline.jpeg"></edge-media>yozq, anime screenshot, 1girl, kurumaki zakuro, red hair, medium hair, bob cut, yellow eyes, china dress, cleavage cutout, light smile, crossed arms, standing, hips, looking at viewer<edge-media url="fae928da-9140-406b-ba63-ec1eac0787df" type="image" filename="00520-1269853121.jpeg"></edge-media><h1 id="standard-lora">Standard LoRa</h1>The normal and most common LoRa.dim 32, alpha 16. TE training disabled. Trained for 10000 steps. 2.47 s/it, 7.4 GB VRAM. 162 MB model size.<edge-media url="518f8cf2-18ae-46a5-8c42-f374e48bd4ba" type="image" filename="lora-steps-1.jpg"></edge-media><edge-media url="f66d7b8f-09ea-48d4-84bb-88b69f2754eb" type="image" filename="lora.jpg"></edge-media><edge-media url="899084f4-3fb5-49f9-9ded-67cd6c9d8881" type="image" filename="lora.jpg"></edge-media>nanami ao good after 6000 steps.kurumaki zakuro looks fine pretty fast. Maybe this was a too simple thing to test?Isone kotoha and Nanami Ao has lots of bleeding between characters until large number of training steps.<h3 id="te-training-yesno">TE training yes/no?</h3>Some people say that training TE makes it better. Some say training TE makes the model blow up on your face.Same settings as above but this time with TE training enabled.2.71 s/it, 7.5 GB VRAM. 217 MB model size.Training speed is noticeably slower.<edge-media url="7f503a6d-431e-4909-8863-3ce8c9eeaf51" type="image" filename="lora-te-steps-1.jpg"></edge-media><edge-media url="557de087-e182-4939-bc59-8dca8697d9e0" type="image" filename="lora-te.jpg"></edge-media><edge-media url="4661bf53-c792-4597-ac89-c66a2cc72c36" type="image" filename="lora-te.jpg"></edge-media>nanami ao at 1200 steps look nice, Step 8400 looks very good too. But at 4800 steps something awful happens and the output is weirdly corrupted.kurumaki zakuro about same as before.Isone kotoha and Nanami Ao are much better with TE. The character bleeding stops at earlier step.I would say that the result with TE training enabled looks nicer but the training appears to be less stable. Overall I would say TE is a net positive for in this test.The rest of the LoRa types were all trained with TE training enabled.<h1 id="locon">LoCon</h1>LoRa with Convolutional layers.LoCon is often described as "simple upgrade" over normal LoRa.Extra parameters over LoRa: Convolution dimension 32, convolution alpha 16.What are good numbers for convolution layers? I don't know, so I just set them to be same as dimension and alpha.Trained for 8700 steps. 2.99 s/it, 7.5 GB VRAM. 226 MB model size.Again a bit slower training. Fewer intermediate models were saved.<edge-media url="9899fc7e-acb1-4d01-8817-2d033597debd" type="image" filename="locon-steps-1.jpg"></edge-media><edge-media url="00761840-2542-4c24-99ab-bcc9501e0ec4" type="image" filename="locon.jpg"></edge-media><edge-media url="76d80978-b563-4da2-bf88-62f56738fe59" type="image" filename="locon.jpg"></edge-media>nanami ao looks fine.kurumaki zakuro looks fine. I would say it has more of her "attitude" in these images.Isone kotoha and Nanami Ao again get their bleeding fixed at late steps. Funny how the Isone kotoha and Nanami Ao swap places as steps change.<h1 id="loha">LoHa</h1>Low-rank Hadamard productAnother upgrade over LoRa. Extra parameters over LoRa: Convolution dimension and convolution alpha.At first I tried training with both dimensions at 32 and alpha 16. But I ran out of VRAM.A blog somewhere said that the LoHa dimension is effectively squared. So LoHa with dimension 32 would be "equal" to LoRa with dimension 1024.So I trained with both dimensions at 16 and alpha 8.Trained for 6800 steps. 3.95 s/it. 7.3 GB VRAM. 226 MB model size.This is by far the slowest LoRa type tested.<edge-media url="306dcad3-e904-4c30-b2e1-b524c3b8770c" type="image" filename="loha-steps-1.jpg"></edge-media><edge-media url="e4ae6a19-1417-4717-b8d1-8f20cb675824" type="image" filename="loha.jpg"></edge-media><edge-media url="1af8fa8b-6f82-4c66-bf72-e27126ef8bcf" type="image" filename="loha.jpg"></edge-media>nanami ao has a problems with details in eyes. Other than that it is "fine" I guess. Nothing really stands out.kurumaki zakuro looks fine without issues.kotoha and Nanami Ao look a bit off.<h3 id="loha-2">LoHa 2</h3>As I already said earlier, some people say that a LoHa rank is squared. So Maybe the rank in first LoHa test was just too high?So I trained second LoHa with dim 6 alpha 3.Note that this was trained for fewer steps to see at what point the training starts to affect things.<edge-media url="0ff91b1f-8234-4d51-a5b1-2a339cda91a8" type="image" filename="loha2-steps.jpg"></edge-media>The results aren't really better. Also it seems like at around 1600 steps the training is starting to take effect.<h1 id="lokr">LoKr</h1>Low-rank Kronecker productIt is like LoHa, but different. Not much information about it anywhere.It also has convolution dimension and alpha. But there is also a new "factor" parameter."What does the factor do". I don't know. But it seems to make the resulting model smaller. Factor 2 halves the size, facto 4 quarters the size.Also the convolution dimension may mean something different here.Also there is "full matrix mode" option. I don't know what it does other than it uses too much VRAM so I can't use it.Since I couldn't find even a single blog that said anything about the "factor" I trained three LoKr with factors 0, 2 and 4.Rank 0: 3.24 s/it. 7.5 GB vram. 226 MB model size. <edge-media url="414e550c-7f50-4bb8-ada8-48f7a3a80398" type="image" filename="lokr-0.jpg"></edge-media><edge-media url="46e73600-47d5-4081-b4a2-43d9d60b63fa" type="image" filename="lokr-0.jpg"></edge-media>Rank 2: 3.17 s/it. 7 GB Vram. 113 MB model size.<edge-media url="a36ef5b6-99ed-4cf2-b02e-4d4b751ab803" type="image" filename="lokr-2-steps-1.jpg"></edge-media><edge-media url="9a13c465-b5f9-4897-81ce-13b4df784b3e" type="image" filename="lokr-2.jpg"></edge-media><edge-media url="f149ec74-0f2c-4e1a-a2e3-afa5b1619731" type="image" filename="lokr-2.jpg"></edge-media> Rank 4: 3,15 s/it. 7.3 GB vram. 57 MB model size.<edge-media url="ea439741-fe6c-42dc-9da7-90f148cf5a42" type="image" filename="lokr-4-steps-1.jpg"></edge-media><edge-media url="2a423276-09e0-4a18-9e1d-7e1500d68588" type="image" filename="lokr-4.jpg"></edge-media><edge-media url="c9884caa-e1bf-4667-ad9b-990e488c09b3" type="image" filename="lokr-4.jpg"></edge-media>Differences in training speed and VRAM usage is within margin of error I'd say.nanami ao look pretty good with factors 0 and 2. With 4 the likeness is reduced noticeably.This one seems to have learned the suit better than others.kurumaki zakuro looks fine without issues.kotoha and Nanami Ao image suffers noticeably as factor goes up.Higher factor does reduce quality but much less than would be expected from the size reduction. Factor 4 also seems to learn slower. But at 8200 steps even that one had decent result. Same quality with smaller size? Black magic.Perhaps LoKr requires longer training than others? Or maybe it is just more stable during training allowing it to benefit from extra training without frying itself. Stay tuned for part 2.A bit worrying thing is that sd-webui-forge-classic spits out large list of errors with LoKr. Perhaps it is not well supported?The errors were like this:<pre><code>ERROR:ldm_patched.modules.model_patcher:Failed to apply lokr to diffusion_model.output_blocks.8.0.out_layers.3.weight 
view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.`</code></pre><h1 id="glora">GloRa</h1>"GLoRA: Generalized LoRA for Parameter-Efficient Fine-tuning". Whatever that may mean. I trained one of them. Its parameter seem to be the same: Convolution and alpha.3.44 s/it. 228 MB model size.<edge-media url="d1711c48-bcaa-444b-99f5-8991f248296d" type="image" filename="locon-steps-1.jpg"></edge-media><edge-media url="c488ae2a-ef13-4236-984d-7d76446f1278" type="image" filename="glora.jpg"></edge-media><edge-media url="f35c5e31-c09b-4630-9bac-6888659275c5" type="image" filename="glora.jpg"></edge-media>Results seem fine. The eyes are again a bit corrupted but the suits are well learned.The hat goes on wrong character. But we can blame poor prompt understanding of SDXL for this I think.<h1 id="diag-oft">diag-oft</h1>I also trained this kind of LoRa. But it seems that sd-webui-forge-classic doesn't support it so I haven't tested it.

00020-2199356876.jpeg

Lora tests: LoRa, LoCon, LoHa, LoKr, GloRa

physical violence

weapon violence

wide hips

revealing clothes

thick thighs

downblouse

convenient censoring

huge breasts

pg-13

corpses

suggestive

oral invitation

pg13

sexy

sexual situations

male nudity

disturbing

male swimwear or underwear

female swimwear or underwear

partial nudity

undressed

female nudity

breasts out

exposed female nipple

breast out

lingerie

male underwear

hair over breasts

female swimwear

gigantic breasts

no panties

graphic violence or gore

covered nipples

huge butt

strapless leotard

sitting on face

emaciated bodies

one breast out

nsfw

female underwear

nude

graphic male nudity

adult toys

illustrated explicit nudity

nudity

graphic female nudity

hentai

futanari

porn

sexual intent

genitals

peeing

vore

oral

sexual activity

anal

blowjob

dildo riding

incest

hanging

hate symbols

nazi party

white supremacy

diapers

scat

self injury

hate speech

urine

extremist

child on child

latex clothing

swimwear

bukkake

fellatio

cumshot

implied fellatio

eat_cum

cumdrip

cum in pussy

cum on face

after fellatio

cum on hair

cum on body

cum on tongue

cum on hands

cum in mouth

triple fellatio

autofellatio

fucked silly

cum on pussy