Type | |
Stats | 633 6,572 |
Reviews | (103) |
Published | Jun 30, 2024 |
Base Model | |
Training | Steps: 13,440 Epochs: 24 |
Usage Tips | Strength: 0.9 |
Trigger Words | sniffing armpit |
Hash | AutoV2 6AA286D7B0 |
This LoRA allows depicting a person smelling / sniffing their own armpit.
Surprisingly Pony doesn't already know this concept (at least with tags I tried), so I decided on creating a LoRA for it.
Main trigger: sniffing armpit
Additional tags (ordered by tag frequency): exposed armpit, clothed armpit, arm lowered
(the last one had so few images it is really hit and miss; it is better to also have exposed armpit in the negative for clothed armpit, as the term armpit already makes pony quite happy to generate ... well armpits)
Suggested LoRA weight: Depending on the style you want 0.4 – 1.0.
And the explanation of pony being so bad with it is probably that there are barely any images tagged accordingly on different boorus.
And with that we can talk a bit about the
Training
Specifically, I collected 36 samples from different boorus (not really cherry picking, they were all good ones I could find).
I then generated 170 additional images with Pony Diffusion using Control Net (mixture of Depth and Pose models). For that I used random art styles, gender, ... .
Input images for the Control Net were both drawings and real photographs (unbelievably many stock photos of this exist). They were sourced from conventional image search and 19 of them were added to the training images.
This resulted in 225 training images.
All of them were then tagged using wd-swinv2-tagger-v3 by SmilingWolf and afterwards the 4 tags listed above were manually added.
Afterwards I added masks to the images using first RemBG (Human) and then ClipSeg for the text Arm, Armpit, Face. As the dataset was small and a quick glance showed that not all masks were correct I also did a quick manual pass correcting masks.
I then trained the LoRA using OneTrainer.
The relevant training parameters were:
Prodigy optimizer
24 epochs @ 560 steps
10 image repetitions (with image and caption variations)
Batch size 4
Using image masks, with unmasked probability of 0.03, unmasked weight 0.02
1024 resolution with aspect bucketing
LoRA rank 48, alpha 2 (later resized to target 32, with sv_fro 0.99)
Training was done for about 8 hours on a RTX 4090.
If you have any additional questions feel free to ask.