Type | |
Stats | 3,068 |
Reviews | (259) |
Published | Aug 8, 2023 |
Base Model | |
Trigger Words | Adriana Chechik |
Hash | AutoV2 9D74B5C09B |
A LoRA for Adriana Chechik.
Process
Images (71)
Focus
30 "full" body (waist/knees up)
17 upper body (chest and head)
18 close up (head and shoulders)
6 weird angles/poses (range from "full" body to upper body)
Aspect ratio
30 1:1
41 3:4
Content (varied...)
faces (1 eyes closed, half smiling, 1 eyeglasses)
lighting
clothing
makeup
background
pose
Misc
I try to exclude any images that have a busy/complex scene/background. Abnormal clothing, hand gestures, etc. are cropped out when possible. My rule of thumb is that if I wouldn't want the image to be generated by the LoRA, I don't include it in the dataset. There are some exceptions to this rule, but it is a good starting point to trim the dataset.
As many duplicate clothing items, facial expressions, poses, pieces of jewelry, etc. are excluded as possible, but it can often be hard to avoid this.
Images are cropped by hand and left at whatever # of pixels achieves the desired final image. They are kept to 3:4, 4:3, or 1:1 aspect ratios.
Many others have commented that 71 images is unnecessary, and that 20 or so will do. I prefer to be in the 40-80 range.
Captions
All begin with "adriana chechik, a photo of a woman..."
I describe the clothing, jewelry, lighting, pose, angle, background, facial expression, makeup, and any other information I do not want showing up in the LoRA gens (abnormal hair color, for example) in sentence form.
I do not describe things I do want to show up in the LoRA, like eye color, hair color, skin tone, body proportions, etc.
I have experimented with adding a fake word "ohwx" to the captions with varying results. I did not do so for this LoRA.
Training Params
model: DreamshaperXL
text_encoder_lr: 0.0004
unet_lr: 0.0004
learning_rate: 0.0004
network_dim: 256
network_alpha: 1
lr_scheduler: constant
optimizer_type: Adafactor
train_batch_size: 1
dataset repeats: 20
epochs: 10 (sometimes up to 12 if I have a highly varied dataset)
max_train_steps: 20 * 10 * # of images (so for this one, it was 20 * 10 * 71 = 14,200)
How is it so small?
After training is complete, I am left with a 1.7gb safetensors file. I use the kohya gui to resize the lora with a rank of 256. This spits out a ~18mb safetensors file that is nearly identical to the 1.7gb file in practice.
I'm sure I missed something here, so let me know if there's any other info that would be useful.