Type | |
Stats | 64 48 |
Reviews | (2) |
Published | Aug 15, 2024 |
Base Model | |
Training | Steps: 4,000 |
Trigger Words | fursuit head |
Hash | AutoV2 FF5BE8C5CA |
Initial test of my Fursuit Head LoRA for Pony, trained locally on Windows 11 on a 3090 using AI-Toolkit.
Training converged after around 4-5 hours. After about 4000 steps with gradient accumulations steps set to 2 image quality began to rapidly degrade. Checkpoints were saved in 1000 step increments, though it seems 3500 steps might have produced better results than 3000 or 4000 steps, so I will keep this in mind for future iterations.
Dataset is largely the same as the dataset used for the Pony LoRA with a couple of additions and removals. Images were automatically captioned using XComposer2 4KHD and then manually modified for precision. Around 20% of the dataset was made up of regularization images consisting of people and animals of various kinds, styles, and in various situations which I generated with FLUX directly within ComfyUI, saving the prompt used directly alongside the image.
Surprised by how well this first test has turned out. Increasing the size of the dataset, increasing the ratio of regularization images, and decreasing the checkpoint saving interval should allow for even higher quality iterations, so stay tuned!
Recommended Parameters
LoRA Weight:
0.8 - 0.9
Trigger Word:
Fursuit Head (See example images for prompts)
Other Notes
Example images were generated with dev using default model weights and FP16 T5. Cannot guarantee similar quality at FP8 or NF4, but I would love to see how your tests turn out
Using the Realism LoRA tends to improve image quality and background elements for realistic generations, but can make the fursuit heads look more like real animals than actual fursuits. This might be preferable for some, but otherwise, it is recommended to set the Realism LoRA weight to anything below 0.75
Seems to be pretty flexible to prompting style, but just in case, here is an example of how XComposer captioned most of the images:
a photograph of a woman wearing a cat fursuit head. She is standing outdoors and is the sole focus of the photograph. She is dressed in a pink shirt, black skirt, and black thigh-highs. Her hair is a multi-color combination of white and black and she has fangs. She is wearing animal gloves, and in her hand she is holding her skirt up. There are plants in the background
Around 60% of the dataset consisted of NSFW elements. I am not sure how well these transferred over to the LoRA, but they shouldn't be present unless prompted for