Type | Other |
Stats | 5,804 |
Reviews | (271) |
Published | Feb 13, 2024 |
Base Model | |
Training | Steps: 202 Epochs: 148 |
Hash | AutoV2 9F39F32AB8 |
Thanks to sp00ns' guide:
Training a Custom Adetailer Model | Civitai
I created a custom foot model using yolov8x.
The foot model that sp00ns provided was helpful, but I wanted to see about making my own.
Version 1.0:
I'd tried using AutoDistiller and Grounded SAM to automatically label each of the 1000 images, but it partially failed, in that it also registered hands as feet. (Also I hate Colab, as I can't get work done there without it ending the job prematurely)
Therefore, I painstakingly labeled each and every image using RectLabel on my Mac, then spent about 8 hours training the YOLO model on my PC.
Though I'd planned for 500 epochs, it ended early and determined that the best was at the 93rd epoch.
I included a lot of my own generated images, as well as some stock images; anime, 3D models, and realistic images; male and female, varying skin tones, and various footwear configurations as well as barefoot images. That being said, there are some things it still cannot handle well, such as unconventional poses (like images rotated by 90 degrees), and images where the foot is the subject of composition. My guess is because the vast majority of the training images were of that with the feet taking up a small percentage of the canvas, not enough training was dedicated for closeups of feet. On the other hand, my intent was to use this model to refine feet that would otherwise be neglected, such as in the case of full body shots where the feet take up a tiny fraction of the canvas space.
In short, this version is very good at dealing with feet for standing poses especially in full body shots. But it can struggle with feet outside of that range.
Version 2.0:
I noticed that I mislabeled my training/validation folders for version 1, so my training folder was actually my validation folder and vice-versa. I went ahead and renamed them, however simply doing so and assuming that it would take no more than 100 epochs like version 1 led to some other issues--it started detecting whole bodies as feet. So that was 3 hours down the drain. I set the epochs for 200, migrated a lot of the old validations images into the training folder, and added around 160 new images (using RectLabel to painstakingly label each and every image manually.) This time, after 12 hours, it determined that epoch 148 was the best version, so that is what this is.
From what I've tested, it can detect feet in various configurations far better than v1.0 with few issues; it can detect soles; it can detect feet rotated by 90 degrees; and it can mostly detect feet in unconventional poses--depending on the pose.
A few issues I've noticed, however, is that it sometimes detects hands/knees/other objects as feet, albeit at a lower confidence level than actual feet. If you see this occurring, I'd recommend increasing the Detection model confidence threshold in the Adetailer Detection settings to at least 0.5.
For images of feet that takes up a great majority of the canvas, sometimes it detects them, sometimes it partially detects them, and sometimes it detects one but not the other. Arguably, this model wasn't designed for such images, even though such images were included in the training dataset, because what this model does is crops the total canvas to focus on its target; feet, in order to dedicate a lot of image generation to refine/modify said feet. If feet are already the focus of the image, taking up 50% or more of the total canvas, then this model effectively serves little in the way of refining the target. One can still use it for that purpose if they so please, but it might lead to more problems than solutions, depending on how you use it.
Installation:
Simply move the file into the ~\stable-diffusion-webui\models\adetailer folder and restart the webui. Should also work on ComfyUI, but I haven't tested it there. Of course, you'll need the ADetailer extension for Automatic 1111, or its equivalent on ComfyUI for any of this to work.
Tip: You can increase the ADetailer model count in Automatic 1111 by going to: Settings>ADetailer>Max models.
Note: Civitai doesn't seem to have a category for ADetailer stuff, so I'm setting it as a checkpoint--even though it's not. The settings on pruned or full and precision stuff I just set to whatever.
Also to note, these days stable diffusion seems to be good at doing feet at least in portrait aspect ratios, so I had a hard time coming up with a good use case for portrait. So I instead used the model to paint Tharja's toenails in the example. But, this model will be especially good for landscape aspect ratios similar to what I do normally, as the feet tend to be quite low quality there.