ATTENTION: IF YOU HAVE DOWNLOADED MODELS BEFORE ALREADY - RE-DOWNLOAD THEM PLEASE. I SHOULD HAVE UPDATED THEM WITH NO DILL VERSION(EXCEPT MODEL THAT WAS CHOSEN AS NEW VERSION UPDATE, SO YOU CAN RECEIVE NOTIFICATIONS.
If you are having issues:
Activate your venv
pip install dill
pip install --upgrade ultralytics
This should fix generic problems.
Small(and not so) segmentation model aiming to create accurate masks of breasts for improved inpainting quality with adetailer extension and other use cases.
You can say that this one is a labor of love. I hand annotated decently large dataset of images to train this.
Im presenting you a lineup of 3 models: Nano, Small and Medium, where each step up brings meaningful improvements to overall accuracy and detection capabilities.
Use Medium for best preformance, especially on complex scenes.
On preview i'll be showing only basic scenes, since i don't generate much of compelx ones to showcase, lol.
Validation results
I performed validation on val portion of dataset i used to train at 0.25 threshold and 0.6 IoU, at both 640 and 1024 resolution(Since my model is trained on 1024). I compared it against existing popular Booba detector by @Randanon (Civitai is dead, can't fetch link to properly mention them :( )
P.S. i realized civitai shrunk it to nothing, here is a close-up:P.P.S. v6n/s/m refers to my internal labeling, on civitai it is "v1".
Results are as follows:
v6n: 0.536, v6s: 0.567, v6m: 0.613, Randanon Booba: 0.434
In 1024 resolution my model maintains same level of performance, with Randanon Booba dipping to 0.367.
But please take those results with a grain of salt, since my val dataset doesn't represent specific way Randanon could've annotated their data, but it seems to be compatible, and we do go for quite similar target, based on what i see in manual prediction checks.
1024 resolution tests is not representative of Randanon's model performance either, since it wasn't trained on it, and shouldn't be used like that. It is made only to test performance on my model at native resolution it was trained on.
Val dataset consisted of over 200 images varying drastically in complexity of annotated subject and is not very representative of simple scenes performance.
Personally, i would bump Randanon's model by 0.07-0.10 points based on performance with simple scenes, in which im sometimes lacking.
Please try out both models and pick one that is better for your particular use case, or alternate on case-by-case basis.
My own judgement is based on data that is highly skewed towards complexity of prediction.
Can also be downloaded at HF whenever i would bother to upload there: https://huggingface.co/Anzhc/Anzhcs_YOLOs
Script im using: https://github.com/Anzhc/Training-script-for-Ultralytics-YOLO
Usage
Put into `automatic1111/models/adetailer`, or any other place that you know will ork with YOLOv8 models.
Use.
Training and further info
Dataset consists of over 2000 images. (90/10 train/val split)
Target was to create model that can annotate more complex scenarios, particularly from side, behind, below, above, and other angles, in all possible sizes and in veriety of shapes, while trying to avoid obstructions like hands, and approximating breast area under clothes and rarely other things, if possible, and not missing out on partial appearances.
Also segmented visibility, like breasts visible from behind(on both sides).
And multi-subject performance(multiple girls, docking, etc.).
And do that while maintaining tight mask margin.
This is a hard target to get right, and this is a very much WIP still. While it is not achieving it currently in all ways, it is a meaningful performance gain, especially in complex detections.
Trained in 1024 resolution, but performs on the same level in 640 that is used in adetailer by default.
Training of M variant already took over 16 hours, so it is unlikely i will continue making that one, unless i get better GPU(curently using 4060ti), which is unlikely in near future.
S variant took 6.9 hours(~~not~~ nice).
And N variant took 4.2 hours, which is acceptable.
For a total of over 24 hours for just training time. e_e