A guide for training a yolov8/9/11 detection model for use with ADetailer (nvidia gpu only)

Part 1 - Dataset

For a reasonably effective model you need to have at least 300 images per object you want to detect.
You can create a model with multiple detection classes, the process is the same for one or multiple objects.
The model will perform better the more images you have and the more diverse your dataset is.

Part 2 - Labeling

Use Roboflow. It is by no means perfect and I have a lot of gripes with how much friction there is with the labeling process, but it is simply the easiest option.

Create an account.
Create a new project.
Choose either Object Detection (bbox) or Instance Segmentation (segm).
Select your dataset folder and wait for all of the images to load.
Choose "Label Myself." Annotate your images with either the Bounding Box Tool (for bbox) or the Polygon Tool (for segm). You can try the Smart Select tool but your selections will likely be wonky.
After you're done click on the "Add Images to Dataset" checkmark up top.
You will be presented with a window that has a "Method" selection, choose "Split Images Between Train/Valid/Test" at about 55% train 35% valid and 10% test.
In the Dataset section that you will be taken to next, choose "Train Model" -> "Custom Training". Type in the version name.
In the Preprocessing section add Auto-Orient. add Resize with "Stretch to" set to a square with dimensions of either 312, 640, or 1024 (or higher). The obvious tradeoff is small image = faster training but lower resulting accuracy, opposite for bigger image size. Just remember that every doubling of the resolution quadruples the computation.
Augmentation section. Augmentation will not modify your existing images but will create new ones with augmentations specified. A must-have is "90° Rotate", checkmark all orientations. Choose the rest for your use case.
Continue to creation. Click on "Download Dataset", choose from yolov8, 9, or 11 (12 only has .yaml files right now, so it's not usable for us).
Choose "Download zip to computer."
Export zip.

Part 3 - Training

Go to https://www.anaconda.com/download scroll down, download and install Miniconda
You should have Anaconda Prompt in your start menu now if you look for it. Open it.
Create a new environment

conda create -n your_environment_name python=3.11

Now activate the environment

conda activate your_environment_name

Install dependencies

conda install pytorch torchvision torchaudio cudatoolkit=11.8 -c pytorch -c nvidia

Install ultralytics

pip install ultralytics

Run the train prompt. (The model version should download automatically. (For versions below 11 include the v between yolo and version number [yolov8], for version 11 and above exclude the v [yolo11]))

yolo train model=yolo11m-seg.pt data=C:/path/to/data.yaml epochs=125 imgsz=640 batch=8

yolo versions come in sizes of n (nano), s (small), m (medium), l (large) and x (eXtra large (I think)). Size contributes to train time and vram usage.

Too many epochs will result in overfitting.

imgsz should match your dataset image size that you chose back in the Preprocessing section on Roboflow.

Notice that the path to the data.yaml file needs to be written with forward slashes ( / ) instead of backslashes ( \ )

You can stop prompt execution with CTRL + C. Look at the amount of vram used and adjust model size and batch count until vram is almost full. (Increase yolo model size before batch count)

To give you a rough estimate for the settings you should use an RTX 2060 (6gb) can at most do yolo11m-seg at imgsz of 640 and batch count of 8 at about 2.3 iterations per second.

Part 4 - The rest of it

Once the training is done it will tell you where it saved the results to.
In the "weights" folder you will have best.pt (best performing epoch) and last.pt (last epoch).
Copy the best.pt to your fitting folder (for comfyui it's models\ultralytics\bbox or segm).

Q: How do I best use my detection model in comfyui?
A: Open comfyui-manager, search and install "ComfyUI Impact Pack" and "ComfyUI Impact Subpack".

Once you restart search for node called "UltralyticsDetectorProvider" and choose your model.
Search for node "BBOX Detector (SEGS)" for bbox and "SEGM Detector (SEGS)" for segm.
Search for "DetailerDebug (SEGS/pipe)"
The bbox/segm Detector node will have an image input, connect the last image output that is not one of the previous detailer image outputs. It will also have a text box, type in all classes that you want detected (use the wording you used when first labeling the dataset), type "all" for all classes to be detected.
Search for ToBasicPipe and connect everything. Connect basic_pipe output of ToBasicPipe to the basic_pipe input of DetailerDebug (SEGS/pipe). You can then daisy chain the basic_pipe output of DetailerDebug (SEGS/pipe) to the next detailer in line.
Search for and connect 2 "Image Preview" nodes to DetailerDebug (SEGS/pipe) "cropped" and "cropped_refined" output, this will be your before/after comparison.
You can ask chatgpt for what all the parameters on the nodes mean.

Q: How do I make one of those images with the bounding box, object name and segmentation visible?

A: Make it yourself.

Create a .txt file.
Paste this code and change paths:

from ultralytics import YOLO
import os
import cv2
import glob

#Paths
image_folder = r"C:\path\to\folder\with\images"
model_path = r"C:\path\to\model.pt"
output_folder = r"C:\path\to\output\folder"

model = YOLO(model_path)

image_paths = [f for ext in ("*.png", "*.jpg", "*.jpeg")
               for f in glob.glob(os.path.join(image_folder, ext))]

def get_unique_output_path(folder, original_name):
    name, ext = os.path.splitext(os.path.basename(original_name))
    base_name = f"{name}_detection"
    output_file = os.path.join(folder, f"{base_name}{ext}")
    
    counter = 1
    while os.path.exists(output_file):
        output_file = os.path.join(folder, f"{base_name}_{counter}{ext}")
        counter += 1
    return output_file


results = model.predict(source=image_paths, conf=0.6, imgsz=1280)

for result, img_path in zip(results, image_paths):
    original_img = cv2.imread(img_path)

    annotated_img = result.plot(img=original_img.copy())

    output_file = get_unique_output_path(output_folder, img_path)
    cv2.imwrite(output_file, annotated_img)
    print(f"Saved {img_path} -> {output_file}")

Change file extension from .txt to .py

Instead of running this code through Anaconda manually every time you could create a .txt file, type in:

@echo off
call C:\Users\user_name\miniconda3\Scripts\activate.bat your_environment_name

python C:\path\to\code.py

echo Done.
pause

and change extension to .bat , now you can run this code just by running the .bat

thanks for reading and I hope you find this helpful