To train this model I have used Stable Diffusion XL (SDXL) 1.0 Base model. I still find base model best for realism.
I used the configuration I shared here currently. It has been discovered after 120 different full trainings experimentation : https://www.patreon.com/posts/very-best-for-of-89213064
Hopefully I will make a full big tutorial of DreamBooth training and show all the settings.
For training Kohya SS GUI is used. If you don’t know how to install and use Kohya SS GUI here the tutorial you need > https://youtu.be/sBFGitIvD2A
And here is the tutorial that shows how to use shared configuration both on Windows and RunPod > https://www.youtube.com/watch?v=EEV8RPohsbw
If you are not my Patreon supporter currently I have full public tutorial for LoRA training of SDXL that requires 12 GB VRAM > https://youtu.be/sBFGitIvD2A
If you don’t have a strong GPU and PC you can use Free Kaggle notebook for SDXL DreamBooth training > https://youtu.be/16-b1AjvyBE
So I used the below images as training images dataset.
This training dataset is at best medium quality because it has repeating backgrounds, repeating clothing and missing fully body pose. All these causes overtraining and lesser generalization.
Then for training configuration I did the following.
When training with regularization images, I did 1 epoch and 150 repeating. So more regularization images are used. And saved checkpoints at every 30 epochs. To make it that way calculate the number of steps you need to save. It is easy. 15 30 2 + 1
The images are generated from epoch 150 so it is a little bit overtrained.
I have used ultra high quality manually picked Unsplash real images having man dataset as regularization images dataset. This means they were ground truth regularization images like how Stable Diffusion was initially trained.
You can download dataset from > https://www.patreon.com/posts/massive-4k-woman-87700469
5200 images for both man and woman. Perfect quality cropped and resized to many common resolutions.
All training images and reg images were 1024x1024 pixels.
Used dynamic prompting so multiple color suits are generated.
Used prompts are as below.
Positive
ohwx:0.98 man slightly smiling wearing and wearing an expensive {red|blue|white|black|brown|gold} suit , photoshoot in a sunny day in a city , hd, hdr, uhd, 2k, 4k, 8k
Negative
sunglasses, illustration, 3d, 2d, painting, cartoons, sketch, squinting eyes, blurry
In the after detailer (ADetailer) the following settings are used.
Prompt
photo of ohwx man slightly smiling
0.7 denoiose and 29 steps. Rest are default.
Here below full png info
parameters
ohwx:0.98 man slightly smiling wearing and wearing an expensive gold suit , photoshoot in a sunny day in a city , hd, hdr, uhd, 2k, 4k, 8k,
Negative prompt: sunglasses, illustration, 3d, 2d, painting, cartoons, sketch, squinting eyes, blurry
Steps: 20, Sampler: DPM++ 2M SDE Karras, CFG scale: 7, Seed: 744692608, Size: 1024x1024, Model hash: eef545047f, Model: me15img_150repeat, ADetailer model: face_yolov8n.pt, ADetailer prompt: photo of ohwx man slightly smiling, ADetailer confidence: 0.3, ADetailer mask only top k largest: 1, ADetailer dilate erode: 4, ADetailer mask blur: 4, ADetailer denoising strength: 0.7, ADetailer inpaint only masked: True, ADetailer inpaint padding: 32, ADetailer use separate steps: True, ADetailer steps: 29, ADetailer version: 23.11.1, Version: v1.7.0-133-gde03882d
And here the results.