A massive amount of my time exploring Stable Diffusion has been learning the ins and outs of training. This means I have experimented with a bunch of different techniques and methods.
One important aspect that carries across all techniques and methods is the need for a good dataset. How you source the dataset is up to you, so is how you manage it. But when training, the images have to be in .PNG format, and they need to be RGB images. To faciliate this, I worked with ChatGPT to write some scripts for me that I use on every one of my datasets.
JPGtoPNG - Since most images on the internet are in JPG/JPEG format, we need to convert them to PNG format. If you've got a large dataset, that can be time-consuming. So this script does it for you very quickly, and adds the converted images into a new folder within the existing directory.
ConvertChannel - The most popular training method, Kohya, requires photos to be of RBG but some images are RBGA, and some register as grayscale even if they aren't. If you don't convert the channels prior to training, you'll get an error and it won't tell you which file is the offending file. I run all of my datasets through this conversion, just to be safe.
WordReplace - After you caption (automated or manual), you might decide that you want to change a word or phrase that exists more than once in your captions. This script lets you identify a single word or phrase and change it to whatever you want. It's very handy for swapping out "1woman" with "woman" or something similar.
FindandIsolate - After you caption, you might want to pull images that triggered a certain tag such as "redhead" or "dark skin". This script does just that, and moves the images into a folder of their own for you to be able to do further caption work.
FindandIsolate2 - I haven't finetuned this part of the workflow just yet, so it exists in two scripts. Find and Isolate 2 searches a directory of your choosing for .txt caption files that match the filenames of the images that were previously isolated, and moves them into the same folder.
My current application of this is using the first script to pull images that are tagged with WD14 danbooru tags, then use the second script to pull LLaVa-generated captions into the folder so that I can add additional details that LLaVa left out (such as breast size.)
TruncScan2 - Scan your entire directory for truncated images, which cause failures with Kohya training. Truncated images are files that have been cut off or are incomplete, often due to issues during the download or saving process. These images may not contain all the data required for proper display or processing, which can lead to errors when trying to use them in training models or other applications.
Convert2PNG - A level up from JPGtoPNG, in that this converts JPG/JPEG/WEBP files to PNG.
FindandIsolateAlt - A alternate version of FindandIsolate, this version will pull the images into a folder and then pull the corresponding caption files into the same folder.
EndoftheLine - Quickly add in a trigger word or new tag to an existing dataset!