A massive amount of my time exploring Stable Diffusion has been learning the ins and outs of training. This means I have experimented with a bunch of different techniques and methods.
One important aspect that carries across all techniques and methods is the need for a good dataset. How you source the dataset is up to you, so is how you manage it. But when training, the images have to be in .PNG format, and they need to be RGB images. To faciliate this, I worked with ChatGPT to write some scripts for me that I use on every one of my datasets.
JPGtoPNG - Since most images on the internet are in JPG/JPEG format, we need to convert them to PNG format. If you've got a large dataset, that can be time-consuming. So this script does it for you very quickly, and adds the converted images into a new folder within the existing directory.
ConvertChannel - The most popular training method, Kohya, requires photos to be of RBG but some images are RBGA, and some register as grayscale even if they aren't. If you don't convert the channels prior to training, you'll get an error and it won't tell you which file is the offending file. I run all of my datasets through this conversion, just to be safe.
WordReplace - After you caption (automated or manual), you might decide that you want to change a word or phrase that exists more than once in your captions. This script lets you identify a single word or phrase and change it to whatever you want. It's very handy for swapping out "1woman" with "woman" or something similar.
FindandIsolate - After you caption, you might want to pull images that triggered a certain tag such as "redhead" or "dark skin". This script does just that, and moves the images into a folder of their own for you to be able to do further caption work.
FindandIsolate2 - I haven't finetuned this part of the workflow just yet, so it exists in two scripts. Find and Isolate 2 searches a directory of your choosing for .txt caption files that match the filenames of the images that were previously isolated, and moves them into the same folder.
My current application of this is using the first script to pull images that are tagged with WD14 danbooru tags, then use the second script to pull LLaVa-generated captions into the folder so that I can add additional details that LLaVa left out (such as breast size.)
I have a few more scripts that are more specialized. I may post them as well in the future, if there is interest.
1/6/23 Update: The previous WordReplace.py was the wrong version. I've updated it to the correct working version.