I'm trying to keep this one PRETTY barebones. There are thousands of things I could add, so I'm trying to keep it to the most useful that I've found. Likely will keep each category to 5 or less.

Guides

The Rule Of 3 Link
- A simple understanding of the importance of the rule of 3 and it's direct utility with diffusion models based on the T5 architecture, CLIP_L, and CLIP_G.
- >>> EVERYONE should know this concept.
Deep Learning Guides Link
- A series of more indepth and direct methods of creating known and new AI models using multiple libraries. This is for more advanced users.

Common Diffusion Models

SD15 3rd Party Link
- A fairly defunct version of stable diffusion. It is very hit or miss and the entire architecture was redesigned. The original SD15 was pulled for one reason or another, so we only have a third party link to the core diffusors output.
SD2 Link
- A not-so-explored sequel to SD15 with a fairly robust capability with the correct images and training processes using a different style of training.
SDXL 1.0 Link
- A fairly middleweight image generator diffusion model. Requires fair hardware, generates very rapidly with good hardware. Enough training has been done on this model to provide paved road to the user and trainer.
Flux1D / Flux1S Link
- A fairly heavyweight image generator diffusion model. Powerful and adaptive. Heavy training and testing has been done on this model and even after is still somewhat of an enigma.
HunYuan Link
- A modern and fairly fast video generation image to action interpolation model. It's become my current fixation.

Generation

ComfyUI Link
- A more advanced way to generate images, videos, and sound.
Forge Link
- A simpler program than ComfyUI used to generate images primarily, an offshoot of A1111 that went full optimization renegade.
A1111 Link
- A powerful diffusion generator that spawned multiple offshoots.

Training

Kohya SS GUI /// SD-Scripts
- A powerful training software with many useful tagging and training potentials.
OpenClip trainer Link
- A powerful command utility to specifically finetune CLIP based models.
Simpletuner Link
- A fairly simple way to handle training that spawned multiple offshoot training programs. A bit more streamlined than Kohya; results often vary and are different due to a different process.
Diffusers Link <<<<<<<<<<
- The industry standard diffusion pipeline for training large-scale models due to it's battle tested datacenter robust accelerate backend with tried and tested fixes.
Tensorflow Link
- A useful library and tool for making AI models.
PyTorch Link
- A commonly used powerful library for ai training.

Useful Scripting Engines

Pycharm Link
- A python streamlining software that supports many automated tasks, as well as jupyter notebooks in kind of awkward ways.
- Scroll down for community version, or pay for professional like I do.
vscode Link
- A powerful lightweight code editor that seems to be having a ton of bloat added to it. Catch it before it gets too nasty like every software does.
Collab Notebook Link
- A useful place to get free GPU hosting for small gpus, and more powerful GPUs such as the 40 gig a100s for more powerful needs.

Sourcing Data

Cheesechaser Link
- A rapid and quick method for extracting deepghs datasets from huggingface such as danbooru, safebooru, and 3dbooru.
- Not the easiest thing in the world to make efficient.
ImageGrabber Link
- fairly easy to use image grabbing software that works well in windows and has proxy support.

Tagging + Captioning

TagGui GUI
- A fairly useful tagging GUI software that runs on windows meant for ease-of-access. Has access to multiple captioning and tagging systems, with a fair amount of shortcuts and user access.

Kohya SS GUI
- A powerful training software with many useful tagging and training potentials, the GUI has multiple methods for tagging images and preparing datasets.

Tagging

ImgUtils Link
- ImgUtils is a software designed to streamline anime and captioning. It has many ai models with easy to access methods of using them.
- Requires some technical skill but is highly effective once a simple software is produced.
  - Bounding Boxes
    - ~~BooruS11 -> Standard bbox identification for some booru tags.~~
    - ~~BooruPP ->~~
    - People -> People identifier for more complex people finding.
    - Faces -> Face finding with attempted gender, I usually nullify gender.
    - Eyes -> I finding with a bit more detail.
    - Heads -> Finding heads, similar to face but identifies a bit differently.
    - HalfBody -> Finds collarbone and upper bodies, mostly cowboy shot.
    - Hands -> Identifies a multitude of hands. Usually if identified it's a pretty good hand.
    - ~~Nude -> Identifies nudity using a list of nudity.~~
    - Text -> Identifies if there is text, and the position of those texts for smudging or removing.
    - TextOCR -> Identifies text in a more accurate detail, allowing for training of text.
    - Censored -> Identifies a few censor zones, useful for tagging censoring.
  - Classification
    - Aesthetic -> Determines an aesthetic quality, not always reliable.
    - AI-Detection -> Determines the likelihood of an image being AI generated.
    - NSFW Detector -> Determines an image type and the NSFW level.
    - Monochrome Checker -> Determines monochrome potential.
    - Greyscale Checker ->
    - Real or Anime -> Decides if an image is real or is anime.
    - Anime Style or Age -> year and style based
    - Truncated -> determines if an image is broken or incomplete, useful for filter
  - Taggers
    - ~~Wd14~~ ~~Link~~
      - ~~A powerful suite of taggers.~~
    - ~~Wd14 Large~~ ~~Link~~
      - ~~The big brother of the suite.~~
    - ~~MLBooru~~ ~~Link~~
      - ~~An older and less accurate tagger.~~
SegmentAnything YoloV8 Link
- Identifies many objects and people. Good for identification when gathering bbox training data.
Hagrid Link
- Identifies hands, fantastic at identification and classification of hands. V2 even does bounding boxes and segmentation masking.
MiDaS Link
- Used for depth mask generation. A battle tested and powerful AI that can be used with a little math to determine offsets, positions, depth comparisons, and so on with identified sections within images.

Captioning

Terminal Generation Link
- A guide for terminal caption generation with seglip-500m. Seems to have many useful and reusable utilities with a little tinkering.

JoyCaption AlphaOne Link
- JoyCaption has multiple versions, I used AlphaOne for my SDXL project.
JoyTag Link
- A powerful tagger from the same developer as joycaption.
T5XXL Blip2 Link
- Good at identification of images, works with many forms of T5 with a little tinkering.
Quora T5 Small Paraphraser Link
- Useful for paraphrasing more complex captions into simpler captions.
SentencePiece Link
- The grandparent of T5 and many other LLMS.
SegLip-so400m Link

Useful Hosts

Here, obviously.
Huggingface Link
- A location meant specifically for hosting pre-trained models, datasets, and many more options.

Dataset-Centric Loaders

Tensorflow Datasets Link
- A very easy to set up, prepare, and train with AI training library. Has access to some fairly hefty classifier systems like Imagenet that can be repurposed into identified and captioned diffusion images with a little time. These run very fast on linux based machines, but can only use GPU from WSL2 in windows.
PyTorch Datasets Link
- Access to many identical datasets as Tensorflow through torchvision; which gives all the added benefits of TorchVision speed on windows machines due to cuda124.

Useful Core Model Datasets

HagridV2 Link
- A powerful baseline hand classification and bounding box model. Useful if you're interested in classification and solidity of hands. Many creative methods can be applied to this dataset other than classification to making diffusion models more robust.
CN3d Pose V7 Link
- Very useful with individual character poses. Has many applications beyond simple control net with a large array of potentials, and has many core potentials if a little creative juice is added.
FashionDiffusionData Link
- Useful as a benchmark test for vision models, and can be trained into diffusion models with almost no time due to the size. Bucketing upscale using a high-potency upscaler can take time, but it's lots of potential fashion images for classification, utility, or simple dataset use.
Converting Datasets Link
- This simple script can be easily modified to download and extract pre-formatted parquet datasets from huggingface using your hf token, and then upload the finished zips to your huggingface repo for zip use.

Useful programs and resources

Guides

Common Diffusion Models

Generation

Training

Useful Scripting Engines

Sourcing Data

Tagging + Captioning

Tagging

Captioning

Useful Hosts

Dataset-Centric Loaders

Useful Core Model Datasets

Comments