My LORA build list todo:

I will be updating this regularly with changes and updates as a log and reference to my current goals. I'll make a JIRA soon but that's more time consuming and demanding than just editing a little text file for now. If I end up with a team I'll regulate a JIRA.

Consistency FLUX <-- Simulacrum Omega

A full core reshape of Consistency has given it a new name, Simulacrum. The design and implementation of the fully filled out pose angle, pose position, pose depth, and including many new poses will result a considerably different experience.
I've been purposefully omitting genitals and male forms to try to retain a bit of posterity with the system, but it seems that this may have been an egregious error. A human subject isn't that simple and it needs to be fully incorporated for testing.
Simulacrum v13 is currently highly unstable. The next iteration will hopefully be considerably more effective and stable due to the new tagging process I've planned with multi-prompting.
The overarching goal of consistency has shifted from just adding NAI to PDXL which really wasn't anywhere near as tough as this, to providing controlling agents for FLUX. The same images are viable to be trained into SDXL but at a much lesser potency.
The original PDXL Consistency was designed to be a fixation tool for 2d images and single perspective 3d images. It fixated on specifics and filled in blanks that didn't work with PDXL autism without a lot of fiddling, so I knew I was adding a bunch of information between point A and B. v1.1 worked but had faults and would often cause artifacts when turned up, but that was kind of the point. Introduce large amounts of information EARLY and let pdxl fill in the blanks with less strength. It worked fairly effectively but it's no longer the goal.
The goal now for consistency in FLUX, is to ensure the fidelity and flexibility of individual characters, their various traits, and their interactions with other subjects in any defined environment with the defined capabilities and utilities permitted to the person's tools, utilities, devices, mechanisms, or whatever.
https://civitai.com/models/477291/consistency-v32-lora-flux1d
V4 Goals:
- If all of these go well, the entire system should be ready for full production capability that includes image modification, video editing, 3d editing, and a great deal more that I simply can't comprehend yet.
- v33 overlays
  - This is a bit of a misnomer, as it's more of a scene definition framework for the next structure
  - This one will take both the least and most amount of time and I have a few experiments I need to run with alpha to make it work, but I'm pretty sure overlaying is going to be a choice option not only for displaying messages, but also for scene control due to the way depth works.
- v34 character imposing, rotation value planning, and careful viewpoint offsets:
  - Ensuring that certain characters do exist and are following directives is a primary goal, as sometimes they simply do not.
  - A full number based rotation valuation using pitch/yaw/roll in a degree based fashion is going to be implemented. It's not going to be perfect as I don't have the math skills nor the image sets nor the 3d software skills to pull this off, but it'll be a good start and hopefully latch onto whatever FLUX has already.
- v35 scene controllers
  - Complex interaction points in scenes, camera control, focus, depth, and more that allow for full scene building along with the characters you place in them.
  - Think of this like a 3d version of the overlay controller, but on steroids if you want it to be.
- v36 lighting controllers
  - Segmented and scene controlled lighting changes that affect all characters, objects, and creations contained within.
  - Each light will be placed and generated based on specific rules defined in unreal using a multitude of lighting types, sources, colors, and so on.
  - Theoretically FLUX should fill in the gaps.
- v37 body types and body customization
  - With the introduction of the basic body types, I want to introduce a more complex body type creation that includes but is not limited to things like:
  - fixing the poses that don't work correctly
  - adding a multitude of additional poses
  - more complex hair:
    - hair interaction with objects, cut hair, damaged hair, discolored hair, multicolored hair, tied hair, wigs, etc
  - more complex eyes:
    - eyes of various types, open, closed, squinting, etc
  - facial expressions of many types:
    - happy, sad, :o, no eyes, simple face, faceless, etc
  - ear types:
    - pointed, rounded, no ears, etc
  - skin colors of many types:
    - light, red, blue, green, white, grey, silver, black, jet black, light brown, brown, dark brown, and more.
    - I'll try to avoid sensitive topics here as people seem to care about skin color a whole lot overall, but I really just want a bunch of colors like the clothes.
  - arm, leg, upper torso, waist, hips, neck, and head size controllers:
    - bicep, shoulder, elbow, wrist, hands, fingers, etc with sizers for length, width, and girth.
    - collarbone, and whatever torso tags there are
    - waist and whatever waist tags there are
    - body size generalizations specifics based on a gradient of 1 to 10 rather than some sort of pre-defined system that something any of the boorus used
- v38 outfits and outfit customization
  - Nearly 200 outfits give or take, each with their own custom parameters.
- v39 500 choice video game, anime, and manga characters sampled from high fidelity data
  - five hundred cigaret- err... I mean... Lots of characters. Yes. Definitely not an absolutely large amount of meme based characters that have no rational linkage to character design or archetype.
  - After that you can build anything or train any character.
- massive fidelity and quality boost:
  - including 10s of thousands of images from various sources of high and top quality anime, 3d models, and photographic semirealism to superimpose and train this particularly finetuned version of flux into a stylistic submission that fits within the parameters.
  - each image will be fidelity scored and tagged within a ratio of score_1 to score_10 in a similar fashion to pony, but I'll have my own unique spin on the system depending how well or badly it goes.

2d Spritesheet Character Constructor

This is built with a fixation on being functional with GGUF 2, entirely devoted to small ram and small required space generation. Precise while being condensed.
- desired capability of generating fast on a cellphone or older computer/laptop
v1 region by region tagging for pixel art animations
- orthographic, testing angles, testing depths
- a multitude of horizontal and vertical aligned sprite sheet characters
- stacking 1 by N grids horizontally
  - 64 x 64 x N
    - n = the amount of animation frames you want
  - single resolution training for individual objects and grids that contain them
    - resize using image sizes and flux to incorporate more
- simple character traits to teach flux what they should look like to give it hooks into it's own data
- simple animation loops to train it the sequence of intentions, depth, and ratios
- manual grid prompt controllers to ensure each grid section is regulated as a subject unto itself in this theme without heavy deviance
v1 fixes
- unknown
v2 animations and character trait finetuning
- isometric, more angles, more depths
- traits, weapons, specifics, and so on
- eye colors, hair colors, arm lengths, species, art styles
- padding between character images

2d Object and Scene Spritesheet Constructor

This is built with a fixation on being functional with GGUF 2, entirely devoted to small ram and small required space generation. Precise while being condensed.
- desired capability of generating fast on a cellphone or older computer/laptop
theoretically this should be able to construct entire spritesheets based on a single image input and img2img
v1 region by region object positioning
- orthographic, testing angles, testing depths
- many objects as individual files treated as objects and trained as 64x64
- stacking 16x16 grids with identifiers to designate sections manually either using a simple software to prompt or to manually prompt
- needs lots of formatted single subjects at 64x64 size

2 million image brute force

Based on the unsuccessful small testing outcome and the successful small testing outcomes, I've carefully and methodically devised a potential experiment that will be a full scale test.
The entire process will be documented and stored on Hugging Face to ensure posterity and fidelity throughout.
As a more complex training and a more complex solidification of the model potentials, I plan to generate 1 million images using multiple runpod pods and a specific formatted wildcard setup using Pony Autism + Consistency V4, Zovya Everclear V2, Pony Realism + Pony Realism Inpaint, Yiffy v52, and Ebara.
- ~~The tagging methodology:~~
  - ~~1,000,000 total synthetic images.~~
    - ~~1,000,000 / 5 = 200,000 images per model.~~
    - ~~200k / 9 scores = 22,222 images per score.~~
    - ~~20k / 9 images per core tag = 2500 tags~~
    - ~~We have 2500 tags to work with.~~
- The new tagging methodology is simple.
  - Two prompts and one clump of tags.
  - 2k Anime pack is currently under prompt revamp. I'm going to identify the characters that I'm certain of, and try to figure out who some of the characters I'm uncertain of are.. I'm increasing the count of images to roughly 5000 and the prompting will be entirely important.
  - Dolphin 72b was a large flop and a serious dud.
  - We identify the fixation of the scene using one model, and we identify the specific details and interactions of the individual with the scene in the second prompt. Finally we dump it with a ton of tags from SmilingWolf/LARGE_TAGGER_V3.
- Each chosen model and their purpose:
  - Pony Autism + Consistency V4:
    - The purpose of this model will be to generate poses with high complexity interactions. It will be the crux and the core of the entire engine's pose system.
  - Everclear v2 + Pony Realism Inpaint:
    - This generates a high fidelity realism with a very unique style. The model is similar to the base flux model in a lot of aspects so this will model's outputs will be very useful for clear and concise clothing, colorations, and any number of other deviations based on high quality realism.
  - Yiffy:
    - The entire purpose of this one will be to introduce the more difficult to prompt and yet important things to make sure a model has some variety. Cat girls, anthro, vampires, skin colors, nail types, tongue types, and so on. Essentially this will be the "odd" details section, where everything based on this model is going to fall within the score values of 1-7.
  - Ebara:
    - Ebara's job is to produce and solidify the stylistic intent of anime with the more stylish backgrounds and interesting elements associated with it. It's images will represent score 5-9 2d anime with backgrounds.
  - I have yet to find a good candidate for full 3d capability.
- On a single 4090 I can generate batches of 4 1024x1024 images per minute with background remove, triple loopback for context and complexity, segmentation, and adetailer fixes.
  - A single 4090 should be capable of generating roughly between 7k and 11k images per 24 hours, depending on hiccups and problems.
  - 4090s are about 70 cents an hour on runpod, which is about 16 dollars a day give or take. The total inference to generate a million images should be around $2200 give or take. I'm going to estimate over $4000 just to be on the safe side barring failures, mishaps, and problems.
  - After testing and implementation of an automated setup using my headless comfy workflows, it should fully integrate and be fully capable of running on any number of pods.
  - Each pod will be sliced off a section of the master tag generation list based on the currently running pods and their currently generating goals.
  - A master database will contain the necessary information needed to allocate the majority of the tag sequence based on the last generated image stored and the currently allocated images to the other pods.
  - When the pods finish executing they will give the all clear sign to the master pod which will delegate their roles or terminate the pod depending on need.
I'll be carefully testing the two core released Flux models for a candidate. FLUX Dev has shown inflexibility in a lot of situations, so I need to know if I can get better results from it's sister before I begin this.
Adetailer will be ran on hands, face, and clothing.
Segmentation will remove and leave backgrounds transparent for a percentage of all of the images.
Wd14 tagging + the base wildcard tagging will be applied.
The additional 1 million images will be sourced from the highest end data in the gelbooru, danbooru, and rule34 publicly available datasets accordingly.
- ALL TAGS will be fragmented, wildcarded, and transformed into sentence fragments using a finetuned llm for this particular task. Score tags will be superimposed as a post generation post necessity calculation based on one of the many image aesthetic detection AIs around.
- Each tag order will be relative to the region of an image from top to bottom, first iteration being a two sentence pass where each image is sliced in half into two rows.
- All images will be classified based on a NSFW rating of 1-10
- All images will be classified based on a score rating similar to pony.
This will essentially be it's own divergent form of FLUX that I'm codenaming "FLUX BURNED" for now.
The training will be an iterative burn building from a high learn rate on the lower score images in an iteratively decreasing learn rate for every finetuning phase after.
- I'll adjust the learn rates accordingly based on the first 5 score values to determine if the model is sufficiently prepared for score 6-10.
- This process will be based on divergent and important learned development that will be carefully logged and shared through the process.
This model will have a special emphasis on screen control. The entire model is going to be based on grid and section control, allowing for the most complex interactions possible in the most complex scenes.
Training this with baseline images will be substantially less reliable for realistic people and substantially more reliable for anime and 3d based characters.
Prompting will be easier and more similar to pony, while still allowing a full LLM integrated Q&A section.

My LORA build list todo:

Consistency FLUX <-- Simulacrum Omega

2d Spritesheet Character Constructor

2d Object and Scene Spritesheet Constructor

2 million image brute force

Comments