home models images videos posts articles bounties challenges events updates shop

SDXL Sim UNET Experts [SFW/NSFW]

Name: SDXL Sim UNET Experts [SFW/NSFW]
Rating: 5 (152 reviews)
Author: AbstractPhila

148

2.9k

Updated: Apr 6, 2025

base model

v grid finetune pony noob

Download (11.4 GB)

Verified: 5 months ago

SafeTensor

Details

Type	Checkpoint Trained
Stats	836 1,103 1.3k
Reviews	Very Positive (69)
Published	Mar 19, 2025
Base Model	SDXL 1.0
Training	Steps: 73,000,000 Epochs: 100
Usage Tips	Clip Skip: 1
Hash	AutoV2 282DB0A9F7

1 File

About this version

AbstractPhila

License:

CreativeML Open RAIL++-M Addendum

Watch as the model shifts... into your desired depiction.

Continued assessment shows

BeatriXL is a VPRED model. A very very direct and powerful vpred model, but it's substantially more NARROW viewed than it's predecessors. It's the highly and potently direct literal variation, and almost akin to a heavy refit from the actual feature conversion. It does what you want, but you may get many things you don't want.

SimV4 epsilon is still considerably more powerful than the majority of the VPRED refits that I've executed.

SimV5's unreleased epsilon version is going to get a heavy finetune so it conforms correctly to the SimV4 epsilon's potency, but not before I finetune SD3.5 - For now enjoy the models, as SD3 may take a while.

After 12 full finetune models; SimV4 seems to be the one that stands above the rest. A little less flexible at the expense of the most potent and utilizable structure overall.

SimNoob's fragility shatters under too much plain English pressure; but it is quite good. The A student for sure. It lost much of what made SimV3 good however, and it's absolute behavior became more radical instead of directed.

A symphony of absolutes rather than potentials. A powerful mesmerizing landscape rather than a foolhardy depiction of confusion. A recognizable quality and fidelity instead of the potential of pure noise or absolute chaos.

The vpreds are good, but they shift too hard in one radicalized direction or another to be considered standalone models.

They LOST a LOT of what made them unique models in the first place on conversion; and in that sense they are no longer the same models they were before. That being said, a finetune on Epsilon would have done the same thing, but a full finetune on epsilon isn't necessarily a good thing for those models.

PonySim vpred V5 release

People keep mentioning realism not being very good, so I'll focus a realistic finetune on it later. For now it seems alright to me.

Hands are cleaning up, problems cleaning up, and so on.

I have a full list of images to upload, so just be patient and I'll get them all on the page.

For now, these are some of the successes and some of the failures.

PonySim vpred V5 release imminent;

For those following me on huggingface you already have access to this safetensors.

I'll be uploading the tensor here soon with a full biopsy of the problem sectors, which are currently cooking potential fixes for safe and middle-step problems; where many images conclude earlier or later than they should.

Training cosine with repeats 9x was enough to fit the model, so I'm going to see what happens when I fit the center instead of the whole thing.

PonySim vpred V43 release;

Interesting new finding; you don't want to use score tags.

Place all the quality tags at the bottom and just have at it. Treat it like SimNoob.

masterpiece, most aesthetic, very aesthetic, good aesthetic, 
high resolution, highres, absurdres, absurd res,

There is a big piece of data that uses different kinds of quality tags, and I've noticed there was a big quality hit; so you'll want to employ all of those quality tags together if you want the full highest quality image. It's quite a few tokens, but it's fine if you put them after a break at the bottom. PonySim is pretty robust.

My deepest apologies for the incorrect release. This week has been very hectic at work and I didn't even have time to play with the correct version.

It works, so I released it.

Be sure to carefully follow the tag guide. This isn't exactly the simplest thing to work with in it's current state, but it can build beauty if you control it correctly.

See, not trash. :'| I work hard on these things and I only ask for a thank you in return. I don't want your money.

The CURRENT HUGGINGFACE TRAINING is entirely focused on improving PonySim V43 so it's up to standard with the other models; potentially superior to SimNoob and NoobSim for the toy omega release.

https://huggingface.co/AbstractPhil/PONY-SIM-V4/tree/main/VPRED-SOLIDIFIER

I've updated the TAG GUIDE to better assist with PonySim.

PonySim vpred V38 problems;

Back in the oven it goes.

So far it seems as long as I'm using ComfyUI or standard Forge it runs good. Trying to run it on A1111 and ReForge has issues; particularly with tokens above 75 with PonySim and vpred.

I'll be addressing this in the next finetune to assure that the depth of pony is trained properly to conform to vpred; but for now if you want to access EPRED you can make bigger prompts.

Have fun.

PonySim vpred V38 Scheduled Release;

Scheduled release; 3/21/2025 5 pm gmt-8

After a semi-successful refit the model has begun conforming. It has much more healing to do than the others so far, but it's taking to the training.

It's most definitely a vpred Pony model.

Pony has been refitted to V-PRED so it properly supports dark scenes, lighting, and a multitude of other elements that need to be trained into it further.

I've been eyeballing lighting and aesthetic detailer image packs and methods to identify them using my clips, so we'll see how that goes.

The context is not as powerful as the others... but it supports many of the epred Pony loras.

Have fun everyone.

SURPRISE WEDNESDAY RELEASE!

BACK IN THE OVEN WITH YOU!

The long awaited pure epred SIMV4 is finally stable.

THE RETURN OF PLAIN ENGLISH PROMPT STRENGTH.

It took a long time to get the correct formula to make this one work.

This model has SUPERIOR context potential to SimNoob, but it's not there yet.

Unreleased and unstable sim pure VPRED v5 is still far superior to the rivals in many aspects but it's highly unstable. SimV4 will be the glue that links the necessary data from EPRED to VPRED, but it's not quite ready.

For SimNoob full release

Your eyes DO NOT DECIEVE YOU. SimNoob is an SDXL VPRED model. It can handle Noob loras, Illustrious loras, and a ton of SDXL loras. This is no ordinary model.

put safe, censored in your positive prompt, you need it currently. the censored tag is HEAVILY TRAINED to identify all grid points that could potentially be anything illicit.

nsfw, explicit, questionable, uncensored in negative prompt

As a byproduct of training empty body doll models as the early steps, the outcome shows NSFW information by default with very few tags; which will need to be addressed for the upcoming finetune.

This is an ongoing experiment to provide more control to image generation.

Sim V4 - Ongoing - Training T5 features after each iteration.

NoobSim Full V-Pred Primed - Released

-> 4.4 million features gathered. -> 4XX gigabytes interpolation data.

SimNoob Full V-Pred Primed - Released

-> 4.4 million features gathered -> 4XX gigabytes interpolation data.

PonySim Full V-Pred Primed - Cooking

-> 3.1 million features gathered -> 2XX~ gigabytes interpolation data.

SimPony Epsilon/V-Pred fusion Primed - Next in the oven for vpred prep.

NoobPony Epsilon Primed

PonyNoob Epsilon Primed

T5 Adaptation incomplete; requires additional features for full convergence.

Total Feature Data: 11.9m features

V2 Simple Workflow

V2 Context Workflow

V3 Simple Workflow

V3 Context Workflow

V4 Simple Workflow

V4 Context Workflow

This is MOST DEFINITELY an SDXL based model set; but use SD3 style negative prompting; where you zero the early 10% of the negative prompt timesteps. It adds a great deal of quality overall.

The more I train this thing, the closer it gets to Flux. I'd say it has some capabilities currently beyond Flux in many aspects.

Seems like the vpred noobs behave differently with the workflow than the others, so I'll work one out for those too.

The current context workflows use IPADAPTER and CLIP_VISION for the sake of posterity; those will not be required for the T5 version.

Our candidate base models are;

SDXL Base -> Sim SDXL

CLIP_L
- CLIP_24_L_OMEGA, heavily finetuned using Flux, and then SDXL, and then Flux again, and finally back to SDXL. It has been retuned to comply with multiple high complexity and difficult fixation elements such as depth association, grid, zone control, offset, bad hands, bad anatomy, and more. It has seen battle with noob, with pony, with noob again, with illustrious, with sdxl, with noob again, and more. It has well over 100 million samples trained into it.
CLIP_G
- CLIP_24_G_OMEGA, heavily finetuned but lesser than the CLIP_L. The student to the CLIP_L for a long time; they both wrestled for a long time as the SDXL trained and now they are both within an associative compliance within the UNET.
The battle was legendary.

Illustrious Base -> NoobXL v-Pred V10

CLIP_L -> [NOOB_CLIP_L + CLIP_24_L_OMEGA] 72 / 28
- Merged using SIMPLE MERGE in COMFYUI; as I don't know enough about clips to interpolate a proper merge train. It gets the job done though.
CLIP_G -> [NOOB_CLIP_G + CLIP_24_G_OMEGA] 72 / 28
- Merged in the same as as CLIP_L.

Pony Base -> Ebara

Not finalized yet.

V5 Sim Preparation

Pure EPRED SimV4 is actually one of the most powerful models I've ever played with; but it's not done yet. It needs to be carefully curated into V5 to preserve at least 80% of the large scale training applied to SDXL.

NoobSim and SimNoob introduced interesting aspects, but there is a great deal of missing and destroyed information. SimNoobV5 is closer to Sim EPRED, but still not there. I need to run a full interpolation distillation finetune; which is something I've never ran on a model from start to finish.

This is going to take some time and a new process of training to convert V4 EPRED Sim into a correct VPRED without destroying everything.

While devising this new training process and collecting features from the sister experts, I'll continue training sisters.

Upcoming PonySim is cooking and it's pretty sharp, but it's definitely having it's age show in comparison to NoobSim and SimNoob; which are both highly robust and potent anime models.

Pure SimV5 will feature some of the most powerful and robust context to date; likely never seen in any model short of a higher-grade model like SD3, Flux, or other intentionally high context models.

With this, the quality will likely suffer the further down the rabbit hole you fall; and that'll clean up as the training progresses further into the VPRED realm. This is an ongoing process to perfect Sim Pure into it's final form; the context expert of our truly unchained Flux/SD3 rival. It needs to hit more points though. It needs to be the expert and master of all of the experts in the interpolated mix in order to be the leader; otherwise it will make a poor leader.

V5 SimNoob Release

SimNoob is based on the SDXL epsilon prediction noise; refitted into vprediction over many epochs and an infusion of 28% interpolated features from NoobXL.

This is the opposite component of the NoobSimVPRED; NoobSim being based heavily on noob, this is based heavily on SimV4; the successor of the fairly incompliant and stubborn SimV3.

Primarily SDXL, the infusion from Noob jumpstarted the human depictions; allowing for the formerly deformed bodies, to become much more reliably solidified into usable and reasonable forms.

Structures still work in fantastic ways. Full built-in liminal structure set, hand control, grid control, everything that SimV3 should have had.
Human form is heavily improved, so much so that it no longer requires any sort of ipadapter or solidifier to form them into any desired pose, or place them in situations that otherwise would be impossible with any other model.
Superior control with captions, defiant of the majority of tested SDXL models in depiction, offset, angle, rotation, pose, style, coloration, and even more elements that cannot be quantified.
Responds exceptionally well to plain English, far ahead of NoobSim in many many realms.
The outcome is deemed worthy of labeling V5.
The depiction association and style layering combination system has been partially restored from V2 that was damaged during the V3 training.
Timestep control better conforms to the original V2 than the V3 did.

V46 NoobSim Release

V46 NoobSim is one of 6 1216x1216 vpred models in training meant to be merged together. This is the human form model, intentionally merged and designed with human form depiction. Heavily finetuned using my specific human form datasets, omitting much of my landscaping datasets.

V46 hits every landmark required for the mega model, so the next version is going to be SIM SDXL V4; which is a VPRED variation of SIM SDXL V3 FULL merged with about... 30% NoobXL VPRED strength.

V46 NoobSim brings even more new elements to the table;

Anime is highly robust, and so is the human form.
RAPID shifts from 2d, anime, 3d, and realism while keeping similar forms.
Real looks even more real. Anime looks even more anime. 3d looks even more 3d.
More artists, more tv shows, more cinematics, more video clips, more, more, and more.
Even more control over angles.
Use grid_b3, grid_c3, and grid_d3; they have the most training of all the grid sections and will produce the best results.
safe/questionable/explicit/nsfw all work in this version, very controllable.
more accurate angles, offsets, screen locations, depth, and more.

V4 NoobSim Prime brings many new elements to the table;

The database is over 900,000 automated and tagged for depiction offset images; captions primarily omitted from the noob mix for now as the outcome of V3 FULL (the basis of the merge) showed captions were damaging in large degrees after a point.
US based cartoons of many forms; cartoon network, comedy central, nick, and multiple other series from other channels like family guy, rick and morty, or whatever. Honestly I just sourced everything.
Action films; has some cinematics from action films, but they're quite limited. The base SDXL did not need them, but noob has very little, so it'll need some to supplement.
Actresses, actors, comedians, etc aren't in it or don't cut through due to the low strength of Sim (they are all present in base SDXL and due to my careful training many of them survived). The more potent LAION trainings and public figures should have fair anime representations however, which is kinda cool. Some cut through in the tests as there is a series of face packs and blurry face bodies, but it's really hard to see which are who most of the time.
Watermarks occasionally show up. The faces generally don't conform to whatever watermark you see, but they do occasionally show up with realism more often than others.
Anime has many many series built into NOOB so you can pretty much proc whatever you want as-is.
3d has many games, many shows, many movies, and many series trained into them.
With the introduction of NOOB so introduced the E621 dataset, which further denotes me not breaking the English if I can; as this tagset is foreign to me.

V3 did not work as well as I had hoped, which is a big letdown for me; but with failures, come new avenues of progress.

V3 - FULL has shown to be the STRONGEST of the bunch; and it has substantially more possibilities than the alternative versions.

Beware the NSFW as it seems to have bled into everything.

It did not conform to the necessary information but it produced a substantially powerful model nonetheless, so I've begun training an entire line of different variations based on V-PRED instead of E-PRED noise. The outcomes are substantially more cohesive in earlier steps, so stay tuned for those.

The upscaled and bucketed images did not conform correctly to the necessary outcomes; so you will get hit or miss outcomes no matter what you type into the thing. V2-FULL is still the superior model for now; but they are both quite powerful and very useful when prompted with the correct words.

They simply did not meet expectations yet.

The English DID NOT TAKE. I've tried multiple variations and the plain English simply ruins elements no matter which I choose.

The release is based on tagging rather than English, and that's the biggest disappointment I think I've had.

Otherwise Sim V3 WORKS VERY WELL with high resolution images that contain high complexity single character and complex scene controlling capability. It supports a multitude of captions, hierarchical single character controllers, has advanced pose control over the original, a large potion of LAION is accessible, and a huge amount of new information and pathways have been trained in an effective manner to conform them to the necessary prompting

Is it where I wanted it to be for V4 Sim SDXL? No.

Is it good? Yes. Very very good. I've released the prototype NoobSim along with it, which is being trained to supplement the necessary details for the upcoming mega model.

V4 Suggestions:
Size 1216x1216, 1472x1472, 1600x1600, 2048x2048, variations, etc
Steps -> 20-50 ; 
* 20 for noob 30-50 to take advantage of sim's timestep trainings.

CFG -> 3.5 - 7;
* 3.5 for simple anime and 3d
* 5-6 for realistic or realism
* 7 for high fidelity high complexity landscapes and multi-character interactions.

DPM 2M SDE -> 
* BETA good at proccing sim related elements rather than noob
* Simple works better for noob

EULER A ->
* BETA very very fast, kinda bad context
* NORMAL very fast, okay context, good for anime
* SIMPLE very good at proccing Noob related elements.

 DPM 2S -> 
* BETA high fidelity realism
* * Good with proccing both SIM and NOOB in conjunction but limited anime use.

IMG2IMG:
>>> UNTESTED.

Using the grid helper lora amplifies screen and depiction control at lower epochs, and enables a series of different grid and spritesheet capabilities. At higher epochs it enables a stronger screen control at the cost of quality and context.

Can use MANY loras from MANY models - including Pony, SDXL, and even some Illustrious loras. The model directly merges with all of Felldude's SDXL simv4 based models.

Sadly the VPRED NOOBXL loras are hit or miss, but I've thought of a way to include them, so stay tuned for full v3 release on that spectrum.

SDXL-Simulacrum V3 βγE release 2/26/2025

α version was roughly is the entire Full V2 as a base
β version was roughly 900,000 images burned 10 million to 11.8 million samples.
- this is the TAGGING HALF -> the captioning half is cooking, so this one doesn't work as well with plain English as the v2 counterpart.
- timesteps 32-920 -> Absolute EXPERT grade img2img.
γ version uses gamma regularization and timesteps 500-1000
- This version will be running a gamma epoch every beta epoch
- The experiment is showing capability with high complexity association and reconstructing the original english.
- The full epoch will tell if I continue this route.

Trained with size 1216x1216 bucketing 512 to 2048.

The next iteration is plain English instead of tagging.

Suggested V3 B Settings:

txt2img:
Size 1216x1216, 1472x1472, 1600x1600, 2048x2048, variations, etc
Steps 50
CFG 4-7 -> 15-25 for high fidelity high complexity landscaping or 20+ character situations
DPM 2M SDE -> 
* BETA faster and high context
* UNIFORM fast and fair context
* SGM UNIFORM slower and better form
* EXPONENTIAL fast and okay context
* NORMAL fast and good multi-character style context

EULER A ->
* BETA very very fast, kinda bad context
* NORMAL very fast, okay context, good for anime

 DPM 2S -> 
* BETA high fidelity realism
* NORMAL distorted or deformed

IMG2IMG:
Any size.
Steps 50
Denoise 0.7 > 1.0 (literally takes the form)

DPM 2M SDE ->
* DDIM UNIFORM absolute chad at img2img.
* SIMPLE not bad at all, not bad, but not as good.
* BETA great for introducing new elements, removing elements, and coloring

DDIM ->
* DDIM UNIFORM kinda bad quality but doesn't destroy context.

Euler is kinda bad, surprisingly. It's usually pretty good.

Start the negative prompt at 0.05 or higher, zero before that.

Begin CLIP_G for the positive prompt at around 0.08 or higher, otherwise you may see deformity with high complexity interactions.

Due to request, I'll be releasing all the versions as they come out and pass red vs blue.

The clips are relatively the same. The learn rate was very low, so if you feel like extracting them be my guest. This version is primarily a unet train because the clips are already very advanced.

SDXL-Simulacrum Full V2 αβγδ release 1/31/2025 - 5:00 PM

I dub this model, LOW IQ SDXL FLUX.

α version was roughly 50,000 images burned 0-2 million samples.
β version was roughly 75,000 images burned 2-5 million samples.
γ version was roughly 150,000 images burned 5-7.5 million samples
δ version was roughly 300,000 images burned 7.5-10 million samples.

I have a more accurate list of the trainings used below.

The outcomes seem to favor much higher resolutions than lower, so don't spare the rod.

The Full V2 version is highly complex and very difficult to describe how it works in a simple manner; however I'll sum up this entire model into a very simple description here.

Use PLAIN ENGLISH in a structure that makes sense.

This model builds what you want, in sentence sequence and a semi-logical booru flowchart.

Plain English captions are based on sentencepiece. Most LLMS, including the T5, were trained under unsupervised training using sentencepiece. The inspiration for the foundation and methodology behind the processes for captioning, is based entirely on LLMs and structures. These structures are conjoined with vision based classifiers, bbox identifiers, and an interpolation between various forms of identifiers using depth analysis. If the caption wasn't generated BY a sentencepiece model, it was generated with the outcome from the concept in mind.

**For VERSION 3 I will expand the dataset to well over 2 million images; all captioned with both plain English captions and depiction offset based tagging.**

They WILL NOT be trained together; they will instead be trained as two separate cloned datasets. One dubbed tags file, one dubbed caption file.

One with Booru based tags and small cations < 30 caption tokens, one with plain English < 10 booru tags; mirrored and sister datasets, trained on alternating timesteps.

The Booru tagging will be shuffled, and the English captions will be orderly.

GENERATING IMAGES

ComfyUI is the only generator with intricate enough timestep control for IMG2IMG and TXT2IMG.
- The ironic part is, the timesteps ARE imperfect; but they are pretty close.
- I have released two starter timestep context mover COMFYUI WORKFLOWS with starter timestep usage and double prompts meant for CLIP_L and CLIP_G.
- This is NOT your run of the mill SDXL, you will not get the same results, and it will produce off-putting and sometimes disturbing outputs if you stray too far from the timestep guidelines, especially when you ask for twisted shit.
If you want the FULL EXPERIENCE of this model, you MUST use ComfyUI and play with timesteps.
- I have listed a semi-accurate list of trainings below based on the trained timesteps. The math I used to determine these timesteps is SIMILAR but not fully accurate to the Flux Shift that the CLIP_L was originally finetuned to cooperate with when training Flux; but it will do in a pinch.
Forge works, but not as well.
I MADE SURE it looks good on Forge, so you CAN use Forge; but the context suffers as both the CLIP_L and CLIP_G have intentionally different behavior.

TLDR Generation Settings:

DPM++SDE 2M -> Beta / Karras

CFG 6.5 - 7.5 -> 6.5 is my favorite

Steps -> 12-100 -> I use 50 mostly, low steps work.

Sizes -> Too many.
The RULE OF 3 is the foundational principle in this model. All of the captions are based on this concept, so the rule of 3 will function similar to Flux. Stick to the rule of 3, and you will be okay. Stray too far and you're gonna have a bad time. You can reinforce it by supplementing grid, zone, depiction, size, and identifiers specifically associated with those.

Describe what you want to see in plain English, give it styles, give it artists, give it characters, give it clothes, send it to the machine. Out comes your image with combined styles, with artistic styles overlaid, and the characters imposed in those settings. You can give it grids, offsets, angles, and so on. It will probably understand what you want.

Negative anything you don't want to see in sequence from most important to least important.

Consult the tag document for the specifically trained and important attention shifting tags.

Be WARY and VERY careful what you type into this thing.

It's essentially a dumb Flux. It gives you what you want, and sometimes you get the monkey paw reward along with it.

It builds IN SEQUENCE.

Everything you put in the prompt above, has precedent power over everything you put after it. Some tags come with baggage, some tags do not.
Using PLAIN ENGLISH has a very powerful effect, designed specifically to allow EASE OF ACCESS.
This does NOT always work yet, as it was one of those version 3 guideposts that wasn't met; however, it definitely has a very powerful effect.
It has VERY MINOR shuffle training between timesteps 4 and 8, everything else is entirely based on sequential access. I will be including MORE timestep trainings specifically designed to shift attention with more images using shuffle training for the next version;
I have marked timesteps;
- 12-16
- 22-24
- 30-36
- 41-50
These are specifically allocated FOR THE NEXT VERSION to attention shift, finetuning context, and high fidelity inclusion of supplementary details in sequence. Aka shuffle training and quality boosting training steps. ANYTHING OVERLAPPING will not matter, as the data will supplement each other.
This has a VERY HIGH POTENCY EFFECT in COMFYUI when using timestep controlling; especially when using CLIP_L and CLIP_G prompts.
This cake's recipe was not a simple one. In fact, I'd say it's the most intricate and carefully plotted model I've ever made. It depicts both the great achievements; including the successful experiments, and new proofs for the community; but it also depicts some of the greatest failures, most painful incorrect assumptions, and the most painful images I've ever seen.

For THIS VERSION;
- 0-1000 full finetuned baseline -> full finetune, LoCoN full, LOHA full, Dreambooth, and LORA used.
  - CLIP_L trained, CLIP_G frozen.
  - 5,000,000 samples,
  - 57k images; 1/3rd anime, 1/3rd realistic, 1/3rd 3d
    - grid -> did not take
    - hagrid -> did not take
    - pose -> took very well
    - human form -> took very well
    - ai generated -> took very well
- 1-999 first iteration img2img training -> attention training half, dreambooth half
  - CLIP_G training enabled.
  - 200,000 samples,
  - 51k images; pruned first pack, many fetishes and bad images removed
    - removed many hagrid images for blurring hands
      - many classifications removed entirely and will need recaptioning
    - removed all images labeled very displeasing in ai generated
- 10-20 first pass shuffle -> attention training only -> LOKR training only, 5 versions with different settings.
  - Increased LR for CLIP_L and CLIP_G
  - 1,000,000 samples no English captions,
  - 75k images ->
    - mixed safe/questionable/explicit 3d dataset added
      - full pose angle set, full array of artists, full fetish set
    - ai generated removed entirely
- 10-990 second pass shuffle -> full finetune, LOHA, LoCoN used.
  - Reduced LR for CLIP_L and CLIP_G
  - 150,000 samples no English captions
  - 115k images
    - mixed safe/questionable/explicit/nsfw anime dataset added
    - hagrid removed entirely for re-planning for version 3.
- 2-8 second pass English cohesion > attention training only, heavy shift to the goal.
  - High LR for CLIP_L and CLIP_G
  - 800,000 samples
  - 8k images specifically tailored for English descriptions and grid/offset/depth
    - Bucketing and cropping disabled; 1024x1024, 768x768, 1216x1216, 832x1216, 1216x832, 512x512
    - Grid training meant to function as a binding agent.
- 8-992 third pass English cohesion low LR -> full finetune
  - Normal LR for CLIP_L and CLIP_G -> they have normalized
  - 800,000 samples
  - 140k images specifically tailored for English descriptions and context
  - Bucketing re-enabled.
- 1-999 final pass burn -> full finetune, very low learn rate 1/10th original
  - CLIP_L and CLIP_G now cooperate rather than fight.
  - 2 million samples very low learn rate all captions, all tags
  - all images including the omitted images were included, except hagrid
  - trained the entire dataset in epochs, rather than in curriculum
  - Roughly 300k images used give or take, I think.

You MAY see some nsfw elements while prompting using safe

Even with questionable/explicit/nsfw negative prompted; but it is currently fairly rare. If you see them don't worry about them impacting the next version in a negative way, I have a full 1 million safe images lined up for the next version to make sure this DOES NOT happen unless the prompter WANTS to see such things.
Many female forms were trained specifically in the nude, which imposes clothes after based on the sequential learning pattern and timestepping. This may have your preview sampler showing nudity, distortions, deformity, and more before it cleans up.
Be warned that it may NOT clean up, but it generates pretty fast if you're using the single pass ComfyUI so just hit the next seed if something doesn't work. There's a chance it will work, you just haven't hit the right seed yet.

Watching the images generate often looks like a slideshow

This is fully intentional. Some of these images can be disturbing and I apologize if you see anything disturbing in this slide show. The final pass did some damage to them but not enough to fully blend them, so be very wary when prompting nsfw elements.
The next version will have a full finetune for the safe tag, to ensure many of these elements are superimposed unless prompted, but for now bare with the negative prompt please.

Careful NSFW curating of prompts

There are often genitalia that appear, distortions, objects, extra limbs, and more. If you start seeing things like this solidify, you can use things like censored in the positive prompt, which is literally a depiction offset tag designed to tackle this exact purpose.
It WILL will censor genitals and nipples. If they continue to show up, you can tell it EXACTLY where you want censored;
- grid_a3 censored nipple. It'll get the idea and the concept will bleed through the image if you don't use the size tag along with it, put this in the positive prompt
- nipple, nudity, nude in the negative.
- It WILL go away.
SDXL has many horror movies built into it's training. You can tell it was given the IMDB dataset, and this often hurts many images; or even introduces horrifying elements. The most annoying part I've found is trying to burn the ages out. I don't even know what sort of tagging they used, but it's not something I've managed yet.
If you see anything horror or age-based, negative "futanari, femboy, loli, shota, horror, monster, gross, blood, gore, saw, ghost, paranormal", and most of the artifacting from the IMDB horror and any training built into SDXL will go away.
- Nothing I can do about this in this version, I've already tried a couple ways to burn it, and it just ends up hurting everything, so I'm going to need another solution.
- I tried including false images in these tags and it only associated everything I trained with the other tags, into the horror sections; causing a massively terrifying version that I will never ever release.
  - Though I know now how to make cool halloween loras better, so that's cool.
- I do so apologize for this, as I'm usually very careful at curating this sort of response, but this time I cannot control every element in SDXL yet. I require more research and more testing.
Negative genitals if they appear. The penis being a primary one that tends to show up, simply just negative it and it goes away. It knows what it is. It also knows what the majority of condom based things, sex toys, and so on are; so you can negative everything away if the negative questionable, explicit, nsfw aren't doing the trick.
- "penis, vagina, penetration, sex toy, dildo" and so on can be put in the negative prompt to nearly guarantee they won't show up; but they will if you prompt them in positive with negative, and there are some artists and styles with many images related to those. So be careful.

NSFW elements CAN BE TERRIFYING.

This version's nsfw prompting does NOT do well with complex plain English scenes yet, but it does work.
Keep your plain English prompts short, and stick to the Booru and Sim tags. You will produce OKAY NSFW context results if that's the goal, but nothing to phone home about yet in terms of cohesive fidelity.
You can have better luck adding a style or two, adding some artists, and so on. Kinda nudging it towards what you want to see. If the artist is in there, it'll probably work. If not, you can try one of the stronger ones in the list.
If you're hoping for an easy porn maker, you will have some luck with simple prompts, but the more complex you get, or the more plain English you include; the more abomination-looking your outcome will become.

Barebones:

ComfyUI Guidelines and Workflows
Full tag list and counts

Overcooked Portions
Undercooked Portions
Cache corruption time consumption

It'll be out at 5pm officially. -> ETA 11 hours

I've decided since this version never quite made it to the markers for V3, that I would dub this as the FULL version 2 release. It's hit as many markers as it will with this dataset, so I'll need to expand the dataset to nearly 3x or 4x the images to fill in the necessary missing information; so we're looking at 1.5 to 3 million images. Which is roughly a third of a big booru.

Getting that many images with sections that can be identified and segmented will involve sampling every single database I can find; including datasets like Fashion, IMDB, and anything I can get really. If I am to make this model SMART, then it needs to know what everything is, and where that everything is; because it's still needs a lot of data.

I'm going to begin hosting these fully tagged and prepared datasets on huggingface in tar parquet format so my custom cheesechaser can grab at it for you if you want.

I will do the ole smudge face thing for real people like I usually do, which is why some of them turn into anime characters by the way. SDXL has a bunch built into it already, so it's clearly been taught the IMDB dataset, which means I know what I can negative teach.

SDXL-SimulacrumV25β

Currently on epoch 65 ->

7.5 million samples give or take.

The teasers show off the intentional style and series bleeding, which is exactly as intended.

How many models were a pain in the ass to finetune BECAUSE something overwhelms something else, and that something else gets in the way? Well, not this one. EVERYTHING is directly easy to finetune, by design.

It's now hit 85/100 markers. I anticipate it being done by tomorrow or the day after.

Generation Recommendations:
DPM-2M-SDE
-> BETA / KARRAS
-> Steps 14-50 -> 50
-> CFG 4.5-8.5 -> 6.5

DPM-2S-Ancestral
-> BETA / KARRAS
-> Steps 32
-> CFG 5 - 8 -> 6

DPM-2M
-> BETA / KARRAS
-> Steps 20-40 -> 40
-> CFG 7 -> 7

Euler doesn't work very well.

PROMPT BASICS HERE

<CAPTIONS HERE>

good aesthetic, very aesthetic, most aesthetic, masterpiece,
anime, 
<CHARACTERS HERE>

<ACTION CAPTIONS HERE>

<OFFSETS AND GRID GO HERE>

<CHARACTER TRAITS HERE>

highres, absurdres, newest, 2010s

Try not to breach 75 tokens for this version. The CLIP_L has been trained with 225 but they definitely aren't smart enough yet.

This helps make most images better.

good aesthetic, very aesthetic, most aesthetic, masterpiece,

TLDR: Use this NEGATIVE PROMPT to get started.

lowres,
nsfw, explicit, questionable, 
displeasing, very displeasing, disgusting, 

text, size_f text, size_h text, size_q text,
censored, censor bar,
monochrome, greyscale, 
bad anatomy, ai-generated, ai generated, jewelry,

watermark, 
hand, 
blurry hand,
bad hands, missing digit, extra digit, 
extra arm, missing arm, 
convenient arm, convenient leg, 
arm over shoulder, 
synthetic_woman,

Barebones negative: use at your own peril.

lowres, 
displeasing, very displeasing, disgusting, 

text, 
monochrome, greyscale, comic, 
synthetic_woman,

Credits and Links:

A special thanks to everyone at the DeepGHS for all their hard work and effort when it comes to organizing and preparing tools, AI, and keeping datasets orderly and organized.
Flux1D / Flux1S Link
SDXL 1.0 Link
OpenClip trainer Link
Kohya SS GUI /// SD-Scripts

Images sourced from or by
- Cheesechaser Link
  - Safebooru
  - Gelbooru
  - R34xxx/R34us
  - 3dBooru
  - Realbooru -> smudged face
- ImageGrabber Link

Out-of-scope Datasets Used
- HagridV2 Link
- CN3d Pose V7 Link
- FashionDiffusionData Link

Partially Prepared for release captioning software using;
- ImgUtils Link
  - Used an entire array of available AIs in this pack plus more.
  - Bounding Boxes
    - BooruS11
    - BooruPP
    - People
    - Faces
    - Eyes
    - Heads
    - HalfBody
    - Hands
    - Nude
    - Text
    - TextOCR
    - Hagrid
    - Censored
    - DepthMidas
    - SegmentAnything YoloV8
  - Classification
    - Aesthetic
    - AI-Detection
    - NSFW Detector
    - Monochrome Checker
    - Greyscale Checker
    - Real or Anime
    - Anime Style or Age -> year based
    - Truncated
- Hagrid Link
- MiDaS Link
- Wd14 Link
- Wd14 Large Link
- MLBooru Link
Captioning
- JoyCaption AlphaOne Link
- T5XXL Blip2 Link
- SentencePiece Link
- Quora T5 Small Paraphraser Link
- SegmentAnything YoloV8 Link

ComfyUI Link
Forge Link