Update 6: Prompting Omega
https://civitai.com/models/1409093
With the first release of Omega; the hybrid distilled model that cost nearly $4500 to create, we have the first powerhouse beast of a simulacrum.
Forged from Pony, Noob, and SDXL with a dataset numbering in the millions; the parent models have gone through hell and high water from point A to B.
You can prompt it, like prompting any of the others. This model has the most potential diversity and most potential power of any SDXL model to date; with also the most potential insanity, most potential cross-contamination, and most potential fail-points as well!
When you wish on the monkey paw, you get the monkey paw's... other side.
So... be careful what you wish for; it will probably make it. If it doesn't, swap some words, change some tags, and you just might get anything.
In any case; it's a full VPRED distillation.
In other words, there is very little EPRED noise left, and if there is, it's been trained into a semi-useful state. Now, there are sections of the model I still haven't tapped into, as these things are highly complex, but the outcomes show that no matter what you type in, you will see at least something different.
It's literally a smut model if you aren't careful, so just bare with it until the CURRENTLY PREPARING LATENTS finetune please. It's coming, the next batch is a safe set of images only; meant specifically for artistic intent.
80,000 safe images specifically sourced from safebooru, as well as a multitude of gathered images such as hagrid hands, classification faces, and a great deal of other images specifically designed to mitigate this hellscape that ended up a smut-maker. If only to condition it into something more useful for general human depictive art without unwilling producing something nude or lewd.
You can speak to it in plain English. It doesn't know other languages currently, so use plain English, and consult the tag auto-complete attached to the mad science article as a json.
I've thought of a few ways to auto-complete English, and I looked into a couple of methodologies behind auto complete; so I can potentially produce something similar to a coding-style auto complete extension for COMFYUI in the future.
However, today, just use whatever auto-complete you want, and consult the tag list with CTRL+F if you're too lazy for that.
As it stands, the tag list contains only about 1/3rd of the overall tags, and less than 1/10th of the plain English.
So I'll be providing a fully updated version soon.
For now;
To prompt SAFE-ER but DEFINITELY not safe;
Positive Prompt:
safe, rating_safe, censored,
score_9, score_8, score_7, etc...
Negative prompt:
nsfw, rating_nsfw, explicit, rating_explicit, questionable, rating_questionable,
>>> If lewd things REFUSE to go away;
grid, upper body, breasts, penis, uncensored, nipples, pussy, genitals, cum
Omega is a little stubborn on this one because it's basically a franken-brain. This is all highly experimental.
score_9, score_8, score_7, score_6, score_5, score_4, score_3, score_2, score_1
Score tags work, DO NOT USE score_#_up tags, as they are not trained nor are they conditioned.
masterpiece = score_9
most aesthetic = score_8
very aesthetic = score_7
normal aesthetic = score_6, score_5
okay aesthetic = score_4, score_3
very displeasing = score_2
disgusting = score_1
They do not line up 1:1, so you will get different results from them. However, the quality boost is definitely a real thing no matter which one you use. This allows for additional diversity in the tagger as well as additional inclusion for those who are tagging differently.
source_real
source_anime
source_2d
source_cartoon
source_3d
You can guess what those do, and know they are at the head; right below the score tags.
You can also prompt tags like;
newest, 2020s, 2010s, 2000s, older, oldest
For years.
rating_nsfw, rating_exlicit, rating_questionable, rating_safe
nsfw, explicit, questionable, safe
For the rating system. It's kind of hit or miss; but I'm running a fixer finetune that should solidify them.
Scrolling further down the list will show all the various ways this model can be prompted; with mixed results here or there, mostly good results, but there's no guarantees.
Update 5: PonySimV43 Suggestions
Wew. This thing is actually pretty good. Even the feature infusion was good. This thing is actually not too bad as a standalone model I think, but that's up to you to test.
--------------------------------------------------------------
<<<REALISTIC>>>
--------------------------------------------------------------
<very complex English caption here>
<grid/zone/offset tags here>,
score_9, score_8_up, score_7_up,
masterpiece, most aesthetic,
realistic, real,
solidification prompt for additional detail,
<person counts>, <gender numbers>, <species>, <series>,
<character tags>,
<character interaction tags>,
absurdres, highres, newest, 2020s,
final details prompt for setting
--------------------------------------------------------------
<<<ANIME>>>
--------------------------------------------------------------
<complex English caption here>
<grid/zone/offset tags here>,
score_9, score_8_up, score_7_up,
masterpiece, most aesthetic, very aesthetic, good aesthetic,
source_anime, anime,
<solidification prompt for additional detail>,
<person counts>, <gender numbers>, <species>, <series>,
<character tags>,
absurdres, highres, newest, 2020s,
final details prompt for setting
--------------------------------------------------------------
<<<3D>>>
--------------------------------------------------------------
<complex English caption here>,
<grid/zone/offset tags here>,
score_9, score_8_up, score_7_up,
masterpiece, most aesthetic, very aesthetic,
3d, 3d \(artwork\), realistic,
<person counts>, <gender numbers>, (species:1.15), <series>,
<character tags>,
<character interaction tags>,
highres, absurdres, newest, 2020s
nsfw, rating_explicit,
lowres, text, bad anatomy,
okay aesthetic, very displeasing, disgusting,
monochrome, greyscale, comic,
old, older,
anthro, semi-anthro, furry, feral,
simple background, blurry background,
Update 4: SimNoobV5 Suggestions
A large percentage of the LAION data has been disrupted or damaged, but the outcomes show a strong showcase that a large percentage of LAION caption associated data still exists. Plain English works considerably better than any NOOB model I've managed to check.
Use RES4LYF samplers for SimNoob. 2S are effective, RK is effective, DPMPP 2M SDE -> BETA and RES 2S SDE still attract the most context with this model; while it's sister NoobSim is a powerhouse with EulerA -> SGM UNIFORM.
Positive Prompt:
--------------------------------------------------------------
<<<REALISTIC>>>
--------------------------------------------------------------
<very complex English caption here>
masterpiece, most aesthetic, very aesthetic,
realistic, real,
<sim and noob tags>
absurdres, highres
--------------------------------------------------------------
<<<ANIME>>>
--------------------------------------------------------------
<complex English caption here>
masterpiece, most aesthetic, very aesthetic, good aesthetic,
anime,
<sim and noob tags here>
newest, very awa, highres, absurdres, best quality
--------------------------------------------------------------
<<<3D>>>
--------------------------------------------------------------
<complex English caption here>
masterpiece, most aesthetic, very aesthetic, good aesthetic,
3d, 3d \(artwork\), realistic,
<sim and noob tags here>
highres, absurdres
Negative Prompt:
nsfw, lowres, text, bad anatomy,
okay aesthetic, very displeasing, disgusting,
monochrome, greyscale, comic,
old, older,
anthro, semi-anthro, furry, feral,
simple background, blurry background,
Update 3: NoobSim Suggestions
NoobSim is 72% noob 28% simv3. The fusion was a feature interpolation that literally implanted more powerful SimV3 traits interpolated directly onto the NoobXLVPRED-V10 feature sets.
Positive Prompt:
--------------------------------------------------------------
<<<REALISTIC>>>
--------------------------------------------------------------
<plain English caption here>
most aesthetic, very aesthetic,
realistic, real,
<sim and noob tags>
masterpiece, newest, absurdres, highres
--------------------------------------------------------------
<<<ANIME>>>
--------------------------------------------------------------
<simple English caption here>
masterpiece, most aesthetic, very aesthetic, good aesthetic,
anime,
<sim and noob tags here>
newest, very awa, highres, absurdres, best quality
--------------------------------------------------------------
<<<3D>>>
--------------------------------------------------------------
<plain English caption here, you can go overboard>
masterpiece, most aesthetic,
3d, 3d \(artwork\), realistic,
<sim and noob tags here>
highres, absurdres
Negative Prompt:
nsfw, lowres, text, bad anatomy,
okay aesthetic, very displeasing, disgusting,
monochrome, greyscale, comic,
old, older,
anthro, semi-anthro, furry, feral,
simple background, blurry background,
Like always, substitute positive or negative to your current use cases.
Update 2: Minor tweaks for V3 Beta release.
https://civitai.com/models/1177470/sdxl-sim-v3-ultrares-b-sfwnsfw
Update 1: Avoiding cross contamination and introducing additional details.
Due to this model having a multitude of cross contaminations from text, blacklisting it does a pretty good job removing any text you may see. It's also been over-tagged and has far too many tokens for text; but I figured out a basic workaround for this problem.
MORE PLAIN ENGLISH CAPTIONS!
The more it sees, the more likely you'll get the image you want. So write a really good description of your image and make sure you include one of the style classifiers or you'll just get a random one of the 3 core styles.
This also tends to clean up bad anatomy and bad hands, or very strange depictions outside of the 768x768 range.
To introduce even MORE details and data; use all 4 rating tags, all of the aesthetic tags, and anything else you want just mashed together below your plain English caption. It seems to have really interesting and potent responses due to the training.
Blacklist text, bad anatomy, and whatever else you want if you use enough plain english.
I recommend trying JoyCaption to figure out some captions from your favorite images and then working from there.
The Template
#TEMPLATE = [
# "{captions}",
# "{aesthetic}",
# "{rating}",
# "{core}",
# "{artist}",
# "{characters}",
# "{character_count}",
# "{gender}",
# "{species}",
# "{series}",
# "{photograph}",
# "{substitute}", -> omitted for v3, all are general or unknown tags now
# "{general}",
# "{unknown}",
# "{metadata}",
# "{year}",
#]
STICK TO THE ORDER or you're gonna have a bad time.
Examples
The SDXL-Simulacrum-V2β release 1 - the first of 10; 5 million sample models this weekend, so I'm preparing some docs ahead of time.
This was tagged with nearly 400,000 images, the counter says there are over 700,000 unique tags.
Tagging includes:
Gelbooru
Danbooru
E621 Omitted until V4R34 - US
R34 - XXX
Safebooru
AnimePictures Omitted until V53Dbooru
Listing them all isn't possible, nor is listing their power currently. Not in this little article.
Testing shows it's SDXL is highly unstable but also responsive to depiction offset, and the clips also introduce grid and offset to other SDXL models, Flux models, and even SD3.
I'll also be releasing a full set of toy prompts for realism, 3d, and anime; showing off the multiple potentials and utilities of depiction offset. This will include the current version of CLIP_L_OMEGA and CLIP_G_OMEGA.
Next week I hope it actually WORKS well enough to not have to restart the training.
Upper left, lower right. Some things work fairly well, other things don't work at all. Some even contaminate or make other tags less effective, or more erratic. Options need to happen after an assessment, but I'll be releasing this afternoon so you all get to play with it.
Seed by seed should give similar results.
Deviations still occur, but they will be refined out through training over time.
WORKING CORRECTLY, treats your subjects like a calm in the storm. Everything moves around them, while they stay steadfast.
I HIGHLY recommend using the SDXL-Refiner or something of that nature to repair or refine anything SFW related.
By next week I hope to have a full comfyui tool created to assist with auto-complete in the prompt box, automatic wildcarding to a certain prompt length, and automatic replacing of swapped prompt tokens to the correct tokens for the model.
This will be expanded later to something multi-node and complex, based on identification of zones and inpainting when detections aren't made during inference.
It's been trained on nearly 2.5 million 5 million samples already from nearly 400,000 350k automatically bbox identified and captioned images. There is very little safe here.
Here is the basic tag lookup reference sheet. I'll be compounding a full auto-complete for comfyui and a tool to assist with actually placing things on places that you might get to see them show up.
It should behave similarly to NAIV4 and have similar nsfw capability as NAI, so you can call it K-Mart NAI with realism.
This thing is either going to be a big floppy fish, or it's going to do something astronomical after the full high quality AI image dataset is introduced, which is roughly 5 million categorically selected images based on tags, blacklists, sizes, character offsets, and specific classification identifiers that will be trained in curriculum training specifically devoted to problem sections and working sections based on classification accuracy.
The Baseline Tags
BE WARNED AHEAD OF TIME. Deviation from the pattern is not advised.
This model is currently HIGHLY unstable and very very prone to NSFW elements, many of which are essentially monstrous due to the dataset.
YOU ARE WARNED. NSFW WILL INTERFERE.
This model was trained WITHOUT SHUFFLE and with MULTIPLE EPOCHS using no bucketing.
If you deviate, everything outside the 768x768 range becomes absolute chaos.
The CLIP_L was trained to behave with FLUX, which means I'm conditioning SDXL to behave like flux. Bare with it until I feed this CLPI_L the properly depiction offset LAION200m 512x512.
# TAGS:
# All of the core simulacrum tags were sorted using the template above.
# STYLE TAGS
# This was essentially trained with a three fork mindset;
# 3d, realistic, anime
# Using one of those 3 will give the best results.
# 3d
3d
3d (artwork)
blender (medium)
digital media (medium)
# realistic -> anime mostly
anime, realistic
# 2d
# there is a multitude of 2d elements in v3
# anime
# too many variations, use years or series.
# NSFW
# These cause a clusterfuck of things to happen, better to negative them for now.
safe
questionable
explicit
nsfw
sex
# AI GENERATED
# AI generated images were identified and blacklisted mostly.
ai generated # -> There were plenty left in for negative prompting < 5%
# MONOCHROME / GREYSCALE
# these were identified and pruned mostly, but some exist.
monochrome
greyscale
# AESTHETIC
# these are identified using imgutils
disgusting # -> deprecated, still exists but not as strong
very displeasing # -> new disgusting < 10% quality
normal aesthetic # -> < 25%
good aesthetic # -> < 50%
very aesthetic # -> < 75%
most aesthetic # -> <= 85%
masterpiece # -> > 85%
# GRID TAGS
# 5x5 Controllers
# Each grid point was identified using all bounding box ais used.
# Controls the primary grid system trained into SDXL-SimB-V2
grid_a1 #
grid_a2 #
grid_a3 #
grid_a4 #
grid_a5 #
grid_b1 #
grid_b2 #
grid_b3 #
grid_b4 #
grid_b5 #
grid_c1 #
grid_c2 #
grid_c3 #
grid_c4 #
grid_c5 #
grid_d1 #
grid_d2 #
grid_d3 #
grid_d4 #
grid_d5 #
grid_e1 #
grid_e2 #
grid_e3 #
grid_e4 #
grid_e5 #
# Half Zone Controllers
# Identified using all bounding boxes.
zone_l # -> lower half
zone_u # -> upper half
# Quarter Zone Controllers
# Identified using all bounding boxes.
zone_ul # -> upper left corner
zone_ur # -> upper right corner
zone_ll # -> lower left corner
zone_lr # -> lower right corner
# 3x3 Legacy Grid
# Identified using all bounding boxes.
Legacy and deprecated, use as your own risk.
depicted-upper-left # -> DEPRECATED
depicted-upper-center # -> DEPRECATED
depicted-upper-right # -> DEPRECATED
depicted-middle-left # -> DEPRECATED
depicted-middle-center # -> DEPRECATED
depicted-middle-right # -> DEPRECATED
depicted-lower-left # -> DEPRECATED
depicted-lower-center # -> DEPRECATED
depicted-lower-right # -> DEPRECATED
# Size Controllers
# Identified using all bounding boxes.
size_s # -> small < 25% of overall image
size_q # -> quarter-frame 25% to 40%
size_h # -> half-frame 41% to 70%
size_f # -> full-frame 71% to 100%
minimal # -> DEPRECATED - small < 25% of overall image
quarter-frame # -> DEPRECATED - quarter-frame 25% to 40%
half-frame # -> DEPRECATED - half-frame 41% to 70%
full-frame # -> DEPRECATED - full-frame 71% to 100%
# Text Tags
# Identified using the OCV text AI in IMGUTILS
# The text bodies were checked for ENGLISH or JAPANESE, all other languages omitted for now.
# Chinese will be in V3 after testing.
text "text goes here"
japanese text "japanese goes here"
english text "english goes here"
# Depth Tags
# Identified using MIDAS and depth analysis.
#These are an experiment using midas depth dot normalization comparison.
# I have no idea if they work yet.
behind another # one detection behind another
above another # one detection above another
below another # one detection below another
to the side of # one detection to the side of another
lower left of # one detection to the lower left of another
upper right of # one detection to the upper right of another
from side of # one detection from the side of another
from behind of # one detection from behind of another
from above of # one detection from above of another
from below of # one detection from below of a number
# People Tags
# Identified using the PEOPLE AI.
person
1people
2people
3people
4people
5people
6people
7people
8people
9people
10people
11people
12people
13people
14people
15people
16people
17people
18people
19people
20people
21people # 21-25 introduced for v3
22people
23people
24people
25people
# HalfBody
# These are detecting upper body positions for solidity.
upper body
# Head
# This is used in substitution of the Booru head identifier.
head
# Eyes
# These are relatively untested, but it should work.
eyes
# Imgutils + Hagrid
# these should implement hands from the imgutils ai and hagrid ai
# hagrid handles real, while imgutils handles anime
hand -> a normal identified hand, quality varies.
blurry hand -> a normal blurry hand, taught intentionally for negative prompt
bad hand -> an intentionally deformed hand taught for for negative prompt
# Booru Tags
head
collarbone
nude, breasts
sideboob
covered navel
no panties, cleft of venus,
ass, covered ass
ass, bare ass
sitting, spread legs, split
sitting, spread legs, split, nsfw
standing split, standing on one leg, leg up
standing split, standing on one leg, leg up, nsfw
hips
wings
feral
front view, from above, facing viewer, ass up, top-down bottom-up
rear view, from behind, ass up, top-down bottom-up
# Booru P
penis
cum
penetration
fingering
cunnilingus
paizuri
handjob
oral sex
tribadism
# NudeNet Tags
pussy, covered pussy, cameltoe
nude, breasts, ass exposed
nude, breasts, breasts exposed
nude, pussy, pussy
nude, pectorals, pectorals exposed
nude, anus, exposed anus
feet, bare feet
navel, covered navel
feet, covered feet
armpits, covered armpits
bare armpits
navel, exposed navel
penis, exposed penis
anus, covered anus
breasts, covered breasts
ass, covered ass
# Object Tags
# Identified using a modified yolov8 segment anything
# meant to edge find better and identify better
# each bounding box was blur check proliferated and then grown by 20%
bicycle
car
camera
motorcycle
airplane
bus
train
truck
boat
traffic light
fire hydrant
stop sign
parking meter
bench
bird
cat
dog
horse
sheep
cow
elephant
bear
zebra
giraffe
backpack
umbrella
handbag
tie
suitcase
frisbee
skis
snowboard
sports ball
kite
skateboard
surfboard
baseball bat
baseball glove
tennis racket
bottle
wine
cup
fork
knife
spoon
bowl
banana
apple
sandwich
orange
broccoli
carrot
hot dog
pizza
donut
cake
chair
couch
potted plant
dining table
bed
toilet
tv
laptop
mouse
remote
keyboard
cell phone
microwave
oven
toaster
sink
refrigerator
book
clock
vase
scissors
teddy bear
hair drier
toothbrush
# STYLE TAGS
older
1990s
90s (style) -> some images have this
90s -> some images have this
2000s
00s -> some images have this
00s (style) -> some images have this
2010s
newest
# QUALITY TAGS
lowres
highres
absurdres
# CAMERA TAGS
close-up
portrait
cowboy shot -> doesn't work very well
full body
from side
side view
from behind
rear view
from above
above view
from below
below view
# POSE TAGS
# These should respond well to camera tags.
standing
squatting
kneeling
jumping
bouncing
crawling
climbing
sitting
riding on
lying
on side
on back
on stomach
prone position
all fours position
# SEX TAGS
# if it's a sex pose, it's probably got position after it.
# there are at least 80 from r34, and 40 from gelbooru. Good luck.
all fours position # -> they are all like this