Type | |
Stats | 3,147 |
Reviews | (254) |
Published | Sep 25, 2024 |
Base Model | |
Usage Tips | Clip Skip: 1 |
Hash | AutoV2 0D4DA4AE59 |
Please Read Description
NatViS (Natural Vision) is a photorealistic full-parameter fine-tune of SDXL that uses Natural Language prompting to generate high quality SFW/NSFW images. Trained on 1M+ image-caption pairs on a dataset that’s been expanded and refined for over a year.
Note: NatViS is still being trained. V1 (epoch 68) wrapped up training on July 19th, 2024.
Update: See Here for more info on the next version (v2.7)!
Buy me a coffee ❤
https://ko-fi.com/ndimensional
I’ve never been a fan of e-begging, however SDXL fine-tunes at this scale are becoming expensive to tune. So I will begrudgingly ask; if you like what I do and would like to support my models. Consider donating on Ko-Fi 💗
I will be begin posting updates, answering questions, taking feedback, and releasing early access (NOT EXCLUSIVE) models to supporters.
All donations will be used to fund the creation of new Stable Diffusion fine-tunes and open-source AI tools.
Changelog
10-26-24 NatViS v2.5 Lightning 4step (Not Recommended!):
Uploaded 4step Lightning version of NatViS 2.5
ONLY USE IF NEEDED
============
10-25-24 NatViS v2.5 Lightning 8step
Released 8step Lightning version of NatViS v2.5. Read About this version
Note: Unlike my previous 8step lightning releases; this version is a simple merge with the SDXL Lightning LoRA. I did this due to requests for low CFG.
Sample images may not be the best representation of the model as a result of me not fully understanding the quirks of Lightning.
I will be releasing the FULL CFG 8step lightning version as well, since it appears to preserve more of the fine-grained features from the fine-tune.
============
10_23_24 NatViS v2.5
What's New?
Uploaded NatViS v2.5
Updates to text-encoder(s) to reintroduce tag/booru-style prompting capabilities that were broken in v2.0
Subset of data included from new (improved) dataset, specifically image-caption pairs with short n' punchy captions.
Info on new dataset (for future models/update): Includes more variation of caption styles and all automation is manually verified by a human (i.e., me).
Introduced more analog photography and classic cinematic film image data to further the push for more authentic realism.
What's Next?
General:
Review SD3.5 license to see if it's worth touching.It's not terrible. Will start research into models architecture for fine-tuning/LoRA.
General: Release Anti-Pony Alpha model (Anime, Digital Illustrations).
In advance, it's not nearly as robust as Pony. This is a test to see if there's enough interest in the idea to pursue crowd funding for training.
Trained with character knowledge and quality in-mind, novel booru+ tagging system & natural language prompting, multiple styles/mediums, artist knowledge, no silly quality ranking tags, SDXL compatible (i.e., not overfit and broken)
More info will come out soon.
NatViS: Release of Lightning variants for NatViS v2.5.
Done more effectively this time.
NatViS: Finally getting around to creating, and releasing a PDF guide.
NatViS: Continue fine-tuning of v3.0.
============
10_2_24 NatViS v2.0 Lightning 4step
Uploaded 4step lightning model for v2.0
============
10-1-24 NatViS v2.0 Lightning 8step
Uploaded 8step lightning models for v2.0
============
9-25-24 NatViS v2.0
What's New?
Prompting: This update focuses primarily on the text-encoders. Natural language prompting capabilities have been improved to follow less-strict formats and relies less on using specific tokens.
Ethnicity and Demonym: Increased accuracy of phenotypes for various ethnicities and demonyms. Not just limited to body structure, but also includes clothing, hair, landscapes, ect.. See here for small examples.
Camera EXIF: Inclusion of Camera EXIF data for popular modern and analog cameras that can be prompted. Includes, Camera Name, Focal Length, f-stop, ISO, shutter speed, lens type. Also includes attachments such as ND filters, polarizers.
Analog: Improvements to analog and vintage photograph generations.
Lighting and shadow: Prompt how light (or thereof) interacts with objects/subjects in the scene. Amongst other general lighting related modifiers. More info soon.
Skin Textures: Small improvements to the detail of skin textures with less or no explicit token related to skin detail.
Implementation of Pseudo Instruction: This will require a more lengthy write-up.
Better male anatomy.
Lesbians.
What's Next?
Lightning models will be released within the coming days.
Full PDF guide and documentation within the next week.
Info on v3.0 within the next month.
8/4/24 NatViS v1.0 Lightning 4step
Uploaded 4step lightning version of v1.0 (See About this version for more info).
============
8/3/24 NatViS v1.0 Lightning 8step
Uploaded 8step lightning version of v1.0 (See About this version for more info)
============
8/2/24 NatViS v1.0
Initial Release
Usage Tips
Note: These are simply recommendations, feel free to experiment.
Prompting
NatViS leverages SDXL’s bigG text-encoder to allow for Natural Language prompting.
What is Natural Language Prompting?
Since the release of Stable Diffusion v1.4 — people have become accustom to comma delimited lists of visually descriptive tags/phrases. This was a necessity for early Stable Diffusion models due to the architecture and choice of text-encoder. With SDXL’s dual text-encoder/tokenizer architecture we are able to write more naturally descriptive prompts.
Simply describe the image you want to generate, just as you would describe the image to a person.
For example;
Comma delimited list: a woman, standing, outdoors, sun beams, dappled light, apple tree, wearing denim jeans, flannel shirt, brown hair, long hair, looking at viewer, highest quality, atmospheric, 35mm, masterpiece
Natural Language: A masterpiece, 35mm-style photo of a woman with long brown hair, standing outdoors in dappled sunlight beneath an apple tree. She wears denim jeans and a flannel shirt, gazing directly at the viewer with an atmospheric quality.
Note: This is just an example to highlight how to write a natural language prompt. For better examples, see the sample images.
Will NatViS Understand Everything I tell it?
Absolutely, not.
Due to various limitations in both the architecture and size of the data I’m able to fine-tune as one person. There will be instances where the model will simply not generate what you want. Often, you experiment with different wording, placement of tokens (i.e., moving a sentence or individual token closer to the start or end of a prompt), remove potentially conflicting tokens, ect… Their really is no definitive solution I can, as it varies from prompt-to-prompt. Unfortunately there will times when no solution/workaround is successful.
Can I still use Tags?
Short answer: Yes
SDXL’s dual text-encoder/tokenizer architecture can process tokens/sequences with both encoders in parallel. Meaning, you don’t have to use natural language prompting.
Note: Since the training data was purely captioned with Natural Language descriptions, not all the common descriptive tags people are familiar with will be understood by the model. Especially Booru, Booru-style tags.
I found a hybrid system works well, as seen in many of the sample images.
For example;
Say you tried your natural language prompt, but want to make the results a bit more cinematic. Instead of modifying the entire prompt; you can simply append cinematic lighting, harmonious, film still, ect..
To the end of your prompt.
Quality Tags/Classifiers? (score_up_x
)
Blasphemy.
You can use quality rank/classifiers if you want. But they will not part of the training data.
Negative Prompt
Similar to other SDXL models. Use tags separated with commas and keep it short. Add/Remove tokens from the negative prompt as needed.
Generation Parameters
CFG:
Recommended: 5-7
7+ to enforce a specific style/medium
Sampler/Sampling Steps:
This can be quite subjective, so I will just share what I typically use instead of giving direct recommendations.
Sampler - DPM++ 2M SDE
Scheduler - Karras
Steps - 55
ADetailer: (Extension)
Link
Again, subjective so I’ll just share my settings.
Model - mediapipe_face_full (use mediapipe for photorealism)
Confidence - 0.45
Everything else is default.
CFG Rescale: (Extension)
Link
I forgot that I had this installed, I’m not quite sure if it was enforcing the zero terminal SNR to the noise schedule or not. Since the parameter was null, it shouldn’t have.
Phi - 0
Important
If you struggle to replicate the sample images, even with the exact seed and parameters. It’s likely because of the noise scheduler. I enabled the fix for this in Webui, but had since reinstalled webui and forgot to re-enable it. This only applies to V1 of NatViS.
Training Info
TO-DO
This will take a while to write up. So in the meantime:
TLDR; 1M+ images, processed/cleaned via personal Dataset Toolkit I’m developing, captioned via Multimodal Large Language Model (MLLM) with unified feature space (part of Dataset Toolkit, not GPT). Training Data, Configs, Custom Scripts will be made available and open-sourced when the final version is released. Dataset Toolkit has no announced release date.
Check out my other models
SDXL Checkpoints: https://civitai.com/collections/966964
SDXL LoRAs: https://civitai.com/collections/966969
40K Series: https://civitai.com/collections/956187
SD1.5 Checkpoints: https://civitai.com/collections/966974
SD1.5 LoRAs: https://civitai.com/collections/966972