Sign In

SciStyle

173
912
37
Verified:
SafeTensor
Type
Checkpoint Trained
Stats
578
Reviews
Published
Jan 4, 2024
Base Model
SD 1.5
Usage Tips
Clip Skip: 1
Hash
AutoV2
002ECBC866

SciStyle

v1 of SciStyle is a test model for a new image captioning pipeline I've been working on. The model was trained on a subset of 1k images of various styles/mediums. Surprised by the results for a model trained on only 1k images, I decided to release it here. The full model is currently being worked on.

For more info on the image captioning pipeline, refer to my Discord thread linked bellow


Questions/Feedback/Updates?

Visit my thread on the Unstable Diffusion Discord


Info

S&D

Base Model: Stable Diffusion v1.5

Type: Experimental Fine-tune

Clip: 1

Medium: Multi-medium

Caption Style: Natural Language + Booru Style

Dataset Size: Subset, 4k images out of 25k images + DnD dataset

Training Resolution: 768x768

Difference from v1: More fantasy focused, additional training on a DnD dataset.


V1

Base Model: Stable Diffusion v1.5

Type: Experimental Fine-tune

Clip: 1

Medium: Multi-medium

Caption Style: Natural Language + Booru Style

Dataset Size: Subset, 1k images out of 25k images

Training Resolution: 768x768


V2

Base Model: Stable Diffusion v1.5

Type: Experimental Fine-tune

Clip: 1

Medium: Multi-medium

Caption Style: Natural Language + Booru Style

Dataset Size: Subset, 6.5k images out of 25k images

Training Resolution: 768x768

Difference from v1: More species from various Sci-fi and fantasy universes.


Features

  1. Multi-medium: Capable of generating images from multiple art mediums, simply include the medium in the prompt.

  2. Natural Language & Booru: Accepts both natural language prompts and booru style prompts.

  3. Extra Detail: Understands subtle details often skipped by SD models. Such as, number of objects/subjects in a scene, background information, color information for various parts of the image, atmosphere, ect.. (see my discord thread above for more info on how this is achieved.)

  4. Flexible: Can easily be merged with other SD1.5 checkpoints / LoRAs


Usage

Special Tokens:

  • SciStyle, can be used as a class token at the beginning of the prompt, but is not necessary.

  • Tag for various art mediums, i.e., a comic book illustration of, 90s anime screencap of or, simply add the medium towards the end of the prompt; comic book illustration, photorealistic. These are just examples of tag placement. Feel free to experiment with other mediums


Recommended Settings

Sampler/Solver:

  • Euler a

    • Steps: 20 - 32

    • CFG: 6 - 7.5

  • DPM++ SDE Karras

    • Steps: 30 - 40

    • CFG: 6 - 8.5

  • DPM++ 2M SDE Karras

    • Steps: 50+

    • CFG: 7 - 8

These are just recommendations.

Hires Fix

Settings for all ESRGAN models:

  • Upscale by

    • 1.5 if resolution is > 512x768

    • Don't exceed 2.0 (unless you have a beefy rig)

  • Denoise Strength

    • 0.25 - 0.35

  • Hires Steps

    • If sampling steps > 60,

      • hires steps = half of sampling steps

    • Otherwise, leave at 0

Extensions

ADetailer
Download here

Neutral Prompt

Download here

Read repo(s) Descriptions for usage guides

Negative Embeddings

Only if you want to remake one of the sample images. Personally, I would avoid using negative embeddings and instead use a simple negative prompt and then add+ or subtract- tokens per new idea. I only use them to speed-up inference during sample generation. That being said, other negative embeddings such as EasyNegative, ect.. are also fine to use with this model.


Checkout my other models

SDXL

SD1.5

LoRA