Sign In

ChromeShineXL

126
843
28
Updated: Sep 4, 2024
styleanime
Verified:
SafeTensor
Type
Checkpoint Merge
Stats
374
Reviews
Published
Jun 29, 2024
Base Model
Pony
Trigger Words
score_9, score_8, score_7_up
Hash
AutoV2
C63265381E
default creator card background decoration
Jemnite's Avatar
Jemnite

(If you're using this on PixAI, it's pretty possible that the recommended parameters are garbo. I can't control what people put in their imports, sorry.)

A Flavor-of-the-Month style mix developed off of the base of MIX-GEM-XL. The primary raison d'etre for this model is to have bright dynamic lighting combined with good backgrounds and unconventional prompts, such as centaurs or android girls.

ChromeLightXL is the Style Extract for this model. From what I can tell it can retain maybe around 55% to 70% of the model's style, however it is much smaller and can be combined with various base other models to so that you don't have to switch off your favorite model if you don't want to (or if you are committed to using base pony or autismmix til the end of time).

Versions

Currently there are two versions. Prototype (proto) and Mass Production (MP). The naming for this is supposed to be ala Gundam, where the 'prototype model' is more powerful and stronger, however it is a lot harder to use and a lot more constricted in what it can do. Here's the breakdown:

  • chromeshinexl_proto: This version is more unique and carries more of the strong qualia that carry ChromeShineXL. The way it does skin tones and textures is to put is frankly, superior to MP's in the vast majority of cases, as well as the way it does framing of characters. However, it has lower prompt adherence, worse backgrounds, and is way worse at handling loras with unclean datasets. If the loras you are using is not meticulously cleaned of any watermarks, signatures, or patreon/weibo/twitter logos, they will make it into the generation.

  • chomreshinexl_MP: This version is less unique, but more stable. It is better (but not perfect) at dealing with loras with dirty datasets and is much better at at prompt adherence, which means it is much better at listening to what you actually prompt (the effect on the tag dynamic_posing, I noticed is especially big). That is not to say that it will necessarily create better images, if your prompts are bad (contain lots of contradictory tags, have typos/invalid tags, throw in a lot of photography/lighting terms that you don't actually want), the image might probably actually be worse. Additionally it's somewhat more accurate at coherence of both outfits and backgrounds.

If you want my recommendation on which one to use I think it generally depends on your use case. Are you going to rely on the inherent knowledge of the model, or are you thinking of combining it with a lot of character/outfit loras? Do you want to deal a finicky model that often think it knows more than you (and occasionally does), or an obedient one that listens to your instructions even if it knows better? As always, the results are the best proof. Look at the example images to see which suits your needs better.

Prompting

This is a tag-based model, which means you should try to utilize tags primarily, and natural language secondarily, if at all. If you are unfamiliar with the sort of tags the model responds to, most training data for anime style models is pulled from either Danbooru or e621. Both websites contain helpful tag-wikis, which should aid you as a reference.

In any case do not use subjective terms when it comes to AI. This is an observation I often have, but when it comes to tags like best quality, high quality, very aesthetic or score_9, score_8, score_7_up these are not concepts that AI naturally understands, but qualifier tags that are trained into the model (usually based on user score metrics because individually determining the quality of millions of pieces of artwork is impossible for us puny humans). The golden rule of AI is that it only knows what you feed it. (This also means tags like beautiful woman or perfect face do not have an effect unless they were tagged during training, which is very unlikely given source/autotaggers.)

Anyway, for negatives, it's up to you. The best solution, of course, is to slowly modify each negative for every generation over multiple successive prompt modifications on the same seed, but if you don't have an eternity some helpful tags for negatives are low quality, extra digits, artistic error, watermark, artist name, signature. e621_p_low serves as an inbuilt general purpose negative quality tag that uses less token count than score_6, score_5, score_4. If you don't trust it, you can always opt to go for the full quality tag chain instead, but IMO it is a better substitute. The preview images serve as a helpful example but as you are, of course, free to modify your own negatives as you see fit.

Sampling and Other Parameters

As is the case with all diffusion models, negatives will have more of an effect the higher the Classifier-Free Guidance Scale (CFG) is. While prompts are what the text encoder conditions the latents with, the CFG modulates the strength. It would take a lot of words to explain how the prompts actually guide the latents, but a quick summary is that unconditional_conditioning (negatives) inhibit certain vectors from being applied to latent space and the higher the CFG, the stronger the inhibition is (and the stronger the conditioning (positives) also are). Of course, excessively high CFG has a tendency to burn the image by inducing too strong of an effect on the de-noising process. My recommendation is to either use Perturbed Attention Guidance (PAG) to enhance the guidance scale without increasing CFG or use Dynamic Thresholding CFG to clamp CFG at early step stages.

My recommended Sampler is Euler A with whatever scheduler you like the most. I've found SGMUniform to work the best (and fastest) for me, but others report liking the AYS Sampler. My personal experience with the AYS sampler is that it's generally more accurate to the prompt most of the time but it also magnifies some of the less desirable qualia that the model learned (due to insufficient data cleaning, mostly) and will occasionally inject things like text or watermarks. If you're willing to try more esoteric samplers, I've found that Euler dy Negative sampler is especially clean. Subjectively it is 'less ambitious' than Euler a, however it is very good at making simple, clear cut, clean generations.

I do recommend 25-35 step count. My default is 28. Frankly speaking you should not venture far beyond that range. Increasing the step count on non-converging samplers (Stochastic samplers and ancestral ones are the two types that immediately come to mind) will dramatically change your image and returns on doing on converging samplers are exceeding minimal once step count exceeds 35. You're just wasting a bunch of compute on inference for no reason. A better solution would just to adjust your other parameters (prompt probably) instead of assuming more steps will fix whatever flaw you're encountering.

Model performs best at 832x1216 or 768x1344