Stable Cascade - First Impressions

Hey, everyone!

Many of you have already heard of the new player in Model Town, 'Stable Cascade', and I've seen some people share their thoughts about it, both positive and negative, so I thought I would do the same.

Keep in mind, I am not the most technical person. My opinion is based on aesthetic and it's capabilities.

I will not be providing any instructions on the installation process. Please follow the instructions provided on the Stability AI Github. Since we are talking about it though, I will be 100% honest and say it was not the easiest to get installed. But as I said, I am not the most technical of the bunch; I know enough to get by. However, I will not hold this against Cascade, as I am sure that as time goes on, there will be some big-brain geniuses who will help make it more accessible to the rest of us.

Generation size examples:

The biggest question that I wanted answered right away: How does it look?

The answer? Great! (for a base model) Here are a couple of images from this prompt at 1024x1024:

Right off the bat, it doesn't seem like a whole lot more than what you can get currently with most SDXL models. Hell, even the SDXL base model can probably get you this. But it's when you start looking at how you can push it that it starts to get more interesting.

Let's see what happens when I increase the size to 1280x1280, which is kind of the high limit on SDXL for the maximum empty latent size before you start getting some weird results:

Some minor but notable improvements in the detail in the eyes, but no noticeable distortion (which you would usually get when generating higher than 1024x1024 with SDXL).

But that's all fine and dandy; this is supposed to be the next step, and going up from 1024 to 1280 ain't all that impressive. So let's kick it up a notch, shall we?

1536x1536

With SDXL, a text2image generation at this size would be completely unusable. Most of you know what I mean, but if not, go ahead and try to see what happens; you will most likely get warped anatomy, extra eyes, long necks, etc. But here, that is obviously not the case! I'd say that is a huge upgrade, but that is just my opinion, and we will weigh the pros and cons here in a minute.

Just for fun, I wanted to push it beyond its limit as well, and 2048x2048 is still not there (which is fine; this is already more than a baby step).

Hope you enjoy the nightmares. :)

Now, let's talk about speed.

At a single 1024x1024 resolution with CFG:4, I am enjoying 9.5it/s on my RTX 4090 (24GB VRAM), with batches of 2 around 5.25it/s – very nice.

When working with 1536x1536, I am getting around 5.5-6it/s, with batches of 2 getting around 2.5it/s. For the image size and no upscaling steps, I see this as an absolute win.

VRAM usage gets up there a bit, which is why I don't think SD1.5 or XL are going anywhere anytime soon. I've been sitting at a very cramped 15-20GB usage for Cascade. It's not going to be something you jump to if you can barely run SDXL on your GPU.

Thoughts:

All technical bits aside, because this is a different architecture than your usual SD1.5 and XL models, I really like the out-of-the-box quality. And that's coming from a guy who has never liked any of the base models released before.

It seems to handle a good amount of styles and is far more capable with anime/cartoon styles than I remember base SDXL being, but it has been a while, and I do not wish to dig that grave.

Hands are a little bit better! That's nice :)

I saw that it boasted better text in images, but I have not seen a crazy improvement myself. Oh well.

The speed feels like a small improvement over SDXL, and it's not the thing I am most impressed about with Cascade, but we are pointed in the right direction.

This model is far from NSFW compatible, which I know is a problem for a lot of people. I know it was when SDXL first came out, but that problem was remedied in a very short time, as I think it will be for Cascade as well. Be patient!

There is a lot of information I can't really speak to, such as LoRa training, image-to-image, and fine-tuning, but more on that later.

I am not going to switch over to Cascade full-time for the time being, as I'm certainly not going to be one of the first few who crack the fine-tuning and LoRa training process. I also enjoy ComfyUI too much and will continue to do so. But I am very excited for this step and think that once the official release comes out, we are going to see a lot of really cool stuff!

Click here for more examples I made using Stable Cascade!

Thanks for reading, if you would like to continue the conversation, please consider joining the AIpub Discord, I have a load of great friends there who love talking about AI art generation and would love to have ya!

https://discord.gg/DBvh2vPbzR

-J

Stable Cascade - First Impressions

Hey, everyone!

Generation size examples:

Thoughts:

Click here for more examples I made using Stable Cascade!

Comments