Walk, Run, FLY: Comparing Early LCM and Turbo Checkpoint Merges (2023/11/30)

Edit 2: Cross-linking outputs with a shorter article here, and I will not be making future edits to this article.

Boring Stuff

This article results from loading a variety of checkpoints that claim to merge LCM and/or Turbo models into them. The methodology was to produce four images from each using the same prompts, resolutions, seeds, etc. The grids are uploaded here to CivitAI to help you optimize quality across CFG and steps for this new style of checkpoints. Since this is a tedious process, I decided to post it to save some of you the trouble of doing it yourself. I doubt I can add new models to this "as they come out" as some soon-to-come wave would surely drown me. For now, I will try to update it as often as I can with new models as they pop up so you can get apples-to-apples comparisons on these new developments. One model already has a new version that I have not tested, so that will likely be my first edit ...

Quality is a subjective metric. Therefore, you will probably disagree with my own sense of aesthetics, and that's okay. I am not here to judge the strength of each model for every plausible task: only whether they quickly produce reasonably useful images.

A red box highlights the first minimum-viable image. In my opinion, these are images I could "work with" in either inpainting, upscalers, img2img, Gimp, Photoshop, whatever. It doesn't represent perfection - just the "minimum quality" to be useful to me personally. You may look at the same grid and have higher or lower standards for this metric. For this review, it represents the lowest number of steps with the lowest CFG that passes my own standard. The idea is that if you were to set this model to cranking out thousands of images, what are the fastest settings you could use and still have a directory of stuff worth sifting through?
A cyan box highlights what I feel is an aesthetic "sweet spot" for this model & prompt combination. Again, highly subjective, but averaged across multiple models and multiple prompts, it becomes a more useful idea of where the model does well.

A year ago, I was doing much the same thing: exploring samplers, number of steps, and clip skip. Those experiments took me WEEKS. Then they were replaced with checkpoint comparisons. And then those were replaced by Lora weighting charts. And so on, and so on. However, this article only took me one day because some of the images were generated in fractions of a second! Even the most complicated gens in this roundup only took about 10 seconds. The entire bear grid of 49 images typically took 2 minutes and small change, and was even faster for SDXL Turbo. A few of the results look as good as anything you could do with other models and a lot more time. Which is truly mind-blowing -- what a year it has been!!!

Test Rig*

RTX 3090, 24gb VRAM
Ryzen 3950X, 128gb RAM
Linux Mint 21.1 Vera

*I have a custom power management script which cuts the usable power to the GPU down from 370W until GPU temp stabilizes @68C. Above that, the rig is unstable until I upgrade heat management (water cooling, greater forced airflow, etc.). When there is a sustained load (quite often), it is usually running around 220W. Your performance-to-heat-management situation may produce somewhat different results even with similar hardware.

The Results

Checkpoint #1: SDXL Turbo

Comparison Grids: CivitAI Post

Speed settings: 1 step @ 1.0 CFG

Sweet settings: 1 to 6 steps @ 1.0 CFG

This checkpoint is truly mind-blowing and one-of-a-kind. Spoiler alert: It is the speed king (so far) by a LOT. I actually moved the starting point for some other checkpoints to 4 steps because they were entirely useless below that. (Those charts stop at 10 because Comfy's Turbo Scheduler only goes to 10.) But not Original Recipe Turbo. In fact, the quality here is trailing off just about the time the other ones hit their stride. And they really mean that 1.0 CFG business. Maybe you can get away with 1.1 or 0.9, but I wouldn't bother. It is clearly optimized near 1.0 CFG either intentionally or as a byproduct of the unique training method. The sweet spot for most prompts is REALLY small. There's no need to gen anything past 6 steps, and a decent card can rap out a 6 image batch in a second or two. Everyone who uses it will instantly realize how much of a game-changer it is. Yes, the lower resolution is a hassle. Yes, there are quality issues with some subjects. Yes, all your hands will be janky, again, so you probably need to inpaint/upscale with your favorite hand-fixing tricks. But for just "brainstorming," this thing is a whole other level.

Checkpoint #2: Blue Pencil XL LCM

Comparison Grids: CivitAI Post

Speed settings: 5 steps @ 1.5 CFG

Sweet settings: 6-7 steps @ 1.0-1.5 CFG

I'm a big fan of Blue Pencil XL, so when I saw this was merged with LCM, it was my gateway drug to the rest of this frenzy. I wanted to know how things could differ from regular XL and the new Turbo model. This checkpoint has very explicit instructions for use (4 steps @ 1.0 CFG). Were they right? I hate to argue with the creator's recommendation as I assume they have run more gens with their own model than I have, but I do beg to differ the tiniest bit here. While it is true that you can get useful stuff in 4 steps, sometimes, maybe ... I would not recommend that as your everyday settings. The quality improvements for a few more steps are worth it every time, here. If your goal is 1,000 anime waifus per hour, then this should be your tool of choice: Batch 6-7 steps with 1.0-1.5 CFG and it will give you something nice almost every time. Next, I'll be eager to put this to the test of upscaling anime inputs. It will probably take less time to make them 4k than it took to produce some of my low-rez images from last year.

Checkpoint #3: Hephaistos NextGENXL (LCM)

Comparison Grids: CivitAI Post

Speed settings: 7 steps @ 2.0 CFG (but - don't do this)

Sweet settings: 8-9 steps @ 2.0-2.5 CFG (batch)

This model had the most conservative recommendations (12 steps @ 3.0 CFG), but still shows how LCM can reduce the necessary step count for existing models. It remains to be seen whether merging LCM is a better approach than just using the Lora, and I do not have enough data to make that call, yet. That said, this was by far the "slowest" model and it was tricky to get something nice out of it. Colossus and Hephaistos XL are considerably easier models to use (from the same creator), and perhaps there is some way to compare using them with the Lora vs this model. I do not want to discourage anyone from trying it: It can produce some interesting results (I really liked some higher-CFG angels). However, in this roundup, where the primary criterion is speed, I feel safe suggesting that this version of this model ... is not in the same league as the others.

Checkpoint #4: LEOSAM HelloWorld Turbo + LCM

Comparison Grids: CivitAI Post

Speed settings: 5 steps @ 1.0 CFG

Sweet settings: 7 steps @ 2.0 CFG

Curiously, the "sweet spot" for this model seems to be along a diagonal in the charts. The fastest you might get something useful is typically 5 steps @ 1.0 CFG. However, the next best would be 6 @ 1.5 and then 7 @ 2.0 and so on. This pattern is reasonably consistent with this model. You can do more CFG if you have time for more steps. Most of the other models don't see "burn" in the lower CFG if you keep adding steps (I assume they would at some point, but my tests were limited to 10 steps max). Much like original SDXL Turbo, that does happen here. Other models in this roundup have a sweet spot that looks like a wedge in some charts, but this one is narrower. This is not a great "turbo" model where you get real-time feedback as-you-type, but the improvement in image quality along diagonals makes me wonder if it would be a natural for multiple passes at upscaling. The downside to that theory is that it did have some resolution issues just over the border of 1024. That tends to be a tell for a model that hallucinates more when you upscale (if it starts at a very minor upscale like 1024 --> 1280, then you'll definitely need to use controlnets and other tricks to keep the image cohesive at 4k+). Either way, the sharpness and style of this model are compelling enough for me to keep trying new things with it. I mean, decent hands at 6-8 steps? That's miraculous. Even one of the two-headed angels honestly looks pretty good. I've legit been on uglier dates ...

Checkpoint #5: PixelWave Turbo

Comparison Grids: CivitAI Post

Speed settings: 5 steps @ 1.0 CFG (or any number from 1.0 up to about 1.5 CFG)

Sweet settings: Batch 6-8 steps from 1.0 to 2.5 CFG

This one produced some fun results. There are tangible differences in the final render between 4, 5, and 6 steps almost every time. Changes in CFG value can also significantly vary the result. Not just the same image getting sharper: If you look at the grids, you can see whole objects change. The pencil sketch test was wild: Either (1) the "sweet spot" is huge for this model, or (2) if you are more critical, it is a less consistent model. A few different hot spots of nice quality pop out in unattached spaces on the charts with different vibes. This led me to highlight large "sweet spots" even though a few of the images in there are not really what I would consider "sweet" quality. In the end, I settled on some more conservative settings than the creator's recommendation (5 steps @ 2 CFG), but I am impressed at the range of CFG values this model well-tolerates. It gives you nice, "high" resolution images in 6 steps very consistently. Plus, if you batch it with some higher CFG values, you can get 6-8 takes on your prompt/seed combo very quickly. A month ago, that would have seemed like black magic. However, today, it is one fast option among many similar-to-faster options. If I were to give this model a niche, I would suggest creating many images in a batch to leverage the larger sweet spot and reduce process overhead. You might find that overall images-per-hour is on par with some "faster" models here, and you would have some diversity in the outputs, as well.

Checkpoint #6: Realities Edge (LCM + Turbo)

Comparison Grids: CivitAI Post

Speed settings: 6 steps @ 1.0 CFG

Sweet settings: 8 steps @ 1.5 CFG

This one works well with low CFG and SDXL resolutions, but not a Turbo-level number of steps. Still, workable images emerge between 4 and 7 steps and improve with additional steps. "Burn" (and some chroma noise) starts to come through as you raise CFG, but raising steps often overcomes the new artifacts. That said, the detail gain for CFG above 2.5 is marginal compared to more steps. With more time, I would probably explore this model @ 9-10 steps and CFG 0.9, 0.8, 0.7, etc. Some of the CFG 0.5 images were intriguing, and I feel like there’s an unexplored world between the low CFG grid lines of these charts for this model. This model has less "burn" at higher steps for a wider variety of prompts which makes getting higher quality images at 7+ steps pretty likely.

Checkpoint #7: Tertium (Turbo)

Coming soon ...

Speed settings: 4 steps @ 1.0 CFG (or maybe 0.9 ...)

Several times the grids tempted me to choose a Step 3 image for this model. First of all, super-impressive and the closest to SDXL Turbo that I've had in this roundup in terms of cohesive images at low steps. There's a little magic happening in the early steps here. However, like so many others, the sweet spot seems to come just a little later. For this model, it amounts to a "sharpening" effect, and I wonder if just settling for step 3 or 4 and kicking it to upscaling with the same model on a higher step count might resolve it. Such a workflow would probably give you a hires-fix-quality image and resolution in a tiny fraction of the time. The thing that is fun about playing with these low-step models is that the imagination wanders into such territory all the time, and I will not get to do even 10% of the things I have thought up since the release of Turbo! But I digress. Tertium has lots of fun sweet spots "between the lines" that have inspired my imagination to look further - especially a little under the 1.0 line. On step 8 @ 0.70 CFG, the Warrior image has excellent facial expression and a background that deserves a story. Good stuff!

Checkpoint #8: TurboVision XL (Turbo)

Comparison Grids: CivitAI Post

Speed settings: 4 steps @ 1.0 CFG

Sweet settings: 5-6 steps @ 1.0-1.5 CFG

This one has a semi-real style that reminds me of the old Protovision models, or maybe early Deliberate. This one may not be "real-time" enough to generate images in a single step, but it was the only other model that put up consistently decent images at 4 steps. I put together a special Comfy workflow for this model to give me the 4 images from the "sweet spot" on one hit because it is unusual that at least one of them isn't interesting. Some chroma noise sneaks in here and there, and the (author? publisher? owner?) says they are actively working on better coherence. Dropping to 3 steps would be a 25% improvement in overall performance at this point. If I were to bet on one of these getting close enough to the original Turbo to be considered "real-time," it would be this one.

Note: A new version of this model is already out, so I'll run similar tests on it when I get a chance and put any new insights here.

Gen Info

Grid #1: 1280x1024, 3D render style. Seed: 1060170675707242

Prompt: 8k 3d render, ((massive cyborg wings arching toward viewer)) pearlescent holy energy glow, legendary posed angel 1milf wearing minimal elegant tiny futuristic nanotech body paint

"Back wings" are notoriously difficult for AI, and a favorite topic of mine. Realities Edge nailed this one with several decent gens in the grid. Yes, I know that Turbo is best at 512x512, but that doesn't work for everything. So, what's the gap between resolution jank with Turbo and some of these other checkpoints depending on LCM for a boost? Turns out it's not too bad.
circuits bioluminescent, levitating halo holy yellow_energy rings above head orbiting luminous

Grid #2: 1280x1024, digital art, psychedelics style. Seed: 4946178987469.

Prompt: blacklight poster fluorescent palette trippy auroras over the bioluminescent mushroom forest digital art style

I printed one of my gens in a way that reacts to blacklights and the effect was stunningly beautiful. I'm hooked on this art style, now.

Grid #3: 768x768, photorealism, cinematic. Seed: 54057823884935

Prompt: warrior killer barbarian ultimate muscle definition wearing a loincloth and attacking toward the viewer with magical rune sword luminous photo cinematic legendary fighter vivid foreground, background barren landscape gray wastes desert blasted place accursed

Clearly, Turbo has challenges at higher resolutions, but just turning up the resolution a smidge can get you better hands, weapons, and faces. Yes, you might also get janky swords, spaghetti fingers, and orb-faces, but it did pretty well on this one so long as you stuck to 1.0 CFG.

Grid #4: 576x576, pencil sketch? Seed: 236602316077921

Prompt: abstract pencil sketch of a bear getting stoned on the shore of a river, minimal background, bear is blowing green smoke into the air in contrast to the charcoal shading

Okay, you caught me. I never used 512x512. However, I wanted to see what stretching it by one block in each direction might give me. The results were pretty good. No regrets. I mostly ran SD 1.5 at 576x576, too ...

Conclusions (2023-12-01)

I'm sure whatever I write here will be obsolete in a matter of days, but I'll go ahead and offer some "first impressions" because there aren't many available yet. First, merging SDXL Turbo is not enough to get the Turbo effect. It might be helping to improve the quality of the models that already merged LCM in the same low number of steps, but I'm skeptical it is having much effect on the "Turbo" models. Having the UI put up images before you finish typing the first word of the prompt is a surreal experience, and none of the others quite match the quality-to-speed ratio of SDXL Turbo.

The quality and ease of using TurboVision make it the second-most compelling. If you are willing to wait a few extra steps to get your image, you can get much higher quality and higher resolution from this one on a consistent basis. As a "general model" for high speed with improved quality over the base SDXL Turbo, this is the one I'm using. There has even been a new release as I am writing this, so the community is well on the way to adopting and improving on these speed increases.

My thinking is that these higher-quality, not-quite-as-fast models will find their true home in upscaling workflows. Improving upscaling performance by 5-10x takes the sting out of upscaling everything in a batch. I used to do that: Hires Fix everything in the batch, and my batches would run for 2-3 days. If that process got down to ~1 minute/image, then that would be amazing. Or perhaps their strength will be video where the frames are similar to each other and a tiny flaw in one frame is easily erased over the course of several frames. Rendering videos 5-10x faster will absolutely mean more videos for us to share. So that's exciting, too!

Guess I'll be comparing some of these checkpoints for upscaling this weekend. I'm particularly curious to put TurboVisionXL as a "generalist" vs. Blue Pencil on some anime upscaling. If anyone reads this far and wants to do that for me, I won't complain ...

Edit 1 (2023-12-02)

I've put an excessive amount of time into ComfyUI in the past 24 hours (haven't we all, though?). I can now grind out these grids at a higher speed, so hopefully, this helps me keep on top of new developments a bit easier. As I feared, there have been MANY speedy models dropping this weekend.

One lesson I've learned is that DPM++ SDE creates more useful images with a lower number of steps. While several samplers likely work fine with these models, I have gone back to standardize on this one and "re-shoot" the original grids so they all use this same sampler. I was using what worked fastest on my rig, but the fractional difference in speed at low steps means I should have opted for quality. I've posted links to all of the newer grids, and my Comfy SDXL Turbo Grid Maker is available on the Comfy Workflow site.

While zooming in on various grid positions, I constantly had to scroll to the edge of the previous images. The new grids have their stats above every cell. You may love it or hate it, but it is a slight quality-of-life improvement for me, so there you have it.

I'm also including a few "zoom-in" grids where sweet spots or other interesting images seem to pop out. As the time scale has moved into smaller numbers, what I see in these grids tells me that new images are available at smaller increments of CFG - sometimes, even below 1.0!!! These are smaller grids that can be uploaded with lossless compression for you sticklers out there. The big grids have 1% compression to fit in CivitAI's file size limits.