Sign In

Mastering the CFG Scale in Stable Diffusion

Mastering the CFG Scale in Stable Diffusion

I Already Know CFG… Right?

You’ve probably fiddled with the CFG scale dozens—or hundreds—of times by now. You know the basics: a low CFG gives the model more freedom to riff on your prompt, and a high CFG makes it stick to your words like glue. Maybe you’ve even found your “sweet spot” settings. But wouldn’t it be great to really understand how CFG works under the hood so you can push boundaries—or stay comfortably within them—no matter what style or prompt you try next?

That’s what this guide is all about. I’ll show you exactly why certain CFG ranges matter, how to make sense of multiple CFG sliders in WebUI Forge, and how you can tailor these settings.

But what is it!? 

Classifier-Free Guidance (CFG) is a technique that blends two versions of the model: one that’s paying attention to your prompt (prompt-aware) and one that’s more or less ignoring it (prompt-agnostic). By mixing these together, you control how closely Stable Diffusion follows your prompt versus improvising on its own.

Let’s imagine your prompt:

scenery, outdoors, tree, flower, nature, sunlight, grass, rock, day, water, forest, pink flower, bush, river, plant, path, pond

as a song you've written.

How it works:

Now, in the studio, you’ve got two singers:

  • Singer A (Prompt-Aware): Follows the lyrics note for note.

  • Singer B (Prompt-Agnostic): Jazzes it up with freeform scatting and unexpected harmonies.

Distilled CFG: The Rehearsal

*applies to some image generation architectures but not all - flux is an example.

Think of Distilled CFG as the dress rehearsal. This is where you decide how much each singer leads the practice session:

  • Low Distilled CFG (4–5):

    • Singer A and Singer B are hanging out, casually trying out the song. During rehearsal, Singer B’s creative riffs are heard just as much as Singer A’s strict reading of the lyrics. Translation: your initial image might play loose with the details—maybe the pink flower becomes a cluster of blossoms, or your path meanders through the pond instead of around it.

  • High Distilled CFG (9–10):

    • Singer A stands center stage during rehearsal, insisting on the correct lyrics from the get-go: “We’re doing an outdoor scene with a distinct pink flower, a river, a pond—no riffing!” Early on, you’re more likely to see a clearly defined path, pond, and pink flowers in their usual spots. Less chance of bizarre detours or wild color choices—Singer B’s improvisations are dialed way down.

CFG Scale: The Main Performance

After rehearsal, it’s time for the show. The CFG Scale is how you mix the final performance:

  • Mid CFG (7–8):

    • Singer A takes the lead, but Singer B still adds a touch of improvisation. You’ll get a fairly faithful rendition of “scenery, outdoors, tree…” with a pink flower near the path—yet there might be subtle artistic flairs, like the water reflecting unexpected hues of sunlight.

  • High CFG (10–12+):

    • Singer A dominates the spotlight, belting out every lyric precisely as written. That pink flower will be exactly pink—no more, no less. The pond sits exactly where you implied, rather than spontaneously merging into a river. If you push it too high, though, things can start sounding “overproduced”—you might see artificially crisp edges or colors so vivid they feel unnatural.

Examples of Mixing Distilled & Main CFG

  • Low Distilled / High Main

    • Rehearsal: Both singers jam casually.

    • Main Performance: Suddenly, Singer A must stick to the lyrics precisely.

    • Result: The final image is more literal—pink flower is pink, path is clearly defined—but you might see leftover quirks from rehearsal, like an unusual bush arrangement.

  • High Distilled / Low Main

    • Rehearsal: Singer A is strict about the script.

    • Main Performance: Singer B is allowed more freedom in the final.

    • Result: The early layout is spot-on—flowers, trees, water all neatly placed—but the last pass might let some creative flourishes seep in. Maybe the pink flower becomes magenta or the path gains a whimsical curve.

  • Moderate Distilled / Moderate Main

    • Rehearsal: Balanced approach—Singer A leads, but Singer B can suggest variations.

    • Performance: They harmonize without overshadowing each other.

    • Result: You typically get a coherent, natural scene. The “song” is recognizable—trees, pond, flowers in place—but with some nice improvisational texture around the edges, like unexpected grass shading or subtle ripples in the water.

By tuning both Distilled CFG (how the rehearsal unfolds) and CFG Scale (how the final show is performed), you craft the perfect duet of literal adherence to your prompt and imaginative flair. It’s a dance between letting your prompt truly shine and letting the model’s creativity riff—making each image generation a unique show that’s both planned and spontaneously alive.

Samples and expected outcomes

Here are four sample configurations you might try in A1111 or WebUI Forge for our nature prompt. After you generate, observe how each setting interplay shapes the final image.

If you are using an image generation architecture that doesn't support distilled CFG you can ignore that part of the configuration below.

Configuration 1:

Distilled CFG = 4, Main CFG = 6

  • Rehearsal (Distilled = 4): Singer B (freeform) has almost equal say as Singer A (prompt-aware). The model’s initial approach may be more loose—maybe you see two pink flowers instead of one, or a meandering path that cuts through the pond.

  • Main Performance (CFG = 6): Moderately prompt-faithful, but still offers wiggle room. The final image should be recognizable as a scenic environment with a pink flower, but might feature unexpected details like swirling reflections on the water or a winding forest background.

Why It Did That: Early on, the model wasn’t forced to be super-literal, so elements had a chance to take shape in interesting ways. By the final pass, it maintains the spirit of “scenery, outdoors, tree…” without drilling down on every detail of the prompt.

Configuration 2:

Distilled CFG = 8, Main CFG = 7

  • Rehearsal (Distilled = 8): Singer A is leading firmly from the start—trees, flower, pond, and so on are pretty clear. You may see a neat layout of elements without too many odd twists in the initial frames.

  • Main Performance (CFG = 7): Still in a moderately tight range, so the final result holds onto that fairly accurate structure. You might get a reliably “pretty” scene with a clear pink flower in the foreground and a visible path or two.

Why It Did That: Stronger adherence in rehearsal means the foundation is prompt-accurate. A mid-level final CFG ensures the end image doesn’t stray too far from the established layout—expect fewer surprises, but a cohesive scene.

Configuration 3:

Distilled CFG = 4, Main CFG = 12

  • Rehearsal (Distilled = 4): Lots of improvisation. Singer B has plenty of room to experiment: maybe the pond merges into the river, or the pink flower ends up near a rock instead of a bush.

  • Main Performance (CFG = 12): Suddenly, the model clamps down, laser-focused on your prompt. Details become crisp and literal—extremely pink flower, well-defined path, forest that’s distinctly green, etc. If you see any leftover eccentricities from the rehearsal, they’ll be overshadowed by this final push for accuracy.

Why It Did That: Early freedom can lead to interesting or even strange compositional quirks. But the very high main CFG might force those quirks into a more literal, possibly over-sharpened final state.

Configuration 4:

Distilled CFG = 10, Main CFG = 10

  • Rehearsal (Distilled = 10): Singer A belts the lyrics from the get-go: you’ll likely see an orderly arrangement of trees, river, path, pink flower, etc.

  • Main Performance (CFG = 10): That same strict adherence continues straight through to the final image. The pink flower is definitely pink and definitely a flower—no question about it. Every element will be spelled out almost exactly as your prompt intended.

Why It Did That: High Distilled + High Main means there’s minimal breathing room for improvisation at any stage. Great if you want a straightforward, predictable result. But it could also introduce “overbaked” details (ultra-contrasty foliage or unnaturally glossy water).

When you plug these configurations into A1111 or WebUI Forge with our nature prompt, pay attention to how each step influences the final output. Screenshots or a quick generation comparison are super insightful—you’ll see just how the “duet” of Singer A (prompt-aware) and Singer B (prompt-agnostic) evolves from rehearsal to main performance.

Hires. Fix and more singers

Imagine one single performance with two lead singers and two backup singers, all on stage at the same time, each with their own mic and role. They’re all performing the same song (your prompt), but each pair contributes a slightly different layer of harmony and volume.

The Main Singers (Distilled CFG & CFG)

  • Singer A (Distilled CFG)

    • Sets the initial vibe or tone of the performance.

    • If Singer A’s mic is loud (high Distilled CFG), you establish a strong, prompt-faithful baseline from the get-go. If it’s softer (low Distilled CFG), there’s more freeform room at the outset.

  • Singer B (CFG)

    • This is the principal vocalist who gives shape to the final sound in the base resolution.

    • A higher CFG means Singer B stays closer to the exact lyrics; a lower CFG means they allow more improvisation.

Together, Singer A and Singer B define what your “base” image looks like—how literal or creative it is. Think of them as the front line of your prompt’s interpretation.

The Backup Singers (Hires Distilled CFG & Hires CFG)

Meanwhile, behind (but still with) these main singers, you have two backup singers who specifically enrich and refine the overall performance:

  • Backup Singer A (Hires Distilled CFG)

    • Listens to Singer A and Singer B, then adds supporting harmonies that reinforce or slightly adjust the established melody.

    • If this backup singer’s mic is turned up (high Hires Distilled CFG), they’ll closely echo the main vocals, keeping everything tight and on-script. If it’s lower, they might introduce subtle new riffs or variations as they support the tune.

  • Backup Singer B (Hires CFG)

    • Provides the final, high-resolution embellishments—crispy layering, richer tones, more detailed harmonies.

    • A higher Hires CFG means they meticulously match the main singers’ lyrics, ensuring every note is crystal-clear and faithful to the prompt. A lower Hires CFG gives them liberty to experiment with melodic flourishes at this “enhancement” stage.

Together, Backup Singer A and Backup Singer B refine the performance in real time—offering depth and complexity without overriding the main melody.

How All Four Work in Unison

  • Low Distilled CFG / High CFG:

    • Singer A (Distilled) starts out softly, allowing some improvisational vibe.

    • Singer B (Main CFG) then belts out a more literal tune, shaping a fairly faithful base image.

    • The backup singers (Hires Distilled & Hires) step in to polish and expand upon what’s already there.

  • High Distilled CFG / Moderate CFG:

    • Singer A starts strong and literal, establishing a tight baseline.

    • Singer B keeps things relatively faithful, but leaves minor room for creativity.

    • The backup singers can either maintain that faithful approach (higher Hires Distilled & Hires) or inject a bit of flair (lower those values).

  • Moderate Distilled & CFG / High Hires CFG:

    • The main singers create a balanced, moderately literal foundation.

    • The backup singers swoop in with high Hires CFG to ensure every little detail in the final image lines up with the prompt—like meticulously harmonized vocals.

In practice, you’re generating a single image, but it’s as if the music is performed by a small ensemble of vocalists, each with a slightly different role in how strictly they follow the “lyrics” (your prompt). By adjusting their relative volumes (the four CFG sliders), you dial in exactly how faithful or free-flowing the overall performance becomes—both in the base resolution (the two main singers) and the enhanced resolution (the two backup singers).

Denoising Strength: The Backup Singers’ PA System

All four singers perform simultaneously on stage, but the overall power of the backup singers (the hi-res portion) is governed by Denoising Strength, acting like the PA system specifically for them:

  • Low Denoising (0.1–0.2): The backup singers’ mics are kept low. They mainly support what Singer A & Singer B have already established, so the hi-res pass simply polishes your base image without drastically altering composition or style.

  • Medium Denoising (0.25–0.4): The PA system is at a comfortable volume—Backup Singer A & Backup Singer B add noticeable improvements, refining details and possibly enhancing colors or textures. The overall scene remains true to the base resolution’s layout.

  • High Denoising (0.5+): The backup singers’ mics are turned way up. They can re-interpret or even overshadow certain aspects of the initial performance. Want a more dramatic sky or an expanded bloom of pink flowers? High denoising gives them the freedom to riff.

Putting It All Together

Imagine all four singers (two leads + two backups) are on stage from the start, each with their own mic (CFG slider). Meanwhile, the backup singers’ overall influence is shaped by the PA system (Denoising Strength). You still get one single final performance (image), but each singer’s volume setting decides how strictly your prompt is followed versus how much creative interpretation sneaks in—both at base resolution (Leads: Distilled CFG, CFG) and when adding high-resolution details (Backups: Hires Distilled CFG, Hires CFG).

So if you find your final image is too timid and you want unexpected surprises, turn down the main CFG or bump up your Denoising Strength to let the backup singers inject more spontaneity. If you need a laser-accurate scene, push those CFG sliders higher and lower the Denoising Strength—making sure nobody ad-libs too far off script.

One stage, one performance, four singers, and a PA system. CFG settings let you dial them in just right.

33

Comments