RES4LYF — Complete Reference Guide
Updated & Reordered
I built this guide with the help of Claude Opus because — let's be honest — the sheer number of sampler names, scheduler options, and obscure settings in RES4LYF is overwhelming. Every parameter feels like it could make or break your output, but none of them come with a manual. So I dug into the source code, tested combinations, and documented what actually matters. This guide is my personal reference for getting consistently good results without guessing. If it saves you the same confusion it saved me, even better.
Quick Navigation
Part 1 — The Node Ecosystem
Complete Node Map
Sampler Nodes · Option Nodes
Guide Nodes · Sigma Nodes
VAE & Latent Utility · Conditioning · Mask Nodes
Model Patch Nodes
Part 2 — Samplers, Schedulers, Noise & Conditioning
Quick Start Defaults
Eta — The Randomness Slider
Sampler Name Suffixes · Sampler Families · Implicit Steps
Noise Scaling Modes · Noise Types Reference
Schedulers · Presets · Special Features
Conditioning — Strategy · Regional · Timestep
Part 3 — Resolution, Latent Space & Prompt Theory
Choose Your Resolution — Model Budgets · Bucket Sizes · Alignment Rules
Latent Space, Denoise & Upscaling — How It Actually Works
VAE Operations — Precision Encoding · Upscale · Style Transfer
Precision & Latent Manipulation
Mask Operations
Prompt Structure & Token Theory — Structure · Tokens · Score Tags
Performance Profiles · Bad Combos
Troubleshooting (Conditioning/Precision/Latents)
Part 4 — Quality Pipeline
Pipeline Integration — Where Each Node Fits
Complete Workflow Overview
Stage 1: Generation — Sampler Setup
Model Adaptations: Flux · Z-Image (Base/Turbo) · Qwen-Image (Base/Distilled)
Stage 2: Upscale · Stage 3: Refine
Stage 4: Fix (Face/Region)
Stage 5: Final Upscale · Stage 6: Polish & Save
Preset Configurations · Troubleshooting (Sampling/Pipeline)
Part 1 — The Node Ecosystem
RES4LYF has 370+ nodes. This part covers the node ecosystem — all the nodes you connect to go from empty latent to final saved image. Part 2 covers the sampling algorithms and schedulers in depth; Part 4 puts it all together into a complete pipeline.
The Complete Node Map
RES4LYF has 370+ nodes. For a quality image pipeline you only need ~15-20. Here's the hierarchy:
┌─────────────────────────────────────────────────────────────┐
│ SAMPLERS │
│ ClownsharKSampler ← the all-in-one workhorse │
│ SharkSampler ← initialization/pipeline control │
│ BongSampler ← dead-simple, just works │
│ ClownSampler ← returns SAMPLER object (for chaining) │
└──────────────────────────┬──────────────────────────────────┘
│ takes OPTIONS input
┌──────────────────────────▼──────────────────────────────────┐
│ OPTION NODES │
│ ClownOptions_SDE ← noise type + eta │
│ ClownOptions_StepSize ← overshoot (sharpness) │
│ ClownOptions_DetailBoost ← detail enhancement engine │
│ ClownOptions_SigmaScaling ← lying sigma (the secret sauce) │
│ ClownOptions_Momentum ← convergence acceleration │
│ ClownOptions_ImplicitSteps ← iterative polish per step │
│ ClownOptions_Cycles ← unsample→resample loops │
│ ClownOptions_Tile ← tiled sampling for large images │
│ ClownOptions_SwapSampler ← switch algorithm mid-run │
│ SharkOptions ← init noise + denoise_alt │
│ ClownOptions_Combine ← merge multiple OPTIONS dicts │
└──────────────────────────┬──────────────────────────────────┘
│ takes GUIDES input
┌──────────────────────────▼──────────────────────────────────┐
│ GUIDE NODES │
│ ClownGuide_Mean ← latent-space reference guide │
│ ClownGuide_FrequencySep ← high/low frequency control │
│ ClownGuide_Style ← style transfer (AdaIN/WCT) │
│ ClownGuides_Sync ← dual masked/unmasked guides │
│ ClownGuides_Sync_Advanced ← + drift/lure (experimental) │
└──────────────────────────┬──────────────────────────────────┘
│ takes SIGMAS input
┌──────────────────────────▼──────────────────────────────────┐
│ SIGMA NODES (81 total) │
│ Schedulers → generate initial sigma schedule │
│ sigmas_mult → scale all sigmas up/down │
│ sigmas_rescale → remap to new range │
│ sigmas_interpolate → change step count preserving shape │
│ sigmas_concat → join two schedules │
│ sigmas_split → break at a point │
│ sigmas_math1/3 → custom expressions │
│ + 70 more manipulation nodes │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ VAE & LATENT UTILITY NODES │
│ VAEEncodeAdvanced ← deterministic VAE encode (Part 3) │
│ LatentUpscaleWithVAE ← decode→upscale→re-encode (Part 3) │
│ VAEStyleTransferLatent ← latent-space style xfer (Part 3) │
│ EmptyLatentImage64 ← fp64 empty latent (Part 3) │
│ EmptyLatentImageCustom ← channels/precision ctrl (Part 3) │
│ Set Precision Universal ← cast cond+sigma+latent (Part 3) │
│ LatentMatchChannelwise ← fix color shifts (Part 3) │
│ LatentNoised ← controlled noise inject (Part 3) │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ CONDITIONING NODES │
│ ClownRegionalConditioning_ABC ← regional prompts (Part 2) │
│ ConditioningSetTimestepRange ← step scheduling (Part 2) │
│ ConditioningAverage ← blend 2 prompts (Part 2) │
│ ConditioningOrthoCollin ← surgical blending (Part 2) │
│ ConditioningMultiply ← scale strength (Part 2) │
│ ConditioningTruncate ← SD3.5 fix (Part 2) │
│ CLIPTextEncodeFluxUnguided ← Flux dual-tower (Part 2) │
│ Conditioning Recast FP64 ← precision cast (Part 2) │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ MASK NODES │
│ MaskEdge ← smart edge detection (Part 3) │
│ GrowMaskWithBlur ← ComfyUI standard (Part 4) │
│ Mask to Segs / BBox ← detection pipeline (Part 4) │
└─────────────────────────────────────────────────────────────┘
Connection pattern: ClownsharKSampler has auto-growing options input slots — connect each option node directly to the sampler (one per slot).
SharkOptions → Sampler (options slot 1)
ClownOptions_SDE → Sampler (options slot 2)
ClownOptions_DetailBoost → Sampler (options slot 3)
ClownOptions_SigmaScaling → Sampler (options slot 4)
ClownOptions_ImplicitSteps → Sampler (options slot 5)
Add more option nodes and new slots appear automatically.
Sampler Nodes — Which One to Use
ClownsharKSampler (Recommended)
The all-in-one workhorse. Takes model, conditioning, latent, sigmas, options, guides — does everything in one node.
eta — Default: 0.5 · Range: -100 to 100 · What It Does: Noise injection per step (see Eta in Part 2)
sampler_name — Default: res_2m · Range: see list · What It Does: Algorithm (see Sampler Families in Part 2)
scheduler — Default: beta57 · Range: see list · What It Does: Sigma schedule shape
steps — Default: 30 · Range: 1-10000 · What It Does: Total denoising steps
steps_to_run — Default: -1 · Range: -1 to 10000 · What It Does: -1 = all steps. Set to N to run only N steps (for chaining passes)
denoise — Default: 1.0 · Range: -10000 to 10000 · What It Does: Sigma range to use (1.0 = full, 0.5 = half)
cfg — Default: 5.5 · Range: -100 to 100 · What It Does: Guidance scale. Negative = channelwise CFG
seed — Default: 0 · Range: 0-max · What It Does: Random seed
sampler_mode — Default: standard · Range: standard/unsample/resample · What It Does: Direction of sampling
bongmath — Default: True · Range: bool · What It Does: Enable implicit step calculation + sigma manipulation
Returns: (output_latent, denoised_latent, options_dict)
sampler_mode explained:
standard — normal forward denoising (noise → image). Use for generation.
unsample — reverse (image → noise). Used as first half of a refinement cycle.
resample — unsample then sample in one pass. Shortcut for refinement.
BongSampler (Simple Alternative)
If you don't want option nodes at all. Locked to RES samplers, beta57, minimal controls.
sampler_name — Default: res_2s_sde · What It Does: Only RES variants available
scheduler — Default: beta57 · What It Does: Only RES4LYF schedulers
steps — Default: 30 · What It Does: Total steps
cfg — Default: 5.5 · What It Does: Guidance scale
denoise — Default: 1.0 · What It Does: Sigma range
Use when: You want something that just works with zero configuration.
SharkSampler (Pipeline Control)
The pipeline-oriented sampler. Controls initialization separately from the sampling itself.
noise_type_init — Default: gaussian · What It Does: Type of noise to start from
noise_stdev — Default: 1.0 · What It Does: Initial noise strength
denoise_alt — Default: 1.0 · What It Does: Alternative denoise (scales noise differently than sigma-slice)
Use when: You need separate init noise control (e.g., structured noise for coherent variation).
Option Nodes — Full Breakdown
ClownOptions_SDE — Noise Control
Controls the stochastic noise injected during SDE sampling. This is separate from eta — eta scales the amount, this controls the character.
noise_type_sde — Default: gaussian · What It Does: Noise distribution for main steps
noise_type_sde_substep — Default: gaussian · What It Does: Noise distribution for sub-steps
noise_mode_sde — Default: hard · What It Does: When noise is strongest (see Noise Modes)
noise_mode_sde_substep — Default: hard · What It Does: Sub-step noise timing
eta — Default: 0.5 · What It Does: Main step noise amount
eta_substep — Default: 0.5 · What It Does: Sub-step noise amount
seed — Default: -1 · What It Does: Noise seed (-1 = use sampler seed)
Main vs substep: Multi-stage samplers (_2s, _3s etc.) have internal sub-steps. You can control noise separately for the main step and sub-steps. For most cases, keep them the same.
Quality tip: gaussian + hard + eta 0.5 is the safe default. Try brownian for smoother results, blue or laplacian for sharper detail.
ClownOptions_StepSize — Overshoot Control
Controls how aggressively the sampler steps through the sigma schedule.
overshoot_mode — Default: hard · What It Does: When overshoot is strongest
overshoot_mode_substep — Default: hard · What It Does: Sub-step overshoot timing
overshoot — Default: 0.0 · What It Does: Positive = sharper/grittier. Negative = soften/smoother.
overshoot_substep — Default: 0.0 · What It Does: Sub-step overshoot
Quality tip: Leave at 0.0 for generation. Try 0.05-0.15 positive overshoot for sharpening in a refinement pass, or -0.05 to -0.1 for smoothing skin.
ClownOptions_DetailBoost — The Detail Engine
This is one of the most powerful quality nodes. It detects where the model or sampler underestimates noise and injects additional detail there.
weight — Default: 1.0 · Range: -100 to 100 · What It Does: Positive = sharper/grittier. Negative = soften/deepen colors. This is the main knob.
method — Default: model · Range: 6 options · What It Does: What component to boost (see below)
mode — Default: hard · Range: noise modes · What It Does: When the boost is strongest in the schedule
eta — Default: 0.5 · Range: -100 to 100 · What It Does: Strength multiplier for the noise mode curve
start_step — Default: 3 · Range: 0-10000 · What It Does: First step to apply boost
end_step — Default: 10 · Range: -1 to 10000 · What It Does: Last step (-1 = all remaining)
Method options explained:
model — What It Boosts: The model's prediction error · Best For: Default — enhances overall detail and texture
model_alpha — What It Boosts: Model prediction in alpha channel · Best For: Models with transparency
sampler — What It Boosts: The sampler's integration error · Best For: Corrects sampling artifacts
sampler_normal — What It Boosts: Sampler error, normalized · Best For: More controlled version of sampler
sampler_substep — What It Boosts: Sub-step integration error · Best For: For multi-stage samplers
sampler_substep_normal — What It Boosts: Normalized sub-step error · Best For: Controlled sub-step correction
Quality tips:
Generation:
weight=0.2-0.5,method=model,mode=hard,start_step=3,end_step=10— subtle enhancement across middle steps where detail formsRefinement pass:
weight=0.5-1.5,method=model,mode=sinusoidal,start_step=0,end_step=-1— stronger boost throughoutSkin/portrait:
weight=-0.1 to -0.3,method=model,mode=lorentzian— negative weight softens skinTexture/fabric:
weight=1.0-2.0,method=model,mode=hard— aggressive detail
ClownOptions_SigmaScaling — Lying Sigma (The Secret Sauce)
This is the technique that makes RES4LYF special. It lies to the model about where it is in the denoising process, tricking it into producing more or less detail.
s_noise — Default: 1.0 · Range: -10000 to 10000 · What It Does: Scales SDE noise. 1.03-1.07 = moderate detail/texture boost
s_noise_substep — Default: 1.0 · Range: same · What It Does: Sub-step noise scaling
noise_anchor_sde — Default: 1.0 · Range: -100 to 100 · What It Does: 1.0 = normal. Lower = grittier/more detailed. 0.0 = maximum grit
lying — Default: 1.0 · Range: -10000 to 10000 · What It Does: Downscales sigma → model thinks it's further along → produces sharper detail. 0.89-0.98 = sweet spot
lying_inv — Default: 1.0 · Range: -10000 to 10000 · What It Does: Upscales sigma → compensates for color desaturation from lying. Match to lying (e.g., lying=0.89, lying_inv=1.05-1.10)
lying_start_step — Default: 0 · Range: 0-10000 · What It Does: When lying kicks in
lying_inv_start_step — Default: 1 · Range: 0-10000 · What It Does: When lying_inv kicks in (typically 1 step after lying)
How lying works:
The model gets called with
sigma * lying(smaller sigma = "I'm almost done denoising")Model responds with sharper, more detailed predictions than it would at the real sigma
But the actual sampling happens at the real sigma, so the image structure is preserved
Side effect: lying desaturates colors →
lying_invcompensates by upscaling sigma for the inverse step, which restores saturation
Recommended lying pairs:
1.0 — lying_inv: 1.0 · Effect: Off (no lying)
0.95 — lying_inv: 1.03-1.05 · Effect: Subtle detail boost
0.89 — lying_inv: 1.05-1.10 · Effect: Strong detail boost (recommended)
0.80 — lying_inv: 1.10-1.15 · Effect: Very aggressive — may artifact
0.98 — lying_inv: 1.01 · Effect: Barely noticeable — for refinement passes
s_noise vs lying: s_noise adds more noise (variation/texture). lying changes model behavior (sharpness/detail). They stack — use both for maximum effect.
Flux note: Flux is a rectified flow model with a linear noise schedule. The sigma distortion from lying is amplified more than on SDXL's cosine schedule. Values like lying=0.89 that work well on SDXL will produce severe noise artifacts on Flux (leopard-print patterns, texture corruption). Flux starting point: lying=0.97, lying_inv=1.02, s_noise=1.04. These are conservative but effective — they add detail without the artifacts that aggressive SDXL-tuned values cause.
Z-Image / Qwen-Image note: Both are rectified flow models like Flux — same conservative lying values apply. Z-Image uses the same shift=3.0 schedule; Qwen-Image's larger 20B model tolerates slightly more aggression (lying=0.95, lying_inv=1.03). Their distilled variants (ZIT, Qwen-Image Lightning) at 4-10 steps need even lighter values or no lying.
ClownOptions_Momentum — Convergence Speed
Accelerates or decelerates how fast the sampler converges to the final image.
momentum — Default: 0.0 · What It Does: Positive = faster convergence (standard sampling). Negative = faster convergence (unsampling).
Quality tip: Leave at 0.0 unless you're experimenting. Small positive values (0.1-0.3) can help lock in details faster but risk overshooting.
ClownOptions_ImplicitSteps — Iterative Polish
Adds correction iterations after each denoising step. Each step gets re-solved N times.
implicit_type — Default: bongmath · What It Does: Correction algorithm
implicit_type_substeps — Default: bongmath · What It Does: Sub-step correction algorithm
implicit_steps — Default: 0 · What It Does: Number of correction passes per step (0=off, 2-3=quality, 5+=waste)
implicit_substeps — Default: 0 · What It Does: Correction passes for sub-steps
Implicit type options:
bongmath — What It Does: Advanced sigma manipulation + implicit solve · Best For: Default — balances quality and speed
predictor-corrector — What It Does: Classic predict → correct cycle · Best For: Conservative, predictable
rebound — What It Does: Iterates with rebound (bouncing convergence) · Best For: Some models converge better this way
retro-eta — What It Does: Retroactive eta correction · Best For: SDE-heavy workflows needing eta cleanup
Quality tip: Use implicit_steps=2 with bongmath for final renders. Skip for drafts. The difference is subtle but real — cleaner edges, fewer micro-artifacts.
ClownOptions_Cycles — Unsample/Resample Loops
Runs the sampler forward, then backward, then forward again — iteratively refining the image.
cycles — Default: 0.0 · What It Does: Number of unsample→resample cycles (0=off). Internally: cycles × 2 = total passes
eta_decay_scale — Default: 1.0 · What It Does: Multiplies eta after each cycle (helps convergence over iterations)
unsample_eta — Default: 0.5 · What It Does: Eta for the unsample (reverse) pass
unsampler_override — Default: none · What It Does: Use a different sampler for unsample pass
unsample_steps_to_run — Default: -1 · What It Does: Steps for unsample pass (-1=all)
unsample_cfg — Default: 1.0 · What It Does: CFG for unsample pass (usually low)
unsample_bongmath — Default: False · What It Does: Enable bongmath for unsample pass
How it works:
Normal sampling (forward pass) generates the image
Unsample reverses a few steps (adds noise back in a controlled way)
Resample runs forward again from the noisy state
Each cycle refines details without major structural changes
Quality tip: cycles=1 with eta_decay_scale=0.8 gives a nice refinement. More cycles = more polish but slower.
ClownOptions_Tile — Tiled Sampling
For images too large to fit in VRAM. Splits the latent into overlapping tiles, samples each separately, blends back together.
tile_width — Default: 1024 · What It Does: Tile width in pixels
tile_height — Default: 1024 · What It Does: Tile height in pixels
Advanced variant (ClownOptions_Tile_Advanced): Accepts comma-separated or multiline list of width,height pairs, allowing different tile sizes per region.
ClownOptions_SwapSampler — Algorithm Switching
Switch to a different sampler algorithm mid-run when error drops below a threshold.
sampler_name — Default: (default) · What It Does: Algorithm to swap to
swap_below_err — Default: 0.0 · What It Does: Swap when per-step error drops below this
swap_at_step — Default: 30 · What It Does: Hard swap at this step regardless of error
log_err_to_console — Default: False · What It Does: Print error values (for tuning the threshold)
Quality tip: Start with res_3m, swap to euler for the last 5 steps. Or start with euler for broad structure, swap to res_3m for detail. Enable logging first to find good swap thresholds for your model.
SharkOptions — Initialization
Controls the starting noise and alternative denoise behavior.
noise_type_init — Default: gaussian · What It Does: Type of initial noise
s_noise_init — Default: 1.0 · What It Does: Initial noise strength multiplier
denoise_alt — Default: 1.0 · What It Does: Alternative denoise — scales noise differently from sigma-slice
channelwise_cfg — Default: False · What It Does: Apply CFG per channel (can reduce color burn at high CFG)
Quality tip: channelwise_cfg=True at CFG 7+ prevents the washed-out look that high guidance sometimes causes. gaussian init is always safe; try brownian for smoother base images.
Flux note: channelwise_cfg has no effect at cfg=1.0. Leave it False for Flux — it adds unnecessary overhead when there's no guidance amplification to balance.
Z-Image Base / Qwen-Image note: These models use true CFG (3.0-5.0) — set channelwise_cfg=True to prevent color burn and washed-out highlights at higher guidance. For their distilled variants (ZIT, Qwen-Image Lightning) at cfg=1.0, leave False.
ClownOptions_Combine — Merge Options
Merges two OPTIONS dicts into one. Later values override earlier ones for the same key.
Guide Nodes — Latent-Space Steering
Guides give the sampler a reference latent to steer toward. Massively useful for refinement, style transfer, and region-specific enhancement.
ClownGuide_Mean — Basic Guidance
The simplest guide. Points the sampler toward a target latent.
weight — Default: 0.75 · Range: -100 to 100 · What It Does: How strongly to pull toward the guide. 0 = ignored, 1.0 = strong lock
cutoff — Default: 1.0 · Range: 0.0 to 1.0 · What It Does: Disables guide when output ≈ guide (cosine similarity). Higher = guide stays active longer
weight_scheduler — Default: beta57 · Range: schedulers · What It Does: How weight changes over steps
start_step — Default: 0 · Range: 0-10000 · What It Does: First step to apply guide
end_step — Default: 15 · Range: -1 to 10000 · What It Does: Last step (-1 = all)
invert_mask — Default: False · Range: bool · What It Does: Invert the mask region
Takes: guide (LATENT) — the target, mask (optional) — where to apply
Use cases:
Refinement: Guide = original generation, weight = 0.7-0.8. Sampler enhances but won't deviate from source.
Style transfer: Guide = style reference, weight = 0.3-0.5. Picks up style colors/textures.
Face fix: Guide = original, mask = face region only, weight = 0.8. Fixes face without touching background.
ClownGuide_FrequencySeparation — Frequency-Aware Guidance
Separates the guide and target into high/low frequency bands and controls each independently.
method — Default: median · What It Does: Frequency separation method
sigma — Default: 3.0 · What It Does: Gaussian blur radius (for gaussian method)
kernel_size — Default: 8 · What It Does: Median filter size (main control with median method)
inner_kernel_size — Default: 2 · What It Does: Inner filtering
stride — Default: 2 · What It Does: Processing stride
lowpass_weight — Default: 1.0 · What It Does: Low frequency (structure/color). 1.0 = keep as-is. <1 = sharpen, >1 = blur
highpass_weight — Default: 1.0 · What It Does: High frequency (detail/edges). >1 = sharpen, <1 = smooth
Method options: gaussian, gaussian_pw, median, median_pw
Quality tips:
Sharpen detail:
highpass_weight=1.2-1.5,lowpass_weight=0.9Smooth skin:
highpass_weight=0.7-0.8, keeplowpass_weight=1.0Overall enhancement:
lowpass_weight=0.95,highpass_weight=1.1— subtle but clean
ClownGuide_Style — Style Transfer
Transfers the style (color distribution, texture pattern) from a reference to the output.
apply_to — Default: positive · What It Does: Which conditioning gets the style (positive/negative/denoised)
method — Default: WCT · What It Does: Style transfer algorithm
weight — Default: 1.0 · What It Does: Global strength
synweight — Default: 1.0 · What It Does: Strength on opposite conditioning (prevents CFG burn)
weight_scheduler — Default: constant · What It Does: How weight changes over steps
start_step — Default: 0 · What It Does: First step
end_step — Default: -1 · What It Does: Last step
Method options:
AdaIN — What It Does: Adaptive Instance Normalization — matches mean/variance · Quality: Fast, good for mood/atmosphere
WCT — What It Does: Whitening Color Transform — matches full color distribution · Quality: Best quality — fine texture/color control
WCT2 — What It Does: Enhanced WCT · Quality: Similar to WCT with improvements
scattersort — What It Does: Scatter-based optimal transport · Quality: Advanced, different aesthetic
none — What It Does: Disabled · Quality: —
Sigma Nodes — Schedule Manipulation
The sigma schedule is the roadmap for denoising — which noise levels to visit in which order. Manipulating sigmas gives you fine-grained control over the process.
Key Sigma Nodes for Quality
sigmas_interpolate — What It Does: Change step count while preserving schedule shape · When to Use: Want 40-step quality from a 20-step schedule
sigmas_mult — What It Does: Scale all sigmas by a factor · When to Use: Globally increase/decrease noise range
sigmas_rescale — What It Does: Remap to new min/max range · When to Use: Constrain noise to specific range
sigmas_concat — What It Does: Join two sigma sequences · When to Use: Two-phase sampling with different schedules
sigmas_split — What It Does: Split at a point · When to Use: Separate early/late phases
sigmas_pad — What It Does: Add steps at start/end · When to Use: Extend schedule without recalculating
sigmas_cleanup — What It Does: Remove near-zero/duplicate values · When to Use: Fix broken schedules
sigmas_math1/3 — What It Does: Custom expressions (a,b,c,x,y,z,s variables) · When to Use: Advanced custom curves
Custom Schedulers
beta57 — Character: Balanced, optimized for RES · Best For: Default for everything
tan_scheduler — Character: Tangent curve — gentle start, steep middle, gentle end · Best For: More time on fine detail, less on rough structure
linear_quadratic_advanced — Character: Quad curve with manual control points · Best For: When you know exactly what schedule you want
constant_scheduler — Character: Flat sigma (same noise at every step) · Best For: Specialized — refinement at a single noise level
karras — Character: Concentrates steps in high-detail range · Best For: SD/SDXL models, DPM++ samplers
exponential — Character: Smooth exponential spacing · Best For: High step counts (30+)
sgm_uniform — Character: Uniform spacing · Best For: Low-denoise refinement
Model Patch Nodes
RES4LYF includes model-specific patches that optimize attention patterns. These go between your model loader and the sampler.
ReFluxPatcher / Advanced — Target Model: Flux · What It Does: Selectively enable/disable attention blocks
ReWanPatcher / Advanced — Target Model: Wan Video · What It Does: Sliding window attention (saves VRAM for video)
ReChromaPatcher — Target Model: Chroma · What It Does: Architecture-specific optimizations
ReLTXVPatcher — Target Model: LTX Video · What It Does: Video model patches
ReSDPatcher — Target Model: SD 1.5/2.1 · What It Does: Legacy model optimization
ReSD35Patcher — Target Model: SD 3.5 · What It Does: SD3.5-specific patches
FluxOrthoCFGPatcher — Target Model: Flux · What It Does: Orthogonal CFG (reduces artifacts at high guidance)
FluxGuidanceDisable — Target Model: Flux · What It Does: Disable/zero-out specific CLIP guidance
TorchCompileModels — Target Model: Any · What It Does: torch.compile() for speed
Part 2 — Samplers, Schedulers, Noise & Conditioning
Part 1 showed you the nodes. This part explains the algorithms behind them — what each sampler family does, how schedulers shape the noise curve, which combinations actually work, and how conditioning controls what the model sees.
Quick Start Defaults
Sampler: res_2m or res_3m
Scheduler: beta57
Noise: gaussian
Eta: 0.5
Steps: 15-25 (with RES samplers)
"Typically only 20 steps are needed with RES samplers. Far more are needed with Uni-PC
and other common samplers, and they never reach the same level of quality."
— RES4LYF README
Eta — The Randomness Slider
Eta controls how much random noise is injected back after each denoising step. After the model denoises, it adds some noise back before the next step. Eta scales that amount.
0.0 — Effect: Fully deterministic — same seed = identical image · Use Case: Reproducibility, upscale refinement
0.2-0.3 — Effect: Slight variation, very consistent · Use Case: Final renders, low-denoise passes
0.5 — Effect: Balanced creativity + consistency · Use Case: Default — good for most workflows
0.8-1.0 — Effect: High variation, more "creative" · Use Case: Exploration, trying different looks
Negative — Effect: Ultra-smooth, over-conservative · Use Case: Artifact reduction (experimental)
In practice: Lower eta = sharper, more predictable. Higher eta = softer, more varied between seeds. For upscale/refinement passes at low denoise, use 0.0-0.3 to keep the original intact.
What the Sampler Name Suffixes Mean
_2m,_3m,_4m= Multistep — uses history from previous steps to predict the next. Faster, fewer steps needed. The number is how many previous steps it remembers._2s,_3s,_4s= Single-step (multi-stage) — does multiple internal calculations per step but doesn't use history. More flexible with custom schedules. The number is how many internal stages per step.
Practical: _m variants are faster and what you should use by default. _s variants are for fine-tuning or custom sigma schedules.
Sampler Families — What They Actually Do
The Modern Family (Use These)
All of these are exponential integrators — math solvers specifically designed for how diffusion models work. They converge in ~15-25 steps instead of the 35-50 needed by classical methods. Think of them as "native" solvers for AI image generation.
RES — What It Is: Purpose-built for rectified flow models (Flux etc.) · Character: Sharp detail, aggressive · When to Pick It: Default choice — fastest convergence
DPM++ — What It Is: Diffusion Probabilistic Model solver · Character: Safe, middle-ground · When to Pick It: Familiar from other UIs, works on any model
DEIS — What It Is: Diffusion Exponential Integrator Sampler · Character: Sometimes softer output · When to Pick It: When RES feels too sharp/contrasty
ABNORSETT — What It Is: Adams-Bashforth-Norsett (numerical math method) · Character: Conservative, structure-focused · When to Pick It: When RES adds too much fine detail
ETDRK — What It Is: Exponential Time Differencing Runge-Kutta · Character: Very stable, clean output · When to Pick It: Maximum stability — avoids artifacts
Lawson — What It Is: Lawson exponential integrator · Character: Alternative aesthetic · When to Pick It: When you want a different "look" from RES
In short: Try res_3m first. If the output is too sharp → DEIS or ABNORSETT. Too noisy → ETDRK. Otherwise stick with RES.
Specialized RES Variants
These are alternative mathematical formulations within the RES exponential family. The differences are subtle:
cox_matthews — Character: Slightly different stability profile — try if res_Xs has edge artifacts
lie — Character: Lie group integrator — different convergence pattern
krogstad — Character: Krogstad's method — alternative coefficient set
You don't need these unless you're A/B testing for a specific model. They exist for mathematical completeness.
The Classical Family (Slower but Universal)
These are general-purpose ODE solvers from numerical mathematics. They work on any model but need 2-3x more steps than the modern family.
euler — What It Is: Simplest possible — one straight-line estimate per step · Steps Needed: 30-50 · When to Use: Preview/drafts only, too crude for finals
heun — What It Is: Euler but corrects itself (predict → correct) · Steps Needed: 25-35 · When to Use: Better than euler, still simple
midpoint — What It Is: Takes one step, checks the middle, adjusts · Steps Needed: 25-35 · When to Use: Similar to heun
rk4 — What It Is: Classic 4th-order Runge-Kutta — the "gold standard" classical method · Steps Needed: 30-40 · When to Use: Reliable fallback for unknown models
ralston — What It Is: RK4 with optimized coefficients for minimum error · Steps Needed: 30-40 · When to Use: Slightly more accurate than rk4
dormand-prince — What It Is: Adaptive-precision heritage (used in scientific computing) · Steps Needed: 25-35 · When to Use: When you need proven mathematical reliability
bogacki-shampine — What It Is: Built-in error estimation · Steps Needed: 25-35 · When to Use: Self-correcting behavior
ssprk — What It Is: Strong Stability Preserving RK · Steps Needed: 30-40 · When to Use: Avoids overshooting/oscillation
In short: Use rk4_4s as a safe fallback. Use euler only for quick previews.
The DPM++ Variants Explained
dpmpp_2m — What's Different: 2-step multistep — uses previous step's data. Fast, clean.
dpmpp_3m — What's Different: 3-step multistep — smoother than 2m but slightly slower.
dpmpp_2s — What's Different: 2-stage single-step — no history, each step independent.
dpmpp_3s — What's Different: 3-stage single-step.
dpmpp_sde_2s — What's Different: SDE variant — adds stochastic noise (like built-in eta). More variation per run.
ddim — What's Different: Denoising Diffusion Implicit Model — the original method. Deterministic by default. Feels "flat" compared to modern samplers.
The Implicit Family (For Refinement Only)
These are too slow for main sampling but excellent as polish. They solve each step iteratively until it converges — like checking your work multiple times.
Gauss-Legendre — What It Is: Highest precision per computation · When to Use: Final render polish
Radau IIA — What It Is: Best implicit family for stiff problems · When to Use: Difficult models with artifacts
Lobatto — What It Is: Various endpoint handling strategies · When to Use: Specialized edge cases
How to use: Set implicit_steps to 2-3 with one of these as the implicit solver. Your main sampler does the heavy lifting, the implicit solver polishes each step.
Hybrid Samplers
Prediction + correction blend. The sampler predicts (explicit step), then corrects (implicit step) in one go.
pec423 — Character: Predict-Evaluate-Correct, 4 stages
pec433 — Character: Similar with 3-stage correction
Implicit Steps — What They Actually Do
After each main denoising step, the sampler can re-solve that step N more times until the answer converges. Like proofreading a sentence multiple times.
0 (default) — Speed Impact: None · Quality Impact: Normal · Recommended For: Everything — most workflows
2 — Speed Impact: ~+40% slower · Quality Impact: Cleaner edges, fewer artifacts · Recommended For: Final renders worth polishing
3 — Speed Impact: ~+60% slower · Quality Impact: Diminishing returns vs 2 · Recommended For: When 2 still shows artifacts
5+ — Speed Impact: 2x+ slower · Quality Impact: Nearly zero improvement · Recommended For: Don't bother
Only works with implicit-capable samplers (Gauss-Legendre, Radau, Lobatto families, plus diagonally implicit ones).
Noise Scaling Modes — What They Change
These control when during the sampling process noise gets injected. Combined with eta, they shape the character of randomness:
hard — What It Does: Full noise early, drops off fast · Visual Effect: Explores different compositions early → locks in details late. Most variation.
hard_var — What It Does: Hard with variance tracking · Visual Effect: Similar to hard, more controlled
soft — What It Does: Gentle, consistent noise throughout · Visual Effect: Conservative results. Less variation between seeds. Safer.
soft-linear — What It Does: Soft with linear fade · Visual Effect: Smooth transition from some noise to none
softer — What It Does: Even less noise than soft · Visual Effect: Maximum safety. Minimal seed-to-seed variation.
lorentzian — What It Does: Peaked — most noise in middle steps · Visual Effect: Balances early exploration with late refinement
sinusoidal — What It Does: Sine wave — noise oscillates · Visual Effect: Creative oscillation between exploring and refining
exp — What It Does: Exponential curve (can exceed 1.0) · Visual Effect: Extreme variation. Experimental.
eps — What It Does: Epsilon-based scaling · Visual Effect: Technical — matches epsilon prediction models
vpsde — What It Does: Variance-Preserving SDE · Visual Effect: Mathematically "correct" for VP diffusion
er4 — What It Does: ER4-specific scaling · Visual Effect: Specialized
none — What It Does: No noise added between steps · Visual Effect: Fully deterministic ODE (ignores eta)
Practical combos:
hard+eta=0.5→ Good default. Creative early, precise late.soft+eta=0.3→ Safe refinement. Consistent results.none+ any eta → Deterministic regardless of eta setting.
Schedulers
beta57 — What It Does: Custom RES4LYF scheduler (alpha=0.5, beta=0.7). Optimized for RES samplers at 15-25 steps. Use this by default.
karras — What It Does: Karras noise schedule — concentrates steps in the high-detail range. Classic choice for DPM++ variants.
exponential — What It Does: Exponential spacing — good for high step counts (30+)
normal — What It Does: Linear spacing — standard, nothing special
simple — What It Does: Uniform spacing — equal sigma gaps between steps
sgm_uniform — What It Does: Score-based Generative Model uniform — good for low-denoise refinement
+ all other ComfyUI native schedulers
Noise Types — Complete Reference
For Init Noise (noise_type_init) and SDE Noise (noise_type_sde)
Standard — Type: gaussian · Character: Bell curve randomness · When to Use: Default — always safe
Type: gaussian_backwards · Character: Reversed gaussian · When to Use: Experimental denoising
Type: brownian · Character: Random walk (smooth, correlated) · When to Use: Smoother base images
Type: uniform · Character: Flat randomness · When to Use: Even noise coverage
Type: laplacian · Character: Peaked with sharp tails · When to Use: More contrast in noise → sharper detail
Type: studentt · Character: Heavy-tailed gaussian · When to Use: More extreme outliers → dramatic variation
Spatial — Type: perlin · Character: Coherent (neighboring pixels correlated) · When to Use: Structured variation, landscape-like
Type: wavelet · Character: Frequency bands · When to Use: Controlled multi-scale noise
Colored — Type: pink (α=1) · Character: Low frequency emphasis · When to Use: Natural feel, smooth large-area variation
Type: brown (α=2) · Character: Very low frequency · When to Use: Gentle, sweeping changes
Type: white (α=0) · Character: Equal all frequencies · When to Use: Standard unstructured
Type: blue (α=-1) · Character: High frequency emphasis · When to Use: Fine grain, detailed texture
Type: violet (α=-2) · Character: Very high frequency · When to Use: Extremely fine detail
Type: ultraviolet_A/B/C (α=-3/-4/-5) · Character: Extreme high frequency · When to Use: Aggressive fine detail (experimental)
Pyramid — Type: pyramid-bicubic · Character: Multi-scale (bicubic upscale) · When to Use: Natural multi-resolution variation
Type: pyramid-bilinear · Character: Multi-scale (bilinear upscale) · When to Use: Faster pyramid
Type: hires-pyramid-* · Character: High-res pyramid variants · When to Use: Higher quality multi-scale
Noise Modes (noise_mode_sde, mode)
When during the sampling schedule noise is strongest:
hard — Curve Shape: Full early, drops fast · Effect: Default — most variation early, locks in late
hard_var — Curve Shape: Hard + variance preserving · Effect: More mathematically correct version of hard
soft — Curve Shape: Gentle throughout · Effect: Conservative, consistent seeds
soft-linear — Curve Shape: Soft with linear decay · Effect: Smooth fade-out
softer — Curve Shape: Very gentle · Effect: Maximum consistency
lorentzian — Curve Shape: Bell-shaped peak in middle · Effect: Balanced exploration/refinement
sinusoidal — Curve Shape: Wave pattern · Effect: Oscillates between exploring and refining
exp — Curve Shape: Exponential · Effect: Can exceed 1.0 — extreme variation
eps — Curve Shape: Epsilon-based · Effect: Matches epsilon prediction models
vpsde — Curve Shape: Variance preserving SDE · Effect: Mathematically correct for VP diffusion
er4 — Curve Shape: ER4-specific · Effect: Specialized
none — Curve Shape: No scaling · Effect: Fully deterministic (overrides eta)
Standard
gaussian — What It Does: Normal bell-curve randomness. Safe default.
gaussian_backwards — What It Does: Reversed — specialized denoising pattern. Experimental.
brownian — What It Does: Random walk — each sample depends on previous. Smoother than gaussian.
uniform — What It Does: Flat randomness — all values equally likely. Less natural-looking.
laplacian — What It Does: Peaked — most noise near zero with occasional spikes. Sharper detail emphasis.
studentt — What It Does: Heavy-tailed — like gaussian but with more extreme outliers. More dramatic.
perlin — What It Does: Coherent spatial noise — neighboring pixels are correlated. Creates structured variation.
wavelet — What It Does: Frequency-based — noise at specific frequency bands
none — What It Does: No noise
Colored Noise
These affect which frequencies the noise emphasizes:
pink — Character: Emphasizes low frequencies — natural feel, smooth large-scale variation
brown — Character: Even more low-frequency — very smooth, large-area changes
blue — Character: Emphasizes high frequencies — fine-grained, detailed noise
violet — Character: Very high frequency — extremely fine detail emphasis
white — Character: Equal at all frequencies — unstructured randomness
fractal — Character: Customizable frequency balance via alpha parameter
Pyramid Noise
Multi-scale noise that adds variation at multiple resolutions simultaneously. Produces more natural-looking variation than single-scale noise:
pyramid-bicubic/bilinear/nearest— different upscale interpolation for the pyramid layershires-pyramid-*— high-resolution variants
Presets — Copy These
Quick Draft
Sampler: res_2m | Scheduler: beta57 | Steps: 15 | Eta: 0.5 | Noise: gaussian | Mode: hard
Standard Quality (Recommended)
Sampler: res_3m | Scheduler: beta57 | Steps: 20 | Eta: 0.5 | Noise: gaussian | Mode: hard
High Quality Final
Sampler: res_3m | Scheduler: beta57 | Steps: 25 | Eta: 0.3 | Noise: gaussian | Mode: soft | implicit_steps: 2
Deterministic / Upscale Refinement
Sampler: res_2m | Scheduler: beta57 | Steps: 20 | Eta: 0.0 | Noise: gaussian | Mode: hard
Same seed always produces identical output.
Flux (Standard — No CFG)
Sampler: res_3m | Scheduler: beta57 | Steps: 25-30 | CFG: 1.0 | Eta: 0.5 | Noise: gaussian | Mode: lorentzian
DetailBoost method: sampler | SigmaScaling: lying=0.97, lying_inv=1.02, s_noise=1.04 | channelwise_cfg: False
Flux uses guidance distillation, not CFG. No negative prompt. Use conservative sigma
scaling — Flux's linear schedule amplifies lying effects more than SDXL.
Z-Image Base (True CFG)
Sampler: res_3m | Scheduler: beta57 | Steps: 28-50 | CFG: 3.0-5.0 | Eta: 0.5 | Noise: gaussian | Mode: hard
DetailBoost method: sampler | SigmaScaling: lying=0.97, lying_inv=1.02, s_noise=1.04 | channelwise_cfg: True
Z-Image Base uses true CFG with negative prompts. Same rectified flow as Flux (shift=3.0)
but with standard dual-pass guidance — channelwise_cfg=True helps at higher CFG values.
Z-Image Turbo / ZIT (Guidance-Free)
Sampler: res_2m | Scheduler: beta57 | Steps: 8-10 | CFG: 1.0 | Eta: 0.3 | Noise: gaussian | Mode: lorentzian
DetailBoost method: sampler | SigmaScaling: lying=0.98, lying_inv=1.01, s_noise=1.02 | channelwise_cfg: False
Z-Image Turbo is guidance-free (Decoupled-DMD distillation). No negative prompt, no CFG.
Very low step count — keep sigma scaling minimal to avoid artifacts in few-step sampling.
Qwen-Image (True CFG)
Sampler: res_3m | Scheduler: beta57 | Steps: 30-50 | CFG: 4.0 | Eta: 0.5 | Noise: gaussian | Mode: hard
DetailBoost method: model | SigmaScaling: lying=0.95, lying_inv=1.03, s_noise=1.04 | channelwise_cfg: True
Qwen-Image (20B MMDiT) uses true CFG with negative prompts. Larger pixel budget (1328²)
means more detail capacity — lying=0.95 is enough. Uses Qwen2.5-VL as text encoder.
Qwen-Image Distilled / Lightning (No CFG)
Sampler: res_2m | Scheduler: beta57 | Steps: 4-15 | CFG: 1.0 | Eta: 0.3 | Noise: gaussian | Mode: lorentzian
DetailBoost method: sampler | SigmaScaling: lying=0.98, lying_inv=1.01, s_noise=1.02 | channelwise_cfg: False
Distilled/Lightning variants are guidance-free. No negative prompt. Conservative settings
for few-step sampling.
Low-Denoise Refinement Pass
Sampler: res_2m | Scheduler: beta57 | Steps: 15-20 | Eta: 0.0-0.2 | Denoise: 0.2-0.3
SDXL / SD1.5
Sampler: res_2m | Scheduler: karras or beta57 | Steps: 25-40 | Eta: 0.5 | Noise: gaussian
Special Features
Denoise Behavior
Both ComfyUI's KSampler and RES4LYF use sigma-slicing — they compute a larger schedule and take the tail end. All requested steps always execute across the narrower noise range. See Latent Space, Denoise & Upscaling for details.
Implicit Refinement Types
"bongmath" — custom refinement approach
"rebound" — with CFG decay
"retro-eta" — backward eta correction
"predictor-corrector" — classic predict-then-fix
Cycle/Unsampling
Supports inversion workflows:
Rebound CFG decay
Eta decay scaling
Unsample mode for image-to-noise inversion
Conditioning — Control What the Model "Sees"
Conditioning = the prompt embedding tensors the model uses during sampling. RES4LYF gives
you tools to manipulate these embeddings directly — blending, scheduling, restricting to
specific steps, and applying them to different spatial regions.
Conditioning Strategy — When to Split Prompts
Single CLIPTextEncode works universally — it handles SDXL, SD3.5, Flux, DiT, WAN, etc.
Even Flux routes the same text to both CLIP-L and T5-XXL internally. You only need
CLIPTextEncodeFluxUnguided when you want different text per encoder (an optimization, not
a requirement).
When splitting prompts is worth it:
Regional (spatial) — Multi-subject scenes, face vs body vs background. Separate CTE per region into ClownRegionalConditioning_ABC + masks
Timestep (temporal) — Composition vs detail control. CTE "layout" at range 0.0–0.5, CTE "detail" at range 0.5–1.0, then Combine
Dual-tower (Flux only) — Fine-tune CLIP-L vs T5-XXL separately. Use CLIPTextEncodeFluxUnguided with different clip_l / t5xxl text
When splitting prompts is NOT worth it:
Separate CTE per concept into ConditioningCombine — Just concatenates tokens. A single well-written prompt does the same. Only helps if hitting 77-token CLIP-L limit on SDXL
Separate CTE per concept into ConditioningAverage — Averaging embeddings destroys information. You get a blurry middle-ground between concepts
One CTE per "setting" (lighting, mood, etc.) — Overhead for no gain. The model already parses "dramatic lighting, moody atmosphere" from a single prompt
Recommended approach per pipeline stage:
Stage 1 (simple): Single CTE with a well-structured prompt. Good enough for 90% of images
Stage 1 (multi-subject): Regional conditioning with masks — genuinely better for spatial control
Stage 1 (advanced): Timestep scheduling — composition prompt early, detail prompt late
Stage 3 (refine): Regional "sharp detail" for subject, "soft bokeh" for background
Stage 4 (face fix):
ClownRegionalConditioning2with face-specific prompt + mask
In short: Don't split prompts unless you're directing them to different places (regional) or
different times (timestep scheduling). Splitting just to recombine is busywork.
Regional Conditioning — Different Prompts for Different Areas
The most immediately useful conditioning feature. Instead of one prompt for the whole image,
assign different prompts to masked regions.
ClownRegionalConditioning_AB — 2 regions (e.g., subject vs background)
ClownRegionalConditioning_ABC — 3 regions (e.g., face vs body vs background)
ClownRegionalConditioning2 — Simplified 2-region (takes masked/unmasked)
ClownRegionalConditioning3 — Simplified 3-region (auto-computes third mask)
Connection flow:
CLIP Encode "detailed face, sharp eyes" ─→ conditioning_A
CLIP Encode "ornate armor, leather" ─→ conditioning_B
CLIP Encode "castle interior, moody" ─→ conditioning_C
Face mask from detection ─→ mask_A
Body mask (face excluded) ─→ mask_B
Everything else (auto: 1 - A - B) ─→ mask_C (or leave empty for _ABC variant)
ClownRegionalConditioning_ABC
├─ weight: 1.0 (base regional strength)
├─ region_bleed: 0.15 (soft transition at region edges)
├─ region_bleed_start_step: 0
├─ mask_type: "gradient" (smooth blending, not hard cutoff)
├─ edge_width: 0 (no extra edge padding)
└─→ CONDITIONING → sampler positive input
Key parameters:
weight — Default: 1.0 · What it does: How strongly regional conditioning applies (0 = off, 1 = full)
region_bleed — Default: 0.0 · What it does: Soft falloff at region boundaries (0 = hard edge, 0.1-0.2 = smooth)
region_bleed_start_step — Default: 0 · What it does: Which step to start bleed (later = sharper initial separation)
mask_type — Default: "boolean" · What it does: "gradient" = smooth blending. "boolean" = hard on/off
edge_width — Default: 0 · What it does: Extra blur at mask edges (in pixels)
weight_scheduler — Default: "constant" · What it does: Change weight across steps (constant, linear, sqrt, etc.)
start_step / end_step — Default: 0 / -1 · What it does: Step range where regional conditioning is active
mask_type options (for _AB variant):
gradient— smooth blending for both regionsgradient_A/gradient_B— smooth for one, hard for the otherboolean— hard on/off for both regionsboolean_A/boolean_B— hard for one, gradient for the other
For _ABC variant — same options plus gradient_AB, gradient_AC, gradient_BC, boolean_AB, etc.
How it works internally: The regional node creates a callback that runs during sampling,
not at node setup time. It detects your model type (Flux, SDXL, WAN, HiDream) and creates
appropriate attention masks. Both region embeddings get summed, with attention masks gating
which region influences which spatial area.
When to use: Generation stage (Stage 1) when you want spatial prompt control. Also powerful at refinement (Stage 3) — e.g., "sharp detail" for subject, "soft bokeh" for background.
Timestep Scheduling — Different Prompts at Different Steps
ConditioningSetTimestepRange — Restrict a conditioning to only part of the diffusion process.
Example: Composition first, detail later
CLIP Encode "wide landscape, dramatic sky"
→ ConditioningSetTimestepRange: start=0.0, end=0.5
→ "composition prompt" — only active first half
CLIP Encode "highly detailed, sharp focus, professional photography"
→ ConditioningSetTimestepRange: start=0.5, end=1.0
→ "detail prompt" — only active second half
Combine both → sampler positive input
start/endare percentages of total sampling (0.0 = beginning, 1.0 = end)The model builds composition in early steps, adds detail in late steps
This mirrors how diffusion works — early steps = structure, late steps = texture
Conditioning Blend — Smooth Prompt Transitions
ConditioningAverage — Interpolate between two prompt embeddings.
conditioning_to: "photorealistic portrait"
conditioning_from: "oil painting portrait"
conditioning_to_strength: 0.7
→ Result: 70% photorealistic, 30% oil painting influence
strength = 0.0→ 100% fromconditioning_fromstrength = 1.0→ 100% fromconditioning_toHandles mismatched token lengths by zero-padding the shorter one
Blends both the main token embeddings AND the pooled (global) output
ConditioningAverageScheduler — Same blend, but the ratio changes per step.
conditioning_0: "base quality prompt"
conditioning_1: "enhanced detail prompt"
ratio: SIGMAS input (one value per step)
→ At each step, blend ratio comes from the sigma value
→ Early steps (high sigma): more base prompt
→ Late steps (low sigma): more detail prompt
Takes a SIGMAS input (from any sigma generator) as the blend schedule. Each step gets a
different blend ratio. Useful for progressive prompt transitions during sampling.
Conditioning Math — Direct Manipulation
ConditioningMultiply — Scale all prompt embeddings by a number.
multiplier: 1.3 → 30% stronger prompt influence
multiplier: 0.7 → 30% weaker
multiplier: -1.0 → inverted (used for negative conditioning tricks)
Recursively multiplies every tensor in the conditioning structure (embeddings, pooled, etc.).
ConditioningAdd — Add scaled conditioning_2 onto conditioning_1.
conditioning_1: "portrait of a woman"
conditioning_2: "detailed eyes, sharp iris"
multiplier: 0.5 → half-strength addition
→ Result: original prompt + 50% of the "eyes" emphasis
Useful for adding emphasis without re-encoding prompts. Note: modifies conditioning_1 in-place.
Orthogonal-Collinear Decomposition — Surgical Blending
ConditioningOrthoCollin — The most mathematically sophisticated blend node.
Instead of simple interpolation, it decomposes two conditionings into:
Collinear component = what's shared between both prompts (same direction)
Orthogonal component = what's unique to each prompt (perpendicular direction)
conditioning_0: "beautiful woman, detailed face"
conditioning_1: "professional photography, studio lighting"
t5_strength: 1.0
→ 1.0 = favor conditioning_0's direction
→ 0.0 = favor conditioning_1's direction
→ 0.5 = equal blend of directions
clip_strength: 1.0
→ Same control but for the global (pooled) output
When to use: When simple averaging muddles both prompts. OrthoCollin preserves the unique
aspects of each prompt while blending the shared aspects. Best for combining "subject" and
"style" prompts without losing either.
SD3.5-Specific — Truncation Nodes
ConditioningTruncate — Caps positive conditioning at 77 tokens × 4096 dims.
ConditioningZeroAndTruncate — Zeros AND truncates negative conditioning to 154 tokens.
SD3.5M degrades badly if conditioning exceeds these limits. Apply to respective positive/negative
before the sampler:
Positive prompt → ConditioningTruncate → sampler positive
Negative prompt → ConditioningZeroAndTruncate → sampler negative
Only needed for SD3.5M. Flux, SDXL, and other models don't need this.
Flux-Specific — Dual-Tower Encoding
CLIPTextEncodeFluxUnguided — Encode separate prompts for Flux's dual text encoders.
clip_l: "portrait, studio lighting" (CLIP-L — 77 token max, global concepts)
t5xxl: "detailed description of scene..." (T5-XXL — 256+ tokens, fine detail)
Returns:
conditioning → sampler
clip_l_end → token end position (INT)
t5xxl_end → token end position (INT)
Flux uses two text encoders with different strengths. CLIP-L handles global concepts,
T5-XXL handles fine-grained detail. This node lets you independently tune what each
encoder sees.
Style Transfer via Conditioning
StyleModelApplyStyle — Apply visual style from a reference image (Flux Redux).
Load reference image
→ CLIP Vision Encode → clip_vision_output
→ Load Style Model → style_model
StyleModelApplyStyle
├─ conditioning: your text conditioning
├─ style_model: loaded Flux Redux model
├─ clip_vision_output: encoded reference image
└─ strength: 1.0 (declared but currently unused in code)
→ CONDITIONING with style embeddings injected
The style model extracts visual features from the reference image and merges them into
the conditioning's cross-attention layer. The model then generates images with similar
visual style to the reference.
Use at: Generation (Stage 1) or refinement (Stage 3) for style consistency.
Conditioning Precision
Conditioning Recast FP64 — Cast conditioning tensors to float64.
cond_0: your conditioning (required)
cond_1: second conditioning (optional)
→ Both outputs recast to float64 precision
Use before precision-sensitive operations like OrthoCollin decomposition or when chaining
multiple conditioning math operations (multiply → add → average) to prevent floating-point
drift.
Part 3 — Resolution, Latent Space & Prompt Theory
Choose Your Resolution
Every model was trained at a specific pixel budget (total pixels = width × height). Generating outside that budget degrades quality — too high causes duplicated compositions and multi-head artifacts, too low loses detail and looks soft.
The Rule: Change aspect ratio by changing dimensions, but keep total pixels near the training target.
Quick Reference
SD 1.5 — Native: 512×512 · Megapixels: ~0.26 MP · Alignment: ÷8 · Aspect Range: ~1:2 to 2:1
SDXL — Native: 1024×1024 · Megapixels: ~1.05 MP · Alignment: ÷8 (÷64 rec.) · Aspect Range: ~1:2.4 to 2.4:1
SD3.5 Large — Native: 1024×1024 · Megapixels: ~1.0 MP · Alignment: ÷16 · Aspect Range: ~1:2 to 2:1
SD3.5 Medium — Native: 1024×1024 · Megapixels: 0.25–2.0 MP · Alignment: ÷16 · Aspect Range: Wide (multi-res trained)
Flux.1 — Native: 1024×1024 · Megapixels: 0.1–2.0 MP · Alignment: ÷16 · Aspect Range: ~1:2 to 2:1+
Z-Image — Native: 1024×1024 · Megapixels: ~1.05 MP · Alignment: ÷16 · Aspect Range: Flexible (512–2048)
HiDream — Native: 1024×1024 · Megapixels: ~1.05 MP · Alignment: ÷16 · Aspect Range: Similar to Flux
Qwen-Image — Native: 1328×1328 · Megapixels: ~1.54–1.76 MP · Alignment: ÷16 · Aspect Range: 7 fixed buckets
SDXL Bucket Sizes (Training Aspect Ratios)
SDXL was trained on these exact bucket sizes at 64-pixel increments — these are the safest choices:
1024×1024 — Ratio: 1:1 · Megapixels: 1.05
1152×896 / 896×1152 — Ratio: ~4:3 / ~3:4 · Megapixels: 1.03
1216×832 / 832×1216 — Ratio: ~3:2 / ~2:3 · Megapixels: 1.01
1344×768 / 768×1344 — Ratio: ~16:9 / ~9:16 · Megapixels: 1.03
1536×640 / 640×1536 — Ratio: ~21:9 / ~9:21 · Megapixels: 0.98
Flux / Z-Image / SD3.5 / HiDream Aspect Ratios
These transformer-based models use RoPE (rotary positional embeddings) and handle variable resolutions more gracefully. Target ~1.0 MP for best quality:
1024×1024 — Ratio: 1:1 · Megapixels: 1.05
1152×896 / 896×1152 — Ratio: ~4:3 / ~3:4 · Megapixels: 1.03
1344×768 / 768×1344 — Ratio: ~16:9 / ~9:16 · Megapixels: 1.03
1536×640 / 640×1536 — Ratio: ~21:9 / ~9:21 · Megapixels: 0.98
Flux officially supports 0.1–2.0 MP, Z-Image supports 512–2048px per side, and SD3.5 Medium was progressively trained from 256 to 1440px — all three are more resolution-flexible than SDXL.
Qwen-Image Bucket Sizes (2509 / 2512)
Qwen-Image is a 20B MMDiT with its own fixed aspect ratio buckets. It runs at a higher pixel budget than other models (~1.54–1.76 MP). The "2509" and "2512" suffixes are release dates (Sept/Dec 2025), not parameter counts. Uses Qwen2.5-VL as text encoder, 16-channel VAE with 8× compression.
Qwen-Image-2512 fixed the 4:3/3:4 bucket to be cleanly ÷16 (1140→1104). Use these 2512 values:
1328×1328 — Ratio: 1:1 · Megapixels: 1.76
1664×928 / 928×1664 — Ratio: ~16:9 / ~9:16 · Megapixels: 1.54
1472×1104 / 1104×1472 — Ratio: ~4:3 / ~3:4 · Megapixels: 1.63
1584×1056 / 1056×1584 — Ratio: ~3:2 / ~2:3 · Megapixels: 1.67
Why ÷8 vs ÷16?
All models use an 8× VAE (image → latent is 8× smaller per side). Dimensions must be divisible by 8 minimum.
Transformer models (SD3, Flux, Z-Image, HiDream, Qwen-Image) also apply 2×2 patchification on the latent, so pixel dimensions must be divisible by 8 × 2 = 16.
SDXL's training buckets used 64-pixel increments — ÷64 alignment is recommended for optimal bucket matching.
What Happens Outside the Budget
Too large (>1.5× the training MP):
Duplicated compositions, multiple heads/bodies
"Image-within-image" tiling artifacts
Coherent center, degraded edges
Too small (<0.5× the training MP):
Soft/blurry output, loss of fine detail
Oversimplified compositions
The fix for larger output: Generate at native resolution → pixel-space upscale → refine. That's what Stage 2 and 3 are for.
Latent Space, Denoise & Upscaling — How It Actually Works
Understanding what happens under the hood is essential for choosing the right denoise, step count, and upscale strategy in the pipeline stages that follow.
How Denoise Works
Denoise controls how much of the noise schedule is used.
denoise=1.0 → full noise → full denoising (complete generation)
denoise=0.5 → 50% noise added → model refines the other 50%
denoise=0.2 → 20% noise added → model barely touches the image
Under the hood (verified in ComfyUI source — comfy/samplers.py line 1148):
new_steps = int(steps / denoise) # e.g., 30 / 0.2 = 150
sigmas = calculate_sigmas(new_steps) # compute full 150-step sigma schedule
self.sigmas = sigmas[-(steps + 1):] # take last 31 values (low-noise tail)
All requested steps always execute — ComfyUI does NOT skip steps. It computes a larger sigma schedule and slices the tail end, so all 30 steps run across the narrower noise range. RES4LYF uses the same approach.
Why Low Denoise Degrades Quality
The Scheduler Problem: The noise curve is non-linear. Structural/compositional work happens early (high sigma), detail refinement happens later (low sigma). Below ~0.5 denoise, the model skips structural steps entirely.
VAE Round-Trip Loss: Image → VAE Encode → Latent → Add Noise → Denoise → VAE Decode. The VAE is lossy — each encode/decode cycle degrades quality. At low denoise, the model has too few steps to fix these artifacts.
Sigma Mismatch: The model was trained on a specific noise schedule. At very low denoise, starting noise levels may fall in a range where predictions are less accurate.
Steps vs Denoise
Both the standard KSampler and RES4LYF use sigma-slicing — they compute a larger sigma schedule and take only the tail end. All requested steps always execute.
With 30 steps at denoise 0.2:
ComfyUI computes sigmas for 150 total steps (30 / 0.2)
Takes the last 31 sigma values (the low-noise 20% of the schedule)
Runs all 30 steps across that narrower range
Each step covers a small sigma delta within the 20% noise range:
More steps = finer steps within the same noise range → more precise refinement
Fewer steps = coarser steps → faster but less precise
Below ~10 steps at low denoise, quality drops because each step is too coarse
The sweet spot for refinement passes is 15-25 steps at denoise 0.2-0.3
0.2 — Steps: 15 · Sigma Range: Narrow (low noise) · Character: Quick refinement
0.2 — Steps: 25 · Sigma Range: Narrow (low noise) · Character: Precise refinement
0.3 — Steps: 20 · Sigma Range: Moderate · Character: Good balance
0.5 — Steps: 25 · Sigma Range: Wide · Character: Significant rework
Latent-Space Upscaling — Why It Breaks at Low Denoise
Naive latent upscale (bilinear/bicubic/nearest) interpolates between latent vectors, but latent space is NOT spatially smooth like pixel space. The result is an off-manifold latent — a tensor that doesn't match what the model saw during training.
At denoise 0.5+, enough noise is added to push it back on-manifold. At 0.2, those interpolation artifacts survive.
The Fix — Pixel-Space Upscaling:
KSampler (denoise 1.0) → VAE Decode → Upscale Image → VAE Encode → KSampler (denoise 0.2-0.4)
This keeps latents on-manifold because the VAE encoder produces a proper latent from the upscaled image.
Model-Based Latent Upscalers (e.g., LTX) are trained neural networks that understand their specific latent space. They produce valid on-manifold latents but cannot cross models — each model family has a completely different latent space.
Interpolation Methods for Upscaling
Lanczos > Bicubic > Bilinear for sharpness.
Bilinear — Sharpness: Soft · Artifacts: None · Use Case: Fastest, fine if denoise >= 0.5
Bicubic — Sharpness: Moderate · Artifacts: Slight ringing at edges · Use Case: Good balance
Lanczos — Sharpness: Sharpest · Artifacts: Minor ringing possible · Use Case: Best for photo/realistic
For low-denoise refinement passes, use Lanczos — the sampler won't have enough steps to recover blur from bilinear.
Model-based upscalers (RealESRGAN, 4x-UltraSharp, SwinIR) are dramatically better than any interpolation method.
Turbo/Lightning/Distilled Models — Different Rules
Turbo models were distilled to converge in very few steps. Each step does the equivalent of 4-5 standard steps. Too many steps = overshooting.
1.0 — Steps: 6-8
0.5 — Steps: 4-6
0.2-0.3 — Steps: 3-5
This is the opposite of standard models. Match the intended step granularity.
VAE Operations — Encode/Decode Quality
VAEEncodeAdvanced — Precision Encoding
The most important VAE node for quality. Adds deterministic seeding and flexible multi-input
handling.
image_1: your image (optional)
image_2: second image (optional)
mask: mask image (optional)
latent: existing latent (optional, for size reference)
vae: your VAE model
resize_to_input: "image_1" (auto-size all outputs to this input's dimensions)
mask_channel: "red" (which channel to extract as mask)
invert_mask: False
latent_type: "4_channels" (or "16_channels" for Cascade)
width/height: 1024/1024 (only if resize_to_input = "false")
Returns:
latent_1: encoded image_1
latent_2: encoded image_2
mask: extracted mask
empty_latent: matching empty latent
width: actual width used
height: actual height used
Why it matters: Standard VAE encode is non-deterministic — running it twice on the same
image produces slightly different latents. VAEEncodeAdvanced sets torch.manual_seed(42)
before encoding, guaranteeing identical results every run. This matters for reproducible
workflows and consistent latent channel statistics.
Use at: Any stage where you encode an image to latent (upscale → re-encode, mask extraction, img2img input).
LatentUpscaleWithVAE — The Right Way to Upscale Latents
Decode → pixel-space upscale → re-encode. Avoids the problems of pure latent-space interpolation.
latent: your latent
width: target width
height: target height
vae: your VAE model
→ Decodes to image, resizes, re-encodes
→ Preserves state_info metadata (denoised, data_prev_ for video)
Uses deterministic seed (42). Handles video latents (5D tensors) by flattening to batch
dimension, processing, and restacking. Preserves the state_info dictionary that
ClownsharKSampler uses for multi-step state tracking.
Use at: Stage 2 (upscale) as alternative to separate decode → upscale → encode chain. Simpler wiring, same result.
VAEStyleTransferLatent — Latent-Space Style Transfer
Match the visual style of a reference latent onto your generation.
method: "AdaIN" (fast) or "WCT" (high quality)
latent: your generation latent
style_ref: reference image latent (encode your style reference first)
vae: your VAE model
AdaIN (Adaptive Instance Normalization):
Normalizes content latent (subtract mean, divide by std)
Rescales to style reference's mean/std
Fast (~1ms), good for texture/color matching
Can cause color bleeding in complex scenes
WCT (Whitening + Coloring Transform):
Whitening: removes correlation between feature channels
Coloring: applies style reference's covariance structure
Uses eigendecomposition (slower, ~50ms)
Better color preservation, less bleeding
Works through VAE decoder's embedding layer
When to use: After upscale (Stage 2→3) when upscaling changes the color feel. Encode the original (pre-upscale) image as style_ref, then apply to the upscaled latent. WCT is better for faces; AdaIN is fine for landscapes.
Precision & Latent Manipulation — Numerical Quality Control
Precision Casting — Why It Matters
Diffusion models run in fp16 or bf16 for speed, but this loses numerical precision.
For quality-critical steps (final refinement, face fix), higher precision prevents
subtle artifacts like banding, color drift, and texture smearing.
Set Precision — Cast a single latent to fp16/fp32/fp64.
latent → Set Precision (64) → high-precision latent
Set Precision Universal — Cast everything at once (conditioning, sigmas, latent).
cond_pos, cond_neg, sigmas, latent
→ Set Precision Universal (fp64)
→ all outputs in float64
Options: bf16, fp16, fp32, fp64, passthrough
Set Precision Advanced — Returns 5 copies at different precisions simultaneously.
latent → Set Precision Advanced
├─ output_0: passthrough (original dtype)
├─ output_1: global_precision (your choice)
├─ output_2: fp16
├─ output_3: fp32
└─ output_4: fp64
When to use fp64: Final refinement pass, face fix, any operation where you're chaining multiple latent operations (upscale → normalize → match → sample). The accumulated rounding error from fp16/fp32 becomes visible as color banding or texture loss.
When NOT to use fp64: Generation from scratch (Stage 1) — fp16/bf16 is fine. The model weights themselves are fp16, so fp64 latent precision has diminishing returns for the initial generation.
High-Precision Latent Creation
EmptyLatentImage64 — Create blank latents in float64.
width: 1024, height: 1024, batch_size: 1
→ 4-channel latent, 128×128 spatial (8× compression), float64
EmptyLatentImageCustom — Full control over channels, compression, and precision.
channels: "4" or "16" (4 = SD/Flux, 16 = Cascade)
mode: "sdxl" (8×), "cascade_b" (4×), "cascade_c" (custom), "exact" (1×)
precision: "fp16", "fp32", "fp64"
compression: 4-128 (only used in cascade_c mode)
Practical use: Create empty latents with EmptyLatentImage64 when your pipeline chains multiple latent operations before the sampler. The extra precision prevents accumulated rounding errors. For standard generation, the normal EmptyLatentImage (fp32) is fine.
Latent Channel Statistics — Fixing Color Shifts
One of the most underrated quality improvements. After upscaling or latent operations,
channel statistics (mean/standard deviation per channel) can drift, causing color shifts
or washed-out results.
Latent Normalize Channels — Reset channel statistics.
mode: "channels" (per-channel, not global)
operation: "normalize" (zero mean, unit variance)
Other options:
"center" → subtract mean only (preserves variance/contrast)
"standardize" → divide by std only (preserves brightness/color offset)
Latent Match Channelwise — Transfer channel statistics from a reference.
model: your model (used for latent-space preprocessing)
latent_target: the latent to fix
latent_source: reference latent (good colors/statistics)
mask_target: optional mask (only match masked region)
mask_source: optional mask (only sample stats from masked region)
→ Target latent gets source's mean/std per channel
This is AdaIN (Adaptive Instance Normalization) in latent space. Per-channel, it:
Computes target mean/std
Computes source mean/std
Normalizes target to zero-mean unit-variance
Rescales to match source's mean/std
extra_options (text field, regex-parsed):
exclude_channels=0,2— skip specific channelsdisable_process_latent— don't use model's internal latent encoderenable_std/disable_mean— match only variance or only mean
When to use: After upscaling (Stage 2→3 transition) to prevent the color shift that happens when pixel-space upscale → VAE encode doesn't perfectly preserve latent distribution. Feed the original generation's latent as latent_source and the upscaled+re-encoded latent as latent_target.
Latent Get Channel Means — Diagnostic node. Outputs per-channel mean values as SIGMAS.
Use this to inspect whether your latent channels have drifted.
Fourier-Domain Latent Blending — Phase & Magnitude
The most advanced latent operation in RES4LYF. Uses FFT (Fast Fourier Transform) to
decompose latents into:
Phase = spatial structure, edges, composition layout
Magnitude = signal strength, color intensity, contrast
LatentPhaseMagnitude — Blend phase/magnitude from two latents independently.
latent_0_batch: your generation result
latent_1_batch: a reference latent (style reference, previous generation, etc.)
Global power controls:
phase_mix_power: 1.0 (exponent for phase blending)
magnitude_mix_power: 1.0 (exponent for magnitude blending)
Per-channel weights (0 = all from latent_0, 1 = all from latent_1):
phase_luminosity: 0.0 (channel 0 — brightness structure)
phase_cyan_red: 0.0 (channel 1 — color structure)
phase_lime_purple: 0.0 (channel 2 — color structure)
phase_pattern_structure: 0.0 (channel 3 — texture/pattern)
magnitude_luminosity: 0.0
magnitude_cyan_red: 0.0
magnitude_lime_purple: 0.0
magnitude_pattern_structure:0.0
Practical example — keep structure, change colors:
phase_luminosity: 0.0 (keep structure from latent_0)
phase_pattern_structure: 0.0 (keep texture from latent_0)
magnitude_cyan_red: 0.8 (take colors from latent_1)
magnitude_lime_purple: 0.8 (take colors from latent_1)
→ Result: structure of latent_0, color palette of latent_1
Practical example — keep colors, change structure:
phase_luminosity: 0.7 (take layout from latent_1)
phase_pattern_structure: 0.7 (take texture from latent_1)
magnitude_luminosity: 0.0 (keep brightness of latent_0)
magnitude_cyan_red: 0.0 (keep colors from latent_0)
→ Result: composition of latent_1, colors of latent_0
Normalization flags (per input and output):
normal(default True) — Z-score normalize (subtract mean, divide by std)stdize(default True) — Divide by std onlymeancenter(default True) — Subtract mean only
These prevent magnitude scale mismatches between the two latents. Leave all True unless you
have a specific reason.
Critical: Phase/magnitude operations MUST run in float64. Float32 FFT loses >0.1 radians of phase precision, causing visible artifacts. The node converts internally, but feeding fp64 latents avoids unnecessary precision loss at the boundary.
Single-input variants:
LatentPhaseMagnitudeMultiply — Multiply phase/magnitude by channel weights (scale)
LatentPhaseMagnitudeOffset — Add to phase/magnitude (shift hue/structure)
LatentPhaseMagnitudePower — Exponentiate (non-linear compression/expansion)
Noise Injection — Controlled Stochasticity
LatentNoised — Add calibrated noise to a latent with full control.
latent_image: your latent
noise_type: "gaussian", "fractal", "perlin", etc. (same as sampler noise types)
noise_strength: 1.0 (linear scaling, 0 = no noise)
noise_seed: 12345 (reproducible)
normalize: "true" (rescale noise to match latent's mean/std)
noise_is_latent: False (True = treat noise as latent perturbation, not pure additive)
mask: optional (only add noise to masked region)
alpha, k: shape parameters for specific noise types
When normalize=true, noise gets rescaled to match the target latent's statistical
distribution, so a strength of 1.0 adds a meaningful amount of noise regardless of the
latent's actual value range.
When noise_is_latent=true, the noise is combined with the latent and then re-normalized.
This treats the noise as a "latent direction" rather than additive random values.
When to use: Before refinement (Stage 3) to break up over-smooth areas. Small amount of Perlin noise (strength 0.05-0.15) before a low-denoise refine pass adds natural texture variation without changing composition.
LatentNoiseBatch_perlin — Generate spatially-coherent Perlin noise.
seed: 0, width: 1024, height: 1024, batch_size: 1
detail_level: 0.0 (-1.0 to 1.0, scales fractal octaves)
Perlin noise creates smooth, natural-looking patterns (unlike Gaussian which is pure random).
The noise goes through an inverse error function to map it to a Gaussian distribution matching
expected latent statistics. Useful as input to LatentNoised via the latent_noise input.
Mask Operations — Precision Boundaries
MaskEdge — Smart Edge Detection
Extract edge regions from masks with independent internal/external control.
mask: face detection mask
dilation: 20 (edge thickness)
mode: "percent" (relative to mask area) or "absolute" (pixels)
internal: 1.0 (scale internal edge width — inside the mask)
external: 1.0 (scale external edge width — outside the mask)
Creates a ring-shaped mask at the boundary between masked and unmasked areas. Controls:
internal = 1.0, external = 0.0→ edge only inside the mask (shrink)internal = 0.0, external = 1.0→ edge only outside the mask (grow)internal = 0.5, external = 1.5→ thinner inside, wider outside (smoother blend outward)
In "percent" mode, dilation is relative to the mask's area (sqrt of total pixel count).
This auto-scales edge width based on mask size — a small mask gets narrower edges, a large
mask gets wider edges.
When to use: Stage 4 (face fix) to create a feathered boundary for blending fixed regions. Use MaskEdge to create a transition zone, then composite the fixed region through it.
Better than GrowMaskWithBlur for precision work because you can control inside vs outside edge width independently. GrowMaskWithBlur grows uniformly in both directions.
Prompt Structure & Token Theory
Prompt Structure — What Goes Where
Your prompt ordering matters. CLIP-based models (SDXL/SD1.5) front-load attention — early
tokens get disproportionate weight. T5-based models (Flux/SD3.5) read more uniformly but
still benefit from clear structure.
Optimal ordering (most important → least important):
SDXL / Tag-heavy models:
─────────────────────────
1. Subject "1girl, warrior, standing"
2. Subject detail "long silver hair, blue eyes, ornate plate armor"
3. Action / pose "holding sword, looking at viewer"
4. Shot / framing "upper body, from below, dynamic angle"
5. Setting "castle ruins, dramatic sunset sky, volumetric fog"
6. Lighting "rim lighting, golden hour, high contrast"
7. Style "by artgerm, oil painting"
8. Quality tags "masterpiece, best quality, highly detailed"
Flux / Natural language models:
───────────────────────────────
1. Subject + action "A battle-scarred female warrior standing atop castle ruins"
2. Subject detail "with long silver hair and piercing blue eyes, wearing ornate plate armor"
3. Setting + mood "against a dramatic sunset sky with volumetric fog rolling between broken walls"
4. Lighting "lit from behind with golden hour rim lighting"
5. Style (optional) "in the style of a cinematic oil painting with high contrast"
Why this order:
1–2 — What goes there: Subject + details · Why: Highest attention weight — model focuses here most
3–4 — What goes there: Shot + setting · Why: Still strong attention, frames the scene
5–6 — What goes there: Lighting + style · Why: Global modifiers that influence the whole image
Last — What goes there: Quality tags · Why: Work fine with low attention — they're global signals, not spatial
Common mistake — quality tags first: Putting "masterpiece, best quality" at position 1
gives peak attention weight to generic modifiers instead of your subject. The model "hears"
quality tags the loudest and your actual subject description with less emphasis.
Model-specific rules:
Z-Image Base — Quality tags?: Optional, at end · Negative prompt?: Yes — true CFG (3.0-5.0) · Best format: Natural prose sentences
Z-Image Turbo (ZIT) — Quality tags?: Skip · Negative prompt?: Not supported (guidance-free) · Best format: Natural prose sentences
Qwen-Image — Quality tags?: Optional, at end · Negative prompt?: Yes — true CFG (4.0) · Best format: Natural prose sentences
Qwen-Image Distilled — Quality tags?: Skip · Negative prompt?: Not supported (CFG=1.0) · Best format: Natural prose sentences
Flux — Quality tags?: Skip entirely (can hurt) · Negative prompt?: Not supported · Best format: Natural prose sentences
SDXL — Quality tags?: Yes, at end · Negative prompt?: Essential ("worst quality, blurry, deformed") · Best format: Tags, comma-separated
SD3.5 — Quality tags?: Optional, at end · Negative prompt?: Optional but helps · Best format: Prose works better than tags
Pony / Illustrious — Quality tags?: At start — score_9 etc. are primary classifiers · Negative prompt?: Yes · Best format: Score tags first, then subject tags
WAN (video) — Quality tags?: Skip · Negative prompt?: Minimal · Best format: Short, clear prose
Negative prompt template (SDXL):
worst quality, low quality, blurry, deformed, disfigured, extra limbs, bad anatomy,
bad hands, watermark, text, signature, cropped
Token budget awareness:
CLIP-L (SDXL, Flux): 77 tokens per chunk. Attention decays across chunks
T5-XXL (Flux, SD3.5): 256+ tokens with uniform attention — use the space
Qwen3-4B (Z-Image): Single text encoder, no dual CLIP/T5 — natural prose, generous context
Qwen2.5-VL 7B (Qwen-Image): Full VLM as text encoder — rich descriptions, very long context
If hitting limits on SDXL: move style/quality to negative ("NOT low quality") or use
timestep scheduling to split composition vs detail prompts
How Tokens Work — What You're Actually Spending
Tokens are word-pieces, not individual characters. The tokenizer (BPE) splits text into
subword chunks from its vocabulary:
cat — Tokens: cat · Count: 1
warrior — Tokens: warrior · Count: 1
battlefield — Tokens: battle + field · Count: 2
photorealistic — Tokens: photo + real + istic · Count: 3
1girl — Tokens: 1 + girl · Count: 2
, (comma + space) — Tokens: single vocabulary entry · Count: 1
Rules of thumb:
Common English words = 1 token (
dog,red,standing,portrait)Compound / uncommon words = 2–3 tokens (
masterpiece= 2,ultra-detailed= 3)~0.75 words per token on average, or ~4 characters per token
A 77-token CLIP-L chunk holds roughly 50–60 words
Commas cost nothing extra —
,is 1 token. They help CLIP separate concepts cleanly
Real token wasters to avoid:
highly detailed — Why: 2 tokens, detailed alone works · Better alternative: detailed (1 token)
ultra-high-resolution — Why: 4+ tokens for a vague concept · Better alternative: set your resolution properly
8k, 4k, HDR — Why: 3 tokens for buzzwords the model barely understands · Better alternative: drop them
trending on artstation — Why: 4 tokens, meaningless to most models · Better alternative: specific artist name (1–2 tokens)
very very detailed — Why: repeated emphasis burns tokens, no extra effect · Better alternative: say it once
Parenthesis emphasis (detailed:1.3): The parens and colon cost ~2 extra tokens but give
real control over attention weight. Worth it for key concepts — just don't wrap every tag.
Pony / Illustrious Score Tags
These models were trained with quality score tags as primary classifiers. Unlike SDXL quality
tags (which are just weighted concepts), score tags are hard-coded training signals — the
model was explicitly trained to associate them with quality tiers.
The score scale:
score_9 — Meaning: Top tier only (specific tier, no _up)
score_8_up — Meaning: Score 8 and above
score_7_up — Meaning: Score 7 and above
score_6_up — Meaning: Score 6 and above
score_5_up — Meaning: Score 5 and above
score_4_up — Meaning: Score 4 and above (mediocre+)
score_3 / score_2 / score_1 — Meaning: Specific low tiers (no _up variants)
The _up suffix = "this tier and everything above it." Without _up = that specific tier only.
Stacking is emphasis, not redundancy:
score_9, score_8_up, score_7_up
This means: "I want 7+, prefer 8+, really aim for 9." Each tag adds attention weight toward
that tier. It's like saying "good, preferably great, ideally the best."
score_7_up alone is sufficient — it includes 7, 8, and 9. Stacking just biases toward the top.
Positive presets:
score_9, score_8_up, score_7_up — Effect: Strongly biased toward top tier
score_8_up, score_7_up — Effect: Biased toward 8+, baseline 7
score_7_up — Effect: Flat "anything 7+" — simplest, still good
score_9 — Effect: Very top tier only — can be too restrictive
Negative score tags — use bare tags, NOT _up:
Negative: score_5, score_4, score_3, score_2, score_1
This targets specific low tiers. Don't use score_5_up in negative — that means "avoid 5
and above" which conflicts with your positive asking for 7+. The overlapping range confuses
the model.
Don't put score_6 in negative — tier 6 is "decent." Pushing it negative can make output
look artificially perfect. The score_5 cutoff is the sweet spot for most use cases.
Performance Profiles
Draft — Sampler: euler · Steps: 10-15 · Speed: Very Fast · Quality: Low
Balanced — Sampler: res_2m · Steps: 20 · Speed: Fast · Quality: Excellent
Reference — Sampler: rk4_4s · Steps: 35 · Speed: Medium · Quality: Excellent
Precision — Sampler: radau_iia_7s · Steps: 30 · Speed: Slower · Quality: Very High
Luxury — Sampler: res_8s + implicit · Steps: 40 · Speed: Slow · Quality: Maximum
Things That DON'T Combine Well
Linear samplers (euler, rk4) with Flux models → exponential samplers (RES) converge 3x faster on rectified flows
Too many implicit steps (>5) → diminishing returns, wastes compute
eta > 0 with noise_mode = "none" → noise mode overrides eta (no noise added regardless)
Exotic samplers (lobatto_iiid_3s) without understanding → unpredictable results, use RES instead
Very high eta (>1.0) with low denoise → too much noise re-injected into a mostly-clean image
Troubleshooting — Conditioning, Precision & Latents
Colors shift after upscale — Cause: VAE re-encode changes distribution · Fix: Latent Match Channelwise (source = original latent)
Regional prompts bleed into each other — Cause: Hard mask edges · Fix: Increase region_bleed to 0.15-0.2, use mask_type "gradient"
Face fix has visible seam — Cause: Uniform edge blending · Fix: MaskEdge with internal=0.5, external=1.5 for outward-biased blend
Banding in gradients — Cause: fp16 precision loss · Fix: Set Precision Universal fp64 for refinement/fix stages
Style reference doesn't match — Cause: AdaIN too simple · Fix: Use WCT method in VAEStyleTransferLatent
Phase/magnitude blend artifacts — Cause: fp32 FFT precision loss · Fix: Ensure LatentPhaseMagnitude inputs are fp64
SD3.5 quality degrades — Cause: Conditioning too long · Fix: ConditioningTruncate (pos) + ConditioningZeroAndTruncate (neg)
Regional conditioning ignored — Cause: Wrong model detection · Fix: Check model type (Flux/SDXL/WAN) — regional uses model-specific attention masks
VAE encode gives different results each run — Cause: Non-deterministic encode · Fix: Use VAEEncodeAdvanced (seeds torch with 42)
Part 4 — Quality Pipeline: Generation → Upscale → Refine → Fix → Save
Pipeline Integration — Where Each Node Fits
Stage 1: Generation (Enhanced)
[Standard encoding — works with SDXL, SD3.5, DiT, etc.]
CLIPTextEncode
├─ positive: "your detailed scene description, quality tags"
└─ negative: "worst quality, blurry, deformed" (if model uses negative)
→ conditioning
[Flux only — dual text encoder]
CLIPTextEncodeFluxUnguided
├─ clip_l: "global concepts, style keywords"
└─ t5xxl: "detailed scene description with fine nuances"
→ conditioning
[Optional: Regional control for multi-subject scenes]
ClownRegionalConditioning_AB or _ABC
├─ conditioning_A: subject prompt
├─ conditioning_B: background prompt
├─ mask_A: subject mask (from previous generation or manual)
├─ mask_type: "gradient"
└─ region_bleed: 0.15
→ regional conditioning → sampler positive
[Optional: Timestep scheduling]
ConditioningSetTimestepRange (start=0.0, end=0.5) → composition prompt
ConditioningSetTimestepRange (start=0.5, end=1.0) → detail prompt
Combine → sampler positive
[For precision-critical generation]
EmptyLatentImage64 → fp64 empty latent → sampler
Set Precision Universal (fp64) → precision-cast conditioning + sigmas + latent
[For SD3.5M only]
Positive → ConditioningTruncate → sampler positive
Negative → ConditioningZeroAndTruncate → sampler negative
Stage 2→3: Upscale → Refine (Enhanced)
Stage 2 output (upscaled image)
→ VAEEncodeAdvanced (deterministic encode, resize_to_input="image_1")
→ upscaled latent
[Fix color shift from upscale]
Latent Match Channelwise
├─ latent_target: upscaled latent (color-shifted)
├─ latent_source: original latent (correct colors)
└─ model: your model
→ color-corrected upscaled latent
[Optional: Style consistency]
VAEStyleTransferLatent (method="WCT")
├─ latent: color-corrected upscaled latent
├─ style_ref: original generation latent
└─ vae: your VAE
→ style-matched latent
[Optional: Pre-refine texture injection]
LatentNoised
├─ latent_image: style-matched latent
├─ noise_type: "brownian" or "fractal"
├─ noise_strength: 0.05-0.10
└─ normalize: "true"
→ textured latent → Stage 3 sampler
Stage 4: Face/Region Fix (Enhanced)
Detection mask from Stage 4A
→ MaskEdge (dilation=25, mode="percent", internal=0.5, external=1.5)
→ edge_mask (for blend zone)
[Regional conditioning for face fix]
ClownRegionalConditioning2
├─ conditioning_masked: "detailed face, sharp eyes, smooth skin, pores"
├─ conditioning_unmasked: original prompt (or empty)
├─ mask: face detection mask
├─ mask_type: "gradient"
└─ region_bleed: 0.1
→ regional conditioning → face fix sampler positive
[High precision for face fix]
Set Precision Universal (fp64)
→ cast conditioning + latent to fp64
→ face fix sampler
After sampler output:
→ composite using edge_mask for smooth boundary blending
Full Enhanced Pipeline Summary
STAGE 1: Generate
├─ CLIPTextEncode (or CLIPTextEncodeFluxUnguided for Flux)
├─ [Optional] ClownRegionalConditioning_ABC (multi-area prompts)
├─ [Optional] ConditioningSetTimestepRange (step scheduling)
├─ [Optional] StyleModelApplyStyle (reference image style)
├─ EmptyLatentImage64 (fp64 precision)
└─→ ClownsharKSampler → generation latent
STAGE 2: Upscale
├─ VAE Decode → pixel upscale (model or bicubic)
└─ VAEEncodeAdvanced (deterministic re-encode)
STAGE 2→3 BRIDGE: Latent Correction
├─ Latent Match Channelwise (fix color shift from upscale)
├─ [Optional] VAEStyleTransferLatent WCT (style consistency)
└─ [Optional] LatentNoised (texture injection pre-refine)
STAGE 3: Refine
├─ [Optional] ConditioningAverage (blend base + detail prompts)
└─→ ClownsharKSampler (denoise 0.25-0.35)
STAGE 4: Face/Region Fix
├─ Detection → mask
├─ MaskEdge (precision boundary)
├─ ClownRegionalConditioning2 (face-specific prompt)
├─ Set Precision Universal fp64 (precision casting)
├─ InpaintCrop → ClownsharKSampler → InpaintStitch
└─ Composite using edge_mask
STAGE 5-6: Final upscale + save
The Complete Workflow
Stage 1: GENERATE → high-quality base image
Stage 2: UPSCALE → 2x resolution via pixel-space upscale
Stage 3: REFINE → detail enhancement + sampler polish
Stage 4: FIX (face/skin) → targeted region correction
Stage 5: FINAL UPSCALE → optional 2nd upscale
Stage 6: SAVE → output with metadata
Stage 1: Generation — Get the Best Base Image
The base generation determines 80% of final quality. Get this right and the rest is polish.
Sampler Setup
Option nodes (each connects directly to a ClownsharKSampler options slot):
SharkOptions → Sampler options
ClownOptions_SDE → Sampler options
ClownOptions_DetailBoost → Sampler options
ClownOptions_SigmaScaling → Sampler options
SharkOptions:
noise_type_init: gaussian
s_noise_init: 1.0
denoise_alt: 1.0
channelwise_cfg: True ← prevents color burn at higher CFG
ClownOptions_SDE:
noise_type_sde: gaussian
noise_mode_sde: hard
eta: 0.5
ClownOptions_DetailBoost:
weight: 0.3 ← subtle during generation, don't overdo it
method: model
mode: hard
start_step: 3 ← skip first 2 steps (rough structure phase)
end_step: -1 ← apply through the rest
ClownOptions_SigmaScaling:
s_noise: 1.04 ← moderate SDE noise boost
lying: 0.92 ← model produces sharper detail
lying_inv: 1.06 ← compensates color desaturation
lying_start_step: 0
lying_inv_start_step: 1
ClownsharKSampler:
sampler_name: res_3m ← highest quality exponential integrator
scheduler: beta57 ← optimized for RES
steps: 25-30 ← RES needs fewer steps
denoise: 1.0 ← full generation
cfg: 5.5-7.5 ← depends on model
sampler_mode: standard
bongmath: True
Why these values:
res_3muses 3-point history for quadratic extrapolation — best accuracy per steplying=0.92tricks the model into producing ~8% more detail than it normally woulddetail_boost weight=0.3adds subtle enhancement without artifactschannelwise_cfgprevents the washed-out look from guidance
Flux Adaptation — Stage 1
The settings above are tuned for SDXL-style models with traditional CFG. Flux uses a fundamentally different guidance mechanism and requires different settings.
Why Flux is different: Standard models (SDXL, SD1.5) use classifier-free guidance (CFG) —
the sampler runs two forward passes (conditional + unconditional) and amplifies the difference.
Flux uses guidance distillation — the guidance value is baked into the model as a learned
vector input. There is no separate negative/unconditional pass at all. This means:
CFG must be 1.0 — there is no negative conditioning to subtract, so CFG > 1 has no
meaningful effect (and can hurt quality)No negative prompt — Flux has no unconditional path. Leave negative empty or don't
connect itchannelwise_cfgis irrelevant — with CFG=1.0 there's no guidance amplification to
balance per-channel, so it does nothing (or adds overhead)Sigma scaling interacts differently — Flux is a rectified flow model with linear noise
schedule. Aggressivelyingvalues that work on SDXL can produce severe noise artifacts
(leopard-print patterns, texture corruption) on Flux
BFL reference settings (from black-forest-labs/flux sampling.py):
Sampler — FLUX.1-dev: Euler (first-order ODE) · FLUX.1-schnell: Euler
Steps — FLUX.1-dev: 50 · FLUX.1-schnell: 1-4
Guidance — FLUX.1-dev: 3.5 (model-internal vector, not CFG) · FLUX.1-schnell: 0.0
CFG — FLUX.1-dev: 1.0 (no CFG) · FLUX.1-schnell: 1.0
Schedule — FLUX.1-dev: Time-shifted linear (image-size dependent) · FLUX.1-schnell: Time-shifted linear
Negative prompt — FLUX.1-dev: None · FLUX.1-schnell: None
Adapted RES4LYF settings for Flux:
SharkOptions:
noise_type_init: gaussian
s_noise_init: 1.0
denoise_alt: 1.0
channelwise_cfg: False ← no CFG splitting, disable this
ClownOptions_SDE:
noise_type_sde: gaussian
noise_mode_sde: lorentzian ← less aggressive than hard, balances exploration/refinement
eta: 0.5
ClownOptions_DetailBoost:
weight: 0.3
method: sampler ← "sampler underestimates" works better than "model" on Flux
mode: hard
start_step: 3
end_step: -1
ClownOptions_SigmaScaling:
s_noise: 1.04 ← moderate SDE boost (tested, works on Flux)
lying: 0.97 ← conservative — Flux amplifies lying more than SDXL
lying_inv: 1.02 ← compensates lying desaturation
lying_start_step: 0
lying_inv_start_step: 1
ClownsharKSampler:
sampler_name: res_3m ← still best quality (also works: euler for BFL-standard behavior)
scheduler: beta57
steps: 25-30
denoise: 1.0
cfg: 1.0 ← MUST be 1.0 for standard Flux
sampler_mode: standard
bongmath: True
Flux distilled variants (e.g., Flux-dev with guidance distillation fine-tunes) may accept cfg > 1.0 — test carefully. Standard FLUX.1-dev and FLUX.1-schnell must use cfg=1.0.
Sigma scaling on Flux: Flux's linear sigma schedule amplifies lying effects more than SDXL's cosine schedule. The SDXL-tuned values (lying=0.92) will produce visible artifacts (leopard-print patterns). The values above (lying=0.97, lying_inv=1.02, s_noise=1.04) are tested and work well as a starting point. Don't go below lying=0.95 on Flux without checking for noise artifacts.
Z-Image Adaptation — Stage 1
Z-Image is a 6B S3-DiT (Single-Stream DiT) by Alibaba/Tongyi-MAI. It uses Lumina2's
NextDiT backbone in ComfyUI, with flow matching (shift=3.0) and Qwen3-4B as text encoder.
Two variants exist: Base (true CFG) and Turbo/ZIT (guidance-free).
Why Z-Image is different from Flux: Both are rectified flow models, but Z-Image Base uses
true CFG (dual forward pass — conditional + unconditional) while Flux uses guidance
distillation. This means:
Z-Image Base:
cfg=3.0-5.0, negative prompts work and help quality,channelwise_cfg=TrueusefulZ-Image Turbo (ZIT):
cfg=1.0(guidance-free via Decoupled-DMD distillation), no negative, same rules as FluxSigma scaling: Same rectified flow schedule as Flux — use conservative lying values
Text encoder: Qwen3-4B (single encoder, no dual CLIP/T5) — natural prose works best
VAE: 16-channel (same as Flux), ÷16 alignment required
Z-Image Base settings:
SharkOptions:
noise_type_init: gaussian
s_noise_init: 1.0
denoise_alt: 1.0
channelwise_cfg: True ← helps at CFG 3-5
ClownOptions_SDE:
noise_type_sde: gaussian
noise_mode_sde: hard ← standard noise timing (CFG handles guidance)
eta: 0.5
ClownOptions_DetailBoost:
weight: 0.3
method: sampler ← works well on flow models
mode: hard
start_step: 3
end_step: -1
ClownOptions_SigmaScaling:
s_noise: 1.04
lying: 0.97 ← conservative — same rectified flow as Flux
lying_inv: 1.02
lying_start_step: 0
lying_inv_start_step: 1
ClownsharKSampler:
sampler_name: res_3m
scheduler: beta57
steps: 28-50
denoise: 1.0
cfg: 3.0-5.0 ← true CFG, negative prompt recommended
sampler_mode: standard
bongmath: True
Z-Image Turbo (ZIT) settings: Same as Flux adaptation above — cfg=1.0,channelwise_cfg=False, no negative prompt, 8-10 steps, lorentzian noise mode.
Qwen-Image Adaptation — Stage 1
Qwen-Image is a 20B MMDiT — the largest open-source diffusion model. It uses Qwen2.5-VL 7B as text encoder (a full VLM), 16-channel VAE, flow matching (like Flux/SD3), and runs at a higher pixel budget (~1.54-1.76 MP at 1328² native).
Why Qwen-Image is different: Uses true CFG (not guidance distillation) with negative
prompts. The VLM text encoder understands rich natural language better than CLIP+T5. At 20B
parameters it has more capacity but needs more steps and VRAM.
Qwen-Image Base:
cfg=4.0, negative prompts are powerful,channelwise_cfg=TruerecommendedQwen-Image Distilled/Lightning:
cfg=1.0, no negative, 4-15 stepsSigma scaling: Flow matching like Flux — conservative lying values
Resolution: Fixed buckets only (1328², 1664×928, 1472×1104, 1584×1056 + orientations)
Text encoder: Qwen2.5-VL 7B — rich prose descriptions, no token limit anxiety
Qwen-Image Base settings:
SharkOptions:
noise_type_init: gaussian
s_noise_init: 1.0
denoise_alt: 1.0
channelwise_cfg: True ← recommended at CFG 4.0
ClownOptions_SDE:
noise_type_sde: gaussian
noise_mode_sde: hard
eta: 0.5
ClownOptions_DetailBoost:
weight: 0.3
method: model ← 20B model has high capacity, model method works well
mode: hard
start_step: 3
end_step: -1
ClownOptions_SigmaScaling:
s_noise: 1.04
lying: 0.95 ← slightly more room than Flux due to larger model capacity
lying_inv: 1.03
lying_start_step: 0
lying_inv_start_step: 1
ClownsharKSampler:
sampler_name: res_3m
scheduler: beta57
steps: 30-50 ← 20B model benefits from more steps
denoise: 1.0
cfg: 4.0 ← true CFG
sampler_mode: standard
bongmath: True
Qwen-Image Distilled/Lightning: Same pattern as Z-Image Turbo — cfg=1.0, channelwise_cfg=False, no negative, 4-15 steps depending on distillation variant. Use lorentzian noise mode for few-step sampling.
Stage 2: Upscale — Pixel-Space (Not Latent)
Why pixel-space: Latent upscale creates off-manifold latents (see Latent Space, Denoise & Upscaling). Pixel upscale → VAE re-encode is safer and produces cleaner results for the refinement pass.
Workflow:
Generated Latent → VAE Decode → Upscale Image (2x) → VAE Encode → refined latent
Upscale method: Use an upscale model (4x-UltraSharp, RealESRGAN, NMKD, etc.) through ComfyUI's ImageUpscaleWithModel node, or use ImageScale with Lanczos for a simple 2x.
Key: After upscaling in pixel space, VAE-encode the upscaled image back to latent for Stage 3 refinement.
Stage 3: Refine — Low-Denoise Sampler Pass
Take the upscaled latent and run a short sampling pass with low denoise to add detail at the new resolution.
Option Nodes:
ClownOptions_SDE → Sampler options
ClownOptions_DetailBoost → Sampler options
ClownOptions_SigmaScaling → Sampler options
ClownOptions_SDE:
noise_type_sde: gaussian
noise_mode_sde: hard
eta: 0.25 ← lower for refinement (preserve structure)
ClownOptions_DetailBoost:
weight: 0.5-1.0 ← stronger than generation — this is where you add detail
method: model
mode: sinusoidal ← focuses boost on middle steps
start_step: 0
end_step: -1
ClownOptions_SigmaScaling:
s_noise: 1.05
lying: 0.89 ← stronger lying for more detail at higher res
lying_inv: 1.08
ClownsharKSampler (Refinement):
sampler_name: res_2m ← 2m is fine for refinement (faster)
scheduler: beta57
steps: 15-20 ← short pass
denoise: 0.3-0.45 ← low denoise preserves the upscaled content
cfg: 4.5-6.0 ← slightly lower CFG for refinement
sampler_mode: standard
bongmath: True
Why lower denoise: At denoise 0.3-0.45, the sampler only touches the fine detail layer of the sigma schedule. It adds texture and sharpness without changing composition, colors, or structure.
Flux Adaptation — Stage 3
Same principles as Stage 1 Flux Adaptation: cfg=1.0,channelwise_cfg=False, conservative sigma scaling.
Flux-specific refinement changes:
ClownOptions_SDE:
noise_mode_sde: lorentzian ← softer than hard, better for Flux refinement
ClownOptions_DetailBoost:
method: sampler ← "sampler underestimates" consistently better on Flux
ClownOptions_SigmaScaling:
s_noise: 1.04 ← same as Stage 1
lying: 0.97
lying_inv: 1.02
ClownsharKSampler:
cfg: 1.0 ← must be 1.0 for Flux
Z-Image Adaptation — Stage 3
Same principles as Z-Image Stage 1 — true CFG for Base,
guidance-free for Turbo.
Z-Image Base refinement changes:
ClownOptions_SDE:
noise_mode_sde: lorentzian ← softer for refinement, even with true CFG
ClownOptions_DetailBoost:
method: sampler ← reliable on flow models
weight: 0.25 ← slightly lighter for refinement
ClownOptions_SigmaScaling:
s_noise: 1.04
lying: 0.97
lying_inv: 1.02
ClownsharKSampler:
cfg: 3.0-4.0 ← slightly lower than Stage 1 for refinement
steps: 20-30
Z-Image Turbo (ZIT): Same as Flux Stage 3 — cfg=1.0, lorentzian, 6-8 steps.
Qwen-Image Adaptation — Stage 3
Same principles as Qwen-Image Stage 1 — true CFG with
negative prompts. The large model capacity means refinement can be aggressive.
Qwen-Image Base refinement changes:
ClownOptions_SDE:
noise_mode_sde: lorentzian
ClownOptions_DetailBoost:
method: model ← 20B capacity shines in refinement
weight: 0.25
ClownOptions_SigmaScaling:
s_noise: 1.04
lying: 0.95
lying_inv: 1.03
ClownsharKSampler:
cfg: 3.5 ← slightly lower than Stage 1's 4.0
steps: 25-35
Qwen-Image Distilled: Same as Flux Stage 3 — cfg=1.0, lorentzian, 4-10 steps.
Alternative: Tiled Refinement
For very large images (3000+ px), use tiled sampling:
ClownOptions_Tile:
tile_width: 1024
tile_height: 1024
Connect to a ClownsharKSampler options slot. The sampler will process each tile separately and blend them back together.
Stage 4: Fix — Face, Skin, Eyes, Mouth, Teeth
This is the targeted correction stage. You detect regions accurately, create pixel-perfect masks, and re-sample just those areas.
4A: Accurate Detection & Masking
Two approaches — VLM (accurate, slower) and YOLO (fast, pre-trained classes).
Option A: Florence2 VLM Detection (Best Accuracy)
Uses a vision-language model — understands natural language prompts, detects almost anything you can describe.
SmartLML (Florence2) ← vision-language detection
task: object detection
prompt: "face" / "eyes" / "mouth" / "teeth" / "hands"
→ bounding box output
Detection to BBox ← convert detection format to bbox
→ bbox coordinates
LayerMask SAM2 Ultra ← Segment Anything from the bbox
input_image: [refined image from Stage 3]
bbox: [from detection]
→ pixel-accurate mask (not a rough blob — actual contour)
Mask to Segs ← convert mask to segments format
→ SEGS (for detailer workflows)
Flow for each region:
Image → SmartLML Florence2 ("face") → BBox → SAM2 Ultra → face_mask
Image → SmartLML Florence2 ("eyes") → BBox → SAM2 Ultra → eyes_mask
Image → SmartLML Florence2 ("mouth, teeth") → BBox → SAM2 Ultra → mouth_mask
When to use: Complex scenes, unusual angles, non-standard subjects, anything YOLO wasn't trained on.
Option B: YOLO/Ultralytics Detection (Fastest)
Uses pre-trained YOLO models — no VLM needed, runs at 30-50 FPS. From Impact Pack / Impact Subpack.
UltralyticsDetectorProvider ← loads YOLO model
model_name: "face_yolov8m.pt" (or segm variant)
→ BBOX_DETECTOR (and optionally SEGM_DETECTOR)
BboxDetectorForEach ← runs detection on image
bbox_detector: [from provider]
image: [refined image from Stage 3]
threshold: 0.5 ← confidence cutoff (lower = more detections)
dilation: 10 ← expand bbox slightly
→ SEGS (with cropped regions, masks, confidence scores)
SAMDetectorCombined (optional) ← refine bbox masks with SAM2
sam_model: [SAM2 model]
segs: [from bbox detector]
→ refined MASK (pixel-accurate from rough bbox)
Available YOLO models:
face_yolov8m.pt — Detects: Faces only · Speed: Fast · File: bbox/face_yolov8m.pt
face_yolov8m-seg.pt — Detects: Faces + instance mask · Speed: Fast · File: segm/face_yolov8m-seg.pt
person_yolov8m-seg.pt — Detects: Full person + mask · Speed: Fast · File: segm/person_yolov8m-seg.pt
yolov8m.pt — Detects: 80 COCO classes (person, car, etc.) · Speed: Fast · File: bbox/yolov8m.pt
hand_yolov8s.pt — Detects: Hands · Speed: Very fast · File: bbox/hand_yolov8s.pt
Model size variants: n (nano/fastest) → s (small) → m (medium/balanced) → l (large) → x (best accuracy)
Flow for face fix:
UltralyticsDetectorProvider("face_yolov8m.pt")
↓ BBOX_DETECTOR
BboxDetectorForEach(image, threshold=0.5, dilation=10)
↓ SEGS
(optional) SAMDetectorCombined(SAM2, SEGS) → pixel-accurate mask
↓ MASK / SEGS
[continue to Inpaint Crop or SetLatentNoiseMask]
When to use: Batch processing, real-time workflows, standard subjects (faces, hands, people). Much faster than Florence2 — no LLM inference needed.
Comparison
Speed — Florence2 (VLM): ~1-3 sec per detection · YOLO (Ultralytics): ~20-50 ms per detection
Model size — Florence2 (VLM): 1-7 GB · YOLO (Ultralytics): 36-140 MB
Flexibility — Florence2 (VLM): Any text prompt · YOLO (Ultralytics): Fixed pre-trained classes
Accuracy — Florence2 (VLM): Excellent for described objects · YOLO (Ultralytics): Excellent for trained classes
Best for — Florence2 (VLM): Complex/unusual detections · YOLO (Ultralytics): Faces, people, hands (standard)
VRAM — Florence2 (VLM): ~2-4 GB · YOLO (Ultralytics): ~200-500 MB
Requires — Florence2 (VLM): SmartLML node · YOLO (Ultralytics): Impact Pack + Impact Subpack
Recommendation: Use YOLO for faces/hands (it's what it was trained for and it's 50x faster). Use Florence2 for anything YOLO can't detect or fail — specific objects, text regions, clothing items, etc.
Either way — feather the mask
Regardless of detection method, feather before sampling:
If using Inpaint Crop (Section 4E): The crop node handles feathering via mask_blend_pixels.
If using SetLatentNoiseMask directly (Section 4B): Feather with GrowMaskWithBlur:
mask: face_mask
grow_amount: 10-20 px ← slight expansion for context
blur_radius: 25-40 px ← soft feathered edges for seamless blending
4B: Bridge Mask → ClownsharKSampler
The key node is SetLatentNoiseMask (built-in ComfyUI node). It embeds a mask into the latent dict as noise_mask. When the sampler receives this latent, it only denoises the masked region and preserves everything outside.
SetLatentNoiseMask
samples: [VAE-encoded upscaled image from Stage 3]
mask: [feathered face_mask from SAM2 + GrowMaskWithBlur]
→ masked_latent (LATENT with noise_mask embedded)
Then feed masked_latent directly into ClownsharKSampler as the latent_image input. The sampler will:
Only add noise to the masked region
Only denoise the masked region
Preserve everything outside the mask untouched
Blend at mask edges based on the feathering
Full node chain for face fix:
[Refined Image] → VAE Encode → SetLatentNoiseMask(mask=face_mask) → masked_latent
↓
ClownOptions_SDE → ClownOptions_DetailBoost → ClownsharKSampler(latent_image=masked_latent)
↓
VAE Decode → fixed image
4C: Sampler Settings for Face Fix
ClownOptions_SDE:
noise_type_sde: gaussian
noise_mode_sde: soft ← softer noise for face refinement
eta: 0.2-0.3 ← low for preservation
ClownOptions_DetailBoost (face):
weight: 0.3-0.5 ← moderate — don't over-sharpen skin
method: model
mode: lorentzian ← peaked in middle steps, gentle start/end
start_step: 0
end_step: -1
ClownOptions_SigmaScaling (face):
lying: 0.95 ← gentle lying for faces (0.89 is too aggressive for skin)
lying_inv: 1.03
s_noise: 1.02 ← minimal extra noise
ClownsharKSampler (face fix):
sampler_name: res_2m
scheduler: beta57
steps: 15-20
denoise: 0.35-0.5 ← enough to fix issues, not enough to regenerate
cfg: 5.0-6.5
sampler_mode: standard
bongmath: True
prompt: "detailed face, perfect skin, sharp eyes, symmetrical features, natural skin texture"
negative: "blurry, distorted, asymmetrical, plastic skin, uncanny valley"
4D: Optional — Guided Face Fix
Add a ClownGuide_Mean to steer the face refinement toward the original:
ClownGuide_Mean:
guide: [original latent from Stage 2 — before any face changes]
weight: 0.7-0.8 ← strong guidance keeps structure
cutoff: 1.0
start_step: 0
end_step: -1
And/or ClownGuide_FrequencySeparation for skin smoothing:
method: median
kernel_size: 8
highpass_weight: 0.8 ← slightly reduce high-freq detail = smoother skin
lowpass_weight: 1.0 ← keep color/structure intact
Connect guides to the guides input on ClownsharKSampler.
4E: Better Alternative — Inpaint Crop & Stitch + ClownsharKSampler
The Inpaint Crop & Stitch node pack (comfyui-inpaint-cropandstitch) is purpose-built for this exact workflow. It handles the crop, context padding, and seamless blending automatically.
Two nodes:
Inpaint Crop — takes full image + mask → outputs cropped region (with padding/context), cropped mask (pre-feathered), and a
STITCHERdict that remembers how to put it backInpaint Stitch — takes the processed crop + stitcher dict → seamlessly composites back into the full image
Full face-fix flow:
[Full Image] + [SAM2 face_mask]
↓
┌──────────────────────────────────┐
│ Inpaint Crop Improved │
│ context_extend_factor: 1.5 │ ← 50% padding around face for context
│ output_target_width: 768 │ ← resize crop to model-friendly size
│ output_target_height: 768 │
│ mask_blend_pixels: 32 │ ← auto-feathering for seamless edges
│ mask_fill_holes: True │
│ device_mode: GPU │ ← instant, ~5ms
└──────┬───────────┬───────────┬───┘
↓ ↓ ↓
cropped_image cropped_mask stitcher
(768×768) (feathered) (metadata dict)
↓ ↓
VAE Encode SetLatentNoiseMask
└─────┬─────┘
↓
masked_latent
↓
┌──────────────────────────────────┐
│ ClownsharKSampler │
│ sampler=res_2m, steps=15-20 │
│ denoise=0.35-0.45 │ ← light refinement
│ cfg=5.0-6.0 │
│ + options chain as below │
└──────────┬───────────────────────┘
↓
VAE Decode
↓
processed_crop (768×768)
↓
┌──────────────────────────────────┐
│ Inpaint Stitch Improved │
│ stitcher: [from crop] │
│ inpainted_image: [from decode] │
└──────────┬───────────────────────┘
↓
Final Image (original size, face refined, rest untouched, seamless blend)
Why this is better than SetLatentNoiseMask alone:
The face fills the entire crop → model works at maximum effective resolution for the face
Context padding gives the model surrounding pixels for coherent edge generation
Auto-feathered mask eliminates manual GrowMaskWithBlur tuning
Stitch handles all the coordinate math, resize-back, and blending automatically
Works identically for faces in any position/size in the image
Inpaint Crop key parameters:
context_extend_factor — Recommended: 1.3-1.5 · What It Does: How much padding around the mask (1.5 = 50% extra on each side)
output_target_width/height — Recommended: 768 or 1024 · What It Does: Resize crop to model-native resolution
mask_blend_pixels — Recommended: 28-40 · What It Does: Gaussian blur radius for edge feathering
mask_expand_pixels — Recommended: 5-10 · What It Does: Dilate mask before cropping (catch edge pixels)
mask_fill_holes — Recommended: True · What It Does: Fill small gaps in the SAM2 mask
output_padding — Recommended: "32" or "64" · What It Does: Pad to multiple (latent alignment)
device_mode — Recommended: GPU · What It Does: 30-100x faster than CPU
4F: Complete Detection → Crop → Fix → Stitch Pipeline
Combining all the pieces — the full per-region fix chain:
[Refined Image from Stage 3]
↓
SmartLML Florence2 (task: detect, prompt: "face")
↓
Detection to BBox
↓
LayerMask SAM2 Ultra → pixel-accurate face_mask
↓
Inpaint Crop Improved (context=1.5, target=768×768, blend=32)
↓
cropped_image + cropped_mask + stitcher
↓
VAE Encode → SetLatentNoiseMask(mask=cropped_mask)
↓
[Options: SDE(soft, eta=0.25) → DetailBoost(weight=0.4, lorentzian) → SigmaScaling(lying=0.95)]
↓
ClownsharKSampler (res_2m, 15 steps, denoise=0.4, cfg=5.5)
↓
VAE Decode → Inpaint Stitch Improved
↓
[Fixed Image — face refined, everything else untouched]
Repeat for each region with different prompts:
1 — Florence2 Prompt: "face" · Crop Target: 768×768 · Denoise: 0.40 · Detail Boost: 0.4 (lorentzian) · Lying: 0.95
2 — Florence2 Prompt: "eyes" · Crop Target: 512×512 · Denoise: 0.30 · Detail Boost: 0.6 (hard) · Lying: 0.92
3 — Florence2 Prompt: "mouth, teeth" · Crop Target: 512×512 · Denoise: 0.30 · Detail Boost: 0.3 (soft) · Lying: 0.95
4 — Florence2 Prompt: "hands" (if needed) · Crop Target: 768×768 · Denoise: 0.40 · Detail Boost: 0.3 (hard) · Lying: 0.95
Stage 5: Final Upscale (Optional)
If you need a second resolution bump:
Option A: Pixel-Space Upscaler Model
Use ImageUpscaleWithModel with 4x-UltraSharp or RealESRGAN-x4plus for a clean 2x or 4x boost. No re-sampling needed — this is a pure image upscale.
Option B: SUPIR (If Available)
restoration_scale: 1.0-1.5 ← light artifact removal
cfg_scale: 3.0-4.0 ← moderate guidance
steps: 30-45
color_fix_type: Wavelet ← preserves original colors best
use_tiled_vae: True ← saves VRAM
control_scale: 0.5-0.8 ← how much restoration to apply
Use SUPIR as a polish pass, not an aggressive upscaler. Conservative settings give the best results.
Option C: Skip
If Stage 3 refinement already got you to target resolution, skip this entirely. Less processing = fewer artifacts.
Stage 6: Final Polish & Save
Sharpening
KJNodes Adaptive USM:
blur_sigma: 2.5
strength: 0.4-0.6 ← keep under 1.0 to avoid artifacts
threshold: 5 ← only sharpen above noise floor
Save
Standard SaveImage node with PNG/JPEG output.
Preset Configurations — Copy These
Quick Quality (RES4LYF Only, ~2 min)
GENERATE:
sampler=res_3m, scheduler=beta57, steps=25, cfg=5.5, denoise=1.0
lying=0.92, lying_inv=1.06, detail_boost weight=0.3
No upscale/refine, just save.
Balanced Pipeline (~5 min)
GENERATE:
sampler=res_3m, scheduler=beta57, steps=25, cfg=6.0, denoise=1.0
lying=0.92, lying_inv=1.06, detail_boost weight=0.3, mode=hard
UPSCALE:
pixel-space 2x (Lanczos or upscale model)
REFINE:
sampler=res_2m, steps=15, denoise=0.35, cfg=5.0
lying=0.89, lying_inv=1.08, detail_boost weight=0.5, mode=sinusoidal
SAVE: with Adaptive USM strength=0.5
Maximum Quality Pipeline (~15 min)
GENERATE:
sampler=res_3m, scheduler=beta57, steps=30, cfg=6.5, denoise=1.0
lying=0.92, lying_inv=1.06, detail_boost weight=0.3
implicit_steps=2, implicit_type=bongmath
channelwise_cfg=True
UPSCALE:
pixel-space 2x (4x-UltraSharp or RealESRGAN)
REFINE:
sampler=res_2m, steps=20, denoise=0.4, cfg=5.0
lying=0.89, lying_inv=1.08, detail_boost weight=0.8, mode=sinusoidal
eta=0.2
FACE FIX:
Florence2 "face" → BBox → SAM2 Ultra → face_mask
Inpaint Crop (context=1.5, target=768, blend=32)
VAE Encode → SetLatentNoiseMask(cropped_mask)
guide=original, guide_weight=0.8
sampler=res_2m, steps=15, denoise=0.4, cfg=5.5
lying=0.95, lying_inv=1.03, noise_mode=soft
VAE Decode → Inpaint Stitch back to full image
separate eye/mouth passes (same Florence2 → SAM2 → Crop → Sample → Stitch pipeline)
FINAL:
optional 2nd upscale (SUPIR light or upscale model)
Adaptive USM strength=0.5
Save PNG
Portrait Focus (~10 min)
GENERATE:
sampler=res_3m, steps=30, cfg=6.0
lying=0.95, lying_inv=1.03 ← softer lying for portraits
detail_boost weight=0.15 ← subtle for skin
UPSCALE:
pixel-space 2x
FACE REFINE (entire face):
Florence2 "face" → SAM2 Ultra → Inpaint Crop (768, context=1.5)
VAE Encode → SetLatentNoiseMask → ClownsharKSampler
guide_weight=0.8, denoise=0.4, steps=20
freq_sep: highpass_weight=0.75 ← smooth skin
lying=0.95, noise_mode=soft
VAE Decode → Inpaint Stitch
EYE DETAIL:
Florence2 "eyes" → SAM2 Ultra → Inpaint Crop (512, context=1.3)
denoise=0.35, detail_boost weight=0.5 ← sharpen eyes
lying=0.92 ← slightly stronger for eye detail
Inpaint Stitch back
SAVE: Adaptive USM strength=0.3 (light for portraits)
Troubleshooting — Sampling & Pipeline
Over-sharpened / crunchy — Cause: lying too low, detail_boost too high · Fix: Raise lying to 0.95+, reduce weight to 0.2
Washed-out colors — Cause: High CFG without compensation · Fix: Enable channelwise_cfg, or lower CFG
Color desaturation with lying — Cause: lying_inv too low · Fix: Increase lying_inv (if lying=0.89, try lying_inv=1.10)
Face still looks bad after fix — Cause: Denoise too low, or mask too small · Fix: Increase denoise to 0.45-0.5, grow mask by 20+ px
Seam at mask boundary — Cause: Mask not feathered enough · Fix: GrowMaskWithBlur: blur_radius=35+
Generation too smooth — Cause: No detail boost, no lying · Fix: Add detail_boost weight=0.3, lying=0.92
Generation too noisy/artifacts — Cause: Too much noise injection · Fix: Lower s_noise to 1.0, eta to 0.3, lying to 0.95
Refinement changes composition — Cause: Denoise too high · Fix: Lower to 0.25-0.35 for refinement
Tiled sampling shows grid — Cause: Tiles too small · Fix: Increase tile_width/height to 1024+


