Sign In

RES4LYF (Deep Dive)

2

Apr 4, 2026

(Updated: 19 days ago)

generation guide
RES4LYF (Deep Dive)

RES4LYF — Complete Reference Guide

Updated & Reordered

I built this guide with the help of Claude Opus because — let's be honest — the sheer number of sampler names, scheduler options, and obscure settings in RES4LYF is overwhelming. Every parameter feels like it could make or break your output, but none of them come with a manual. So I dug into the source code, tested combinations, and documented what actually matters. This guide is my personal reference for getting consistently good results without guessing. If it saves you the same confusion it saved me, even better.

Quick Navigation

Part 1 — The Node Ecosystem

  • Complete Node Map

  • Sampler Nodes · Option Nodes

  • Guide Nodes · Sigma Nodes

  • VAE & Latent Utility · Conditioning · Mask Nodes

  • Model Patch Nodes

Part 2 — Samplers, Schedulers, Noise & Conditioning

  • Quick Start Defaults

  • Eta — The Randomness Slider

  • Sampler Name Suffixes · Sampler Families · Implicit Steps

  • Noise Scaling Modes · Noise Types Reference

  • Schedulers · Presets · Special Features

  • Conditioning — Strategy · Regional · Timestep

Part 3 — Resolution, Latent Space & Prompt Theory

  • Choose Your Resolution — Model Budgets · Bucket Sizes · Alignment Rules

  • Latent Space, Denoise & Upscaling — How It Actually Works

  • VAE Operations — Precision Encoding · Upscale · Style Transfer

  • Precision & Latent Manipulation

  • Mask Operations

  • Prompt Structure & Token Theory — Structure · Tokens · Score Tags

  • Performance Profiles · Bad Combos

  • Troubleshooting (Conditioning/Precision/Latents)

Part 4 — Quality Pipeline

  • Pipeline Integration — Where Each Node Fits

  • Complete Workflow Overview

  • Stage 1: Generation — Sampler Setup

  • Model Adaptations: Flux · Z-Image (Base/Turbo) · Qwen-Image (Base/Distilled)

  • Stage 2: Upscale · Stage 3: Refine

  • Stage 4: Fix (Face/Region)

  • Stage 5: Final Upscale · Stage 6: Polish & Save

  • Preset Configurations · Troubleshooting (Sampling/Pipeline)


Part 1 — The Node Ecosystem

RES4LYF has 370+ nodes. This part covers the node ecosystem — all the nodes you connect to go from empty latent to final saved image. Part 2 covers the sampling algorithms and schedulers in depth; Part 4 puts it all together into a complete pipeline.


The Complete Node Map

RES4LYF has 370+ nodes. For a quality image pipeline you only need ~15-20. Here's the hierarchy:

┌─────────────────────────────────────────────────────────────┐
│                        SAMPLERS                              │
│  ClownsharKSampler  ←  the all-in-one workhorse             │
│  SharkSampler       ←  initialization/pipeline control       │
│  BongSampler        ←  dead-simple, just works               │
│  ClownSampler       ←  returns SAMPLER object (for chaining) │
└──────────────────────────┬──────────────────────────────────┘
                           │ takes OPTIONS input
┌──────────────────────────▼──────────────────────────────────┐
│                     OPTION NODES                             │
│  ClownOptions_SDE           ← noise type + eta               │
│  ClownOptions_StepSize      ← overshoot (sharpness)          │
│  ClownOptions_DetailBoost   ← detail enhancement engine      │
│  ClownOptions_SigmaScaling  ← lying sigma (the secret sauce) │
│  ClownOptions_Momentum      ← convergence acceleration       │
│  ClownOptions_ImplicitSteps ← iterative polish per step      │
│  ClownOptions_Cycles        ← unsample→resample loops        │
│  ClownOptions_Tile          ← tiled sampling for large images │
│  ClownOptions_SwapSampler   ← switch algorithm mid-run       │
│  SharkOptions               ← init noise + denoise_alt       │
│  ClownOptions_Combine       ← merge multiple OPTIONS dicts   │
└──────────────────────────┬──────────────────────────────────┘
                           │ takes GUIDES input
┌──────────────────────────▼──────────────────────────────────┐
│                      GUIDE NODES                             │
│  ClownGuide_Mean             ← latent-space reference guide  │
│  ClownGuide_FrequencySep     ← high/low frequency control    │
│  ClownGuide_Style            ← style transfer (AdaIN/WCT)    │
│  ClownGuides_Sync            ← dual masked/unmasked guides   │
│  ClownGuides_Sync_Advanced   ← + drift/lure (experimental)  │
└──────────────────────────┬──────────────────────────────────┘
                           │ takes SIGMAS input
┌──────────────────────────▼──────────────────────────────────┐
│                      SIGMA NODES (81 total)                  │
│  Schedulers      → generate initial sigma schedule           │
│  sigmas_mult     → scale all sigmas up/down                  │
│  sigmas_rescale  → remap to new range                        │
│  sigmas_interpolate → change step count preserving shape     │
│  sigmas_concat   → join two schedules                        │
│  sigmas_split    → break at a point                          │
│  sigmas_math1/3  → custom expressions                        │
│  + 70 more manipulation nodes                                │
└─────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────┐
│                  VAE & LATENT UTILITY NODES                   │
│  VAEEncodeAdvanced       ← deterministic VAE encode (Part 3) │
│  LatentUpscaleWithVAE    ← decode→upscale→re-encode (Part 3) │
│  VAEStyleTransferLatent  ← latent-space style xfer  (Part 3) │
│  EmptyLatentImage64      ← fp64 empty latent        (Part 3) │
│  EmptyLatentImageCustom  ← channels/precision ctrl  (Part 3) │
│  Set Precision Universal ← cast cond+sigma+latent   (Part 3) │
│  LatentMatchChannelwise  ← fix color shifts         (Part 3) │
│  LatentNoised            ← controlled noise inject  (Part 3) │
└─────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────┐
│                    CONDITIONING NODES                         │
│  ClownRegionalConditioning_ABC ← regional prompts   (Part 2) │
│  ConditioningSetTimestepRange  ← step scheduling    (Part 2) │
│  ConditioningAverage           ← blend 2 prompts    (Part 2) │
│  ConditioningOrthoCollin       ← surgical blending  (Part 2) │
│  ConditioningMultiply          ← scale strength     (Part 2) │
│  ConditioningTruncate          ← SD3.5 fix          (Part 2) │
│  CLIPTextEncodeFluxUnguided    ← Flux dual-tower    (Part 2) │
│  Conditioning Recast FP64      ← precision cast     (Part 2) │
└─────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────┐
│                       MASK NODES                             │
│  MaskEdge                ← smart edge detection     (Part 3) │
│  GrowMaskWithBlur        ← ComfyUI standard         (Part 4) │
│  Mask to Segs / BBox     ← detection pipeline       (Part 4) │
└─────────────────────────────────────────────────────────────┘

Connection pattern: ClownsharKSampler has auto-growing options input slots — connect each option node directly to the sampler (one per slot).

SharkOptions          → Sampler (options slot 1)
ClownOptions_SDE      → Sampler (options slot 2)
ClownOptions_DetailBoost → Sampler (options slot 3)
ClownOptions_SigmaScaling → Sampler (options slot 4)
ClownOptions_ImplicitSteps → Sampler (options slot 5)

Add more option nodes and new slots appear automatically.


Sampler Nodes — Which One to Use

ClownsharKSampler (Recommended)

The all-in-one workhorse. Takes model, conditioning, latent, sigmas, options, guides — does everything in one node.

  • etaDefault: 0.5 · Range: -100 to 100 · What It Does: Noise injection per step (see Eta in Part 2)

  • sampler_nameDefault: res_2m · Range: see list · What It Does: Algorithm (see Sampler Families in Part 2)

  • schedulerDefault: beta57 · Range: see list · What It Does: Sigma schedule shape

  • stepsDefault: 30 · Range: 1-10000 · What It Does: Total denoising steps

  • steps_to_runDefault: -1 · Range: -1 to 10000 · What It Does: -1 = all steps. Set to N to run only N steps (for chaining passes)

  • denoiseDefault: 1.0 · Range: -10000 to 10000 · What It Does: Sigma range to use (1.0 = full, 0.5 = half)

  • cfgDefault: 5.5 · Range: -100 to 100 · What It Does: Guidance scale. Negative = channelwise CFG

  • seedDefault: 0 · Range: 0-max · What It Does: Random seed

  • sampler_modeDefault: standard · Range: standard/unsample/resample · What It Does: Direction of sampling

  • bongmathDefault: True · Range: bool · What It Does: Enable implicit step calculation + sigma manipulation

Returns: (output_latent, denoised_latent, options_dict)

sampler_mode explained:

  • standard — normal forward denoising (noise → image). Use for generation.

  • unsample — reverse (image → noise). Used as first half of a refinement cycle.

  • resample — unsample then sample in one pass. Shortcut for refinement.

BongSampler (Simple Alternative)

If you don't want option nodes at all. Locked to RES samplers, beta57, minimal controls.

  • sampler_nameDefault: res_2s_sde · What It Does: Only RES variants available

  • schedulerDefault: beta57 · What It Does: Only RES4LYF schedulers

  • stepsDefault: 30 · What It Does: Total steps

  • cfgDefault: 5.5 · What It Does: Guidance scale

  • denoiseDefault: 1.0 · What It Does: Sigma range

Use when: You want something that just works with zero configuration.

SharkSampler (Pipeline Control)

The pipeline-oriented sampler. Controls initialization separately from the sampling itself.

  • noise_type_initDefault: gaussian · What It Does: Type of noise to start from

  • noise_stdevDefault: 1.0 · What It Does: Initial noise strength

  • denoise_altDefault: 1.0 · What It Does: Alternative denoise (scales noise differently than sigma-slice)

Use when: You need separate init noise control (e.g., structured noise for coherent variation).


Option Nodes — Full Breakdown

ClownOptions_SDE — Noise Control

Controls the stochastic noise injected during SDE sampling. This is separate from eta — eta scales the amount, this controls the character.

  • noise_type_sdeDefault: gaussian · What It Does: Noise distribution for main steps

  • noise_type_sde_substepDefault: gaussian · What It Does: Noise distribution for sub-steps

  • noise_mode_sdeDefault: hard · What It Does: When noise is strongest (see Noise Modes)

  • noise_mode_sde_substepDefault: hard · What It Does: Sub-step noise timing

  • etaDefault: 0.5 · What It Does: Main step noise amount

  • eta_substepDefault: 0.5 · What It Does: Sub-step noise amount

  • seedDefault: -1 · What It Does: Noise seed (-1 = use sampler seed)

Main vs substep: Multi-stage samplers (_2s, _3s etc.) have internal sub-steps. You can control noise separately for the main step and sub-steps. For most cases, keep them the same.

Quality tip: gaussian + hard + eta 0.5 is the safe default. Try brownian for smoother results, blue or laplacian for sharper detail.

ClownOptions_StepSize — Overshoot Control

Controls how aggressively the sampler steps through the sigma schedule.

  • overshoot_modeDefault: hard · What It Does: When overshoot is strongest

  • overshoot_mode_substepDefault: hard · What It Does: Sub-step overshoot timing

  • overshootDefault: 0.0 · What It Does: Positive = sharper/grittier. Negative = soften/smoother.

  • overshoot_substepDefault: 0.0 · What It Does: Sub-step overshoot

Quality tip: Leave at 0.0 for generation. Try 0.05-0.15 positive overshoot for sharpening in a refinement pass, or -0.05 to -0.1 for smoothing skin.

ClownOptions_DetailBoost — The Detail Engine

This is one of the most powerful quality nodes. It detects where the model or sampler underestimates noise and injects additional detail there.

  • weightDefault: 1.0 · Range: -100 to 100 · What It Does: Positive = sharper/grittier. Negative = soften/deepen colors. This is the main knob.

  • methodDefault: model · Range: 6 options · What It Does: What component to boost (see below)

  • modeDefault: hard · Range: noise modes · What It Does: When the boost is strongest in the schedule

  • etaDefault: 0.5 · Range: -100 to 100 · What It Does: Strength multiplier for the noise mode curve

  • start_stepDefault: 3 · Range: 0-10000 · What It Does: First step to apply boost

  • end_stepDefault: 10 · Range: -1 to 10000 · What It Does: Last step (-1 = all remaining)

Method options explained:

  • modelWhat It Boosts: The model's prediction error · Best For: Default — enhances overall detail and texture

  • model_alphaWhat It Boosts: Model prediction in alpha channel · Best For: Models with transparency

  • samplerWhat It Boosts: The sampler's integration error · Best For: Corrects sampling artifacts

  • sampler_normalWhat It Boosts: Sampler error, normalized · Best For: More controlled version of sampler

  • sampler_substepWhat It Boosts: Sub-step integration error · Best For: For multi-stage samplers

  • sampler_substep_normalWhat It Boosts: Normalized sub-step error · Best For: Controlled sub-step correction

Quality tips:

  • Generation: weight=0.2-0.5, method=model, mode=hard, start_step=3, end_step=10 — subtle enhancement across middle steps where detail forms

  • Refinement pass: weight=0.5-1.5, method=model, mode=sinusoidal, start_step=0, end_step=-1 — stronger boost throughout

  • Skin/portrait: weight=-0.1 to -0.3, method=model, mode=lorentzian — negative weight softens skin

  • Texture/fabric: weight=1.0-2.0, method=model, mode=hard — aggressive detail

ClownOptions_SigmaScaling — Lying Sigma (The Secret Sauce)

This is the technique that makes RES4LYF special. It lies to the model about where it is in the denoising process, tricking it into producing more or less detail.

  • s_noiseDefault: 1.0 · Range: -10000 to 10000 · What It Does: Scales SDE noise. 1.03-1.07 = moderate detail/texture boost

  • s_noise_substepDefault: 1.0 · Range: same · What It Does: Sub-step noise scaling

  • noise_anchor_sdeDefault: 1.0 · Range: -100 to 100 · What It Does: 1.0 = normal. Lower = grittier/more detailed. 0.0 = maximum grit

  • lyingDefault: 1.0 · Range: -10000 to 10000 · What It Does: Downscales sigma → model thinks it's further along → produces sharper detail. 0.89-0.98 = sweet spot

  • lying_invDefault: 1.0 · Range: -10000 to 10000 · What It Does: Upscales sigma → compensates for color desaturation from lying. Match to lying (e.g., lying=0.89, lying_inv=1.05-1.10)

  • lying_start_stepDefault: 0 · Range: 0-10000 · What It Does: When lying kicks in

  • lying_inv_start_stepDefault: 1 · Range: 0-10000 · What It Does: When lying_inv kicks in (typically 1 step after lying)

How lying works:

  1. The model gets called with sigma * lying (smaller sigma = "I'm almost done denoising")

  2. Model responds with sharper, more detailed predictions than it would at the real sigma

  3. But the actual sampling happens at the real sigma, so the image structure is preserved

  4. Side effect: lying desaturates colors → lying_inv compensates by upscaling sigma for the inverse step, which restores saturation

Recommended lying pairs:

  • 1.0lying_inv: 1.0 · Effect: Off (no lying)

  • 0.95lying_inv: 1.03-1.05 · Effect: Subtle detail boost

  • 0.89lying_inv: 1.05-1.10 · Effect: Strong detail boost (recommended)

  • 0.80lying_inv: 1.10-1.15 · Effect: Very aggressive — may artifact

  • 0.98lying_inv: 1.01 · Effect: Barely noticeable — for refinement passes

s_noise vs lying: s_noise adds more noise (variation/texture). lying changes model behavior (sharpness/detail). They stack — use both for maximum effect.

Flux note: Flux is a rectified flow model with a linear noise schedule. The sigma distortion from lying is amplified more than on SDXL's cosine schedule. Values like lying=0.89 that work well on SDXL will produce severe noise artifacts on Flux (leopard-print patterns, texture corruption). Flux starting point: lying=0.97, lying_inv=1.02, s_noise=1.04. These are conservative but effective — they add detail without the artifacts that aggressive SDXL-tuned values cause.

Z-Image / Qwen-Image note: Both are rectified flow models like Flux — same conservative lying values apply. Z-Image uses the same shift=3.0 schedule; Qwen-Image's larger 20B model tolerates slightly more aggression (lying=0.95, lying_inv=1.03). Their distilled variants (ZIT, Qwen-Image Lightning) at 4-10 steps need even lighter values or no lying.

ClownOptions_Momentum — Convergence Speed

Accelerates or decelerates how fast the sampler converges to the final image.

  • momentumDefault: 0.0 · What It Does: Positive = faster convergence (standard sampling). Negative = faster convergence (unsampling).

Quality tip: Leave at 0.0 unless you're experimenting. Small positive values (0.1-0.3) can help lock in details faster but risk overshooting.

ClownOptions_ImplicitSteps — Iterative Polish

Adds correction iterations after each denoising step. Each step gets re-solved N times.

  • implicit_typeDefault: bongmath · What It Does: Correction algorithm

  • implicit_type_substepsDefault: bongmath · What It Does: Sub-step correction algorithm

  • implicit_stepsDefault: 0 · What It Does: Number of correction passes per step (0=off, 2-3=quality, 5+=waste)

  • implicit_substepsDefault: 0 · What It Does: Correction passes for sub-steps

Implicit type options:

  • bongmathWhat It Does: Advanced sigma manipulation + implicit solve · Best For: Default — balances quality and speed

  • predictor-correctorWhat It Does: Classic predict → correct cycle · Best For: Conservative, predictable

  • reboundWhat It Does: Iterates with rebound (bouncing convergence) · Best For: Some models converge better this way

  • retro-etaWhat It Does: Retroactive eta correction · Best For: SDE-heavy workflows needing eta cleanup

Quality tip: Use implicit_steps=2 with bongmath for final renders. Skip for drafts. The difference is subtle but real — cleaner edges, fewer micro-artifacts.

ClownOptions_Cycles — Unsample/Resample Loops

Runs the sampler forward, then backward, then forward again — iteratively refining the image.

  • cyclesDefault: 0.0 · What It Does: Number of unsample→resample cycles (0=off). Internally: cycles × 2 = total passes

  • eta_decay_scaleDefault: 1.0 · What It Does: Multiplies eta after each cycle (helps convergence over iterations)

  • unsample_etaDefault: 0.5 · What It Does: Eta for the unsample (reverse) pass

  • unsampler_overrideDefault: none · What It Does: Use a different sampler for unsample pass

  • unsample_steps_to_runDefault: -1 · What It Does: Steps for unsample pass (-1=all)

  • unsample_cfgDefault: 1.0 · What It Does: CFG for unsample pass (usually low)

  • unsample_bongmathDefault: False · What It Does: Enable bongmath for unsample pass

How it works:

  1. Normal sampling (forward pass) generates the image

  2. Unsample reverses a few steps (adds noise back in a controlled way)

  3. Resample runs forward again from the noisy state

  4. Each cycle refines details without major structural changes

Quality tip: cycles=1 with eta_decay_scale=0.8 gives a nice refinement. More cycles = more polish but slower.

ClownOptions_Tile — Tiled Sampling

For images too large to fit in VRAM. Splits the latent into overlapping tiles, samples each separately, blends back together.

  • tile_widthDefault: 1024 · What It Does: Tile width in pixels

  • tile_heightDefault: 1024 · What It Does: Tile height in pixels

Advanced variant (ClownOptions_Tile_Advanced): Accepts comma-separated or multiline list of width,height pairs, allowing different tile sizes per region.

ClownOptions_SwapSampler — Algorithm Switching

Switch to a different sampler algorithm mid-run when error drops below a threshold.

  • sampler_nameDefault: (default) · What It Does: Algorithm to swap to

  • swap_below_errDefault: 0.0 · What It Does: Swap when per-step error drops below this

  • swap_at_stepDefault: 30 · What It Does: Hard swap at this step regardless of error

  • log_err_to_consoleDefault: False · What It Does: Print error values (for tuning the threshold)

Quality tip: Start with res_3m, swap to euler for the last 5 steps. Or start with euler for broad structure, swap to res_3m for detail. Enable logging first to find good swap thresholds for your model.

SharkOptions — Initialization

Controls the starting noise and alternative denoise behavior.

  • noise_type_initDefault: gaussian · What It Does: Type of initial noise

  • s_noise_initDefault: 1.0 · What It Does: Initial noise strength multiplier

  • denoise_altDefault: 1.0 · What It Does: Alternative denoise — scales noise differently from sigma-slice

  • channelwise_cfgDefault: False · What It Does: Apply CFG per channel (can reduce color burn at high CFG)

Quality tip: channelwise_cfg=True at CFG 7+ prevents the washed-out look that high guidance sometimes causes. gaussian init is always safe; try brownian for smoother base images.

Flux note: channelwise_cfg has no effect at cfg=1.0. Leave it False for Flux — it adds unnecessary overhead when there's no guidance amplification to balance.

Z-Image Base / Qwen-Image note: These models use true CFG (3.0-5.0) — set channelwise_cfg=True to prevent color burn and washed-out highlights at higher guidance. For their distilled variants (ZIT, Qwen-Image Lightning) at cfg=1.0, leave False.

ClownOptions_Combine — Merge Options

Merges two OPTIONS dicts into one. Later values override earlier ones for the same key.


Guide Nodes — Latent-Space Steering

Guides give the sampler a reference latent to steer toward. Massively useful for refinement, style transfer, and region-specific enhancement.

ClownGuide_Mean — Basic Guidance

The simplest guide. Points the sampler toward a target latent.

  • weightDefault: 0.75 · Range: -100 to 100 · What It Does: How strongly to pull toward the guide. 0 = ignored, 1.0 = strong lock

  • cutoffDefault: 1.0 · Range: 0.0 to 1.0 · What It Does: Disables guide when output ≈ guide (cosine similarity). Higher = guide stays active longer

  • weight_schedulerDefault: beta57 · Range: schedulers · What It Does: How weight changes over steps

  • start_stepDefault: 0 · Range: 0-10000 · What It Does: First step to apply guide

  • end_stepDefault: 15 · Range: -1 to 10000 · What It Does: Last step (-1 = all)

  • invert_maskDefault: False · Range: bool · What It Does: Invert the mask region

Takes: guide (LATENT) — the target, mask (optional) — where to apply

Use cases:

  • Refinement: Guide = original generation, weight = 0.7-0.8. Sampler enhances but won't deviate from source.

  • Style transfer: Guide = style reference, weight = 0.3-0.5. Picks up style colors/textures.

  • Face fix: Guide = original, mask = face region only, weight = 0.8. Fixes face without touching background.

ClownGuide_FrequencySeparation — Frequency-Aware Guidance

Separates the guide and target into high/low frequency bands and controls each independently.

  • methodDefault: median · What It Does: Frequency separation method

  • sigmaDefault: 3.0 · What It Does: Gaussian blur radius (for gaussian method)

  • kernel_sizeDefault: 8 · What It Does: Median filter size (main control with median method)

  • inner_kernel_sizeDefault: 2 · What It Does: Inner filtering

  • strideDefault: 2 · What It Does: Processing stride

  • lowpass_weightDefault: 1.0 · What It Does: Low frequency (structure/color). 1.0 = keep as-is. <1 = sharpen, >1 = blur

  • highpass_weightDefault: 1.0 · What It Does: High frequency (detail/edges). >1 = sharpen, <1 = smooth

Method options: gaussian, gaussian_pw, median, median_pw

Quality tips:

  • Sharpen detail: highpass_weight=1.2-1.5, lowpass_weight=0.9

  • Smooth skin: highpass_weight=0.7-0.8, keep lowpass_weight=1.0

  • Overall enhancement: lowpass_weight=0.95, highpass_weight=1.1 — subtle but clean

ClownGuide_Style — Style Transfer

Transfers the style (color distribution, texture pattern) from a reference to the output.

  • apply_toDefault: positive · What It Does: Which conditioning gets the style (positive/negative/denoised)

  • methodDefault: WCT · What It Does: Style transfer algorithm

  • weightDefault: 1.0 · What It Does: Global strength

  • synweightDefault: 1.0 · What It Does: Strength on opposite conditioning (prevents CFG burn)

  • weight_schedulerDefault: constant · What It Does: How weight changes over steps

  • start_stepDefault: 0 · What It Does: First step

  • end_stepDefault: -1 · What It Does: Last step

Method options:

  • AdaINWhat It Does: Adaptive Instance Normalization — matches mean/variance · Quality: Fast, good for mood/atmosphere

  • WCTWhat It Does: Whitening Color Transform — matches full color distribution · Quality: Best quality — fine texture/color control

  • WCT2What It Does: Enhanced WCT · Quality: Similar to WCT with improvements

  • scattersortWhat It Does: Scatter-based optimal transport · Quality: Advanced, different aesthetic

  • noneWhat It Does: Disabled · Quality:


Sigma Nodes — Schedule Manipulation

The sigma schedule is the roadmap for denoising — which noise levels to visit in which order. Manipulating sigmas gives you fine-grained control over the process.

Key Sigma Nodes for Quality

  • sigmas_interpolateWhat It Does: Change step count while preserving schedule shape · When to Use: Want 40-step quality from a 20-step schedule

  • sigmas_multWhat It Does: Scale all sigmas by a factor · When to Use: Globally increase/decrease noise range

  • sigmas_rescaleWhat It Does: Remap to new min/max range · When to Use: Constrain noise to specific range

  • sigmas_concatWhat It Does: Join two sigma sequences · When to Use: Two-phase sampling with different schedules

  • sigmas_splitWhat It Does: Split at a point · When to Use: Separate early/late phases

  • sigmas_padWhat It Does: Add steps at start/end · When to Use: Extend schedule without recalculating

  • sigmas_cleanupWhat It Does: Remove near-zero/duplicate values · When to Use: Fix broken schedules

  • sigmas_math1/3What It Does: Custom expressions (a,b,c,x,y,z,s variables) · When to Use: Advanced custom curves

Custom Schedulers

  • beta57Character: Balanced, optimized for RES · Best For: Default for everything

  • tan_schedulerCharacter: Tangent curve — gentle start, steep middle, gentle end · Best For: More time on fine detail, less on rough structure

  • linear_quadratic_advancedCharacter: Quad curve with manual control points · Best For: When you know exactly what schedule you want

  • constant_schedulerCharacter: Flat sigma (same noise at every step) · Best For: Specialized — refinement at a single noise level

  • karrasCharacter: Concentrates steps in high-detail range · Best For: SD/SDXL models, DPM++ samplers

  • exponentialCharacter: Smooth exponential spacing · Best For: High step counts (30+)

  • sgm_uniformCharacter: Uniform spacing · Best For: Low-denoise refinement


Model Patch Nodes

RES4LYF includes model-specific patches that optimize attention patterns. These go between your model loader and the sampler.

  • ReFluxPatcher / AdvancedTarget Model: Flux · What It Does: Selectively enable/disable attention blocks

  • ReWanPatcher / AdvancedTarget Model: Wan Video · What It Does: Sliding window attention (saves VRAM for video)

  • ReChromaPatcherTarget Model: Chroma · What It Does: Architecture-specific optimizations

  • ReLTXVPatcherTarget Model: LTX Video · What It Does: Video model patches

  • ReSDPatcherTarget Model: SD 1.5/2.1 · What It Does: Legacy model optimization

  • ReSD35PatcherTarget Model: SD 3.5 · What It Does: SD3.5-specific patches

  • FluxOrthoCFGPatcherTarget Model: Flux · What It Does: Orthogonal CFG (reduces artifacts at high guidance)

  • FluxGuidanceDisableTarget Model: Flux · What It Does: Disable/zero-out specific CLIP guidance

  • TorchCompileModelsTarget Model: Any · What It Does: torch.compile() for speed


Part 2 — Samplers, Schedulers, Noise & Conditioning

Part 1 showed you the nodes. This part explains the algorithms behind them — what each sampler family does, how schedulers shape the noise curve, which combinations actually work, and how conditioning controls what the model sees.

Quick Start Defaults

Sampler:   res_2m or res_3m
Scheduler: beta57
Noise:     gaussian
Eta:       0.5
Steps:     15-25 (with RES samplers)

"Typically only 20 steps are needed with RES samplers. Far more are needed with Uni-PC
and other common samplers, and they never reach the same level of quality."
— RES4LYF README


Eta — The Randomness Slider

Eta controls how much random noise is injected back after each denoising step. After the model denoises, it adds some noise back before the next step. Eta scales that amount.

  • 0.0Effect: Fully deterministic — same seed = identical image · Use Case: Reproducibility, upscale refinement

  • 0.2-0.3Effect: Slight variation, very consistent · Use Case: Final renders, low-denoise passes

  • 0.5Effect: Balanced creativity + consistency · Use Case: Default — good for most workflows

  • 0.8-1.0Effect: High variation, more "creative" · Use Case: Exploration, trying different looks

  • NegativeEffect: Ultra-smooth, over-conservative · Use Case: Artifact reduction (experimental)

In practice: Lower eta = sharper, more predictable. Higher eta = softer, more varied between seeds. For upscale/refinement passes at low denoise, use 0.0-0.3 to keep the original intact.


What the Sampler Name Suffixes Mean

  • _2m, _3m, _4m = Multistep — uses history from previous steps to predict the next. Faster, fewer steps needed. The number is how many previous steps it remembers.

  • _2s, _3s, _4s = Single-step (multi-stage) — does multiple internal calculations per step but doesn't use history. More flexible with custom schedules. The number is how many internal stages per step.

Practical: _m variants are faster and what you should use by default. _s variants are for fine-tuning or custom sigma schedules.


Sampler Families — What They Actually Do

The Modern Family (Use These)

All of these are exponential integrators — math solvers specifically designed for how diffusion models work. They converge in ~15-25 steps instead of the 35-50 needed by classical methods. Think of them as "native" solvers for AI image generation.

  • RESWhat It Is: Purpose-built for rectified flow models (Flux etc.) · Character: Sharp detail, aggressive · When to Pick It: Default choice — fastest convergence

  • DPM++What It Is: Diffusion Probabilistic Model solver · Character: Safe, middle-ground · When to Pick It: Familiar from other UIs, works on any model

  • DEISWhat It Is: Diffusion Exponential Integrator Sampler · Character: Sometimes softer output · When to Pick It: When RES feels too sharp/contrasty

  • ABNORSETTWhat It Is: Adams-Bashforth-Norsett (numerical math method) · Character: Conservative, structure-focused · When to Pick It: When RES adds too much fine detail

  • ETDRKWhat It Is: Exponential Time Differencing Runge-Kutta · Character: Very stable, clean output · When to Pick It: Maximum stability — avoids artifacts

  • LawsonWhat It Is: Lawson exponential integrator · Character: Alternative aesthetic · When to Pick It: When you want a different "look" from RES

In short: Try res_3m first. If the output is too sharp → DEIS or ABNORSETT. Too noisy → ETDRK. Otherwise stick with RES.

Specialized RES Variants

These are alternative mathematical formulations within the RES exponential family. The differences are subtle:

  • cox_matthewsCharacter: Slightly different stability profile — try if res_Xs has edge artifacts

  • lieCharacter: Lie group integrator — different convergence pattern

  • krogstadCharacter: Krogstad's method — alternative coefficient set

You don't need these unless you're A/B testing for a specific model. They exist for mathematical completeness.

The Classical Family (Slower but Universal)

These are general-purpose ODE solvers from numerical mathematics. They work on any model but need 2-3x more steps than the modern family.

  • eulerWhat It Is: Simplest possible — one straight-line estimate per step · Steps Needed: 30-50 · When to Use: Preview/drafts only, too crude for finals

  • heunWhat It Is: Euler but corrects itself (predict → correct) · Steps Needed: 25-35 · When to Use: Better than euler, still simple

  • midpointWhat It Is: Takes one step, checks the middle, adjusts · Steps Needed: 25-35 · When to Use: Similar to heun

  • rk4What It Is: Classic 4th-order Runge-Kutta — the "gold standard" classical method · Steps Needed: 30-40 · When to Use: Reliable fallback for unknown models

  • ralstonWhat It Is: RK4 with optimized coefficients for minimum error · Steps Needed: 30-40 · When to Use: Slightly more accurate than rk4

  • dormand-princeWhat It Is: Adaptive-precision heritage (used in scientific computing) · Steps Needed: 25-35 · When to Use: When you need proven mathematical reliability

  • bogacki-shampineWhat It Is: Built-in error estimation · Steps Needed: 25-35 · When to Use: Self-correcting behavior

  • ssprkWhat It Is: Strong Stability Preserving RK · Steps Needed: 30-40 · When to Use: Avoids overshooting/oscillation

In short: Use rk4_4s as a safe fallback. Use euler only for quick previews.

The DPM++ Variants Explained

  • dpmpp_2mWhat's Different: 2-step multistep — uses previous step's data. Fast, clean.

  • dpmpp_3mWhat's Different: 3-step multistep — smoother than 2m but slightly slower.

  • dpmpp_2sWhat's Different: 2-stage single-step — no history, each step independent.

  • dpmpp_3sWhat's Different: 3-stage single-step.

  • dpmpp_sde_2sWhat's Different: SDE variant — adds stochastic noise (like built-in eta). More variation per run.

  • ddimWhat's Different: Denoising Diffusion Implicit Model — the original method. Deterministic by default. Feels "flat" compared to modern samplers.

The Implicit Family (For Refinement Only)

These are too slow for main sampling but excellent as polish. They solve each step iteratively until it converges — like checking your work multiple times.

  • Gauss-LegendreWhat It Is: Highest precision per computation · When to Use: Final render polish

  • Radau IIAWhat It Is: Best implicit family for stiff problems · When to Use: Difficult models with artifacts

  • LobattoWhat It Is: Various endpoint handling strategies · When to Use: Specialized edge cases

How to use: Set implicit_steps to 2-3 with one of these as the implicit solver. Your main sampler does the heavy lifting, the implicit solver polishes each step.

Hybrid Samplers

Prediction + correction blend. The sampler predicts (explicit step), then corrects (implicit step) in one go.

  • pec423Character: Predict-Evaluate-Correct, 4 stages

  • pec433Character: Similar with 3-stage correction


Implicit Steps — What They Actually Do

After each main denoising step, the sampler can re-solve that step N more times until the answer converges. Like proofreading a sentence multiple times.

  • 0 (default)Speed Impact: None · Quality Impact: Normal · Recommended For: Everything — most workflows

  • 2Speed Impact: ~+40% slower · Quality Impact: Cleaner edges, fewer artifacts · Recommended For: Final renders worth polishing

  • 3Speed Impact: ~+60% slower · Quality Impact: Diminishing returns vs 2 · Recommended For: When 2 still shows artifacts

  • 5+Speed Impact: 2x+ slower · Quality Impact: Nearly zero improvement · Recommended For: Don't bother

Only works with implicit-capable samplers (Gauss-Legendre, Radau, Lobatto families, plus diagonally implicit ones).


Noise Scaling Modes — What They Change

These control when during the sampling process noise gets injected. Combined with eta, they shape the character of randomness:

  • hardWhat It Does: Full noise early, drops off fast · Visual Effect: Explores different compositions early → locks in details late. Most variation.

  • hard_varWhat It Does: Hard with variance tracking · Visual Effect: Similar to hard, more controlled

  • softWhat It Does: Gentle, consistent noise throughout · Visual Effect: Conservative results. Less variation between seeds. Safer.

  • soft-linearWhat It Does: Soft with linear fade · Visual Effect: Smooth transition from some noise to none

  • softerWhat It Does: Even less noise than soft · Visual Effect: Maximum safety. Minimal seed-to-seed variation.

  • lorentzianWhat It Does: Peaked — most noise in middle steps · Visual Effect: Balances early exploration with late refinement

  • sinusoidalWhat It Does: Sine wave — noise oscillates · Visual Effect: Creative oscillation between exploring and refining

  • expWhat It Does: Exponential curve (can exceed 1.0) · Visual Effect: Extreme variation. Experimental.

  • epsWhat It Does: Epsilon-based scaling · Visual Effect: Technical — matches epsilon prediction models

  • vpsdeWhat It Does: Variance-Preserving SDE · Visual Effect: Mathematically "correct" for VP diffusion

  • er4What It Does: ER4-specific scaling · Visual Effect: Specialized

  • noneWhat It Does: No noise added between steps · Visual Effect: Fully deterministic ODE (ignores eta)

Practical combos:

  • hard + eta=0.5 → Good default. Creative early, precise late.

  • soft + eta=0.3 → Safe refinement. Consistent results.

  • none + any eta → Deterministic regardless of eta setting.


Schedulers

  • beta57What It Does: Custom RES4LYF scheduler (alpha=0.5, beta=0.7). Optimized for RES samplers at 15-25 steps. Use this by default.

  • karrasWhat It Does: Karras noise schedule — concentrates steps in the high-detail range. Classic choice for DPM++ variants.

  • exponentialWhat It Does: Exponential spacing — good for high step counts (30+)

  • normalWhat It Does: Linear spacing — standard, nothing special

  • simpleWhat It Does: Uniform spacing — equal sigma gaps between steps

  • sgm_uniformWhat It Does: Score-based Generative Model uniform — good for low-denoise refinement

  • + all other ComfyUI native schedulers


Noise Types — Complete Reference

For Init Noise (noise_type_init) and SDE Noise (noise_type_sde)

  • StandardType: gaussian · Character: Bell curve randomness · When to Use: Default — always safe

    • Type: gaussian_backwards · Character: Reversed gaussian · When to Use: Experimental denoising

    • Type: brownian · Character: Random walk (smooth, correlated) · When to Use: Smoother base images

    • Type: uniform · Character: Flat randomness · When to Use: Even noise coverage

    • Type: laplacian · Character: Peaked with sharp tails · When to Use: More contrast in noise → sharper detail

    • Type: studentt · Character: Heavy-tailed gaussian · When to Use: More extreme outliers → dramatic variation

  • SpatialType: perlin · Character: Coherent (neighboring pixels correlated) · When to Use: Structured variation, landscape-like

    • Type: wavelet · Character: Frequency bands · When to Use: Controlled multi-scale noise

  • ColoredType: pink (α=1) · Character: Low frequency emphasis · When to Use: Natural feel, smooth large-area variation

    • Type: brown (α=2) · Character: Very low frequency · When to Use: Gentle, sweeping changes

    • Type: white (α=0) · Character: Equal all frequencies · When to Use: Standard unstructured

    • Type: blue (α=-1) · Character: High frequency emphasis · When to Use: Fine grain, detailed texture

    • Type: violet (α=-2) · Character: Very high frequency · When to Use: Extremely fine detail

    • Type: ultraviolet_A/B/C (α=-3/-4/-5) · Character: Extreme high frequency · When to Use: Aggressive fine detail (experimental)

  • PyramidType: pyramid-bicubic · Character: Multi-scale (bicubic upscale) · When to Use: Natural multi-resolution variation

    • Type: pyramid-bilinear · Character: Multi-scale (bilinear upscale) · When to Use: Faster pyramid

    • Type: hires-pyramid-* · Character: High-res pyramid variants · When to Use: Higher quality multi-scale

Noise Modes (noise_mode_sde, mode)

When during the sampling schedule noise is strongest:

  • hardCurve Shape: Full early, drops fast · Effect: Default — most variation early, locks in late

  • hard_varCurve Shape: Hard + variance preserving · Effect: More mathematically correct version of hard

  • softCurve Shape: Gentle throughout · Effect: Conservative, consistent seeds

  • soft-linearCurve Shape: Soft with linear decay · Effect: Smooth fade-out

  • softerCurve Shape: Very gentle · Effect: Maximum consistency

  • lorentzianCurve Shape: Bell-shaped peak in middle · Effect: Balanced exploration/refinement

  • sinusoidalCurve Shape: Wave pattern · Effect: Oscillates between exploring and refining

  • expCurve Shape: Exponential · Effect: Can exceed 1.0 — extreme variation

  • epsCurve Shape: Epsilon-based · Effect: Matches epsilon prediction models

  • vpsdeCurve Shape: Variance preserving SDE · Effect: Mathematically correct for VP diffusion

  • er4Curve Shape: ER4-specific · Effect: Specialized

  • noneCurve Shape: No scaling · Effect: Fully deterministic (overrides eta)

Standard

  • gaussianWhat It Does: Normal bell-curve randomness. Safe default.

  • gaussian_backwardsWhat It Does: Reversed — specialized denoising pattern. Experimental.

  • brownianWhat It Does: Random walk — each sample depends on previous. Smoother than gaussian.

  • uniformWhat It Does: Flat randomness — all values equally likely. Less natural-looking.

  • laplacianWhat It Does: Peaked — most noise near zero with occasional spikes. Sharper detail emphasis.

  • studenttWhat It Does: Heavy-tailed — like gaussian but with more extreme outliers. More dramatic.

  • perlinWhat It Does: Coherent spatial noise — neighboring pixels are correlated. Creates structured variation.

  • waveletWhat It Does: Frequency-based — noise at specific frequency bands

  • noneWhat It Does: No noise

Colored Noise

These affect which frequencies the noise emphasizes:

  • pinkCharacter: Emphasizes low frequencies — natural feel, smooth large-scale variation

  • brownCharacter: Even more low-frequency — very smooth, large-area changes

  • blueCharacter: Emphasizes high frequencies — fine-grained, detailed noise

  • violetCharacter: Very high frequency — extremely fine detail emphasis

  • whiteCharacter: Equal at all frequencies — unstructured randomness

  • fractalCharacter: Customizable frequency balance via alpha parameter

Pyramid Noise

Multi-scale noise that adds variation at multiple resolutions simultaneously. Produces more natural-looking variation than single-scale noise:

  • pyramid-bicubic/bilinear/nearest — different upscale interpolation for the pyramid layers

  • hires-pyramid-* — high-resolution variants


Presets — Copy These

Quick Draft

Sampler: res_2m | Scheduler: beta57 | Steps: 15 | Eta: 0.5 | Noise: gaussian | Mode: hard

Standard Quality (Recommended)

Sampler: res_3m | Scheduler: beta57 | Steps: 20 | Eta: 0.5 | Noise: gaussian | Mode: hard

High Quality Final

Sampler: res_3m | Scheduler: beta57 | Steps: 25 | Eta: 0.3 | Noise: gaussian | Mode: soft | implicit_steps: 2

Deterministic / Upscale Refinement

Sampler: res_2m | Scheduler: beta57 | Steps: 20 | Eta: 0.0 | Noise: gaussian | Mode: hard

Same seed always produces identical output.

Flux (Standard — No CFG)

Sampler: res_3m | Scheduler: beta57 | Steps: 25-30 | CFG: 1.0 | Eta: 0.5 | Noise: gaussian | Mode: lorentzian
DetailBoost method: sampler | SigmaScaling: lying=0.97, lying_inv=1.02, s_noise=1.04 | channelwise_cfg: False

Flux uses guidance distillation, not CFG. No negative prompt. Use conservative sigma
scaling — Flux's linear schedule amplifies lying effects more than SDXL.

Z-Image Base (True CFG)

Sampler: res_3m | Scheduler: beta57 | Steps: 28-50 | CFG: 3.0-5.0 | Eta: 0.5 | Noise: gaussian | Mode: hard
DetailBoost method: sampler | SigmaScaling: lying=0.97, lying_inv=1.02, s_noise=1.04 | channelwise_cfg: True

Z-Image Base uses true CFG with negative prompts. Same rectified flow as Flux (shift=3.0)
but with standard dual-pass guidance — channelwise_cfg=True helps at higher CFG values.

Z-Image Turbo / ZIT (Guidance-Free)

Sampler: res_2m | Scheduler: beta57 | Steps: 8-10 | CFG: 1.0 | Eta: 0.3 | Noise: gaussian | Mode: lorentzian
DetailBoost method: sampler | SigmaScaling: lying=0.98, lying_inv=1.01, s_noise=1.02 | channelwise_cfg: False

Z-Image Turbo is guidance-free (Decoupled-DMD distillation). No negative prompt, no CFG.
Very low step count — keep sigma scaling minimal to avoid artifacts in few-step sampling.

Qwen-Image (True CFG)

Sampler: res_3m | Scheduler: beta57 | Steps: 30-50 | CFG: 4.0 | Eta: 0.5 | Noise: gaussian | Mode: hard
DetailBoost method: model | SigmaScaling: lying=0.95, lying_inv=1.03, s_noise=1.04 | channelwise_cfg: True

Qwen-Image (20B MMDiT) uses true CFG with negative prompts. Larger pixel budget (1328²)
means more detail capacity — lying=0.95 is enough. Uses Qwen2.5-VL as text encoder.

Qwen-Image Distilled / Lightning (No CFG)

Sampler: res_2m | Scheduler: beta57 | Steps: 4-15 | CFG: 1.0 | Eta: 0.3 | Noise: gaussian | Mode: lorentzian
DetailBoost method: sampler | SigmaScaling: lying=0.98, lying_inv=1.01, s_noise=1.02 | channelwise_cfg: False

Distilled/Lightning variants are guidance-free. No negative prompt. Conservative settings
for few-step sampling.

Low-Denoise Refinement Pass

Sampler: res_2m | Scheduler: beta57 | Steps: 15-20 | Eta: 0.0-0.2 | Denoise: 0.2-0.3

SDXL / SD1.5

Sampler: res_2m | Scheduler: karras or beta57 | Steps: 25-40 | Eta: 0.5 | Noise: gaussian

Special Features

Denoise Behavior

Both ComfyUI's KSampler and RES4LYF use sigma-slicing — they compute a larger schedule and take the tail end. All requested steps always execute across the narrower noise range. See Latent Space, Denoise & Upscaling for details.

Implicit Refinement Types

  • "bongmath" — custom refinement approach

  • "rebound" — with CFG decay

  • "retro-eta" — backward eta correction

  • "predictor-corrector" — classic predict-then-fix

Cycle/Unsampling

Supports inversion workflows:

  • Rebound CFG decay

  • Eta decay scaling

  • Unsample mode for image-to-noise inversion


Conditioning — Control What the Model "Sees"

Conditioning = the prompt embedding tensors the model uses during sampling. RES4LYF gives
you tools to manipulate these embeddings directly — blending, scheduling, restricting to
specific steps, and applying them to different spatial regions.

Conditioning Strategy — When to Split Prompts

Single CLIPTextEncode works universally — it handles SDXL, SD3.5, Flux, DiT, WAN, etc.
Even Flux routes the same text to both CLIP-L and T5-XXL internally. You only need
CLIPTextEncodeFluxUnguided when you want different text per encoder (an optimization, not
a requirement).

When splitting prompts is worth it:

  • Regional (spatial)Multi-subject scenes, face vs body vs background. Separate CTE per region into ClownRegionalConditioning_ABC + masks

  • Timestep (temporal)Composition vs detail control. CTE "layout" at range 0.0–0.5, CTE "detail" at range 0.5–1.0, then Combine

  • Dual-tower (Flux only)Fine-tune CLIP-L vs T5-XXL separately. Use CLIPTextEncodeFluxUnguided with different clip_l / t5xxl text

When splitting prompts is NOT worth it:

  • Separate CTE per concept into ConditioningCombineJust concatenates tokens. A single well-written prompt does the same. Only helps if hitting 77-token CLIP-L limit on SDXL

  • Separate CTE per concept into ConditioningAverageAveraging embeddings destroys information. You get a blurry middle-ground between concepts

  • One CTE per "setting" (lighting, mood, etc.)Overhead for no gain. The model already parses "dramatic lighting, moody atmosphere" from a single prompt

Recommended approach per pipeline stage:

  • Stage 1 (simple): Single CTE with a well-structured prompt. Good enough for 90% of images

  • Stage 1 (multi-subject): Regional conditioning with masks — genuinely better for spatial control

  • Stage 1 (advanced): Timestep scheduling — composition prompt early, detail prompt late

  • Stage 3 (refine): Regional "sharp detail" for subject, "soft bokeh" for background

  • Stage 4 (face fix): ClownRegionalConditioning2 with face-specific prompt + mask

In short: Don't split prompts unless you're directing them to different places (regional) or
different times (timestep scheduling). Splitting just to recombine is busywork.


Regional Conditioning — Different Prompts for Different Areas

The most immediately useful conditioning feature. Instead of one prompt for the whole image,
assign different prompts to masked regions.

ClownRegionalConditioning_AB — 2 regions (e.g., subject vs background)
ClownRegionalConditioning_ABC — 3 regions (e.g., face vs body vs background)
ClownRegionalConditioning2 — Simplified 2-region (takes masked/unmasked)
ClownRegionalConditioning3 — Simplified 3-region (auto-computes third mask)

Connection flow:

  CLIP Encode "detailed face, sharp eyes"  ─→  conditioning_A
  CLIP Encode "ornate armor, leather"      ─→  conditioning_B
  CLIP Encode "castle interior, moody"     ─→  conditioning_C
  
  Face mask from detection                 ─→  mask_A
  Body mask (face excluded)                ─→  mask_B
  Everything else (auto: 1 - A - B)        ─→  mask_C  (or leave empty for _ABC variant)
  
  ClownRegionalConditioning_ABC
    ├─ weight: 1.0               (base regional strength)
    ├─ region_bleed: 0.15        (soft transition at region edges)
    ├─ region_bleed_start_step: 0
    ├─ mask_type: "gradient"     (smooth blending, not hard cutoff)
    ├─ edge_width: 0             (no extra edge padding)
    └─→ CONDITIONING  →  sampler positive input

Key parameters:

  • weightDefault: 1.0 · What it does: How strongly regional conditioning applies (0 = off, 1 = full)

  • region_bleedDefault: 0.0 · What it does: Soft falloff at region boundaries (0 = hard edge, 0.1-0.2 = smooth)

  • region_bleed_start_stepDefault: 0 · What it does: Which step to start bleed (later = sharper initial separation)

  • mask_typeDefault: "boolean" · What it does: "gradient" = smooth blending. "boolean" = hard on/off

  • edge_widthDefault: 0 · What it does: Extra blur at mask edges (in pixels)

  • weight_schedulerDefault: "constant" · What it does: Change weight across steps (constant, linear, sqrt, etc.)

  • start_step / end_stepDefault: 0 / -1 · What it does: Step range where regional conditioning is active

mask_type options (for _AB variant):

  • gradient — smooth blending for both regions

  • gradient_A / gradient_B — smooth for one, hard for the other

  • boolean — hard on/off for both regions

  • boolean_A / boolean_B — hard for one, gradient for the other

For _ABC variant — same options plus gradient_AB, gradient_AC, gradient_BC, boolean_AB, etc.

How it works internally: The regional node creates a callback that runs during sampling,
not at node setup time. It detects your model type (Flux, SDXL, WAN, HiDream) and creates
appropriate attention masks. Both region embeddings get summed, with attention masks gating
which region influences which spatial area.

When to use: Generation stage (Stage 1) when you want spatial prompt control. Also powerful at refinement (Stage 3) — e.g., "sharp detail" for subject, "soft bokeh" for background.

Timestep Scheduling — Different Prompts at Different Steps

ConditioningSetTimestepRange — Restrict a conditioning to only part of the diffusion process.

Example: Composition first, detail later

  CLIP Encode "wide landscape, dramatic sky"
    → ConditioningSetTimestepRange: start=0.0, end=0.5
    → "composition prompt" — only active first half
  
  CLIP Encode "highly detailed, sharp focus, professional photography"
    → ConditioningSetTimestepRange: start=0.5, end=1.0
    → "detail prompt" — only active second half
  
  Combine both → sampler positive input
  • start / end are percentages of total sampling (0.0 = beginning, 1.0 = end)

  • The model builds composition in early steps, adds detail in late steps

  • This mirrors how diffusion works — early steps = structure, late steps = texture

Conditioning Blend — Smooth Prompt Transitions

ConditioningAverage — Interpolate between two prompt embeddings.

  conditioning_to:   "photorealistic portrait"
  conditioning_from: "oil painting portrait"
  
  conditioning_to_strength: 0.7
    → Result: 70% photorealistic, 30% oil painting influence
  • strength = 0.0 → 100% from conditioning_from

  • strength = 1.0 → 100% from conditioning_to

  • Handles mismatched token lengths by zero-padding the shorter one

  • Blends both the main token embeddings AND the pooled (global) output

ConditioningAverageScheduler — Same blend, but the ratio changes per step.

  conditioning_0:  "base quality prompt"
  conditioning_1:  "enhanced detail prompt"
  ratio:           SIGMAS input (one value per step)
  
  → At each step, blend ratio comes from the sigma value
  → Early steps (high sigma): more base prompt
  → Late steps (low sigma): more detail prompt

Takes a SIGMAS input (from any sigma generator) as the blend schedule. Each step gets a
different blend ratio. Useful for progressive prompt transitions during sampling.

Conditioning Math — Direct Manipulation

ConditioningMultiply — Scale all prompt embeddings by a number.

  multiplier: 1.3  → 30% stronger prompt influence
  multiplier: 0.7  → 30% weaker
  multiplier: -1.0 → inverted (used for negative conditioning tricks)

Recursively multiplies every tensor in the conditioning structure (embeddings, pooled, etc.).

ConditioningAdd — Add scaled conditioning_2 onto conditioning_1.

  conditioning_1: "portrait of a woman"
  conditioning_2: "detailed eyes, sharp iris"
  multiplier: 0.5  → half-strength addition
  
  → Result: original prompt + 50% of the "eyes" emphasis

Useful for adding emphasis without re-encoding prompts. Note: modifies conditioning_1 in-place.

Orthogonal-Collinear Decomposition — Surgical Blending

ConditioningOrthoCollin — The most mathematically sophisticated blend node.

Instead of simple interpolation, it decomposes two conditionings into:

  • Collinear component = what's shared between both prompts (same direction)

  • Orthogonal component = what's unique to each prompt (perpendicular direction)

  conditioning_0: "beautiful woman, detailed face"
  conditioning_1: "professional photography, studio lighting"
  
  t5_strength: 1.0
    → 1.0 = favor conditioning_0's direction
    → 0.0 = favor conditioning_1's direction
    → 0.5 = equal blend of directions
  
  clip_strength: 1.0
    → Same control but for the global (pooled) output

When to use: When simple averaging muddles both prompts. OrthoCollin preserves the unique
aspects of each prompt while blending the shared aspects. Best for combining "subject" and
"style" prompts without losing either.

SD3.5-Specific — Truncation Nodes

ConditioningTruncate — Caps positive conditioning at 77 tokens × 4096 dims.
ConditioningZeroAndTruncate — Zeros AND truncates negative conditioning to 154 tokens.

SD3.5M degrades badly if conditioning exceeds these limits. Apply to respective positive/negative
before the sampler:

  Positive prompt → ConditioningTruncate → sampler positive
  Negative prompt → ConditioningZeroAndTruncate → sampler negative

Only needed for SD3.5M. Flux, SDXL, and other models don't need this.

Flux-Specific — Dual-Tower Encoding

CLIPTextEncodeFluxUnguided — Encode separate prompts for Flux's dual text encoders.

  clip_l:  "portrait, studio lighting"       (CLIP-L — 77 token max, global concepts)
  t5xxl:   "detailed description of scene..." (T5-XXL — 256+ tokens, fine detail)
  
  Returns:
    conditioning  →  sampler
    clip_l_end    →  token end position (INT)
    t5xxl_end     →  token end position (INT)

Flux uses two text encoders with different strengths. CLIP-L handles global concepts,
T5-XXL handles fine-grained detail. This node lets you independently tune what each
encoder sees.

Style Transfer via Conditioning

StyleModelApplyStyle — Apply visual style from a reference image (Flux Redux).

  Load reference image
    → CLIP Vision Encode → clip_vision_output
    → Load Style Model → style_model
  
  StyleModelApplyStyle
    ├─ conditioning:       your text conditioning
    ├─ style_model:        loaded Flux Redux model
    ├─ clip_vision_output: encoded reference image
    └─ strength: 1.0      (declared but currently unused in code)
    
    → CONDITIONING with style embeddings injected

The style model extracts visual features from the reference image and merges them into
the conditioning's cross-attention layer. The model then generates images with similar
visual style to the reference.

Use at: Generation (Stage 1) or refinement (Stage 3) for style consistency.

Conditioning Precision

Conditioning Recast FP64 — Cast conditioning tensors to float64.

  cond_0: your conditioning  (required)
  cond_1: second conditioning (optional)
  
  → Both outputs recast to float64 precision

Use before precision-sensitive operations like OrthoCollin decomposition or when chaining
multiple conditioning math operations (multiply → add → average) to prevent floating-point
drift.


Part 3 — Resolution, Latent Space & Prompt Theory

Choose Your Resolution

Every model was trained at a specific pixel budget (total pixels = width × height). Generating outside that budget degrades quality — too high causes duplicated compositions and multi-head artifacts, too low loses detail and looks soft.

The Rule: Change aspect ratio by changing dimensions, but keep total pixels near the training target.

Quick Reference

  • SD 1.5Native: 512×512 · Megapixels: ~0.26 MP · Alignment: ÷8 · Aspect Range: ~1:2 to 2:1

  • SDXLNative: 1024×1024 · Megapixels: ~1.05 MP · Alignment: ÷8 (÷64 rec.) · Aspect Range: ~1:2.4 to 2.4:1

  • SD3.5 LargeNative: 1024×1024 · Megapixels: ~1.0 MP · Alignment: ÷16 · Aspect Range: ~1:2 to 2:1

  • SD3.5 MediumNative: 1024×1024 · Megapixels: 0.25–2.0 MP · Alignment: ÷16 · Aspect Range: Wide (multi-res trained)

  • Flux.1Native: 1024×1024 · Megapixels: 0.1–2.0 MP · Alignment: ÷16 · Aspect Range: ~1:2 to 2:1+

  • Z-ImageNative: 1024×1024 · Megapixels: ~1.05 MP · Alignment: ÷16 · Aspect Range: Flexible (512–2048)

  • HiDreamNative: 1024×1024 · Megapixels: ~1.05 MP · Alignment: ÷16 · Aspect Range: Similar to Flux

  • Qwen-ImageNative: 1328×1328 · Megapixels: ~1.54–1.76 MP · Alignment: ÷16 · Aspect Range: 7 fixed buckets

SDXL Bucket Sizes (Training Aspect Ratios)

SDXL was trained on these exact bucket sizes at 64-pixel increments — these are the safest choices:

  • 1024×1024Ratio: 1:1 · Megapixels: 1.05

  • 1152×896 / 896×1152Ratio: ~4:3 / ~3:4 · Megapixels: 1.03

  • 1216×832 / 832×1216Ratio: ~3:2 / ~2:3 · Megapixels: 1.01

  • 1344×768 / 768×1344Ratio: ~16:9 / ~9:16 · Megapixels: 1.03

  • 1536×640 / 640×1536Ratio: ~21:9 / ~9:21 · Megapixels: 0.98

Flux / Z-Image / SD3.5 / HiDream Aspect Ratios

These transformer-based models use RoPE (rotary positional embeddings) and handle variable resolutions more gracefully. Target ~1.0 MP for best quality:

  • 1024×1024Ratio: 1:1 · Megapixels: 1.05

  • 1152×896 / 896×1152Ratio: ~4:3 / ~3:4 · Megapixels: 1.03

  • 1344×768 / 768×1344Ratio: ~16:9 / ~9:16 · Megapixels: 1.03

  • 1536×640 / 640×1536Ratio: ~21:9 / ~9:21 · Megapixels: 0.98

Flux officially supports 0.1–2.0 MP, Z-Image supports 512–2048px per side, and SD3.5 Medium was progressively trained from 256 to 1440px — all three are more resolution-flexible than SDXL.

Qwen-Image Bucket Sizes (2509 / 2512)

Qwen-Image is a 20B MMDiT with its own fixed aspect ratio buckets. It runs at a higher pixel budget than other models (~1.54–1.76 MP). The "2509" and "2512" suffixes are release dates (Sept/Dec 2025), not parameter counts. Uses Qwen2.5-VL as text encoder, 16-channel VAE with 8× compression.

Qwen-Image-2512 fixed the 4:3/3:4 bucket to be cleanly ÷16 (1140→1104). Use these 2512 values:

  • 1328×1328Ratio: 1:1 · Megapixels: 1.76

  • 1664×928 / 928×1664Ratio: ~16:9 / ~9:16 · Megapixels: 1.54

  • 1472×1104 / 1104×1472Ratio: ~4:3 / ~3:4 · Megapixels: 1.63

  • 1584×1056 / 1056×1584Ratio: ~3:2 / ~2:3 · Megapixels: 1.67

Why ÷8 vs ÷16?

All models use an 8× VAE (image → latent is 8× smaller per side). Dimensions must be divisible by 8 minimum.

Transformer models (SD3, Flux, Z-Image, HiDream, Qwen-Image) also apply 2×2 patchification on the latent, so pixel dimensions must be divisible by 8 × 2 = 16.

SDXL's training buckets used 64-pixel increments — ÷64 alignment is recommended for optimal bucket matching.

What Happens Outside the Budget

Too large (>1.5× the training MP):

  • Duplicated compositions, multiple heads/bodies

  • "Image-within-image" tiling artifacts

  • Coherent center, degraded edges

Too small (<0.5× the training MP):

  • Soft/blurry output, loss of fine detail

  • Oversimplified compositions

The fix for larger output: Generate at native resolution → pixel-space upscale → refine. That's what Stage 2 and 3 are for.


Latent Space, Denoise & Upscaling — How It Actually Works

Understanding what happens under the hood is essential for choosing the right denoise, step count, and upscale strategy in the pipeline stages that follow.

How Denoise Works

Denoise controls how much of the noise schedule is used.

  • denoise=1.0 → full noise → full denoising (complete generation)

  • denoise=0.5 → 50% noise added → model refines the other 50%

  • denoise=0.2 → 20% noise added → model barely touches the image

Under the hood (verified in ComfyUI source — comfy/samplers.py line 1148):

new_steps = int(steps / denoise)          # e.g., 30 / 0.2 = 150
sigmas = calculate_sigmas(new_steps)       # compute full 150-step sigma schedule
self.sigmas = sigmas[-(steps + 1):]        # take last 31 values (low-noise tail)

All requested steps always execute — ComfyUI does NOT skip steps. It computes a larger sigma schedule and slices the tail end, so all 30 steps run across the narrower noise range. RES4LYF uses the same approach.

Why Low Denoise Degrades Quality

The Scheduler Problem: The noise curve is non-linear. Structural/compositional work happens early (high sigma), detail refinement happens later (low sigma). Below ~0.5 denoise, the model skips structural steps entirely.

VAE Round-Trip Loss: Image → VAE Encode → Latent → Add Noise → Denoise → VAE Decode. The VAE is lossy — each encode/decode cycle degrades quality. At low denoise, the model has too few steps to fix these artifacts.

Sigma Mismatch: The model was trained on a specific noise schedule. At very low denoise, starting noise levels may fall in a range where predictions are less accurate.

Steps vs Denoise

Both the standard KSampler and RES4LYF use sigma-slicing — they compute a larger sigma schedule and take only the tail end. All requested steps always execute.

With 30 steps at denoise 0.2:

  • ComfyUI computes sigmas for 150 total steps (30 / 0.2)

  • Takes the last 31 sigma values (the low-noise 20% of the schedule)

  • Runs all 30 steps across that narrower range

Each step covers a small sigma delta within the 20% noise range:

  • More steps = finer steps within the same noise range → more precise refinement

  • Fewer steps = coarser steps → faster but less precise

  • Below ~10 steps at low denoise, quality drops because each step is too coarse

  • The sweet spot for refinement passes is 15-25 steps at denoise 0.2-0.3

  • 0.2Steps: 15 · Sigma Range: Narrow (low noise) · Character: Quick refinement

  • 0.2Steps: 25 · Sigma Range: Narrow (low noise) · Character: Precise refinement

  • 0.3Steps: 20 · Sigma Range: Moderate · Character: Good balance

  • 0.5Steps: 25 · Sigma Range: Wide · Character: Significant rework

Latent-Space Upscaling — Why It Breaks at Low Denoise

Naive latent upscale (bilinear/bicubic/nearest) interpolates between latent vectors, but latent space is NOT spatially smooth like pixel space. The result is an off-manifold latent — a tensor that doesn't match what the model saw during training.

At denoise 0.5+, enough noise is added to push it back on-manifold. At 0.2, those interpolation artifacts survive.

The Fix — Pixel-Space Upscaling:

KSampler (denoise 1.0) → VAE Decode → Upscale Image → VAE Encode → KSampler (denoise 0.2-0.4)

This keeps latents on-manifold because the VAE encoder produces a proper latent from the upscaled image.

Model-Based Latent Upscalers (e.g., LTX) are trained neural networks that understand their specific latent space. They produce valid on-manifold latents but cannot cross models — each model family has a completely different latent space.

Interpolation Methods for Upscaling

Lanczos > Bicubic > Bilinear for sharpness.

  • BilinearSharpness: Soft · Artifacts: None · Use Case: Fastest, fine if denoise >= 0.5

  • BicubicSharpness: Moderate · Artifacts: Slight ringing at edges · Use Case: Good balance

  • LanczosSharpness: Sharpest · Artifacts: Minor ringing possible · Use Case: Best for photo/realistic

For low-denoise refinement passes, use Lanczos — the sampler won't have enough steps to recover blur from bilinear.

Model-based upscalers (RealESRGAN, 4x-UltraSharp, SwinIR) are dramatically better than any interpolation method.

Turbo/Lightning/Distilled Models — Different Rules

Turbo models were distilled to converge in very few steps. Each step does the equivalent of 4-5 standard steps. Too many steps = overshooting.

  • 1.0Steps: 6-8

  • 0.5Steps: 4-6

  • 0.2-0.3Steps: 3-5

This is the opposite of standard models. Match the intended step granularity.


VAE Operations — Encode/Decode Quality

VAEEncodeAdvanced — Precision Encoding

The most important VAE node for quality. Adds deterministic seeding and flexible multi-input
handling.

  image_1:         your image          (optional)
  image_2:         second image         (optional)
  mask:            mask image           (optional)
  latent:          existing latent      (optional, for size reference)
  vae:             your VAE model
  
  resize_to_input: "image_1"           (auto-size all outputs to this input's dimensions)
  mask_channel:    "red"               (which channel to extract as mask)
  invert_mask:     False
  latent_type:     "4_channels"        (or "16_channels" for Cascade)
  width/height:    1024/1024           (only if resize_to_input = "false")
  
  Returns:
    latent_1:      encoded image_1
    latent_2:      encoded image_2
    mask:          extracted mask
    empty_latent:  matching empty latent
    width:         actual width used
    height:        actual height used

Why it matters: Standard VAE encode is non-deterministic — running it twice on the same
image produces slightly different latents. VAEEncodeAdvanced sets torch.manual_seed(42)
before encoding, guaranteeing identical results every run. This matters for reproducible
workflows and consistent latent channel statistics.

Use at: Any stage where you encode an image to latent (upscale → re-encode, mask extraction, img2img input).

LatentUpscaleWithVAE — The Right Way to Upscale Latents

Decode → pixel-space upscale → re-encode. Avoids the problems of pure latent-space interpolation.

  latent: your latent
  width:  target width
  height: target height
  vae:    your VAE model
  
  → Decodes to image, resizes, re-encodes
  → Preserves state_info metadata (denoised, data_prev_ for video)

Uses deterministic seed (42). Handles video latents (5D tensors) by flattening to batch
dimension, processing, and restacking. Preserves the state_info dictionary that
ClownsharKSampler uses for multi-step state tracking.

Use at: Stage 2 (upscale) as alternative to separate decode → upscale → encode chain. Simpler wiring, same result.

VAEStyleTransferLatent — Latent-Space Style Transfer

Match the visual style of a reference latent onto your generation.

  method:    "AdaIN" (fast) or "WCT" (high quality)
  latent:    your generation latent
  style_ref: reference image latent (encode your style reference first)
  vae:       your VAE model

AdaIN (Adaptive Instance Normalization):

  • Normalizes content latent (subtract mean, divide by std)

  • Rescales to style reference's mean/std

  • Fast (~1ms), good for texture/color matching

  • Can cause color bleeding in complex scenes

WCT (Whitening + Coloring Transform):

  • Whitening: removes correlation between feature channels

  • Coloring: applies style reference's covariance structure

  • Uses eigendecomposition (slower, ~50ms)

  • Better color preservation, less bleeding

  • Works through VAE decoder's embedding layer

When to use: After upscale (Stage 2→3) when upscaling changes the color feel. Encode the original (pre-upscale) image as style_ref, then apply to the upscaled latent. WCT is better for faces; AdaIN is fine for landscapes.


Precision & Latent Manipulation — Numerical Quality Control

Precision Casting — Why It Matters

Diffusion models run in fp16 or bf16 for speed, but this loses numerical precision.
For quality-critical steps (final refinement, face fix), higher precision prevents
subtle artifacts like banding, color drift, and texture smearing.

Set Precision — Cast a single latent to fp16/fp32/fp64.

  latent → Set Precision (64) → high-precision latent

Set Precision Universal — Cast everything at once (conditioning, sigmas, latent).

  cond_pos, cond_neg, sigmas, latent
    → Set Precision Universal (fp64)
    → all outputs in float64
  
  Options: bf16, fp16, fp32, fp64, passthrough

Set Precision Advanced — Returns 5 copies at different precisions simultaneously.

  latent → Set Precision Advanced
    ├─ output_0: passthrough (original dtype)
    ├─ output_1: global_precision (your choice)
    ├─ output_2: fp16
    ├─ output_3: fp32
    └─ output_4: fp64

When to use fp64: Final refinement pass, face fix, any operation where you're chaining multiple latent operations (upscale → normalize → match → sample). The accumulated rounding error from fp16/fp32 becomes visible as color banding or texture loss.

When NOT to use fp64: Generation from scratch (Stage 1) — fp16/bf16 is fine. The model weights themselves are fp16, so fp64 latent precision has diminishing returns for the initial generation.

High-Precision Latent Creation

EmptyLatentImage64 — Create blank latents in float64.

  width: 1024, height: 1024, batch_size: 1
  → 4-channel latent, 128×128 spatial (8× compression), float64

EmptyLatentImageCustom — Full control over channels, compression, and precision.

  channels: "4" or "16"    (4 = SD/Flux, 16 = Cascade)
  mode: "sdxl" (8×), "cascade_b" (4×), "cascade_c" (custom), "exact" (1×)
  precision: "fp16", "fp32", "fp64"
  compression: 4-128       (only used in cascade_c mode)

Practical use: Create empty latents with EmptyLatentImage64 when your pipeline chains multiple latent operations before the sampler. The extra precision prevents accumulated rounding errors. For standard generation, the normal EmptyLatentImage (fp32) is fine.

Latent Channel Statistics — Fixing Color Shifts

One of the most underrated quality improvements. After upscaling or latent operations,
channel statistics (mean/standard deviation per channel) can drift, causing color shifts
or washed-out results.

Latent Normalize Channels — Reset channel statistics.

  mode: "channels"       (per-channel, not global)
  operation: "normalize"  (zero mean, unit variance)
  
  Other options:
    "center"      → subtract mean only (preserves variance/contrast)
    "standardize" → divide by std only (preserves brightness/color offset)

Latent Match Channelwise — Transfer channel statistics from a reference.

  model:          your model       (used for latent-space preprocessing)
  latent_target:  the latent to fix
  latent_source:  reference latent (good colors/statistics)
  mask_target:    optional mask    (only match masked region)
  mask_source:    optional mask    (only sample stats from masked region)
  
  → Target latent gets source's mean/std per channel

This is AdaIN (Adaptive Instance Normalization) in latent space. Per-channel, it:

  1. Computes target mean/std

  2. Computes source mean/std

  3. Normalizes target to zero-mean unit-variance

  4. Rescales to match source's mean/std

extra_options (text field, regex-parsed):

  • exclude_channels=0,2 — skip specific channels

  • disable_process_latent — don't use model's internal latent encoder

  • enable_std / disable_mean — match only variance or only mean

When to use: After upscaling (Stage 2→3 transition) to prevent the color shift that happens when pixel-space upscale → VAE encode doesn't perfectly preserve latent distribution. Feed the original generation's latent as latent_source and the upscaled+re-encoded latent as latent_target.

Latent Get Channel Means — Diagnostic node. Outputs per-channel mean values as SIGMAS.
Use this to inspect whether your latent channels have drifted.

Fourier-Domain Latent Blending — Phase & Magnitude

The most advanced latent operation in RES4LYF. Uses FFT (Fast Fourier Transform) to
decompose latents into:

  • Phase = spatial structure, edges, composition layout

  • Magnitude = signal strength, color intensity, contrast

LatentPhaseMagnitude — Blend phase/magnitude from two latents independently.

  latent_0_batch:  your generation result
  latent_1_batch:  a reference latent (style reference, previous generation, etc.)
  
  Global power controls:
    phase_mix_power:      1.0   (exponent for phase blending)
    magnitude_mix_power:  1.0   (exponent for magnitude blending)
  
  Per-channel weights (0 = all from latent_0, 1 = all from latent_1):
    phase_luminosity:           0.0  (channel 0 — brightness structure)
    phase_cyan_red:             0.0  (channel 1 — color structure)
    phase_lime_purple:          0.0  (channel 2 — color structure)
    phase_pattern_structure:    0.0  (channel 3 — texture/pattern)
    
    magnitude_luminosity:       0.0
    magnitude_cyan_red:         0.0
    magnitude_lime_purple:      0.0
    magnitude_pattern_structure:0.0

Practical example — keep structure, change colors:

  phase_luminosity:        0.0   (keep structure from latent_0)
  phase_pattern_structure: 0.0   (keep texture from latent_0)
  magnitude_cyan_red:      0.8   (take colors from latent_1)
  magnitude_lime_purple:   0.8   (take colors from latent_1)
  → Result: structure of latent_0, color palette of latent_1

Practical example — keep colors, change structure:

  phase_luminosity:        0.7   (take layout from latent_1)
  phase_pattern_structure: 0.7   (take texture from latent_1)
  magnitude_luminosity:    0.0   (keep brightness of latent_0)
  magnitude_cyan_red:      0.0   (keep colors from latent_0)
  → Result: composition of latent_1, colors of latent_0

Normalization flags (per input and output):

  • normal (default True) — Z-score normalize (subtract mean, divide by std)

  • stdize (default True) — Divide by std only

  • meancenter (default True) — Subtract mean only

These prevent magnitude scale mismatches between the two latents. Leave all True unless you
have a specific reason.

Critical: Phase/magnitude operations MUST run in float64. Float32 FFT loses >0.1 radians of phase precision, causing visible artifacts. The node converts internally, but feeding fp64 latents avoids unnecessary precision loss at the boundary.

Single-input variants:

  • LatentPhaseMagnitudeMultiply — Multiply phase/magnitude by channel weights (scale)

  • LatentPhaseMagnitudeOffset — Add to phase/magnitude (shift hue/structure)

  • LatentPhaseMagnitudePower — Exponentiate (non-linear compression/expansion)

Noise Injection — Controlled Stochasticity

LatentNoised — Add calibrated noise to a latent with full control.

  latent_image:    your latent
  noise_type:      "gaussian", "fractal", "perlin", etc. (same as sampler noise types)
  noise_strength:  1.0         (linear scaling, 0 = no noise)
  noise_seed:      12345       (reproducible)
  normalize:       "true"      (rescale noise to match latent's mean/std)
  noise_is_latent: False       (True = treat noise as latent perturbation, not pure additive)
  mask:            optional    (only add noise to masked region)
  
  alpha, k:  shape parameters for specific noise types

When normalize=true, noise gets rescaled to match the target latent's statistical
distribution, so a strength of 1.0 adds a meaningful amount of noise regardless of the
latent's actual value range.

When noise_is_latent=true, the noise is combined with the latent and then re-normalized.
This treats the noise as a "latent direction" rather than additive random values.

When to use: Before refinement (Stage 3) to break up over-smooth areas. Small amount of Perlin noise (strength 0.05-0.15) before a low-denoise refine pass adds natural texture variation without changing composition.

LatentNoiseBatch_perlin — Generate spatially-coherent Perlin noise.

  seed: 0, width: 1024, height: 1024, batch_size: 1
  detail_level: 0.0  (-1.0 to 1.0, scales fractal octaves)

Perlin noise creates smooth, natural-looking patterns (unlike Gaussian which is pure random).
The noise goes through an inverse error function to map it to a Gaussian distribution matching
expected latent statistics. Useful as input to LatentNoised via the latent_noise input.


Mask Operations — Precision Boundaries

MaskEdge — Smart Edge Detection

Extract edge regions from masks with independent internal/external control.

  mask:      face detection mask
  dilation:  20         (edge thickness)
  mode:      "percent"  (relative to mask area) or "absolute" (pixels)
  internal:  1.0        (scale internal edge width — inside the mask)
  external:  1.0        (scale external edge width — outside the mask)

Creates a ring-shaped mask at the boundary between masked and unmasked areas. Controls:

  • internal = 1.0, external = 0.0 → edge only inside the mask (shrink)

  • internal = 0.0, external = 1.0 → edge only outside the mask (grow)

  • internal = 0.5, external = 1.5 → thinner inside, wider outside (smoother blend outward)

In "percent" mode, dilation is relative to the mask's area (sqrt of total pixel count).
This auto-scales edge width based on mask size — a small mask gets narrower edges, a large
mask gets wider edges.

When to use: Stage 4 (face fix) to create a feathered boundary for blending fixed regions. Use MaskEdge to create a transition zone, then composite the fixed region through it.

Better than GrowMaskWithBlur for precision work because you can control inside vs outside edge width independently. GrowMaskWithBlur grows uniformly in both directions.


Prompt Structure & Token Theory

Prompt Structure — What Goes Where

Your prompt ordering matters. CLIP-based models (SDXL/SD1.5) front-load attention — early
tokens get disproportionate weight. T5-based models (Flux/SD3.5) read more uniformly but
still benefit from clear structure.

Optimal ordering (most important → least important):

SDXL / Tag-heavy models:
─────────────────────────
1. Subject           "1girl, warrior, standing"
2. Subject detail    "long silver hair, blue eyes, ornate plate armor"
3. Action / pose     "holding sword, looking at viewer"
4. Shot / framing    "upper body, from below, dynamic angle"
5. Setting           "castle ruins, dramatic sunset sky, volumetric fog"
6. Lighting          "rim lighting, golden hour, high contrast"
7. Style             "by artgerm, oil painting"
8. Quality tags      "masterpiece, best quality, highly detailed"

Flux / Natural language models:
───────────────────────────────
1. Subject + action  "A battle-scarred female warrior standing atop castle ruins"
2. Subject detail    "with long silver hair and piercing blue eyes, wearing ornate plate armor"
3. Setting + mood    "against a dramatic sunset sky with volumetric fog rolling between broken walls"
4. Lighting          "lit from behind with golden hour rim lighting"
5. Style (optional)  "in the style of a cinematic oil painting with high contrast"

Why this order:

  • 1–2What goes there: Subject + details · Why: Highest attention weight — model focuses here most

  • 3–4What goes there: Shot + setting · Why: Still strong attention, frames the scene

  • 5–6What goes there: Lighting + style · Why: Global modifiers that influence the whole image

  • LastWhat goes there: Quality tags · Why: Work fine with low attention — they're global signals, not spatial

Common mistake — quality tags first: Putting "masterpiece, best quality" at position 1
gives peak attention weight to generic modifiers instead of your subject. The model "hears"
quality tags the loudest and your actual subject description with less emphasis.

Model-specific rules:

  • Z-Image BaseQuality tags?: Optional, at end · Negative prompt?: Yes — true CFG (3.0-5.0) · Best format: Natural prose sentences

  • Z-Image Turbo (ZIT)Quality tags?: Skip · Negative prompt?: Not supported (guidance-free) · Best format: Natural prose sentences

  • Qwen-ImageQuality tags?: Optional, at end · Negative prompt?: Yes — true CFG (4.0) · Best format: Natural prose sentences

  • Qwen-Image DistilledQuality tags?: Skip · Negative prompt?: Not supported (CFG=1.0) · Best format: Natural prose sentences

  • FluxQuality tags?: Skip entirely (can hurt) · Negative prompt?: Not supported · Best format: Natural prose sentences

  • SDXLQuality tags?: Yes, at end · Negative prompt?: Essential ("worst quality, blurry, deformed") · Best format: Tags, comma-separated

  • SD3.5Quality tags?: Optional, at end · Negative prompt?: Optional but helps · Best format: Prose works better than tags

  • Pony / IllustriousQuality tags?: At start — score_9 etc. are primary classifiers · Negative prompt?: Yes · Best format: Score tags first, then subject tags

  • WAN (video)Quality tags?: Skip · Negative prompt?: Minimal · Best format: Short, clear prose

Negative prompt template (SDXL):

worst quality, low quality, blurry, deformed, disfigured, extra limbs, bad anatomy,
bad hands, watermark, text, signature, cropped

Token budget awareness:

  • CLIP-L (SDXL, Flux): 77 tokens per chunk. Attention decays across chunks

  • T5-XXL (Flux, SD3.5): 256+ tokens with uniform attention — use the space

  • Qwen3-4B (Z-Image): Single text encoder, no dual CLIP/T5 — natural prose, generous context

  • Qwen2.5-VL 7B (Qwen-Image): Full VLM as text encoder — rich descriptions, very long context

  • If hitting limits on SDXL: move style/quality to negative ("NOT low quality") or use
    timestep scheduling to split composition vs detail prompts

How Tokens Work — What You're Actually Spending

Tokens are word-pieces, not individual characters. The tokenizer (BPE) splits text into
subword chunks from its vocabulary:

  • catTokens: cat · Count: 1

  • warriorTokens: warrior · Count: 1

  • battlefieldTokens: battle + field · Count: 2

  • photorealisticTokens: photo + real + istic · Count: 3

  • 1girlTokens: 1 + girl · Count: 2

  • , (comma + space)Tokens: single vocabulary entry · Count: 1

Rules of thumb:

  • Common English words = 1 token (dog, red, standing, portrait)

  • Compound / uncommon words = 2–3 tokens (masterpiece = 2, ultra-detailed = 3)

  • ~0.75 words per token on average, or ~4 characters per token

  • A 77-token CLIP-L chunk holds roughly 50–60 words

  • Commas cost nothing extra — , is 1 token. They help CLIP separate concepts cleanly

Real token wasters to avoid:

  • highly detailedWhy: 2 tokens, detailed alone works · Better alternative: detailed (1 token)

  • ultra-high-resolutionWhy: 4+ tokens for a vague concept · Better alternative: set your resolution properly

  • 8k, 4k, HDRWhy: 3 tokens for buzzwords the model barely understands · Better alternative: drop them

  • trending on artstationWhy: 4 tokens, meaningless to most models · Better alternative: specific artist name (1–2 tokens)

  • very very detailedWhy: repeated emphasis burns tokens, no extra effect · Better alternative: say it once

Parenthesis emphasis (detailed:1.3): The parens and colon cost ~2 extra tokens but give
real control over attention weight. Worth it for key concepts — just don't wrap every tag.

Pony / Illustrious Score Tags

These models were trained with quality score tags as primary classifiers. Unlike SDXL quality
tags (which are just weighted concepts), score tags are hard-coded training signals — the
model was explicitly trained to associate them with quality tiers.

The score scale:

  • score_9Meaning: Top tier only (specific tier, no _up)

  • score_8_upMeaning: Score 8 and above

  • score_7_upMeaning: Score 7 and above

  • score_6_upMeaning: Score 6 and above

  • score_5_upMeaning: Score 5 and above

  • score_4_upMeaning: Score 4 and above (mediocre+)

  • score_3 / score_2 / score_1Meaning: Specific low tiers (no _up variants)

The _up suffix = "this tier and everything above it." Without _up = that specific tier only.

Stacking is emphasis, not redundancy:

score_9, score_8_up, score_7_up

This means: "I want 7+, prefer 8+, really aim for 9." Each tag adds attention weight toward
that tier. It's like saying "good, preferably great, ideally the best."

score_7_up alone is sufficient — it includes 7, 8, and 9. Stacking just biases toward the top.

Positive presets:

  • score_9, score_8_up, score_7_upEffect: Strongly biased toward top tier

  • score_8_up, score_7_upEffect: Biased toward 8+, baseline 7

  • score_7_upEffect: Flat "anything 7+" — simplest, still good

  • score_9Effect: Very top tier only — can be too restrictive

Negative score tags — use bare tags, NOT _up:

Negative: score_5, score_4, score_3, score_2, score_1

This targets specific low tiers. Don't use score_5_up in negative — that means "avoid 5
and above" which conflicts with your positive asking for 7+. The overlapping range confuses
the model.

Don't put score_6 in negative — tier 6 is "decent." Pushing it negative can make output
look artificially perfect. The score_5 cutoff is the sweet spot for most use cases.


Performance Profiles

  • DraftSampler: euler · Steps: 10-15 · Speed: Very Fast · Quality: Low

  • BalancedSampler: res_2m · Steps: 20 · Speed: Fast · Quality: Excellent

  • ReferenceSampler: rk4_4s · Steps: 35 · Speed: Medium · Quality: Excellent

  • PrecisionSampler: radau_iia_7s · Steps: 30 · Speed: Slower · Quality: Very High

  • LuxurySampler: res_8s + implicit · Steps: 40 · Speed: Slow · Quality: Maximum


Things That DON'T Combine Well

  • Linear samplers (euler, rk4) with Flux modelsexponential samplers (RES) converge 3x faster on rectified flows

  • Too many implicit steps (>5)diminishing returns, wastes compute

  • eta > 0 with noise_mode = "none"noise mode overrides eta (no noise added regardless)

  • Exotic samplers (lobatto_iiid_3s) without understandingunpredictable results, use RES instead

  • Very high eta (>1.0) with low denoisetoo much noise re-injected into a mostly-clean image


Troubleshooting — Conditioning, Precision & Latents

  • Colors shift after upscaleCause: VAE re-encode changes distribution · Fix: Latent Match Channelwise (source = original latent)

  • Regional prompts bleed into each otherCause: Hard mask edges · Fix: Increase region_bleed to 0.15-0.2, use mask_type "gradient"

  • Face fix has visible seamCause: Uniform edge blending · Fix: MaskEdge with internal=0.5, external=1.5 for outward-biased blend

  • Banding in gradientsCause: fp16 precision loss · Fix: Set Precision Universal fp64 for refinement/fix stages

  • Style reference doesn't matchCause: AdaIN too simple · Fix: Use WCT method in VAEStyleTransferLatent

  • Phase/magnitude blend artifactsCause: fp32 FFT precision loss · Fix: Ensure LatentPhaseMagnitude inputs are fp64

  • SD3.5 quality degradesCause: Conditioning too long · Fix: ConditioningTruncate (pos) + ConditioningZeroAndTruncate (neg)

  • Regional conditioning ignoredCause: Wrong model detection · Fix: Check model type (Flux/SDXL/WAN) — regional uses model-specific attention masks

  • VAE encode gives different results each runCause: Non-deterministic encode · Fix: Use VAEEncodeAdvanced (seeds torch with 42)


Part 4 — Quality Pipeline: Generation → Upscale → Refine → Fix → Save

Pipeline Integration — Where Each Node Fits

Stage 1: Generation (Enhanced)

  [Standard encoding — works with SDXL, SD3.5, DiT, etc.]
  CLIPTextEncode
    ├─ positive: "your detailed scene description, quality tags"
    └─ negative: "worst quality, blurry, deformed"  (if model uses negative)
    → conditioning

  [Flux only — dual text encoder]
  CLIPTextEncodeFluxUnguided
    ├─ clip_l: "global concepts, style keywords"
    └─ t5xxl: "detailed scene description with fine nuances"
    → conditioning

  [Optional: Regional control for multi-subject scenes]
  ClownRegionalConditioning_AB or _ABC
    ├─ conditioning_A: subject prompt
    ├─ conditioning_B: background prompt
    ├─ mask_A: subject mask (from previous generation or manual)
    ├─ mask_type: "gradient"
    └─ region_bleed: 0.15
    → regional conditioning → sampler positive

  [Optional: Timestep scheduling]
  ConditioningSetTimestepRange (start=0.0, end=0.5)  → composition prompt
  ConditioningSetTimestepRange (start=0.5, end=1.0)  → detail prompt
  Combine → sampler positive

  [For precision-critical generation]
  EmptyLatentImage64 → fp64 empty latent → sampler
  Set Precision Universal (fp64) → precision-cast conditioning + sigmas + latent

  [For SD3.5M only]
  Positive → ConditioningTruncate → sampler positive
  Negative → ConditioningZeroAndTruncate → sampler negative

Stage 2→3: Upscale → Refine (Enhanced)

  Stage 2 output (upscaled image)
    → VAEEncodeAdvanced (deterministic encode, resize_to_input="image_1")
    → upscaled latent

  [Fix color shift from upscale]
  Latent Match Channelwise
    ├─ latent_target: upscaled latent  (color-shifted)
    ├─ latent_source: original latent  (correct colors)
    └─ model: your model
    → color-corrected upscaled latent

  [Optional: Style consistency]
  VAEStyleTransferLatent (method="WCT")
    ├─ latent: color-corrected upscaled latent
    ├─ style_ref: original generation latent
    └─ vae: your VAE
    → style-matched latent

  [Optional: Pre-refine texture injection]
  LatentNoised
    ├─ latent_image: style-matched latent
    ├─ noise_type: "brownian" or "fractal"
    ├─ noise_strength: 0.05-0.10
    └─ normalize: "true"
    → textured latent → Stage 3 sampler

Stage 4: Face/Region Fix (Enhanced)

  Detection mask from Stage 4A
    → MaskEdge (dilation=25, mode="percent", internal=0.5, external=1.5)
    → edge_mask (for blend zone)

  [Regional conditioning for face fix]
  ClownRegionalConditioning2
    ├─ conditioning_masked: "detailed face, sharp eyes, smooth skin, pores"
    ├─ conditioning_unmasked: original prompt (or empty)
    ├─ mask: face detection mask
    ├─ mask_type: "gradient"
    └─ region_bleed: 0.1
    → regional conditioning → face fix sampler positive

  [High precision for face fix]
  Set Precision Universal (fp64)
    → cast conditioning + latent to fp64
    → face fix sampler

  After sampler output:
    → composite using edge_mask for smooth boundary blending

Full Enhanced Pipeline Summary

  STAGE 1: Generate
  ├─ CLIPTextEncode (or CLIPTextEncodeFluxUnguided for Flux)
  ├─ [Optional] ClownRegionalConditioning_ABC (multi-area prompts)
  ├─ [Optional] ConditioningSetTimestepRange (step scheduling)
  ├─ [Optional] StyleModelApplyStyle (reference image style)
  ├─ EmptyLatentImage64 (fp64 precision)
  └─→ ClownsharKSampler → generation latent
  
  STAGE 2: Upscale
  ├─ VAE Decode → pixel upscale (model or bicubic)
  └─ VAEEncodeAdvanced (deterministic re-encode)
  
  STAGE 2→3 BRIDGE: Latent Correction
  ├─ Latent Match Channelwise (fix color shift from upscale)
  ├─ [Optional] VAEStyleTransferLatent WCT (style consistency)
  └─ [Optional] LatentNoised (texture injection pre-refine)
  
  STAGE 3: Refine
  ├─ [Optional] ConditioningAverage (blend base + detail prompts)
  └─→ ClownsharKSampler (denoise 0.25-0.35)
  
  STAGE 4: Face/Region Fix
  ├─ Detection → mask
  ├─ MaskEdge (precision boundary)
  ├─ ClownRegionalConditioning2 (face-specific prompt)
  ├─ Set Precision Universal fp64 (precision casting)
  ├─ InpaintCrop → ClownsharKSampler → InpaintStitch
  └─ Composite using edge_mask
  
  STAGE 5-6: Final upscale + save

The Complete Workflow

Stage 1: GENERATE          → high-quality base image
Stage 2: UPSCALE           → 2x resolution via pixel-space upscale
Stage 3: REFINE            → detail enhancement + sampler polish  
Stage 4: FIX (face/skin)   → targeted region correction
Stage 5: FINAL UPSCALE     → optional 2nd upscale
Stage 6: SAVE              → output with metadata

Stage 1: Generation — Get the Best Base Image

The base generation determines 80% of final quality. Get this right and the rest is polish.

Sampler Setup

Option nodes (each connects directly to a ClownsharKSampler options slot):

SharkOptions            → Sampler options
ClownOptions_SDE        → Sampler options
ClownOptions_DetailBoost → Sampler options
ClownOptions_SigmaScaling → Sampler options

SharkOptions:

noise_type_init:  gaussian
s_noise_init:     1.0
denoise_alt:      1.0
channelwise_cfg:  True          ← prevents color burn at higher CFG

ClownOptions_SDE:

noise_type_sde:   gaussian
noise_mode_sde:   hard
eta:              0.5

ClownOptions_DetailBoost:

weight:           0.3           ← subtle during generation, don't overdo it
method:           model
mode:             hard
start_step:       3             ← skip first 2 steps (rough structure phase)
end_step:         -1            ← apply through the rest

ClownOptions_SigmaScaling:

s_noise:          1.04          ← moderate SDE noise boost
lying:            0.92          ← model produces sharper detail
lying_inv:        1.06          ← compensates color desaturation
lying_start_step: 0
lying_inv_start_step: 1

ClownsharKSampler:

sampler_name:     res_3m        ← highest quality exponential integrator
scheduler:        beta57        ← optimized for RES
steps:            25-30         ← RES needs fewer steps
denoise:          1.0           ← full generation
cfg:              5.5-7.5       ← depends on model
sampler_mode:     standard
bongmath:         True

Why these values:

  • res_3m uses 3-point history for quadratic extrapolation — best accuracy per step

  • lying=0.92 tricks the model into producing ~8% more detail than it normally would

  • detail_boost weight=0.3 adds subtle enhancement without artifacts

  • channelwise_cfg prevents the washed-out look from guidance

Flux Adaptation — Stage 1

The settings above are tuned for SDXL-style models with traditional CFG. Flux uses a fundamentally different guidance mechanism and requires different settings.

Why Flux is different: Standard models (SDXL, SD1.5) use classifier-free guidance (CFG) —
the sampler runs two forward passes (conditional + unconditional) and amplifies the difference.
Flux uses guidance distillation — the guidance value is baked into the model as a learned
vector input. There is no separate negative/unconditional pass at all. This means:

  • CFG must be 1.0 — there is no negative conditioning to subtract, so CFG > 1 has no
    meaningful effect (and can hurt quality)

  • No negative prompt — Flux has no unconditional path. Leave negative empty or don't
    connect it

  • channelwise_cfg is irrelevant — with CFG=1.0 there's no guidance amplification to
    balance per-channel, so it does nothing (or adds overhead)

  • Sigma scaling interacts differently — Flux is a rectified flow model with linear noise
    schedule. Aggressive lying values that work on SDXL can produce severe noise artifacts
    (leopard-print patterns, texture corruption) on Flux

BFL reference settings (from black-forest-labs/flux sampling.py):

  • SamplerFLUX.1-dev: Euler (first-order ODE) · FLUX.1-schnell: Euler

  • StepsFLUX.1-dev: 50 · FLUX.1-schnell: 1-4

  • GuidanceFLUX.1-dev: 3.5 (model-internal vector, not CFG) · FLUX.1-schnell: 0.0

  • CFGFLUX.1-dev: 1.0 (no CFG) · FLUX.1-schnell: 1.0

  • ScheduleFLUX.1-dev: Time-shifted linear (image-size dependent) · FLUX.1-schnell: Time-shifted linear

  • Negative promptFLUX.1-dev: None · FLUX.1-schnell: None

Adapted RES4LYF settings for Flux:

SharkOptions:

noise_type_init:  gaussian
s_noise_init:     1.0
denoise_alt:      1.0
channelwise_cfg:  False         ← no CFG splitting, disable this

ClownOptions_SDE:

noise_type_sde:   gaussian
noise_mode_sde:   lorentzian    ← less aggressive than hard, balances exploration/refinement
eta:              0.5

ClownOptions_DetailBoost:

weight:           0.3
method:           sampler       ← "sampler underestimates" works better than "model" on Flux
mode:             hard
start_step:       3
end_step:         -1

ClownOptions_SigmaScaling:

s_noise:          1.04          ← moderate SDE boost (tested, works on Flux)
lying:            0.97          ← conservative — Flux amplifies lying more than SDXL
lying_inv:        1.02          ← compensates lying desaturation
lying_start_step: 0
lying_inv_start_step: 1

ClownsharKSampler:

sampler_name:     res_3m        ← still best quality (also works: euler for BFL-standard behavior)
scheduler:        beta57
steps:            25-30
denoise:          1.0
cfg:              1.0           ← MUST be 1.0 for standard Flux
sampler_mode:     standard
bongmath:         True

Flux distilled variants (e.g., Flux-dev with guidance distillation fine-tunes) may accept cfg > 1.0 — test carefully. Standard FLUX.1-dev and FLUX.1-schnell must use cfg=1.0.

Sigma scaling on Flux: Flux's linear sigma schedule amplifies lying effects more than SDXL's cosine schedule. The SDXL-tuned values (lying=0.92) will produce visible artifacts (leopard-print patterns). The values above (lying=0.97, lying_inv=1.02, s_noise=1.04) are tested and work well as a starting point. Don't go below lying=0.95 on Flux without checking for noise artifacts.

Z-Image Adaptation — Stage 1

Z-Image is a 6B S3-DiT (Single-Stream DiT) by Alibaba/Tongyi-MAI. It uses Lumina2's
NextDiT backbone in ComfyUI, with flow matching (shift=3.0) and Qwen3-4B as text encoder.
Two variants exist: Base (true CFG) and Turbo/ZIT (guidance-free).

Why Z-Image is different from Flux: Both are rectified flow models, but Z-Image Base uses
true CFG (dual forward pass — conditional + unconditional) while Flux uses guidance
distillation. This means:

  • Z-Image Base: cfg=3.0-5.0, negative prompts work and help quality, channelwise_cfg=True useful

  • Z-Image Turbo (ZIT): cfg=1.0 (guidance-free via Decoupled-DMD distillation), no negative, same rules as Flux

  • Sigma scaling: Same rectified flow schedule as Flux — use conservative lying values

  • Text encoder: Qwen3-4B (single encoder, no dual CLIP/T5) — natural prose works best

  • VAE: 16-channel (same as Flux), ÷16 alignment required

Z-Image Base settings:

SharkOptions:

noise_type_init:  gaussian
s_noise_init:     1.0
denoise_alt:      1.0
channelwise_cfg:  True          ← helps at CFG 3-5

ClownOptions_SDE:

noise_type_sde:   gaussian
noise_mode_sde:   hard          ← standard noise timing (CFG handles guidance)
eta:              0.5

ClownOptions_DetailBoost:

weight:           0.3
method:           sampler       ← works well on flow models
mode:             hard
start_step:       3
end_step:         -1

ClownOptions_SigmaScaling:

s_noise:          1.04
lying:            0.97          ← conservative — same rectified flow as Flux
lying_inv:        1.02
lying_start_step: 0
lying_inv_start_step: 1

ClownsharKSampler:

sampler_name:     res_3m
scheduler:        beta57
steps:            28-50
denoise:          1.0
cfg:              3.0-5.0       ← true CFG, negative prompt recommended
sampler_mode:     standard
bongmath:         True

Z-Image Turbo (ZIT) settings: Same as Flux adaptation above — cfg=1.0,
channelwise_cfg=False, no negative prompt, 8-10 steps, lorentzian noise mode.

Qwen-Image Adaptation — Stage 1

Qwen-Image is a 20B MMDiT — the largest open-source diffusion model. It uses Qwen2.5-VL 7B as text encoder (a full VLM), 16-channel VAE, flow matching (like Flux/SD3), and runs at a higher pixel budget (~1.54-1.76 MP at 1328² native).

Why Qwen-Image is different: Uses true CFG (not guidance distillation) with negative
prompts. The VLM text encoder understands rich natural language better than CLIP+T5. At 20B
parameters it has more capacity but needs more steps and VRAM.

  • Qwen-Image Base: cfg=4.0, negative prompts are powerful, channelwise_cfg=True recommended

  • Qwen-Image Distilled/Lightning: cfg=1.0, no negative, 4-15 steps

  • Sigma scaling: Flow matching like Flux — conservative lying values

  • Resolution: Fixed buckets only (1328², 1664×928, 1472×1104, 1584×1056 + orientations)

  • Text encoder: Qwen2.5-VL 7B — rich prose descriptions, no token limit anxiety

Qwen-Image Base settings:

SharkOptions:

noise_type_init:  gaussian
s_noise_init:     1.0
denoise_alt:      1.0
channelwise_cfg:  True          ← recommended at CFG 4.0

ClownOptions_SDE:

noise_type_sde:   gaussian
noise_mode_sde:   hard
eta:              0.5

ClownOptions_DetailBoost:

weight:           0.3
method:           model         ← 20B model has high capacity, model method works well
mode:             hard
start_step:       3
end_step:         -1

ClownOptions_SigmaScaling:

s_noise:          1.04
lying:            0.95          ← slightly more room than Flux due to larger model capacity
lying_inv:        1.03
lying_start_step: 0
lying_inv_start_step: 1

ClownsharKSampler:

sampler_name:     res_3m
scheduler:        beta57
steps:            30-50         ← 20B model benefits from more steps
denoise:          1.0
cfg:              4.0           ← true CFG
sampler_mode:     standard
bongmath:         True

Qwen-Image Distilled/Lightning: Same pattern as Z-Image Turbo — cfg=1.0, channelwise_cfg=False, no negative, 4-15 steps depending on distillation variant. Use lorentzian noise mode for few-step sampling.


Stage 2: Upscale — Pixel-Space (Not Latent)

Why pixel-space: Latent upscale creates off-manifold latents (see Latent Space, Denoise & Upscaling). Pixel upscale → VAE re-encode is safer and produces cleaner results for the refinement pass.

Workflow:

Generated Latent → VAE Decode → Upscale Image (2x) → VAE Encode → refined latent

Upscale method: Use an upscale model (4x-UltraSharp, RealESRGAN, NMKD, etc.) through ComfyUI's ImageUpscaleWithModel node, or use ImageScale with Lanczos for a simple 2x.

Key: After upscaling in pixel space, VAE-encode the upscaled image back to latent for Stage 3 refinement.


Stage 3: Refine — Low-Denoise Sampler Pass

Take the upscaled latent and run a short sampling pass with low denoise to add detail at the new resolution.

Option Nodes:

ClownOptions_SDE        → Sampler options
ClownOptions_DetailBoost → Sampler options
ClownOptions_SigmaScaling → Sampler options

ClownOptions_SDE:

noise_type_sde:   gaussian
noise_mode_sde:   hard
eta:              0.25          ← lower for refinement (preserve structure)

ClownOptions_DetailBoost:

weight:           0.5-1.0       ← stronger than generation — this is where you add detail
method:           model
mode:             sinusoidal    ← focuses boost on middle steps
start_step:       0
end_step:         -1

ClownOptions_SigmaScaling:

s_noise:          1.05
lying:            0.89          ← stronger lying for more detail at higher res
lying_inv:        1.08

ClownsharKSampler (Refinement):

sampler_name:     res_2m        ← 2m is fine for refinement (faster)
scheduler:        beta57
steps:            15-20         ← short pass
denoise:          0.3-0.45      ← low denoise preserves the upscaled content
cfg:              4.5-6.0       ← slightly lower CFG for refinement
sampler_mode:     standard
bongmath:         True

Why lower denoise: At denoise 0.3-0.45, the sampler only touches the fine detail layer of the sigma schedule. It adds texture and sharpness without changing composition, colors, or structure.

Flux Adaptation — Stage 3

Same principles as Stage 1 Flux Adaptation: cfg=1.0,
channelwise_cfg=False, conservative sigma scaling.

Flux-specific refinement changes:

ClownOptions_SDE:
  noise_mode_sde:   lorentzian  ← softer than hard, better for Flux refinement

ClownOptions_DetailBoost:
  method:           sampler     ← "sampler underestimates" consistently better on Flux

ClownOptions_SigmaScaling:
  s_noise:          1.04        ← same as Stage 1
  lying:            0.97
  lying_inv:        1.02

ClownsharKSampler:
  cfg:              1.0         ← must be 1.0 for Flux

Z-Image Adaptation — Stage 3

Same principles as Z-Image Stage 1 — true CFG for Base,
guidance-free for Turbo.

Z-Image Base refinement changes:

ClownOptions_SDE:
  noise_mode_sde:   lorentzian  ← softer for refinement, even with true CFG

ClownOptions_DetailBoost:
  method:           sampler     ← reliable on flow models
  weight:           0.25        ← slightly lighter for refinement

ClownOptions_SigmaScaling:
  s_noise:          1.04
  lying:            0.97
  lying_inv:        1.02

ClownsharKSampler:
  cfg:              3.0-4.0     ← slightly lower than Stage 1 for refinement
  steps:            20-30

Z-Image Turbo (ZIT): Same as Flux Stage 3 — cfg=1.0, lorentzian, 6-8 steps.

Qwen-Image Adaptation — Stage 3

Same principles as Qwen-Image Stage 1 — true CFG with
negative prompts. The large model capacity means refinement can be aggressive.

Qwen-Image Base refinement changes:

ClownOptions_SDE:
  noise_mode_sde:   lorentzian

ClownOptions_DetailBoost:
  method:           model       ← 20B capacity shines in refinement
  weight:           0.25

ClownOptions_SigmaScaling:
  s_noise:          1.04
  lying:            0.95
  lying_inv:        1.03

ClownsharKSampler:
  cfg:              3.5         ← slightly lower than Stage 1's 4.0
  steps:            25-35

Qwen-Image Distilled: Same as Flux Stage 3 — cfg=1.0, lorentzian, 4-10 steps.

Alternative: Tiled Refinement

For very large images (3000+ px), use tiled sampling:

ClownOptions_Tile:
  tile_width:   1024
  tile_height:  1024

Connect to a ClownsharKSampler options slot. The sampler will process each tile separately and blend them back together.


Stage 4: Fix — Face, Skin, Eyes, Mouth, Teeth

This is the targeted correction stage. You detect regions accurately, create pixel-perfect masks, and re-sample just those areas.

4A: Accurate Detection & Masking

Two approaches — VLM (accurate, slower) and YOLO (fast, pre-trained classes).

Option A: Florence2 VLM Detection (Best Accuracy)

Uses a vision-language model — understands natural language prompts, detects almost anything you can describe.

SmartLML (Florence2)          ← vision-language detection
  task: object detection
  prompt: "face" / "eyes" / "mouth" / "teeth" / "hands"
  → bounding box output

Detection to BBox              ← convert detection format to bbox
  → bbox coordinates

LayerMask SAM2 Ultra           ← Segment Anything from the bbox
  input_image: [refined image from Stage 3]
  bbox: [from detection]
  → pixel-accurate mask (not a rough blob — actual contour)

Mask to Segs                   ← convert mask to segments format
  → SEGS (for detailer workflows)

Flow for each region:

Image → SmartLML Florence2 ("face") → BBox → SAM2 Ultra → face_mask
Image → SmartLML Florence2 ("eyes") → BBox → SAM2 Ultra → eyes_mask
Image → SmartLML Florence2 ("mouth, teeth") → BBox → SAM2 Ultra → mouth_mask

When to use: Complex scenes, unusual angles, non-standard subjects, anything YOLO wasn't trained on.

Option B: YOLO/Ultralytics Detection (Fastest)

Uses pre-trained YOLO models — no VLM needed, runs at 30-50 FPS. From Impact Pack / Impact Subpack.

UltralyticsDetectorProvider    ← loads YOLO model
  model_name: "face_yolov8m.pt"    (or segm variant)
  → BBOX_DETECTOR (and optionally SEGM_DETECTOR)

BboxDetectorForEach            ← runs detection on image
  bbox_detector: [from provider]
  image: [refined image from Stage 3]
  threshold: 0.5               ← confidence cutoff (lower = more detections)
  dilation: 10                 ← expand bbox slightly
  → SEGS (with cropped regions, masks, confidence scores)

SAMDetectorCombined (optional) ← refine bbox masks with SAM2
  sam_model: [SAM2 model]
  segs: [from bbox detector]
  → refined MASK (pixel-accurate from rough bbox)

Available YOLO models:

  • face_yolov8m.ptDetects: Faces only · Speed: Fast · File: bbox/face_yolov8m.pt

  • face_yolov8m-seg.ptDetects: Faces + instance mask · Speed: Fast · File: segm/face_yolov8m-seg.pt

  • person_yolov8m-seg.ptDetects: Full person + mask · Speed: Fast · File: segm/person_yolov8m-seg.pt

  • yolov8m.ptDetects: 80 COCO classes (person, car, etc.) · Speed: Fast · File: bbox/yolov8m.pt

  • hand_yolov8s.ptDetects: Hands · Speed: Very fast · File: bbox/hand_yolov8s.pt

Model size variants: n (nano/fastest) → s (small) → m (medium/balanced) → l (large) → x (best accuracy)

Flow for face fix:

UltralyticsDetectorProvider("face_yolov8m.pt")
        ↓ BBOX_DETECTOR
BboxDetectorForEach(image, threshold=0.5, dilation=10)
        ↓ SEGS
(optional) SAMDetectorCombined(SAM2, SEGS) → pixel-accurate mask
        ↓ MASK / SEGS
[continue to Inpaint Crop or SetLatentNoiseMask]

When to use: Batch processing, real-time workflows, standard subjects (faces, hands, people). Much faster than Florence2 — no LLM inference needed.

Comparison

  • SpeedFlorence2 (VLM): ~1-3 sec per detection · YOLO (Ultralytics): ~20-50 ms per detection

  • Model sizeFlorence2 (VLM): 1-7 GB · YOLO (Ultralytics): 36-140 MB

  • FlexibilityFlorence2 (VLM): Any text prompt · YOLO (Ultralytics): Fixed pre-trained classes

  • AccuracyFlorence2 (VLM): Excellent for described objects · YOLO (Ultralytics): Excellent for trained classes

  • Best forFlorence2 (VLM): Complex/unusual detections · YOLO (Ultralytics): Faces, people, hands (standard)

  • VRAMFlorence2 (VLM): ~2-4 GB · YOLO (Ultralytics): ~200-500 MB

  • RequiresFlorence2 (VLM): SmartLML node · YOLO (Ultralytics): Impact Pack + Impact Subpack

Recommendation: Use YOLO for faces/hands (it's what it was trained for and it's 50x faster). Use Florence2 for anything YOLO can't detect or fail — specific objects, text regions, clothing items, etc.

Either way — feather the mask

Regardless of detection method, feather before sampling:

If using Inpaint Crop (Section 4E): The crop node handles feathering via mask_blend_pixels.

If using SetLatentNoiseMask directly (Section 4B): Feather with GrowMaskWithBlur:

mask:          face_mask
grow_amount:   10-20 px         ← slight expansion for context
blur_radius:   25-40 px         ← soft feathered edges for seamless blending

4B: Bridge Mask → ClownsharKSampler

The key node is SetLatentNoiseMask (built-in ComfyUI node). It embeds a mask into the latent dict as noise_mask. When the sampler receives this latent, it only denoises the masked region and preserves everything outside.

SetLatentNoiseMask
  samples:  [VAE-encoded upscaled image from Stage 3]
  mask:     [feathered face_mask from SAM2 + GrowMaskWithBlur]
  → masked_latent (LATENT with noise_mask embedded)

Then feed masked_latent directly into ClownsharKSampler as the latent_image input. The sampler will:

  1. Only add noise to the masked region

  2. Only denoise the masked region

  3. Preserve everything outside the mask untouched

  4. Blend at mask edges based on the feathering

Full node chain for face fix:

[Refined Image] → VAE Encode → SetLatentNoiseMask(mask=face_mask) → masked_latent
                                                                         ↓
ClownOptions_SDE → ClownOptions_DetailBoost → ClownsharKSampler(latent_image=masked_latent)
                                                                         ↓
                                                                   VAE Decode → fixed image

4C: Sampler Settings for Face Fix

ClownOptions_SDE:

noise_type_sde:   gaussian
noise_mode_sde:   soft           ← softer noise for face refinement
eta:              0.2-0.3        ← low for preservation

ClownOptions_DetailBoost (face):

weight:           0.3-0.5        ← moderate — don't over-sharpen skin
method:           model
mode:             lorentzian     ← peaked in middle steps, gentle start/end
start_step:       0
end_step:         -1

ClownOptions_SigmaScaling (face):

lying:            0.95           ← gentle lying for faces (0.89 is too aggressive for skin)
lying_inv:        1.03
s_noise:          1.02           ← minimal extra noise

ClownsharKSampler (face fix):

sampler_name:     res_2m
scheduler:        beta57
steps:            15-20
denoise:          0.35-0.5       ← enough to fix issues, not enough to regenerate
cfg:              5.0-6.5
sampler_mode:     standard
bongmath:         True
prompt:           "detailed face, perfect skin, sharp eyes, symmetrical features, natural skin texture"
negative:         "blurry, distorted, asymmetrical, plastic skin, uncanny valley"

4D: Optional — Guided Face Fix

Add a ClownGuide_Mean to steer the face refinement toward the original:

ClownGuide_Mean:
  guide:       [original latent from Stage 2 — before any face changes]
  weight:      0.7-0.8          ← strong guidance keeps structure
  cutoff:      1.0
  start_step:  0
  end_step:    -1

And/or ClownGuide_FrequencySeparation for skin smoothing:

method:          median
kernel_size:     8
highpass_weight:  0.8            ← slightly reduce high-freq detail = smoother skin
lowpass_weight:   1.0            ← keep color/structure intact

Connect guides to the guides input on ClownsharKSampler.

4E: Better Alternative — Inpaint Crop & Stitch + ClownsharKSampler

The Inpaint Crop & Stitch node pack (comfyui-inpaint-cropandstitch) is purpose-built for this exact workflow. It handles the crop, context padding, and seamless blending automatically.

Two nodes:

  • Inpaint Crop — takes full image + mask → outputs cropped region (with padding/context), cropped mask (pre-feathered), and a STITCHER dict that remembers how to put it back

  • Inpaint Stitch — takes the processed crop + stitcher dict → seamlessly composites back into the full image

Full face-fix flow:

[Full Image] + [SAM2 face_mask]
        ↓
┌──────────────────────────────────┐
│  Inpaint Crop Improved           │
│  context_extend_factor: 1.5      │  ← 50% padding around face for context
│  output_target_width:  768       │  ← resize crop to model-friendly size
│  output_target_height: 768       │
│  mask_blend_pixels:    32        │  ← auto-feathering for seamless edges
│  mask_fill_holes:      True      │
│  device_mode:          GPU       │  ← instant, ~5ms
└──────┬───────────┬───────────┬───┘
       ↓           ↓           ↓
  cropped_image  cropped_mask  stitcher
   (768×768)     (feathered)   (metadata dict)
       ↓           ↓
  VAE Encode    SetLatentNoiseMask
       └─────┬─────┘
             ↓
       masked_latent
             ↓
  ┌──────────────────────────────────┐
  │  ClownsharKSampler               │
  │  sampler=res_2m, steps=15-20     │
  │  denoise=0.35-0.45               │  ← light refinement
  │  cfg=5.0-6.0                     │
  │  + options chain as below        │
  └──────────┬───────────────────────┘
             ↓
       VAE Decode
             ↓
       processed_crop (768×768)
             ↓
  ┌──────────────────────────────────┐
  │  Inpaint Stitch Improved         │
  │  stitcher:        [from crop]    │
  │  inpainted_image: [from decode]  │
  └──────────┬───────────────────────┘
             ↓
       Final Image (original size, face refined, rest untouched, seamless blend)

Why this is better than SetLatentNoiseMask alone:

  • The face fills the entire crop → model works at maximum effective resolution for the face

  • Context padding gives the model surrounding pixels for coherent edge generation

  • Auto-feathered mask eliminates manual GrowMaskWithBlur tuning

  • Stitch handles all the coordinate math, resize-back, and blending automatically

  • Works identically for faces in any position/size in the image

Inpaint Crop key parameters:

  • context_extend_factorRecommended: 1.3-1.5 · What It Does: How much padding around the mask (1.5 = 50% extra on each side)

  • output_target_width/heightRecommended: 768 or 1024 · What It Does: Resize crop to model-native resolution

  • mask_blend_pixelsRecommended: 28-40 · What It Does: Gaussian blur radius for edge feathering

  • mask_expand_pixelsRecommended: 5-10 · What It Does: Dilate mask before cropping (catch edge pixels)

  • mask_fill_holesRecommended: True · What It Does: Fill small gaps in the SAM2 mask

  • output_paddingRecommended: "32" or "64" · What It Does: Pad to multiple (latent alignment)

  • device_modeRecommended: GPU · What It Does: 30-100x faster than CPU

4F: Complete Detection → Crop → Fix → Stitch Pipeline

Combining all the pieces — the full per-region fix chain:

[Refined Image from Stage 3]
        ↓
SmartLML Florence2 (task: detect, prompt: "face")
        ↓
Detection to BBox
        ↓
LayerMask SAM2 Ultra → pixel-accurate face_mask
        ↓
Inpaint Crop Improved (context=1.5, target=768×768, blend=32)
        ↓
  cropped_image + cropped_mask + stitcher
        ↓
  VAE Encode → SetLatentNoiseMask(mask=cropped_mask)
        ↓
  [Options: SDE(soft, eta=0.25) → DetailBoost(weight=0.4, lorentzian) → SigmaScaling(lying=0.95)]
        ↓
  ClownsharKSampler (res_2m, 15 steps, denoise=0.4, cfg=5.5)
        ↓
  VAE Decode → Inpaint Stitch Improved
        ↓
[Fixed Image — face refined, everything else untouched]

Repeat for each region with different prompts:

  • 1Florence2 Prompt: "face" · Crop Target: 768×768 · Denoise: 0.40 · Detail Boost: 0.4 (lorentzian) · Lying: 0.95

  • 2Florence2 Prompt: "eyes" · Crop Target: 512×512 · Denoise: 0.30 · Detail Boost: 0.6 (hard) · Lying: 0.92

  • 3Florence2 Prompt: "mouth, teeth" · Crop Target: 512×512 · Denoise: 0.30 · Detail Boost: 0.3 (soft) · Lying: 0.95

  • 4Florence2 Prompt: "hands" (if needed) · Crop Target: 768×768 · Denoise: 0.40 · Detail Boost: 0.3 (hard) · Lying: 0.95


Stage 5: Final Upscale (Optional)

If you need a second resolution bump:

Option A: Pixel-Space Upscaler Model

Use ImageUpscaleWithModel with 4x-UltraSharp or RealESRGAN-x4plus for a clean 2x or 4x boost. No re-sampling needed — this is a pure image upscale.

Option B: SUPIR (If Available)

restoration_scale: 1.0-1.5     ← light artifact removal
cfg_scale:         3.0-4.0     ← moderate guidance
steps:             30-45
color_fix_type:    Wavelet      ← preserves original colors best
use_tiled_vae:     True         ← saves VRAM
control_scale:     0.5-0.8     ← how much restoration to apply

Use SUPIR as a polish pass, not an aggressive upscaler. Conservative settings give the best results.

Option C: Skip

If Stage 3 refinement already got you to target resolution, skip this entirely. Less processing = fewer artifacts.


Stage 6: Final Polish & Save

Sharpening

KJNodes Adaptive USM:
  blur_sigma:  2.5
  strength:    0.4-0.6          ← keep under 1.0 to avoid artifacts
  threshold:   5                ← only sharpen above noise floor

Save

Standard SaveImage node with PNG/JPEG output.


Preset Configurations — Copy These

Quick Quality (RES4LYF Only, ~2 min)

GENERATE:
  sampler=res_3m, scheduler=beta57, steps=25, cfg=5.5, denoise=1.0
  lying=0.92, lying_inv=1.06, detail_boost weight=0.3

No upscale/refine, just save.

Balanced Pipeline (~5 min)

GENERATE:
  sampler=res_3m, scheduler=beta57, steps=25, cfg=6.0, denoise=1.0
  lying=0.92, lying_inv=1.06, detail_boost weight=0.3, mode=hard
UPSCALE:
  pixel-space 2x (Lanczos or upscale model)
REFINE:
  sampler=res_2m, steps=15, denoise=0.35, cfg=5.0
  lying=0.89, lying_inv=1.08, detail_boost weight=0.5, mode=sinusoidal
SAVE: with Adaptive USM strength=0.5

Maximum Quality Pipeline (~15 min)

GENERATE:
  sampler=res_3m, scheduler=beta57, steps=30, cfg=6.5, denoise=1.0
  lying=0.92, lying_inv=1.06, detail_boost weight=0.3
  implicit_steps=2, implicit_type=bongmath
  channelwise_cfg=True
UPSCALE:
  pixel-space 2x (4x-UltraSharp or RealESRGAN)
REFINE:
  sampler=res_2m, steps=20, denoise=0.4, cfg=5.0
  lying=0.89, lying_inv=1.08, detail_boost weight=0.8, mode=sinusoidal
  eta=0.2
FACE FIX:
  Florence2 "face" → BBox → SAM2 Ultra → face_mask
  Inpaint Crop (context=1.5, target=768, blend=32)
  VAE Encode → SetLatentNoiseMask(cropped_mask)
  guide=original, guide_weight=0.8
  sampler=res_2m, steps=15, denoise=0.4, cfg=5.5
  lying=0.95, lying_inv=1.03, noise_mode=soft
  VAE Decode → Inpaint Stitch back to full image
  separate eye/mouth passes (same Florence2 → SAM2 → Crop → Sample → Stitch pipeline)
FINAL:
  optional 2nd upscale (SUPIR light or upscale model)
  Adaptive USM strength=0.5
  Save PNG

Portrait Focus (~10 min)

GENERATE:
  sampler=res_3m, steps=30, cfg=6.0
  lying=0.95, lying_inv=1.03    ← softer lying for portraits
  detail_boost weight=0.15      ← subtle for skin
UPSCALE:
  pixel-space 2x
FACE REFINE (entire face):
  Florence2 "face" → SAM2 Ultra → Inpaint Crop (768, context=1.5)
  VAE Encode → SetLatentNoiseMask → ClownsharKSampler
  guide_weight=0.8, denoise=0.4, steps=20
  freq_sep: highpass_weight=0.75  ← smooth skin
  lying=0.95, noise_mode=soft
  VAE Decode → Inpaint Stitch
EYE DETAIL:
  Florence2 "eyes" → SAM2 Ultra → Inpaint Crop (512, context=1.3)
  denoise=0.35, detail_boost weight=0.5  ← sharpen eyes
  lying=0.92  ← slightly stronger for eye detail
  Inpaint Stitch back
SAVE: Adaptive USM strength=0.3 (light for portraits)

Troubleshooting — Sampling & Pipeline

  • Over-sharpened / crunchyCause: lying too low, detail_boost too high · Fix: Raise lying to 0.95+, reduce weight to 0.2

  • Washed-out colorsCause: High CFG without compensation · Fix: Enable channelwise_cfg, or lower CFG

  • Color desaturation with lyingCause: lying_inv too low · Fix: Increase lying_inv (if lying=0.89, try lying_inv=1.10)

  • Face still looks bad after fixCause: Denoise too low, or mask too small · Fix: Increase denoise to 0.45-0.5, grow mask by 20+ px

  • Seam at mask boundaryCause: Mask not feathered enough · Fix: GrowMaskWithBlur: blur_radius=35+

  • Generation too smoothCause: No detail boost, no lying · Fix: Add detail_boost weight=0.3, lying=0.92

  • Generation too noisy/artifactsCause: Too much noise injection · Fix: Lower s_noise to 1.0, eta to 0.3, lying to 0.95

  • Refinement changes compositionCause: Denoise too high · Fix: Lower to 0.25-0.35 for refinement

  • Tiled sampling shows gridCause: Tiles too small · Fix: Increase tile_width/height to 1024+


2