1. What is ToMe?

Token Merging is an acceleration trick for Stable Diffusion that fuses nearby visual tokens (small latent patches inside the U‑Net) whenever they carry almost the same information. Because the network now has fewer tokens to process, every step runs faster and peak VRAM drops a little. Crucially, ToMe never touches the CLIP text embeddings, so your prompt and negative‑prompt remain bit‑accurate.

2. Practical impact in plain words

With the ratio ρ you decide how aggressively tokens are merged:

ρ = 0 (off) – the model behaves exactly as usual.
ρ ≈ 0.15 (light) – about 15 % of the tokens are fused. Expect roughly 12 % faster renders and a tiny (≈ 0.2 GB) VRAM saving; image quality is visually identical.
ρ ≈ 0.25 (balanced) – keeps three‑quarters of the tokens. This gives something like a 20 % speed‑up and frees roughly 0.4 GB. Very small text may look a bit softer, but most scenes remain crisp.
ρ ≈ 0.40 (aggressive) – only 60 % of tokens remain. Speed jumps by almost 30 % and you save about 0.7 GB, yet fine lines and small lettering start to blur and colours may look “soapy”.
ρ > 0.50 (extreme / preview only) – the network flies, but artefacts become obvious; use this solely for throw‑away thumbnails.

Rule of thumb: draft shots love ρ in the 0.20‑0.30 range, while final renders should stay at 0.10 or below unless you are sure the scene is forgiving.

For two‑pass pipelines (e.g. img2img or High‑res Fix) you can be bold on the first, low‑resolution pass (say ρ ≈ 0.30) and conservative on the second (ρ ≈ 0.05) to keep final detail.

3. How ToMe interacts with LoRA styles

LoRA tweaks the Cross‑Attention layers, and ToMe does not interfere with those layers at all. Your chosen style therefore remains intact, but extreme merging still smooths the underlying textures the LoRA tries to inject. In practice:

Photoreal or portrait LoRA stay clean up to ρ ≈ 0.25.
Manga / anime styles tolerate slightly higher settings, around ρ ≈ 0.30, because line art is bolder.
Highly detailed or noisy styles - fur, brocade, baroque ornament—start to look washed‑out above ρ ≈ 0.15.
Flat cel‑shaded styles survive even ρ ≈ 0.35, as they rely on large, uniform regions anyway.

Workflow tip: generate mass previews at ρ ≈ 0.30, shortlist your favourite seeds, then rerender the winners with ρ = 0 for maximum fidelity.

4. Enabling ToMe in Forge

Go to Settings → Stable Diffusion → Optimizations → Token merging ratio (f2.0.1v1.10.1-previous-665-gae278f79)
Ensure an optimiser is active (Automatic or Flash‑Attn, not None).
Move the Token merging ratio slider to your desired value.
If you use img2img or High‑res Fix, remember there are separate sliders for each pass.

TL;DR

Set ρ ≈ 0.25 for fast drafts, drop to ≤ 0.10 for final images. LoRA styles hold up, but the more intricate the style, the lower the ratio should be.

Token Merging (ToMe) - Cheat Sheet

1. What is ToMe?

2. Practical impact in plain words

3. How ToMe interacts with LoRA styles

4. Enabling ToMe in Forge

TL;DR

Comments

Token Merging (ToMe) - Cheat Sheet

1. What is ToMe?

2. Practical impact in plain words

3. How ToMe interacts with LoRA styles

4. Enabling ToMe in Forge

TL;DR

Comments

Token Merging (ToMe) - Cheat Sheet