Sign In

Anima 2B - Qwen 3.5 4B Text Encoder

Updated: Mar 10, 2026

base modelanima

Verified:

SafeTensor

Type

Checkpoint Trained

Stats

316

0

Reviews

Published

Mar 10, 2026

Base Model

Anima

Hash

AutoV2
EA289BE7C9

License:

Qwen 3.5 4B Text Encoder for Anima 2B

NEW → Now supported on Forge Neo (sd-webui-forge-neo) as a native extension! See the Forge Neo install instructions below.

Installation

ComfyUI

Clone the repo into your ComfyUI custom_nodes folder:

cd ComfyUI/custom_nodes
git clone https://github.com/GumGum10/comfyui-qwen35-anima.git

Then restart ComfyUI.

Forge Neo

Clone or copy the extension into your Forge Neo extensions folder:

cd sd-webui-forge-neo/extensions
git clone https://github.com/GumGum10/sd-forge-qwen-35-encoder.git

Then restart Forge Neo. Dependencies (transformers, safetensors) install automatically on first launch.


What Is This?

A drop-in upgrade for Anima 2B's text encoder. The stock Anima ships with a tiny 0.6B parameter text encoder — it works, but it struggles with complex prompts. This replaces it with a 4B parameter encoder that understands your prompts significantly better.

The trade-off: the larger encoder needs alignment work to "speak the same language" as the diffusion model. We've done that work and ship the alignment files with this release. You just need to place files in the right folders and toggle a couple of settings.


What You Get

Pros:

  • Much better understanding of complex/long prompts (7× more parameters dedicated to reading your text)

  • Better handling of detailed scene descriptions, multiple subjects, and nuanced instructions

  • Alignment controls let you blend between raw 4B output and 0.6B-compatible output

Cons:

  • Uses more VRAM than the stock 0.6B encoder (~4GB vs ~0.6GB for the text encoder portion)

  • Slightly slower encoding (more parameters to run)

  • Alignment is an approximation — the diffusion model was trained against the 0.6B, so we're rotating the 4B's output to match. It's very good (0.96 cosine similarity) but not identical

  • This is a reverse-engineered implementation — the original author's private code may differ in subtle ways


File Placement

All files are available at: lylogummy/anima2b-qwen-3.5-4b

ComfyUI

You'll download 4 files:

ComfyUI/
├── models/
│   └── text_encoders/
│       └── qwen35_4b.safetensors          ← THE TEXT ENCODER WEIGHTS
│
└── custom_nodes/
    └── comfyui-qwen35-anima/              ← THIS CUSTOM NODE FOLDER
        ├── __init__.py                     ← (comes with the node)
        ├── calibration_params.safetensors  ← MAGNITUDE CALIBRATION
        ├── rotation_matrix.safetensors     ← ALIGNMENT ROTATION
        └── qwen35_tokenizer/              ← TOKENIZER FILES
            ├── tokenizer.json
            ├── vocab.json
            └── merges.txt

Forge Neo

You only need to download 1 file — the calibration files, alignment matrix, and tokenizer are already bundled with the extension:

sd-webui-forge-neo/
├── models/
│   └── text_encoder/
│       ├── qwen_3_06b_base.safetensors     ← STOCK 0.6B (you already have this)
│       └── qwen35_4b.safetensors            ← DOWNLOAD THIS
│
└── extensions/
    └── sd_forge_qwen35_encoder/             ← THIS EXTENSION
        ├── scripts/                         ← (comes with extension)
        ├── lib_qwen35/                      ← (comes with extension)
        ├── calibration_params.safetensors   ← (bundled)
        ├── rotation_matrix.safetensors      ← (bundled)
        └── qwen35_tokenizer/               ← (bundled)

Forge Neo note: Keep qwen_3_06b_base.safetensors selected in the top VAE/Text Encoder dropdown — its LLM adapter is still required. Do not put qwen35_4b.safetensors in that top dropdown.

Where to download each file:

qwen35_4b.safetensors (both ComfyUI and Forge Neo) → Download from: text_encoders/ → Place in: ComfyUI/models/text_encoders/ or sd-webui-forge-neo/models/text_encoder/ → What it does: The actual 4B text encoder model weights

calibration_params.safetensors + rotation_matrix.safetensors (ComfyUI only — bundled in Forge Neo) → Download from: calibration/ → Place in: ComfyUI/custom_nodes/comfyui-qwen35-anima/ → What they do: Calibration scales the 4B output to match the 0.6B's magnitude per dimension. The rotation matrix rotates the 4B's concept directions to match what the adapter expects.

qwen35_tokenizer/ folder (ComfyUI only — bundled in Forge Neo) → Download from: tokenizer/ → Place in: ComfyUI/custom_nodes/comfyui-qwen35-anima/qwen35_tokenizer/ → What it does: The correct tokenizer (vocab=248K, NOT the default Qwen3 tokenizer) → Note: This will auto-download from HuggingFace on first use if you don't place it manually.


How to Use

ComfyUI

  1. Add the "Load Qwen3.5 CLIP (Anima)" node (found under loaders → Anima)

  2. Select qwen35_4b.safetensors from the dropdown

  3. Connect the CLIP output to a CLIPTextEncode node

  4. Use with your Anima 2B checkpoint as normal

Forge Neo

  1. Load an Anima 2B checkpoint

  2. Make sure qwen_3_06b_base.safetensors is in the top VAE/Text Encoder dropdown

  3. In the generation tab, expand "Qwen3.5 Text Encoder (Anima)" and enable it

  4. Select qwen35_4b.safetensors in the extension's Model File dropdown

  5. Generate as normal — the extension intercepts text encoding automatically

use_alignment:      ON
alignment_strength: 0.5
use_calibration:    OFF
output_scale:       1.0

That's it. Generate some images and compare against the stock 0.6B.


Tuning Guide

What the settings actually do (plain English):

use_alignment — Rotates the 4B's internal "compass" so that when it says "from the side" or "looking up", it points in the same direction the diffusion model expects. Without this, the 4B understands your prompt fine — it just communicates it in a way the diffusion model misreads.

alignment_strength (0.0 – 1.0) — The rotation (direction fix) is always on when alignment is enabled. This slider controls how much the magnitude shifts to match the 0.6B:

  • 0.0 = Directions fixed, but keep the 4B's own signal strength

  • 0.5 = Halfway blend ← start here

  • 1.0 = Fully match the 0.6B's signal strength

use_calibration — A finer-grained magnitude adjustment (per dimension instead of uniform). Can help, can also over-correct. Try it on and off and compare.

output_scale — A simple multiplier on the final output. Leave at 1.0 unless you know what you're doing.

  1. Generate with alignment OFF first — see what the raw 4B gives you. The text understanding will be better, but poses/viewpoints may be off.

  2. Turn alignment ON, set strength to 0.5 — generate the same prompts again. You should see better pose/viewpoint adherence while keeping the 4B's improved understanding.

  3. Adjust strength — bump it up if spatial stuff is still off, pull it back if quality degrades.

  4. Optionally enable calibration — compare on/off, keep whichever looks better for your use case.


FAQ

Q: Do I need both calibration AND alignment files? A: The alignment file (rotation_matrix.safetensors) is the most important one. Calibration is optional and supplementary. You can use alignment without calibration.

Q: Will this work with any Anima 2B checkpoint? A: Yes — any checkpoint built on Anima 2B that uses the standard text encoder pipeline.

Q: Does this need extra Python packages? A: For ComfyUI — no, everything ships with ComfyUI already. For Forge Neo — transformers and safetensors install automatically on first launch.

Q: How much extra VRAM does this use? A: The 4B encoder weights are FP8 quantized, so roughly ~4GB for the text encoder. The stock 0.6B is under 1GB. Your total VRAM usage depends on your diffusion model + VAE + this.

Q: Why not just scale the output by 10× instead of all this alignment stuff? A: Uniform scaling fixes the magnitude but not the directions. The 4B encodes "from the side" as a vector pointing in a completely different direction than the 0.6B. The rotation matrix fixes that. Scaling alone would be like shouting the wrong directions louder.

Q: Is this better than the stock 0.6B? A: For text understanding — yes, meaningfully. For raw out-of-the-box image quality — it depends on your alignment settings and prompts. The 0.6B has the advantage of being exactly what the model was trained against. The 4B has the advantage of actually understanding complex prompts. With alignment at 0.5, most users see comparable or better results, especially on detailed prompts where the 0.6B falls short.

Q: Can I use this with img2img? A: Yes — works for both txt2img and img2img on both ComfyUI and Forge Neo.

Q: Why does Forge Neo still need the 0.6B model loaded? A: The Anima pipeline uses a small LLM adapter that lives on the 0.6B model. This adapter converts text embeddings into the format the diffusion model expects. The 4B provides the text understanding, but the adapter (on the 0.6B) still handles the final conversion. Both models are needed.


Credits