Sign In

VIBE: Visual Instruction Based Editor (ComfyUI Node & Workflow)

0

VIBE: Visual Instruction Based Editor (ComfyUI Node & Workflow)

🎨 VIBE for ComfyUI

This is a custom node implementation of VIBE (Visual Instruction Based Editor) for ComfyUI. It allows you to edit images using natural language instructions (e.g., "make it winter", "change the dog to a cat", "remove the background").

The workflow leverages the efficient Sana1.5-1.6B diffusion model and Qwen3-VL-2B-Instruct for fast, high-quality image manipulation directly on your GPU.

📺 Video Demo

See VIBE in action! This video showcases various styles and the incredible speed of the model:

🖼️ Example Workflow

Drag and drop this image into ComfyUI to load the workflow:

vibe-workflow.png

💡 Best Practices & Pro-Tips

To get the best results from VIBE, keep these observations in mind:

1. Resolution Matters!

The VIBE model (based on Sana) is optimized for high resolutions.

  • Recommendation: For best detail preservation and anatomical correctness, use resolutions above 2.5 Megapixels (approx. 1600x1600 or higher).

  • At lower resolutions (e.g., 1024x1024), the model may struggle with fine details or produce "smooth/blurry" textures due to the VAE bottleneck.

2. Avoid "Burn-in" / Degradation

If you are performing iterative editing (passing the image through the node multiple times), you might notice increased saturation or contrast artifacts.

  • Solution: Change the Seed to randomize or increment for every generation.

  • Using a fixed seed repeatedly on the same image tends to accumulate model bias and artifacts. A variable seed distributes the noise differently each time, keeping the image cleaner.

3. Text-to-Image (Experimental)

While VIBE is an editor, it can technically generate images from scratch.

  • How to use: I have left an Empty Latent Image node in the workflow connected to the latent input. If you disconnect the input image, VIBE will generate from pure text.

  • Warning: The quality will be lower than dedicated T2I models (like FLUX or SDXL) because VIBE is not trained for generation from scratch. Treat this as an experimental feature for curious users.


⚡ Easy Installation (via ComfyUI Manager)

  1. Load this workflow into ComfyUI.

  2. Open ComfyUI Manager and click "Install Missing Custom Nodes".

  3. Restart ComfyUI after the installation finishes.

  4. Once reloaded, locate the VIBE Image Editor node (in Step 2) and click the "Check / Download Model" button. This will automatically download the necessary weights.

(Alternatively, you can follow the manual installation process below)


🛠️ Installation

1. Install the Node

Navigate to your ComfyUI/custom_nodes folder and clone the repository:

cd ComfyUI/custom_nodes
git clone https://github.com/ato-zen/ComfyUI-VIBE

2. Install Dependencies

Open your terminal inside the custom_nodes/ComfyUI-VIBE folder and run:

cd ComfyUI-VIBE
pip install -r requirements.txt

📂 Model Setup

The node automatically looks for weights in: ComfyUI/models/vibe/

You need to download the weights manually as they are large.

  • Create the directory:

cd ComfyUI/models
mkdir vibe
cd vibe
  • Clone the weights:
    (Ensure you have git-lfs installed)

git clone https://huggingface.co/iitolstykh/VIBE-Image-Edit

Your folder structure should look like this:

📂 ComfyUI/
└── 📂 models/
    └── 📂 vibe/
         └── 📂 VIBE-Image-Edit/
              ├── model_index.json    
              ├── 📂 scheduler/
              ├── 📂 text_encoder/
              ├── 📂 tokenizer/
              ├── 📂 transformer/
              └── 📂 vae/

💻 Hardware & System Compatibility

VIBE (Sana 1.5) is a cutting-edge model that requires modern hardware features (Flash Attention 2 and Triton kernels) to function.

  • ✅ Full Support: NVIDIA RTX 30xx / 40xx / A-series (Ampere, Ada, Hopper). Best performance and native BF16 support.

  • ⚠️ Partial Support: NVIDIA RTX 20xx. May work but might encounter speed issues or black image (NaN) errors.

  • ❌ Unsupported: NVIDIA GTX 10xx (Pascal) & older. These cards lack hardware support for the required Triton kernels. If you see the error: "GET was unable to find an engine", your GPU is likely too old.

  • ❌ Unsupported: AMD or Apple Silicon (M1/M2/M3). The model is strictly tied to NVIDIA CUDA/Triton.

  • 🌐 OS Support: Linux is highly recommended. Windows users should use WSL2; native Windows support for Triton is currently unofficial and unstable.

VRAM Requirements:

  • Minimum: 12GB VRAM.

  • Recommended: 24GB VRAM (required for high-quality 2K / 1600px+ workflows).


🐞 Known Issues & Support

If you encounter the error GET was unable to find an engine to execute this computation, it is a hardware limitation of older NVIDIA cards.

How to Report a Bug

If you are on a supported GPU and encounter issues, please check your terminal logs and open an issue on GitHub:
👉 Report Issue on GitHub

Please always include your GPU model, Operating System, and the full console error log.

🐞 Report Issues

If you encounter any bugs with the node, please report them on GitHub:
Report Issue Here

0