home models images videos 3D Models articles comics challenges updates shop

Detection / Image to Prompt

Name: Detection / Image to Prompt
Rating: 5 (9 reviews)
Author: Rvage

127

Updated: Jul 21, 2026

assets

transformers detection llm lyrics yolo

Download

1 variant available

Config Other

i2p_detection_nodes2.json

59.03 KB

Verified: a day ago

Download (59.03 KB)

This checkpoint includes a config file, download and place it along side the checkpoint.

Details

Type

Workflows

Stats

Reviews

Positive

(5)

Published

Jul 21, 2026

Base Model

Other

Hash

AutoV2

2FCB9C95C2

About this version

default creator card background decoration

1.4K

1.3K

Rvage

Joined May 18, 2025

Image-to-Prompt & Smart Detection — Workflow Guide

38 nodes · 7 groups · 16 unique node types — 92.1% Eclipse nodes

35 Eclipse nodes at the top-level
Built with ComfyUI_Eclipse custom nodes

What Is This?

This workflow is a comprehensive, modular playground for Image-to-Prompt (VLM) generation and Smart Detection built entirely with the ComfyUI_Eclipse custom node suite.

It provides an interactive interface for:

Loading single images or batching directories of files (with video frame extraction support).
Interactively selecting frames/images on the fly in the frontend using the Image Selector.
Running object detection, segmentation, and bounding-box labels using the Smart Detection node.
Querying multiple vision-language models (VLMs) and text-only LLMs concurrently using the unified Smart LM Loader (Smart LML).
Generating prompt variations, timeline structures (for video generators like Wan2.1 and LTX), and song lyrics.
Direct interactive chat with custom system prompts and dynamic routing.

The workspace is designed for modularity and wireless routing: groups are self-contained and communicate using Set/Get named channels, meaning you can toggle groups on and off via muting without breaking any visual wiring.

🌒 Unified Smart Loaders & Backends

A key strength of the ComfyUI_Eclipse suite is consolidating fragmented model wrappers and APIs into two primary smart loaders. Instead of installing and configuring a dozen different node packs for Hugging Face, GGUF, Ollama, vLLM, and Docker servers, everything is unified under a standard input interface.

1. Smart Language Model Loader (Smart LML)

Smart LM Loader [Eclipse] serves as the single entrypoint for LLMs, VLMs, and ONNX taggers. It natively handles prompt templates, system instructions, and advanced sampling (temperature, top_p, min_p, Mirostat, repetition penalties).

It supports 8 distinct backends selectable directly from the node:

Transformers (Native local Hugging Face execution): Standard local VLM runs (Florence-2, Qwen2-VL, Pixtral) and text-only models. Supports FP16, BF16, and INT8/INT4 quantization.
GGUF (Local quantized weights via llama-cpp-python): Running large models locally on lower-VRAM consumer GPUs with high quant efficiency (e.g. Q8_0, Q4_K_M).
Ollama (API interface / Local Docker container): Offloading computation to a background Ollama daemon or remote Ollama server.
llama.cpp (Local Docker-based server engine): Standardized containerized CPU/GPU inference.
vLLM (Docker) (High-throughput Docker container): Offloading inference to a vLLM server container for maximum speed and structured batch requests.
vLLM (Native) (Native local Python library): Fast local inference on Linux systems with native vLLM installed.
SGLang (Docker) (High-performance container engine): Rapid structured text generation and blazing-fast decoding.
WD14 Tagger (Local ONNX-based taggers): Running aesthetic classifier models (like SwinV2 or ConvNeXt) to auto-tag anime/general images locally.

2. Smart Detection

Smart Detection [Eclipse] is a unified object detection, text grounding, and segmentation node. It acts as an adapter that unifies two major model families under a single pipeline:

Vision-Language Models (VLMs): Uses Florence-2 or Qwen-VL to run text-grounding queries (e.g., "detect face, hair, blue dress") and converts textual coordinate outputs into binary masks, cropped bounding boxes, and ComfyUI Impact Pack-compatible SEGS.
YOLO Models: Loads YOLO v8, v9, v10, and v11 models for high-speed object detection and precise facial/segment segmentation.

3. Smart Model Loader (Main)

Used in standard rendering workflows (like iGEN ONE) but registered under the same suite. It loads standard Checkpoints, UNets, CLIP text encoders, VAEs, and LoRAs. It contains a template dictionary matching filenames to expected SHA256 hashes and CivitAI AIRs. If a file is missing, it displays an interactive Download button to automatically fetch it directly into your ComfyUI models directory.

Adding Custom Models to the Registry

To add your own custom models to the Smart LM Loader dropdowns, you should edit the local user registry file:

File Path: [registry/user_models.json](file:///mnt/data/AI/custom_nodes/comfyui_eclipse/registry/user_models.json)

Insert your model details under the appropriate backend key (e.g. "transformers", "gguf", "ollama"):

{
  "transformers": {
    "My-Custom-VLM-Name": {
      "repo_id": "username/repo-name",
      "family": "VLM",
      "has_vision": true
    }
  },
  "gguf": {
    "My-Custom-GGUF-Name": {
      "repo_id": "username/repo-name-GGUF",
      "family": "LLM_TEXT",
      "has_vision": false,
      "quantizations": ["Q4_K_M", "Q8_0"],
      "file_pattern": "model-name-{quant}.gguf"
    }
  }
}

IMPORTANT: Model names starting with an underscore (e.g. _example_Phi-4) are treated as examples and are ignored. Ensure your custom entry keys do not have a leading underscore.

How It Works — The Basics

Wireless Routing (Set/Get)

Rather than cluttering the canvas with messy visual noodles, the workflow uses Set/Get nodes:

SetNode: Publishes the loaded image as a named stream REF_IMAGE.
GetNode: Subscribers inside the Detection, Image to Prompt, and Timeline Prompts groups retrieve this reference image wirelessly.

Mode Bridges (Single Upload vs. Directory Batching)

The workflow includes a smart toggling mechanism for image sourcing:

Load Image (Metadata Pipe): For drag-and-dropping individual files from your browser/computer.
Load Batch From Folder: For batching directories of images or decoding frames directly from video files.
Mode Bridge Set & Get: Wireless switches that communicate which loading path is active. When you toggle the active path, the Any Multi-Switch routes the correct image stream to Set_REF_IMAGE automatically.

Group-by-Group Reference

1. Load Image

The entry point of the workflow. Exposes two loading methods:

Single Image: [Load Image (Metadata Pipe) [Eclipse]](file:///mnt/data/AI/custom_nodes/comfyui_eclipse/py/RvImage_LoadImage_Pipe.py).
Batch / Video: [Load Batch From Folder [Eclipse]](file:///mnt/data/AI/custom_nodes/comfyui_eclipse/py/RvImage_LoadBatchFromFolder.py). Supports listing folders or decoding MP4/MKV video files frame-by-frame.
Interactive Filtering: [Image Selector [Eclipse]](file:///mnt/data/AI/custom_nodes/comfyui_eclipse/py/RvImage_Selector.py). When batch loading a folder or decoding video, this node pauses the ComfyUI run and renders an interactive grid of thumbnails. You click the frames you want (supports shift-click range and text filtering), click "Confirm", and it resumes to output only that selected subset.
Routing: Uses two Mode Bridge Get and Mode Bridge Set nodes to govern whether single or batch mode is selected. The active image goes to Set_REF_IMAGE.

2. Detection

Performs visual segmentation and object locating:

Subscribes to REF_IMAGE via a GetNode.
Smart Detection: Uses a model (defaulting to Florence-2) to detect coordinates or segment structures.
Outputs: Renders the isolated region to Preview Image (DOM), shows the binary segmentation mask in Preview Mask, and dumps textual results (like bounding box text or OCR outputs) to Show Text.

3. Image to Prompt

Generates detailed textual descriptions from your image:

Uses two parallel Smart LM Loader nodes loaded with vision-language models (e.g. Qwen2-VL-7B-Instruct or Florence-2).
NOTE: This parallel setup with two separate nodes is for showcase/demonstration purposes only. In a typical production workflow, multiple sequential operations like captioning and tag generation can be executed inside a single loader node using its built-in multi-task system.
Executes tasks like creating detailed descriptive prompts or extracting comma-separated tagging keys.
Outputs prompts to two separate Show Text boxes.

4. Timeline Prompts (Wan / LTX)

A dedicated VLM generator for text-to-video prompt engineering:

Uses Smart LM Loader (VLM mode) to analyze REF_IMAGE.
Generates structured timeline descriptions (e.g. "0s: subject starts sitting; 2s: subject turns head") tailored for video models like Wan2.1 and LTX-Video.
Dumps the structured timelines to a Show Text box.

5. Prompt Variations

An LLM generator to brainstorm prompts:

Uses Smart LM Loader (text-only mode) to take a simple seed prompt (e.g. "a futuristic cyberpunk street") and expand it into 3 creative prompt variations.
Dumps the results to a Show Text box.

6. Song Lyrics

A creative text generation playground:

Uses Smart LM Loader (text-only mode) configured to compose structured song lyrics or poetry based on user-supplied themes.
Dumps the lyrics to a Show Text box.

7. Direct Chat

A general playground to communicate directly with your loaded LLM or VLM:

Exposes three pre-configured system prompts using [String Multiline [Eclipse]](file:///mnt/data/AI/custom_nodes/comfyui_eclipse/py/RvText_Multiline.py) nodes.
Any Multi-Switch: Routes one of the three system prompts to the loader based on your selection (V1, V2, or V3).
Smart LM Loader: Executes the conversation.
Fast Mode Toggle: Lets you customize model options (device, compile, temperature) via simple frontend buttons.
Show Text: Displays the direct chat responses.

Quick Start Guide

TIP: Ensure your Hugging Face, GGUF, or Ollama server credentials are configured in config.json if using Docker-based backends. Local GGUF and Transformers backends run out-of-the-box using models placed in models/llms/ or models/text_encoders/.

How to Get Captions from an Uploaded Image

Drag and drop your image into the Load Image node in the Load Image group.
In the Image to Prompt group, select your preferred VLM from the dropdown on the Smart LM Loader (e.g. Florence-2-large).
Click Queue Prompt.
The generated prompt and tags will appear in the Show Text nodes inside the group.

How to Detect and Mask Objects (e.g., Face/Hair)

Ensure your image is loaded in Load Image.
Locate the Detection group and enter your search query in the detection_prompt widget of the Smart Detection node (e.g. "face, hair").
Click Queue Prompt.
The node will locate the facial/hair boundaries, isolate them, and display the result in the Preview Image (DOM) and the binary mask in the Preview Mask node.

How to Run Batch Image Tagging

In the Load Image group, set the bridge settings to Folder Mode.
Enter the path to your folder in the directory widget of Load Batch From Folder [Eclipse].
Set the Image Selector to bypass if you want to automatically process all images in sequence, or leave it active to manually select files when execution pauses.
Run the queue. The workflow will process each file sequentially through the active groups.

Custom Node Packages Used

ComfyUI_Eclipse — Handles unified smart loaders (Smart LM Loader, Smart Detection), directory list-loading (Load Batch From Folder), visual filtering (Image Selector), wireless routing (Set/Get), and text previews (Show Text, Preview Image (DOM), Preview Mask).

Explore, test, and build beautiful language-guided pipelines. 🌒