Sign In

simple, 100% offline image captioning tool

1

Jan 20, 2026

(Updated: 19 days ago)

tool guide
simple, 100% offline image captioning tool

Article Content (Markdown-friendly):

This is a simple, 100% offline image captioning tool I built to generate clean, high-quality captions for large image folders — perfect for preparing datasets for LoRA, DreamBooth, Flux fine-tunes, or any Stable Diffusion training.

No uploading to cloud services, no API keys, no rate limits. Everything runs locally on your machine using your NVIDIA GPU (tested on RTX 3060 12GB).

Why I Made This

  • Online captioners often require sharing your images (privacy issue)

  • Florence-2 setups kept hitting dependency hell (flash_attn, einops, timm, device_map errors, etc.)

  • I wanted something dead-simple: double-click → paste folder → get .txt files + ZIP

So I switched to Salesforce/blip-image-captioning-large — it's rock-solid, installs easily, gives natural captions, and runs fast on consumer GPUs.

Features

  • Gradio web UI (localhost only)

  • Auto-finds free port (7860–7870 range) if 7860 is busy

  • Optional trigger word (added at start/end/none) — great for consistent LoRA tags like my_style, tok, 1girl

  • Processes whole folders in batch

  • Outputs .txt files (same name as image) + ZIP archive

  • Saves everything in the tool's folder (portable)

Requirements

  • Windows (10/11)

  • NVIDIA GPU with CUDA support (8GB+ VRAM recommended, 3060/4060/4070/etc. work great)

  • Python 3.8+ installed (from python.org — add to PATH during install)

  • ~5–10 GB free space for first run (model + packages)

How to Use (Super Easy for You or Your Team)

  1. Download/clone the tool from GitHub: https://github.com/Plasmaphantom/ImgCaptionerLOCAL

  2. Double-click start_captioner.bat

    • First time: creates venv + downloads ~1.5 GB (takes 5–15 min)

    • Later runs: starts in seconds

  3. Browser opens

    • Paste your images folder path (e.g. C:\dataset\my_photos)

    • (Optional) Enter trigger word (e.g. my_character)

    • Choose trigger position (Start/End/None)

    • Click "Process All Images"

  4. Wait → captions appear in the UI

    • .txt files + ZIP created in the same folder as the .bat file

    • Example: image001.jpg → image001.txt with caption like: my_character beautiful woman in fantasy armor standing on mountain

Speed: ~1–3 seconds per image on RTX 3060 (faster with smaller batches or lower res images).

Screenshot of the Interface

[Insert screenshot here – run the tool, capture the Gradio page with a folder path entered, and upload via Civitai editor]

Code & Customization

Full source (just 2 files):

  • caption_app.py — the Gradio + BLIP logic

  • start_captioner.bat — auto venv + install + launch

GitHub repo: https://github.com/Plasmaphantom/ImgCaptionerLOCAL

Want tweaks?

  • Switch to other models (LLaVA, JoyCaption, etc.)

  • Add progress bar for huge folders

  • Force longer/more detailed captions

Let me know in comments or open an issue on GitHub — happy to PR updates!

License

MIT – use/modify/share freely.

Hope this helps someone avoid the Florence-2 pain and get captions fast & private. Drop a like/share if you find it useful! 🚀

Tag suggestions: #tools #workflow #dataset #captioning #lora #training #offline

1