simple, 100% offline image captioning tool

Article Content (Markdown-friendly):

This is a simple, 100% offline image captioning tool I built to generate clean, high-quality captions for large image folders — perfect for preparing datasets for LoRA, DreamBooth, Flux fine-tunes, or any Stable Diffusion training.

No uploading to cloud services, no API keys, no rate limits. Everything runs locally on your machine using your NVIDIA GPU (tested on RTX 3060 12GB).

Why I Made This

Online captioners often require sharing your images (privacy issue)
Florence-2 setups kept hitting dependency hell (flash_attn, einops, timm, device_map errors, etc.)
I wanted something dead-simple: double-click → paste folder → get .txt files + ZIP

So I switched to Salesforce/blip-image-captioning-large — it's rock-solid, installs easily, gives natural captions, and runs fast on consumer GPUs.

Features

Gradio web UI (localhost only)
Auto-finds free port (7860–7870 range) if 7860 is busy
Optional trigger word (added at start/end/none) — great for consistent LoRA tags like my_style, tok, 1girl
Processes whole folders in batch
Outputs .txt files (same name as image) + ZIP archive
Saves everything in the tool's folder (portable)

Requirements

Windows (10/11)
NVIDIA GPU with CUDA support (8GB+ VRAM recommended, 3060/4060/4070/etc. work great)
Python 3.8+ installed (from python.org — add to PATH during install)
~5–10 GB free space for first run (model + packages)

How to Use (Super Easy for You or Your Team)

Download/clone the tool from GitHub: https://github.com/Plasmaphantom/ImgCaptionerLOCAL
Double-click start_captioner.bat
- First time: creates venv + downloads ~1.5 GB (takes 5–15 min)
- Later runs: starts in seconds
Browser opens
- Paste your images folder path (e.g. C:\dataset\my_photos)
- (Optional) Enter trigger word (e.g. my_character)
- Choose trigger position (Start/End/None)
- Click "Process All Images"
Wait → captions appear in the UI
- .txt files + ZIP created in the same folder as the .bat file
- Example: image001.jpg → image001.txt with caption like: my_character beautiful woman in fantasy armor standing on mountain

Speed: ~1–3 seconds per image on RTX 3060 (faster with smaller batches or lower res images).

Screenshot of the Interface

[Insert screenshot here – run the tool, capture the Gradio page with a folder path entered, and upload via Civitai editor]

Code & Customization

Full source (just 2 files):

caption_app.py — the Gradio + BLIP logic
start_captioner.bat — auto venv + install + launch

GitHub repo: https://github.com/Plasmaphantom/ImgCaptionerLOCAL

Want tweaks?

Switch to other models (LLaVA, JoyCaption, etc.)
Add progress bar for huge folders
Force longer/more detailed captions

Let me know in comments or open an issue on GitHub — happy to PR updates!

License

MIT – use/modify/share freely.

Hope this helps someone avoid the Florence-2 pain and get captions fast & private. Drop a like/share if you find it useful! 🚀

Tag suggestions: #tools #workflow #dataset #captioning #lora #training #offline