Article Content (Markdown-friendly):
This is a simple, 100% offline image captioning tool I built to generate clean, high-quality captions for large image folders — perfect for preparing datasets for LoRA, DreamBooth, Flux fine-tunes, or any Stable Diffusion training.
No uploading to cloud services, no API keys, no rate limits. Everything runs locally on your machine using your NVIDIA GPU (tested on RTX 3060 12GB).
Why I Made This
Online captioners often require sharing your images (privacy issue)
Florence-2 setups kept hitting dependency hell (flash_attn, einops, timm, device_map errors, etc.)
I wanted something dead-simple: double-click → paste folder → get .txt files + ZIP
So I switched to Salesforce/blip-image-captioning-large — it's rock-solid, installs easily, gives natural captions, and runs fast on consumer GPUs.
Features
Gradio web UI (localhost only)
Auto-finds free port (7860–7870 range) if 7860 is busy
Optional trigger word (added at start/end/none) — great for consistent LoRA tags like my_style, tok, 1girl
Processes whole folders in batch
Outputs .txt files (same name as image) + ZIP archive
Saves everything in the tool's folder (portable)
Requirements
Windows (10/11)
NVIDIA GPU with CUDA support (8GB+ VRAM recommended, 3060/4060/4070/etc. work great)
Python 3.8+ installed (from python.org — add to PATH during install)
~5–10 GB free space for first run (model + packages)
How to Use (Super Easy for You or Your Team)
Download/clone the tool from GitHub: https://github.com/Plasmaphantom/ImgCaptionerLOCAL
Double-click start_captioner.bat
First time: creates venv + downloads ~1.5 GB (takes 5–15 min)
Later runs: starts in seconds
Browser opens
Paste your images folder path (e.g. C:\dataset\my_photos)
(Optional) Enter trigger word (e.g. my_character)
Choose trigger position (Start/End/None)
Click "Process All Images"
Wait → captions appear in the UI
.txt files + ZIP created in the same folder as the .bat file
Example: image001.jpg → image001.txt with caption like: my_character beautiful woman in fantasy armor standing on mountain
Speed: ~1–3 seconds per image on RTX 3060 (faster with smaller batches or lower res images).
Screenshot of the Interface
[Insert screenshot here – run the tool, capture the Gradio page with a folder path entered, and upload via Civitai editor]
Code & Customization
Full source (just 2 files):
caption_app.py — the Gradio + BLIP logic
start_captioner.bat — auto venv + install + launch
GitHub repo: https://github.com/Plasmaphantom/ImgCaptionerLOCAL
Want tweaks?
Switch to other models (LLaVA, JoyCaption, etc.)
Add progress bar for huge folders
Force longer/more detailed captions
Let me know in comments or open an issue on GitHub — happy to PR updates!
License
MIT – use/modify/share freely.
Hope this helps someone avoid the Florence-2 pain and get captions fast & private. Drop a like/share if you find it useful! 🚀
Tag suggestions: #tools #workflow #dataset #captioning #lora #training #offline

