Tagging always was a chore and even with WD14 or BLIP it always took a lot of manual editing to get it right. With GPT-4-Vision it got a lot easier and because I haven't seen a UI around a wrote a small Gradio wrapper for the API.
Features
Prompt Engineering: Customize the prompt for image description to get the most accurate and relevant captions.
Batch Processing: Ability to process an entire folder of images with customized pre and post prompts.
Installation
Clone repository
git clone https://github.com/42lux/CaptainCaption
Install requirements
pip install -r requirements.txt
Screenshot
Usage
Setting Up API Key: Enter your OpenAI API key in the provided textbox.
Uploading Images: In the "Prompt Engineering" tab, upload the image for which you need a caption.
Customizing the Prompt: Customize the prompt, detail level, and max tokens according to your requirements.
Generating Captions: Click on "Generate Caption" to receive the image description.
Batch Processing: In the "GPT4-Vision Tagging" tab, you can process an entire folder of images. Set the folder path, prompt details, and the number of workers for processing.
Running the Application
Run the script and navigate to the provided URL (Standard http://127.0.0.1:7860) by Gradio to access the interface.
Limitations and Considerations
The accuracy of captions depends on the quality of the uploaded images and the clarity of the provided prompts.
The OpenAI API is rate-limited, so consider this when processing large batches of images.
Internet connectivity is required for API communication.