Sign In

JoyCaption Standalone 1.0 GGUF Caption Generator by Sarcastic TOFU ( Supports MacOS, Linux & Windows.. including offline run)

Download

1 variant available

Archive Other

176.19 MB

Verified:

Type

Other

Stats

225

Reviews

Published

Feb 21, 2026

Base Model

Flux.2 Klein 4B

Hash

AutoV2
74FE592C7B

This is a simple, portable, standalone & multiplatform Python GUI application for generating high-quality natural language captions for images using the JoyCaption Beta One model (LLaVA-based, GGUF format). This is perfect for preparing training dataset for SD 1.5, SDXL 1.0, MagicWAN Image v2, QWEN, HunyuanImage-2.1, HiDream, KREA, Chroma, Z-Image Turbo, Z-Image Base, Flux.2 Klein and Flux.1 fully local, no internet required after initial setup to use this. As the main application is written in PyQT6 you can easily run this on Mac, Linux & Windows. I wrote this in such a way that even if you don't have powerful GPU but have a decent processor and RAM combo or good amount of Unified Memory & storage on Mac you can still use it with CPU. (I don't use a Windows machine so couldn't provide you a batch script for Windows but if you upload this readme along with any of the run script, one for Mac or one for Linux, to ChatGPT or GROK and ask to create similar run batch script for Windows it can provide you one). As the python code and shell scripts are clearly Open Source you can surely modify this app to fully utilize a powerful AMD / Nvidia GPU or High end Mac Graphics Cores (if you have one) if you have proper coding skills. Feel free to do so and share your good work!

Features

--------

• This application uses very compact GGUF version of JoyCaption, you have two model options:

- Q4_K_M (~4.6 GB) – fast, good quality (default)

- Q8_0 (~8 GB) – highest quality, slower

• Required Vision Projector (mmproj ~0.82–0.88 GB)

• Caption styles:

- Flux Natural (Detailed) ← most popular

- Flux Natural (Brief)

- SDXL / SD Tags (comma-separated prompt style)

• Content filtering modes:

- PG Mode (no sexual content)

- Vulgar/Blunt/NSFW Mode

• Optional trigger word support (e.g. ohwx, masterpiece – before or after caption)

• Image scaling options (optional):

- Do not scale (original folder + .txt)

- Scale to 512px or 1024px (short side) → saved in output/XXX_scaled/

• Single image or batch folder processing

• Clean dark terminal-style log window

• Progress bar + stop button

• macOS & Linux launch scripts (double-click friendly)

Folder Structure

----------------

JoyCaption_Portable_v1.0/

├── run_linux.sh ← double-click to launch (Linux)

├── run_mac.sh ← double-click to launch (macOS)

├── JoyCaption_Portable.py ← main application

├── requirements.txt

├── .venv/ ← auto-created virtual environment

├── models/ ← place GGUF files here

│ ├── Llama-Joycaption-Beta-One-Hf-Llava-Q4_K.gguf

│ ├── Llama-Joycaption-Beta-One-Hf-Llava-Q8_0.gguf

│ └── llama-joycaption-beta-one-llava-mmproj-model-f16.gguf (required!)

└── output/ ← created automatically when scaling

├── 512_scaled/

└── 1024_scaled/

Quick Start

-----------

1. Download models (only once either using the app itself or manually if you prefer)

From: https://huggingface.co/concedo/llama-joycaption-beta-one-hf-llava-mmproj-gguf/tree/main

Recommended files:

File Size Recommended

----------------------------------------------------------------------------

Llama-Joycaption-Beta-One-Hf-Llava-Q4_K.gguf ~4.6 GB Yes (fast)

Llama-Joycaption-Beta-One-Hf-Llava-Q8_0.gguf ~8 GB Best quality

llama-joycaption-beta-one-llava-mmproj-model-f16.gguf ~0.88 GB REQUIRED

Place all files into the models/ folder.

** Note can download your models within this tool itself on first run

2. Install dependencies (only first time)

Linux / macOS:

bash run_linux.sh # or run_mac.sh

The script will:

- create .venv if missing

- install requirements

- launch the GUI

3. Run the app

• Double-click run_mac.sh (macOS) or run_linux.sh (Linux)

• Or manually: source .venv/bin/activate && python JoyCaption_Portable.py

4. Usage

- Select model (Q4 is fastest)

- Choose style (Flux Natural Detailed usually best)

- Optional: PG/Vulgar mode, trigger word

- Click Single Image or Batch Folder

- Captions saved as .txt files next to images (or in output/ if scaling)

Model Download Links (direct links)

-------------------------------------

Q4_K_M: https://huggingface.co/concedo/llama-joycaption-beta-one-hf-llava-mmproj-gguf/resolve/main/Llama-Joycaption-Beta-One-Hf-Llava-Q4_K.gguf

Q8_0: https://huggingface.co/concedo/llama-joycaption-beta-one-hf-llava-mmproj-gguf/resolve/main/Llama-Joycaption-Beta-One-Hf-Llava-Q8_0.gguf

mmproj: https://huggingface.co/concedo/llama-joycaption-beta-one-hf-llava-mmproj-gguf/resolve/main/llama-joycaption-beta-one-llava-mmproj-model-f16.gguf

Tips

----

• Q4_K_M is 2–3× faster with only minor quality drop — use it for most work

• Flux Natural (Detailed) gives the most natural-looking training captions

• Avoid watermarks/logos in images (prompt already forbids mentioning them)

• Trigger words like ohwx, masterpiece, best quality help with conditioning

• Scaled folders make it easy to build 512×512 or 1024×1024 datasets, but you can opt for using existing folder

Credits

-------

• JoyCaption Beta One model by fancyfeast / community

• GGUF conversions by concedo (KoboldCpp developer)

Enjoy captioning!