This tutorial provides the workflow and helpers for captioning images with gemma3:4b as an alternative to Gemini.
It's very fast and accurate and can do NSFW with a little bit of help.
This tutorial depends on you installing and running ollama: https://ollama.com/download/linux After that you need to run:
ollama pull gemma3:4bThe dataset folder needs to be named 1.png, 2.png, 3.png etc. I provide here a sh script that does this but if that doesnt work for you you need to adjust to have it
Download rename.txt, rename it torename.shand run that in your local dataset dir. It will rename all files 1.png, 2.png etc. IImport the ollama-caption.json and install any missing nodes
Input the path to your dataset dir
Go to
comfy/user/default/was-node-suiteand add the path towhitelist-dirs.listAdd your trigger word to
TextBox1to have it inserted at the start of the captionRestart comfyui and run the workflow -> you will get the 1.txt, 2.txt etc files in your dataset dir
Pro tips:
you can set
image_load_capinLoad image list from dir (inspire)to 1 and check the output in case you need to tweak the captioner system prompt (lower field).i recommend removing watermarks so you dont get them when you generate with lora. i provide a kontext-remove-watermark worfklow that does this if you need it
if you can run
gemma3:12binstead ofgemma3:4bit's more accurate but requires more ram
Enjoy.
