Sign In

QWEN3-8B-VL Image/Video Caption (Uncensored)

Updated: Mar 17, 2026

toolllmcaptionuncensoredqwen

Type

Other

Stats

79

0

Reviews

Published

Mar 12, 2026

Base Model

Other

Training

Steps: 2,000
Epochs: 20

Hash

AutoV2
06A5AC3FBF
SDXL Training Contest Participant
Felldude's Avatar

Felldude

QWEN3-8B Image/Video Caption (Uncensored)

Version 2 - This version is highly attuned to NSFW content. However do to image only training it may generate some video captions as image.

  • This version requires 24GB or more of VRAM

  • Full Finetune (NOT A LORA MERGE) of the 8B parameter model (Vision Frozen)

  • BF16/TF32 training unfortunately do to the size of the model Adam8bit needed to be used.

  • Version 2 Can use nearly any LLM prompt - Version 1 should use the prompt given in whole or in part.

  • Details regarding training of version 1 can be read about here.


Note: No image size safety is built in I have captioned 4k images which will be processed to a very large tensor shape - however reduction to 1k images is recommend

I have an Ampere series card and can not convert this to FP8 or NF4 in high quality. If you have experience converting models with Linux and Transformer Engine DM me.