QWEN3-4B-VL Caption GUI
This model has been finetuned with image and text pairs at 1024px and shows high affinity with limited hallucination on NSFW task.
Full FP32 Training with blockwise conversion to FP8 (Key Blocks in FP32)
This model has limited video caption ability.
This model has been heavily tested and excels at most NSFW and SFW captioning task.


