Sign In

A Powerfull 6GB Caption Model (QWEN3-4B)

2

A Powerfull 6GB Caption Model (QWEN3-4B)

QWEN3-4B-VL Caption GUI

  • This model has been finetuned with image and text pairs at 1024px and shows high affinity with limited hallucination on NSFW task.

  • Full FP32 Training with blockwise conversion to FP8 (Key Blocks in FP32)

  • This model has limited video caption ability.

This model has been heavily tested and excels at most NSFW and SFW captioning task.

2