home models images videos posts articles challenges events updates shop

Qwen-Image on ComfyUI

Mathemagic

Loading Images

qwen

comfyui

humor

The pictures say it all.

If Easy Diffusion is a digital pocket camera, and SD Forge is a DSLR, then ComfyUI is an electronics construction kit to build your own camera with.

The second picture is my impression of the name.

Notes

Qwen-Image, with homebrew ComfyUI workflows based on various examples from the internet.

I actually used city96's Q4_K_M quant, but that bit depth has not been uploaded to CivitAI, so I picked the closest one. At Q4_K_M and above, there isn't much difference in the outputs. I got the tensors from HF.

More Qwen images (pun intended) to come. I have a bunch of renders, but I need to find the time to polish them into a publishable state.

ComfyUI: changing my render backend

The model was released in early August 2025. From what I could tell, the model seemed promising, and since Forge hasn't added support at least yet, I decided I'd embark on learning the flight simulator in the room - ComfyUI.

Turns out I needed three node packages:

ComfyUI-GGUF - use GGUF quantized unets and text encoders (to run on less VRAM - this one is hungry)
ComfyUI-Inpaint-CropAndStitch - build an inpaint mode like Forge's "Masked only"
efficiency-nodes-comfyui - a ksampler with a live preview

These plus ComfyUI's built-in nodes can run a quantized Qwen-Image comfortably.

The live preview is extra important for me here, because this model is so massive (20B parameters) that it's slow. On any model that takes more than a few seconds per image, it helps a lot if you can see what the model is drawing, and stop obvious failed renders while still in progress.

More on my workflow (in the traditional sense of the word) to come, once I find the time to write it down. In brief, so far I've setup txt2img, img2img, and inpaint workflows (in the ComfyUI sense of the word), plus a few more based on the separate Qwen-Image-Edit. The Edit variant is like instruct-pix2pix was in the SD 1.5 days, but smarter.

I also replicated the first three ComfyUI workflows for Illustrious, so that I won't need to switch between two apps in case I need to fix a detail that Illustrious does better. I still need to do the same for Flux.

The Qwen-Image model itself

I concur with much of the internet in saying that Qwen-Image is impressive.

Qwen-Image-Edit deserves its own post. The multi-input mode (image stitching) is particularly interesting there, as well as its potential as an inpainter. But I'll concentrate just on Qwen-Image for now.

Prompt adherence goes to eleven, as long as the concepts aren't too niche or too NSFW. The model is Apache-licensed, so community finetunes can probably eventually help extend its knowledge - but since it's been just a few weeks, the base model is all we have at the moment.

It seems the upgrade to a modern VLM (vision-language model) as the text encoder did wonders. Given how fast LLMs and VLMs have developed during the last few years, it helped a lot to simply upgrade away from an ancient architecture (T5). To be fair though, many modern LLMs are decoder-only models, which to my understanding makes them not directly applicable as text encoders for diffusion models. It's nice that the Qwen team had a modern option here (Qwen2.5 VL 7B) that can double in this role.

The text rendering capabilities seem nice, with great prompt adherence here too, but I haven't had the opportunity to play around with this feature much yet.

Image quality straight out of txt2img is next level. Qwen-Image tends to render geometrically sensible backgrounds that, if you're rendering humans, continue seamlessly behind your character. I think I've seen the infamous "PonyXL lighting bug" just once with this model, out of hundreds of renders. Mostly, if there's a light/shadow terminator on the wall, and it suddenly jumps, there is a reason in the image for that.

AI greeble is still present, but there is much less of it. This is important, because it directly cuts down on the human work needed to fix a promising image into an acceptable polished state.

Qwen-Image's idea of anime style looks acceptable. Definitely stronger than base Flux, but not as exquisite as Illustrious.

A negative prompt is available. The model card suggests a default CFG scale of 4.0. This is nice, because one of the issues with Flux (especially Schnell) was that it was impossible to get rid of the forced blush on anime characters' faces. Here, you can negative-prompt it out.

That said, some negatives work better than others. For an SFW model, when drawing in anime style, Qwen-Image seems to like large-breasted women and revealing (PG-13) outfits a bit too much. Training data, but yeah. It's hard or impossible to tone these down by prompting - these are some of the particular things the model is very stubborn with. (Well, it's a Qwen - what did I expect? On the LLM side, Qwen 3 2507 tends to push back, too. There this is actually a useful feature, but that's an essay for another time.)

One small failing is that the model does not reliably mix anime and western cartoon stylistic influences in the same character, or at least I haven't yet discovered a prompt that reliably does so. This is something I'd like for my OCs.

This model is rather insensitive to the RNG seed. This can be an issue if you're making a bunch of exploratory renders, as the images in a batch can turn out very similar. I've heard that varying the details of the prompt helps, as does noise injection, but I haven't explored the latter yet. On the other hand, seed insensitivity can be a feature, too, if you explicitly prompt for what you want. This model is a precision tool that (mostly) only does what you specify.

As for a verdict, while Illustrious still wins on character-drawing skill, the strong understanding of background geometry and the excellent prompt adherence of Qwen-Image are something new to play with, at least in open-weight models. In the long run, with the right dataset and community support, this could become the next Illustrious. In the meantime, I think I'll continue exploring the current capabilities for a while.

My experiences installing ComfyUI on Linux

This was initially a bit of a headscratcher, but the official install instructions helped. Reading the README on GitHub, I was a bit worried that ComfyUI would require CUDA 12.8, but it turns out it runs just fine on other 12.x versions, at least on 12.5.

In short, to install:

conda create -n "comfyui" python=3.12
- Use Miniconda, as recommended in the official instructions. This is probably the easiest way to install a venv for a Python version different from your system Python.
conda activate comfyui
- IMPORTANT: This activates your comfyui venv (for the rest of the terminal session), so that any Python packages will be installed there, without polluting or breaking your system Python.
git clone [email protected]:comfyanonymous/ComfyUI.git
nvidia-smi
- This is to check your CUDA version.
pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu125
- Install with matching CUDA version - just change the "cu125" part.
cd ComfyUI
pip install -r requirements.txt

And it's installed.

To run, activate the venv, and then python main.py. This one boots up much faster than SD Forge.

If you want to make a one-command start script, see here how to run conda activate from a bash script.

To update to the latest version:

cd ComfyUI
conda activate comfyui
git pull
pip install -r requirements.txt

Qwen-Image on ComfyUI

Notes

ComfyUI: changing my render backend

The Qwen-Image model itself

My experiences installing ComfyUI on Linux

Comments