TL;DR
As of 2026/01/22, this article is a work in progress, and will be updated during the next few days. It is published already mainly so that I can have a working link to it in the image posts where the actual workflow files are.
This article collects links to my set of ComfyUI workflows that I use for generating and working with AI art. In this article, I also share some of my observations and usage tips for each model family.
The workflows are embedded in images published in a series of image posts on my channel, and linked from here. There is one post per model family.
The images are PG to have the content accessible for anyone.
How to use: drop the relevant image into your ComfyUI to load the workflow. Then, in ComfyUI, you can click the "hamburger" ("≡") at the top left corner of the user interface (or the one on the header of the active Comfy tab), and select Save As to save the workflow into your workflows.
NOTE 2026/01/26: The layouts of these workflows were designed for ComfyUI's legacy node system, and have unintended visual overlaps when the Nodes 2.0 option in the GUI is enabled. I might release another version of the workflows for Nodes 2.0 later. Right now, when I tried it, the GUI looked nicer than before, but previews consistently stopped working after a few renders. So, I'll give the ComfyUI authors a bit of time before trying Nodes 2.0 again.
Introduction
As I'm writing this, it's January 2026, and it's been a hectic half a year since summer 2025. New image-generator AI models have appeared at a breakneck pace, and my time available for hobbies has not been able to keep up. I've been diving into new model families and getting a feel as to what works and what doesn't, but not publishing many images. During this time, I have also switched to ComfyUI as my main visual GenAI tool.
ComfyUI is... in short, the visual GenAI community's favorite flight simulator. While those of us into actual aviation simulation may beg to differ, a typical Comfy workflow has at least so many knobs and buttons that I keep looking for the flap lever and the pitch trim, and idly wondering what the ICAO code for today's destination in latent space is.
Seriously though, as ComfyUI is a construction kit rather than a simple tool where you just enter a prompt and press Generate, there has been a lot of workflow building happening behind the scenes. Since I like to run all my image-generator AIs inside one environment, I've been migrating my working habits from SD Forge to ComfyUI. This has meant the need to build txt2img, img2img and inpaint workflows for each model family. While ComfyUI does provide some default workflows, there's always something I want to do differently - e.g. to use GGUF quants, or to have a zoomed-in inpainter like the "Masked only" mode in SD Forge.
So I'm introducing Mathemagic's ComfyUI Workflows, as my attempt to make things as simple as reasonably possible, but no simpler.
This workflow package has certain focus areas:
Txt2img, img2img and inpaint workflows for each model (where it makes sense), like the three main modes of SD Forge.
Optimized for interactive work (as opposed to batches).
KSampler with live preview, so if the composition turns out botched, you can see it immediately, and cancel without waiting for the render to finish.
Obsessive focus on fast renders with decent quality, rather than 5% more quality at the cost of 4× more render time. Generally, as few steps as I can get away with (DPM++ 2M with SGM Uniform helps a lot here), and for Qwen models, loading the Lightning accelerator LoRA.
Autosaving for each finished render, with the filename containing an ISO timestamp (yyyymmddThhmmss, in local time), RNG seed, model name, sampler name, steps, and CFG value.
Simple, clear, visually readable node graph layouts.
Nodes positioned to make it easy to see what connects where.
Links visible.
No unnecessary nodes.
A small amount of extra complexity is allowed to support essential features:
Zoomed-in inpainting (crop-and-stitch), similar to SD Forge.
LoRA. This is mainly to document at which point to insert LoRAs and how.
GGUF loader for anything larger than Illustrious-XL, because VRAM is always a limitation, and GGUF quants (especially the new Unsloth dynamic ones) work well in practice.
In img2img and inpaint workflows, automatic adjustment of the number of steps based on the chosen denoise level, like SD Forge does.
Extensive comments as Markdown note nodes, to make these workflows usable also as tutorials.
A part of this article is actually in those nodes inside the individual workflow files.
As few 3rd party node packages as reasonably possible.
I'm publishing the workflows as embedded in image files, where - for models which have text rendering capability - the text on the sign held by the character says which workflow is embedded in the image. Models sometimes get the text right, and sometimes produce something funny that nevertheless (mostly?) gets the message across.
In the case of Illustrious-XL, which is too small and old to be able to generate text, one just has to guess which is which. Or more easily, to look at the filenames when downloading the images.
I'm including workflows for Illustrious-XL in my set, because even though its SDXL architecture is already showing its age, Illustrious-XL is still unmatched at its sublime digital painting anime style, while it also supports the largest variety of niche topics seen in any anime model so far. In some aspects of character specification (such as hairstyles, and breast size for females), as of 01/2026, Illustrious-XL still beats newer models.
It could be said that in the past year or so, the state of the art, and how to apply it, has changed. Unlike in the SD 1.5, SDXL, PonyXL and Illustrious-XL days, where the latest model was best for everything, now different models are best at different things. It is increasingly useful to keep a copy of your old (but still serviceable) models around, in case they do some things better than recent releases.
What this of course means for the visual GenAI artist is that now there is more need than ever to jump between different checkpoints - and even model families, with wildly different prompting styles - while creating a single still-image artwork. The world would be a simpler place if there was one generalist model to rule them all, but it would have to cover every obscure anime trope (including many for R18 audiences) before there is the slightest chance of superseding the capabilities of the collection of models we already have.
So, proposing a toast for variety, without further ado, let's get into the...
Workflows
The images with the embedded workflows are in the following series of image posts, because articles do not support inline PNG images.
There is one image post per model family:
Qwen, which includes both Qwen-image and Qwen-Image-Edit
TBA - Tools: pose detector
TBA - Tools: SD prompt reader
The individual workflows are linked below, in each section.
When using these, to set values such as denoise exactly, click the field, enter a number on the keyboard, and press enter. If you set e.g. 0.55 denoise, ComfyUI will actually use that value, even if the GUI field shows 0.6 (the actual value rounded to one decimal place) and clicking the right/left arrow buttons changes the value by a larger amount.
As the license for all of the workflows, I choose WTFPL - feel free to do whatever you want with these workflows, including publishing modified versions.
Qwen-Image 2512
As of 01/2026, the Qwen family of models is the go-to choice for complex prompt adherence, if you have the hardware to run a 20B model without losing your sanity. The model is developed by Alibaba, the same company that is behind the Qwen series of LLMs, which have done well as locally hostable thinking models. The model's official GitHub page is here.
The outputs suffer way less from AI greeble than earlier models, but since the principles behind the technology have not changed, the model still does not have an actual understanding of global geometry. If you generate character art, you will still get the occasional fence or shelf whose ends do not meet behind the character. And you will still occasionally get the wrong number of fingers, if you render anything except basic portraits.
(Note that there is also Qwen-Image-Layered, which is a different model that tackles the global geometry issue like a human digital artist would - by drawing the image as a stack of separate layers. I have not tested it myself, but ComfyUI has a default workflow available.)
Text rendering of Qwen is among the best, at least in the sphere of open models. English and Chinese are supported (no Japanese and no European languages with umlauts). The model will still misspell occasionally. Some words (such as "img2img" or "inpaint") it fails to spell correctly at all. Inpainting can be useful for fixing words that the model can spell, but does not get right 100% of the time.
Workflows are here. Direct links:
These workflows are compatible also with the original Qwen-Image (summer 2025, no version number).
Qwen-Image-Edit 2511
Qwen's sister with leet skills in photoshopping. Main open competitor to Google's proprietary Nano Banana.
Edit is especially useful when you already have a character illustration (e.g. rendered by another model), and want to render that character in different clothing, in a different pose, in a different environment, or doing a different activity. It can also generate front / side / back views, which is convenient for compiling character sheets. As this cranks character consistency up to eleven, this should also make it possible (but I haven't tried) to draw comic strips with GenAI, at least one panel at a time.
Furthermore, multiple image inputs (supported since version 2509) allow combining several given characters in the same image, and changing a character's pose based on a DWPose pose image (those colorful stick figures traditionally used with ControlNet).
Multiple inputs also allow transferring given clothing to a given character. The input can be an image of another character wearing that clothing, or a bare clothing image. You can also use Edit to generate a new character wearing given specific clothing. Open GenAI dress-up has arrived!
Workflows are here. Direct links:
Basic edit, with up to 3 input images
Inpaint edit, with a main image to edit and an optional extra reference image for the edit (can be trivially extended to 2 references, I just haven't needed that)
These workflows are compatible also with the older 2509. I'm mentioning this because some people's mileage with 2511 varies.
EDIT 2026/01/26: My color-burning issues with Qwen-Image-Edit-2511 (see the image post with the workflows) were a ComfyUI version issue. Commit 56fa7dbe380cb5591c5542f8aa51ce2fc26beedf from 7 December 2025 had the issue, but commit 7ee77ff038937bdfdbea5d603ad8d4c487c14fd6 from 25 January 2026 works fine.
Z-Image Turbo
As of 01/2026, Z-Image Turbo is another recent model that shows promise. This is 6B, vs. Qwen's 20B, so the model runs faster and uses less VRAM. The model itself is step-distilled (like Flux.1 Schnell), so it doesn't need an accelerator LoRA. It supports 8-step rendering out of the box.
This is another model release by Alibaba, but apparently by a different team. The model's official GitHub page is here. The authors have hinted that base (non-distilled) and edit versions may be upcoming, but as of this writing, those have not been released yet.
In my tests, Z-Image Turbo seems over-focused on portraits. It can be difficult to get a full body shot of a character even when you prompt for it. Mentioning "feet" in the prompt doesn't often help. The model prefers to change the view direction (e.g. looking down to show the feet), or to contort the character's pose (e.g. feet up, knees bent, when sitting on a chair), rather than moving the camera further away.
But when you can get the model to do what you want, the output looks nice, and prompt adherence is almost as good as Qwen's. Text rendering is also good, but specific words may fail. In my tests the model invariably spelled "turbo" as "tubro".
For photorealistic character renders, the word on the street is that Z-Image Turbo generates skin detail better than Qwen.
Also, I'll take the opportunity to point out that considering one of my OCs, Liz the nerdy university student (who has probably become the unofficial mascot for my channel), that out of all imaginable capabilities that an image GenAI could have, Z-Image Turbo is excellent at drawing dental braces, especially with the inpainting workflow. It's also decent at drawing semi-opaque nerdy glasses that are not fully opaque. So if you need such visual tropes for your nerdy OCs, this model can be useful as an inpainter.
Workflows are here. Direct links:
Chroma1-HD
Chroma is essentially a pruned-down (8.9B, down from 12B) and fine-tuned Flux.1 Schnell, which attempts to undo the step distillation, so that it can use CFG higher than 1. Thus, the negative prompt is available. According to the model author, beside this, the main point of the project was to create a Schnell-based checkpoint that's fine-tunable for further training.
The model is rather creative, as in the same prompt can produce many different kinds of images by varying the seed, like SD 1.5 and SDXL. However, like those early models, Chroma will also generate copious amounts of slop. Depending on your use case, you may need to fish for a decent RNG seed for a while.
The word on the street is that Chroma works well for rendering photorealistic images. Its popularity as well as the availability of a negative prompt for something more advanced than SDXL piqued my interest, so I tested Chroma for generating illustration images. I found its capabilities to be hit or miss. Especially if you ask for both anime and scifi in the same prompt, the model seems to know only one style - a full-color pre-production sketch from a "making of" artbook, or perhaps an illustration suitable for a tabletop RPG manual. On top of that, illustrations often turn out like mediocre fanart, no matter which quality tags are in the prompt.
As its prompt format - at least when used for illustrations - Chroma accepts a hodgepodge of natural language and comma-separated booru tags. For example, you can write a paragraph of natural language, then a paragraph of booru tags, and then switch back to natural language for another paragraph.
Chroma being a Flux.1 fine-tune, it has some text rendering capability, but newer models such as Qwen and Z-Image usually do better here. But curiously, when making these workflow images, Chroma was the only model that could spell all of "txt2img", "img2img", "inpaint", and its own name.
Since Chroma is not that great at rendering illustrations, I haven't used the model much, other than to note (after extensive testing) that it's not suitable for my use cases.
Compared to Lightning-accelerated newer models, it's also slow. I like having a negative prompt, but my patience has its limits, lol.
I'm providing the workflows, as they may be useful for other use cases. Txt2img one could do with ComfyUI's default workflow, but a zooming inpainter isn't readily available.
Workflows are here. Direct links:
Illustrious-XL
Old but still serviceable model family from 2024, based on the SDXL architecture. The model is only 3B, so it runs fine at FP16 even with just 8GB of VRAM, completing a 20-step render in a very short time. This model is small enough not to need an accelerator or step distillation.
Illustrious excels at rendering digital painting anime style characters. Unlike many models that associate the word "anime" with flat colors and spiky hair of the 1980s and 1990s, the Illustrious style looks like modern 2D CG anime made in the 2000s or later.
Unlike newer models that essentially use the input side of an LLM as the text encoder, the SDXL architecture uses classical CLIP models. It supports prompt weighting, which both SD Forge and ComfyUI expose with the syntax "(some important term:1.2)". Usually a good range for the weighting is 0.8 ... 1.2, but I've sometimes gone up to 1.6.
Anime SDXL models used to require clip_skip = 2 (empirically, the second-last CLIP layer had embeddings that yielded the best prompt adherence for those models), but with Illustrious I haven't bothered with that, and it works fine.
In SD Forge, FreeU was a nice technology to improve SDXL outputs, and SD Forge Couple gave the ability to render couples to target different regions with different prompts to allow rendering multiple OCs in the same image without prompt leakage between them. I haven't explored if anything similar to the latter two technologies is available for ComfyUI or not. Nowadays, I mostly render with newer models, where default quality is fine, and prompt leakage is (at least almost) a solved problem.
Illustrious remains particularly useful for creating characters to use as inputs for Qwen-Image-Edit, particularly with character designs that Qwen simply doesn't understand. These include at least women with very short hair (a pixie cut) and/or small breasts (when also "anime" is mentioned in the prompt). But if the character needs a shirt with text, then you'll additionally need to use a newer model (Qwen, Z-Image, or maybe Flux) to separately add the text in inpainting.
The Illustrious workflows do not use an accelerated or step-distilled model. Thus, the negative prompt is available. This is especially great for concepts that don't have a positive tag. An example is "twintails" but not "low twintails" - a "high twintails" tag does not exist. I would also love to be able, in newer models, to specify for some OCs that the hairstyle should have "bangs", but not "sidelocks". No such luck - but in Illustrious, the negative prompt allows you to do exactly that.
Unlike some newer models here, which prefer natural language prompts, Illustrious is an oldschool anime model whose native prompt format is a comma-separated list of booru tags. You can try natural language (at least for describing things that don't have a tag), but you may get better prompt adherence with a list of tags where possible.
Also, Illustrious gives you better quality gens if you end the positive prompt with "newest, masterpiece, best quality", and the negative prompt with "sketch, monochrome, oldest, worst quality". If your particular checkpoint recommends something else, use that instead.
The model is small by 2026 standards, so one should not expect too much out of it in terms of prompt adherence for very complex prompts. But what it can do, it does well.
If you haven't used older models, you'll find a lot of example prompts on this site by searching for Illustrious.
Workflows here. Direct links:
These workflows are compatible also with any model that uses the SDXL architecture. These include base SDXL finetunes, as well as PonyXL models. But note that if you use LoRAs, they are specific to each of the three types (IL, Pony, XL).
Tools
Background remover
This is useful for extracting a character from an image that has a background.
InSPyReNet is a highly accurate, fully automatic background remover. At least in my experience, this model often produces more accurate results than e.g. isnet or u2net (which are offered e.g. by the rembg extension for SD Forge).
This is a really simple workflow, using this ComfyUI node. This workflow is mainly published for completeness, as well as to raise awareness for this excellent neural background remover, and for the ComfyUI node that allows using it in Comfy. I'm not affiliated with either of these.
The workflow is here.
Example input image, created with Qwen-Image-2512, with inpainting.
Foreground mask. This image contains the workflow.
Result. RGBA image, with the mask in the alpha channel. This image also contains the workflow.
How to manually polish the mask
Most often, the model Just Works. But occasionally, you may need to open the image and mask in a photo editor (such as GIMP or Photoshop), and tweak the mask manually.
How to do this in GIMP:
Open the original image (with character and background).
Add a layer mask to the image. (Right-click the image in the Layers panel to do this.)
Open the mask image produced by InSPyReNet. Select all, copy.
Go back to your original image. Click on the layer mask in the Layers panel to tell GIMP you'll be drawing to the mask (not to the image itself).
Paste. Select none (Ctrl+Shift+A). Now the layer mask should be your mask image.
Edit the mask image with your leet 'shopping skills.
Right-click the layer in the Layers panel, convert the layer mask to alpha channel.
Export PNG.
Pose detector
TBA
This is especially useful with Qwen-Image-Edit, which accepts these pose images natively as image inputs.
https://github.com/Fannovel16/comfyui_controlnet_aux
SD prompt reader
TBA
This is really just a minimal example, to raise awareness of this node package for those users who don't yet know about it.
There's a standalone app called SD Prompt Reader, which can dig out metadata from images generated by SD Forge. The SD prompt reader node package brings this functionality into ComfyUI.
The package also includes a Prompt Saver node, which saves a copy of your metadata in A1111 format (into the image file) so that CivitAI autodetects it when you upload the image. I haven't used this in my workflows, though.
https://github.com/receyuki/comfyui-prompt-reader-node
Meta
Not the company. In the original sense: about this article.
Which models are not covered (yet)
Qwen-Image-Layered.
As mentioned in the Qwen-Image section.
This is to my knowledge the first model that is ~guaranteed to produce sensible background geometry, every time.
Could supersede Qwen-Image (the non-edit variant) if the output quality is similar.
OTOH, Alibaba seems to go for a breadth-first approach to developing AI models, so this could be a one-off, with no guarantees of ever being developed further. Qwen-Image and especially Qwen-Image-Edit seem to be the mainline image models that get the majority of the lab's development resources.
WAN.
Flux.
I think that mainly, Qwen and Z-Image have eaten Flux.1's lunch by now. Which is fair enough - Flux.1 was among the first models that went beyond SDXL quality.
Flux.2 is an option, but unlike the previous two mentioned above, there is no clear use case where it obviously wins over what I already have.
Criteria for choosing models
Aspects that I evaluate models for:
Capabilities, relative to other models I already have.
Concepts to build OCs and scenes with. E.g. does it know how a grand piano looks like? How the interior of a scifi starship could look like? What twintails are? Can the model use a basic hairstyle as a template, and customize it if I prompt for different details?
Quality and style of illustration output. Anime, western cartoon, comics, ...
Ability to combine unrelated concepts. E.g. a retired lighthouse being launched as a rocket.
Ability to render complex scenes with a single prompt. E.g. a dimensional portal to another world, embedded into a scene that may be from a different genre (e.g. cyberpunk vs. high fantasy).
Ability to render multiple characters with a single prompt (for multiple OCs in the same image).
Object interaction, e.g. characters holding items or doing various activities.
Text rendering, e.g. for clothing or signs.
ROI: time investment in the new model as a user, versus the new capabilities it gives.
With each model, one has to learn model-specific, non-transferable skills: how to prompt it effectively, what it does well, and what it struggles at.
New models, with new capabilities, appear so fast that it only makes sense to invest time in such skills if they are likely to remain relevant six months from now.
Hence I prioritize major model families that are likely to stay in the game.
LoRA ecosystem, both in general, and for niche R18 topics.
I don't have the time to fine-tune models myself. Exploring inference to get the models to do what I want is already a major rabbit hole.
Even with capability improvements, LoRAs are still needed, and will likely always be needed. There are always specific concepts and styles that any given generalist model won't be able to render out of the box.
Misalignment between corporate/lab interests and user interests, but also the simple fact that the space of all visual ideas is large.
An example is how different models treat the keyword "anime". For Z-Image it means 1980s/1990s, while for Illustrious, 2000s - and no amount of prompting can make those models draw in each others' styles.
Model capabilities I don't evaluate for:
Ability to render existing characters from various IPs. For me, this capability is not directly useful, except for random Anna/Elsa memes.
Video rendering. Too slow for interactive work.
Photorealistic output. Not my style.
Full list of ComfyUI node packages used by these workflows
Be careful when installing ComfyUI nodes from the internet. ComfyUI is so popular that malware authors treat it as an exciting new attack vector.
Here and in the embedded workflows, each node package link goes to the node author's original GitHub repository.
For the main workflows, there are currently five dependencies in total:
GGUF loader: https://github.com/city96/ComfyUI-GGUF
Alternative GGUF node package, including a GGUF VAE loader: https://github.com/calcuis/gguf
Aspect ratio selector: https://github.com/budihartono/comfyui-aspect-ratio-presets
KSampler with live preview: https://github.com/jags111/efficiency-nodes-comfyui
Zoomed-in inpainting: https://github.com/lquesada/ComfyUI-Inpaint-CropAndStitch
The tools depend on one node package each:
Background remover: https://github.com/john-mnz/ComfyUI-Inspyrenet-Rembg
Pose detector: https://github.com/Fannovel16/comfyui_controlnet_aux
SD prompt reader: https://github.com/receyuki/comfyui-prompt-reader-node
About the attachment
The attached file presets.py is ComfyUI/custom_nodes/comfyui-aspect-ratio-presets/presets.py, to set up the available ARs and sizes for the aspect ratio / image size selector node.
Back up your original file first in case you want to restore it, then paste this one over it.
Or look through this one, and copy the presets you deem useful into yours.
Although it's technically a Python module, it's really just a plain-text configuration file.


