Sign In

Z-Image Turbo Generation Guide

34

Z-Image Turbo Generation Guide

Now that I've had some time to experiment and test Z-Image Turbo, I wanted to share some tips or a guide of the sorts. I've noticed a lot of variability in the results from this model. You can see it just by browsing Civitai too. I've come to realize a few things are to blame for quality degradation.

It's a Turbo Model; 8 Steps

First things first. This is a turbo model, so it's designed to generate images in 8 steps. Leave the cfg scale at 1 and steps at 8. You could change that to 9 or 10 steps, but you usually won't get anything meaningful from doing so, you're better off trying a different seed if you aren't happy with the output.

Resolution Matters!

Second, the resolution matters - a lot. You want to keep your dimensions divisible by 32. This is one of the biggest things that leads to degraded quality. I created a size selector custom node which you can find here: https://civitai.com/models/2221791/hackafterdark-comfyui-custom-nodes for ComfyUI (or search for AfterDark in the ComfyUI manager to install it). The sizes include common film/photography aspect ratios, but they all work for Z-Image Turbo.

The native resolution for this model is, like many others, 1024x1024. This doesn't mean you can't have a dimension greater than 1024, but if you multiply 1024x1024 you'll end up with 1,048,576 pixels. The resolutions you use should not exceed this many pixels. So for example, if you wanted to make an image that would work well for a 32:9 monitor wallpaper, you would use 1920x512 (which is 983,040 pixels ... and this divides evenly by 32 by the way). You could of course take the final image and upscale it if you were looking for a 5120x1440 image for desktop wallpaper or something (or higher if you have a 4k 32:9 monitor).

Your images will be significantly sharper if you stick to resolutions that divide evenly by 32. That isn't to say you can't get by otherwise, but results will vary and you may reach for some sort of detail enhancer or post-processing to deal with the imperfections that were introduced.

Fortunately ComfyUI's image size nodes should keep your values even and divisible by 32. If you click the arrows in the size nodes it should increment by 16. Again though, if you multiple width*height and you end up with more pixels, you might want to consider reducing the size.

Careful With LoRAs (and using too many)

Many LoRAs unfortunately aren't trained very well...Because it's a bit tricky at the moment and things are new. People are still figuring out how to caption things, all the fine settings to configure during training, dealing with adapters because we don't have the non-distilled model, etc. So it's to be expected.

As interesting sounding as some of the LoRAs are, if they take away from quality, you'll just have to wait until they get better. Sometimes you can simply reduce their strength and in general, you probably always want to reduce the model strength of the LoRAs you use - even well trained ones. I rarely use a model strength about 0.80 or 0.90 with any LoRA. I often find the sweet spot to be around 0.65 to 0.75 for a well trained LoRA. If the LoRA isn't well trained, but I really like its effect/style, I will try to keep it in use by decreasing its strength to 0.25 to 0.35 or so.

If you use multiple LoRAs, you generally want to keep the combined strength under 1.0. I rarely use more than two LoRAs at a time and quite often just use one.

This will likely change in the future with the non-turbo version of the model, when cfg scale can be adjusted and more steps can be used.

Obviously if you use a LoRA or prompt for a style that includes "soft focus" or something like "lomography" then you are naturally going to have something with imperfections. Do remember that imperfections can also make an image more believable or real looking.

Samplers & Schedulers

Most images you see end up using euler or euler ancestral sampler with normal or beta scheduler. This combo works, but definitely explore others. There isn't a huge variety of combinations that work with Z-Image Turbo so it can be a good long trial and error process - but the different combinations can make a dramatic difference. In fact, certain combinations could even result in faster image generation times (or slower).

I very rarely use euler for Z-Image anymore. Though ddim/beta is going to produce very similar results.

My current favorite combinations are (ranked);

  1. seeds_3 / beta (many schedulers work well)

  2. ddim / kl_optimal

  3. ddim / beta

Other noteworthy combinations to try;

  • dpm_2_ancestral / sgm_uniform (or ddim_uniform)

  • res_multistep / beta

  • dpmpp_2m / beta

I generally find Z-Image generated images to be a bit bright and lack contrast. This is why I created my AfterDark LoRA. However, even just by using ddim / kl_optimal, you can get a darker image with more contrast. The sampler/scheduler combination alone still isn't quite enough for my tastes though.

I almost always use my LoRA when generating images because it still enhances the image quality particularly with regard to lighting, contrast, skin tones, and separation of foreground subject with the background. It simply makes things "pop" and almost everything I run with it ends up looking better. Not by some major degree of course, it's not overpowered, but I find that little extra to help a lot.

Prompting Matters!

Last part of the guide here and I'll leave you with a link to a prompting guide that I found. First, the prompting guide can be found here: https://gist.github.com/illuminatianon/c42f8e57f1e3ebf037dd58043da9de32

I'll call out some highlights from it. First, the model doesn't use a negative prompt, so you can simply omit that from your workflow. Having it in the pipeline doesn't do anything. However, it's important to note that Z-Image will do something if you don't tell it that it cannot. Wait, isn't that a negative prompt?! Well yea, kinda, but it doesn't go into the ComfyUI text encoder/clip as a negative prompt. You want to add those exclusion rules to the end of your prompt.

For example, you can add "no nudity" to the end of your prompt. What you put into the "positive" here also matters, if you want a photo realistic image, then add that in your prompt. You could say "no illustration" but you could also simply state at the beginning of your prompt that the image should be a photograph.

Z-Image, like Flux, wants detailed prompts, not tag style like Pony or SDXL.

Follow the following format/template for your prompts (from the guide): [Shot & subject] + [Age & appearance] + [Clothing & modesty] + [Environment/background] + [Lighting] + [Mood] + [Style/medium] + [Technical notes] + [Safety/cleanup constraints]

You also want to pay attention to the length of the prompt, keep it to 80-250 words.

I took the prompt guide and created a system prompt to help me generate prompts using AI. I use Google Gemini 3 for this. I can either use text to describe what I'm after or I can also feed it an image and ask Gemini to create a Z-Image Turbo prompt for me. Since it has the context of that guide, it does a fantastic job of doing so.

Again, be sure to check out my ComfyUI custom nodes as one of them includes a Gemini node with a preset for a Z-Image Turbo Prompt generator system prompt all ready to go. All you need is to put a Gemini API key in your ComfyUI settings and then open one of the included workflows.

If you happen to be a programmer and have VS Code, I started off using the Roo Code extension and created my own custom mode where I pasted that guide in. This provided a more interactive chat interface (similar to a custom GPT if you use OpenAI's ChatGPT - which I'm sure you can also set up) to help me build prompts. My ComfyUI custom node is not an interactive chat, but it will from time to time ask clarifying questions which you'll need to append to your previous prompt to run again (since you'll have a new request/context).

Detailed Comparisons

Fair warning: This is the part of the article that's lengthy. If you simply believe everything I wrote above, great! You can go try it out. If you want to see examples, read on!

There's certainly more nuances, but if you follow the above, I promise you'll get much better results. Your images will be a lot sharper and more detailed. Remember to experiment!

Also remember you can upscale using other software. Either other ComfyUI workflows or tools like those from Topaz Labs. I have been enjoying their Bloom tool lately as it will upscale in latent space so you can let it dream a bit. You can control its creativity as it goes to upscale. You can also simply use their Gigapixel tool and have less invention there during the upscale process.

First I'll show you an example of what I mean about contrast and how I enhance it with my LoRA. In the image below, both were generated using the DDIM sampler with kl_optimal scheduler. The prompt is:

4ft3rd4rk A hyper-realistic close-up portrait of a young adult woman with fair skin and wispy blonde hair, wearing an elaborate metallic masquerade mask featuring intricate gold filigree, oxidized silver details, and embedded turquoise and white gemstones. She is dressed in a vintage dark corset-style bodice with beige lace trim and layered metallic necklaces, fully clothed fashion photography. The environment is a dark, blurred neutral background that emphasizes the subject. Lighting is cinematic and soft, highlighting the texture of the metal mask and realistic skin pores, with catchlights in her piercing blue eyes. Style is 8k high-resolution photography, shot on 85mm lens, macro details, high contrast textures. No text, no watermark, no logos, no distortion, sharp focus, correct anatomy, no noise, high fidelity.

It includes phrases like "the environment is dark" and "lighting is cinematic" and "high contrast textures" for example. Other good words in your prompt for dramatic lighting may include "low-key lighting" or "rim lighting" or "spotlight" even. If you include words like "haze" or "aura" or "fog" that tends to brighten things. Sometimes "soft light" does as well. So if you're going for a darker more fashion photography style, pay close attention to the words about lighting in your prompt.

example_afterdark_lora.jpg

If it wasn't obvious, the bottom image is the one using my AfterDark LoRA, with a model strength of 0.80. Let's zoom in a little bit here so I can point something else out.

example_afterdark_lora_closeup.jpg

On the right is the image with my LoRA in use. Hopefully this is close up enough for you to see the difference in both skin tone and texture. I've often noticed Z-Image Turbo's faces as having a bit of peach fuzz. On the other hand, it's very easy to end up with skin texture that is less realistic. My LoRA can result in this if applied too strong, since it was trained from a lot of fashion photography, you end up with make-up heavy or an airbrushed style. Not always, but sometimes and you can kinda see it here. I would say this is quite reasonable though and if you wanted to balance that out - decrease the strength of the LoRA. Pretty simple adjustment. The other thing you can do is use some post processing to add film grain (if there wasn't already film grain in the generated image).

The skin tone here is another thing to focus on. I do find Z-Image Turbo by default to sometimes have skin tones that aren't quite right. The lighting is also often somewhat flat as you can clearly see here. There's far more shadow on the subject's face when using the LoRA.

Again, ddim with kl_optimal or dpm_2_ancestral with sgm_uniform will help here. I find these sampler/scheduler combos to be consistently better than most others when it comes to contrast. It will help with separation between foreground and background even if you are not using something like my LoRA.

Do watch out for LoRAs that are trained on overly soft images or images generated with AI. You could find yourself introducing the "Flux face" problem back into your images. To be frank, I really dislike the Mid Journey style. It's incredibly plastic looking and comes off fake. So if you look for LoRAs that emulate that style, hey, you might like that style...but realistic it is not. I can spot those a mile away. Way too smooth and destructive on textures like skin, lips, and eye detail.

Speaking of detail, one area that I find weak with Z-Image is hair. While we had "Flux face" with Flux, I feel like we might have "Z-Hair" with Z-Image. It often comes out too smooth and not realistic looking. Sometimes my LoRA helps here by virtue of adding contrast or shadows, but it still doesn't complete fix the issue. I usually find a different seed helps. So there's a randomness to it.

I'll try to steer clear of LoRAs for the remaining examples though. There are many LoRAs out there that I adore, but they end up completely destroying images at high strengths. The final word from me on LoRAs is to simply use them at a strength lower than 1.0. Often times between 0.5 and 0.75. Give it a try and compare, you'll see what I'm talking about.

Below is an example of an image size that was too large. Two things happened.

example_too_large.jpg

This image was sized at 2816x1536. It was beyond the native 1024x1024 resolution by a good bit. I resized it after it was generated since we didn't need to look at it full size here.

The first thing to note is the artifacts introduced. If you look toward the right, you'll see it started to introduce what looks like another subject's hair. You can also see weird artifacts on the right side floating in the air. This is actually pretty common for Flux or most other models too by the way.

The second thing that's harder to see without really zooming in (and this image probably isn't the best to begin with to see it) is that the image isn't as sharp. It's a bit hidden due to the lighting and soft focus nature of the photographic style, but it's more blurry than it should be.

So again, pay close attention to your image size. If you're after a certain aspect ratio, think about sizing down to reach that aspect ratio rather than increasing one of the dimensions slightly to the point it exceeds the native 1024x1024. You can always upscale afterwards.

The prompt for the image above is:

Close-up portrait of a pale redheaded woman at golden hour, warm sunlight glowing through her curly auburn hair, soft freckles, natural makeup, bokeh background, cinematic look

It's pretty short. So I fed it to my custom Gemini node and asked it to enhance the prompt. It wanted some details about clothing, environment, and age. So I added those in a second prompt/run. Here's my final prompt to Gemini through ComfyUI:

enhance the following prompt for z-image turbo: Close-up portrait of a pale redheaded woman at golden hour, warm sunlight glowing through her curly auburn hair, soft freckles, natural makeup, bokeh background, cinematic look.

---
To answer some of your questions.
Clothing: knit sweater
Age: Young adult in her 20's
Background context: wheat field

Gemini's response (note how it doesn't recall our first conversation because that's not how my node works):

Since you have already provided the specific details regarding clothing, age, and background context, I can skip the interview phase and move straight to constructing the optimized prompt for Z-Image Turbo.

I have integrated your inputs into the **S3-DiT Scaffold**, ensuring that the specific lighting effects (backlighting/glow) are emphasized and that all exclusions are handled within the text string.

Here is your detailed Z-Image Turbo prompt:

```text
A close-up realistic photographic portrait of a young adult woman in her 20s with pale skin and voluminous curly auburn hair. She is wearing a modest, textured beige knit sweater, fully clothed with a comfortable fit. The setting is an outdoor wheat field, but the background is rendered as a soft, creamy bokeh of golden stalks. The lighting is distinct golden hour sun, positioned behind the subject to create a warm rim light and a glowing effect through her hair, highlighting soft freckles and natural makeup. Cinematic style, shot on an 85mm portrait lens, high definition, sharp focus on the eyes, highly detailed skin texture. No text, no watermark, no logos, no nudity, no blurred features, no distortion, correct anatomy, clean image.
```

### Breakdown of the Construction:

*   **Subject:** Defined as "Young adult woman in her 20s" to ensure maturity and anatomical correctness.
*   **Appearance:** "Voluminous" added to the hair description to catch the light better; "natural makeup" retained.
*   **Clothing:** "Modest, textured beige knit sweater" provides tactile detail for the model to render, avoiding generic smooth surfaces.
*   **Lighting:** Specific instruction on "rim light" and "backlighting" to achieve the "sunlight glowing through hair" effect you requested.
*   **Exclusions:** Standard safety and quality exclusions (no text, no blur, correct anatomy) appended to the end to prevent artifacts.

The prompt in this case is what's inside the "text" part of the AI's response. If you prompt Gemini "return only the final text so I can use it as a ComfyUI node input" or something like that, it should remove the excess info and give you something you could directly connect to the text encoder input in a ComfyUI workflow.

Anyway, the result?

z-image_00218_.png

As you can see, this is significantly better. The prompt was better and the dimensions were better. While the 1408x768 that I used still totals more than 1024x1024, it's much closer. Remember, the seed and sampler were the same. The more detailed prompt and the resolution improved the image quality by a lot. Here's the image from the original prompt at the same 1408x768 resolution (which we know doesn't result in artifacts or duplicate subjects) for comparison.

You'll notice it's not as vibrant and there's less contrast. This is because the enhanced prompt included more detail about the lighting, despite both mentioning "golden hour."

z-image_00217_.png

There another difference here - one is more close up. This is because Gemini was following the Z-Image Turbo prompt guide's best practices around prompt format and asked me for more info. So what if I remove the "She is wearing a modest, textured beige knit sweater, fully clothed with a comfortable fit." part? Can we get closer to the original image?

Yes, you can but remember if you don't tell Z-Image Turbo about something it can make its own decisions. Sometimes this resulted the subject not wearing clothing at all (though since it was cropped, you could argue they were wearing something strapless I suppose - the image was still PG rated of course, but that may not always be the case).

Here's the image without that bit of detail about the clothing.

z-image_00221_.png

This is perhaps a better comparison with the above (original) prompt. Still more detailed. This new image above is using euler_ancestral (same as the original image with less contrast directly above this image). Remember ddim / kl_optimal? Here's what that looks like:

z-image_00222_.png

Again, no LoRAs are being used here. If you compare the last two images, the colors are a lot more saturated and there's more contrast. The only difference being the sampler/scheduler. So if you find your images looking a bit flat and lacking contrast you can try a different sampler/scheduler combo. I would say this is perhaps too saturated at this point. You can adjust that in post processing or you could try a different seed or different sampler/scheduler combo. To bottom line it - experiment! Don't simply use euler all the time.

If you've made it this far, congrats! That was a very thorough comparison, but it still actually only scratches the surface because there's so many variables at play. Hopefully it was enough to illustrate the importance of your prompt and the settings.

34