Lightweight Image-to-Image Workflow for Z Image Turbo (No ControlNet, WIP)

Hi again,

I wanted to share another simple Z Image Turbo workflow. This one does not use ControlNet, but can be used to achieve similar goals in a lighter, more minimal way.

Unlike the inpainting workflow I posted previously, this setup includes a solid way to control the input and output image size, which makes it much more flexible for composition and consistency testing.

This is still not ControlNet — it’s more of a lightweight alternative approach I’ve been experimenting with. It may require some tuning (I’ve been getting the best results with denoise between ~0.75 and 0.90). I consider this more of a WIP, but it is functional and has been giving me interesting results. Going lower than .75 will keep it really close to the original image, and allowing minor updates via prompting. Using .75 and up will stray further away from the initial image and more closer to the prompt (for things like what I am showing below, from cartoon to image).

Workflow Example (I have also tested this with ZiT LORA's and confirm they work with this workflow):

Also like before if you prefer to just drag and drop an image to create the workflow here is an output image to use to bring up the workflow:

I’ll attach the workflow for download, but like the previous post, I’m also including how to manually build it starting from the default ZiT Text to Image template.

Nodes to Add

Starting from the basic ZiT Text to Image workflow, add these three nodes:

Load Image
ImageScaleToMaxDimension (for controlling input/output size)
VAE Encode
Repeat Latent Btch

Connections (Updated Workflow Instructions)

From the original workflow

Load VAE → VAE Encode
- Connect VAE → VAE

From the newly added nodes

Load Image → ImageScaleToMaxDimension
- Image → Image
ImageScaleToMaxDimension → VAE Encode
- Image → Pixels
VAE Encode → Repeat Latent Batch
- Latent → Samples
Repeat Latent Batch → KSampler
- Latent → latent_image

Notes

This approach allows you to control the working resolution without relying on a ControlNet.
Best results so far have been with denoise roughly between 0.75 and 0.90. You can go lower as well, but the lower you go the more stricter image adherence is. .75 seems to be the mid-point where, once you start going higher, it starts adhering to the prompt more.
This is still experimental, but functional, and useful if you want some “control-like” behavior without adding heavier systems.

Edit: I realized I uploaded this workflow with a Lora set from testing. I set it back to the typical Lora that came with the Z Image Turbo. The attached workflow was also slightly updated to include the ability to batch files before the KSampler. Attached is the new file.