Updated: Jun 21, 2026
base model*Note: This is an highly experimental project.
This model was created with the aim of integrating the Z-Image Turbo and Z-Image Base models. As you know, Z-Image uses completely different models for Turbo and Base. When two such models are released, usually the Base version and a distilled version of it are released, but Z-Image uses models that are fundamentally different. Compatibility is poor. Regarding LoRA, I understand that some degree of interoperability is possible, but model merging is almost impossible.
One difficulty is that the base models are trained in completely different ways, and another is the existence of the refiner. Unlike many other models, the meaning of the DiT layer in Z-Image differs from model to model. In normal models, the DiT is created by a common text encoder, but in Z-Image, this is done via the refiner, and the refiner part changes from model to model. The meaning of the numerical values stored in the DiT changes from model to model. Therefore, the compatibility of the DiT part is low. Incidentally, in the Unet era, I also trained text encoders as models, so it's a similar situation, but in that case, it was only used for cross-attention and didn't have as much impact.

*
What I'm doing
Now, let me explain how Alkahest is trying to solve this. First, I create a merged Turbo and Base model. I create it 1:1 without thinking too much. Naturally, it only produces noise. Next, using that as the base model, I perform Full Test (FT) with the refiner layer fixed. FT itself isn't impossible. You might feel like giving up, thinking that no images will ever appear, but if you keep going, the noise will gradually be reduced.
Alkahest TB-1
Based on the lessons learned from the previous attempt, this model uses the merged result of the Base and De-turbo models as the base model, and performs a full test (FT) using Z-Image output.
Due to dataset limitations, fewer images were used compared to the previous attempt. The FT was performed in four stages: Stage 1 focused on images with a large number of subjects, Stage 2 focused on humans, Stage 3 varied lighting, and Stage 4 for quality correction. While the model eventually produces images not present in the training data, adjustments were challenging; for example, no night scenes appeared in Stage 2, and adding night scenes resulted in only night scenes appearing.
ALKAHEST aims to produce images similar to the Z-Image base model. However, even though Z-Image Turbo and Z-Image Base use the same base model, their images differ considerably. Furthermore, biases in the training data contribute to the unique images produced. Therefore, please understand that the Turbo and Base models will not be perfectly reproduced.
When importing either model, the basic process remains the same:
(Some model - its base model) + ALKAHESTIn the previous version, this part was handled using ComfyUI's standard nodes. This time, a merge node specifically for ALKAHEST has been created. Essentially, it's a node that uses the refiner portion of model1 without destroying it. While that's the only fundamental change, during testing, we frequently encountered cases where the model differences weren't imported correctly. Upon investigation, it turned out the difference from the base model was too small, resulting in almost no impact when added directly. The multiplier setting in MergeModelSubstract determines how many times the difference output is multiplied, so increasing the multiplier would suffice. However, calculating the appropriate multiplier for the model we were testing yielded 171.1x. The maximum multiplier for the standard node is 10.0. While we could have increased the upper limit of the multiplier, it would have made the UI difficult to use, so this time we also set a multiplier on the ModelMergeAdd side. If adding the difference doesn't have much effect, increase the multiplier, and if 10.0 isn't enough, use ModelMergeAdd as well. While it's designed to go up to 100x, practically speaking, 20x seems to be the limit.
The previous version recommended high shift values, but this time, the shift value operates based on the default of 3.
Since there seemed to be plenty of saturation, in many of the examples, the CFG scale is set to 2.0 or higher, and negative prompts are enabled. The workflow includes "Human Deformity," which seems like it would definitely work, but as you can see, unfortunately, it doesn't always work. Also, depending on the model, a CFG scale of 1.0 may be recommended.
The following workflow demonstrates merging between a Turbo model and a Base model.

I created LoRAs for auxiliary purposes, which were created by taking the difference between each base model and ALKAHEST. It can be said that this LoRA supplements the Turbo and Base characteristics under ALKAHEST's management (it may not always produce the expected effect). For creation, ALKAHEST was specified for --base, and each base model was specified for --variant. Additional options used were:
--rank-thresh 0.995 --exclude "refiner"For effective strength, start from 0.1. Of course, similar adjustments can be made between other individual checkpoints.
General LoRA should work for both Turbo and Base models, but the effective setting ranges may differ. Geometry accuracy may also decrease. The aforementioned Turbo/Base LoRA might be helpful.
Another note: Don't worry too much about fine-grained accuracy; when experimenting, I recommend using FP8 models. It reduces stress, and you can switch to BF16 models after your trial and error is complete if necessary.
Summary of Current Status (TB-1)
The difference between the Base model and the Turbo model (compared to the base model) allows you to incorporate the learning content of each. Since it no longer depends on the base model, you can merge the content on ALKAHEST.
This cannot be used with models that have undergone significant changes to their refiners. My recent model falls into this category due to incorrect training settings.
It can handle LoRA from each system. Examples include using Turbo LoRA with a Base model, and vice versa.
The output can be stabilized using the difference LoRA from each base model. For some reason, Turbo models tend to produce anime-style images, while Base models tend to produce realistic images; please use them keeping these characteristics in mind.
TB-0
This is a model that finally produces images. I used about 22,000 images from an anime-style dataset (0.75 series) and then about 15,000 images from a realistic-style dataset (part of 0.79). I've found that the more FTs I run, the less noise there is. While the current stage isn't quite sufficient, I'm testing whether we can handle both Turbo and Base models using this intermediate model.
Verification
I used ComfyUI's merge node set. We will prepare Z-Image Turbo model and Z-Image Base model. Connect the model you want to merge to model1 and the base model to model2 of ModelMergeSubstruct. In the case of Turbo, it is usually a De-turbo model. Next, prepare a ModelMergeAdd node and connect Alkahest to model1 and the difference output from the previous node to model2. Please keep in mind that the model itself that comes out is close to the Base type. Please use Distill LoRA to stabilize the picture. You can use the Turbo type Re-Turbo LoRA, but the Base type Fun Distill is recommended.
There are several models I have tested and used, but the ones mentioned here are rayZimageBaseSFW_artshoot and beretMiXZIT_v40.
To stabilize the image, please use ModelSamplingAuraFlow around 10 to 12.

Now you can import both the differences between the Turbo model and the Base model.
Initially, it was not possible to merge the completed models, but it turned out to be a misunderstanding, and now the merging experiment has been successful.
Conclusion
This experiment was successful in that it allowed for the creation of a bridge model between two models, but it left challenges for further development.
One point of reflection is that creating the base model by merging De-turbo and Base models, rather than Turbo and Base models, would have made operation easier and likely smoother the FT process. A drawback is the need for processing when dealing with the Turbo base model itself, or models that simply add LoRA to the initial base model.
Performing the FT itself on an existing SDXL dataset was also questionable. Using the images output by Turbo and Base models would likely improve accuracy. Since the dataset consisted almost entirely of people, Alkahest showed very little background information. However, it did display some content not present in the dataset, indicating that it was beginning to recall the original model's training data. But there seemed to be too few elements to trigger further learning.
In reality, the Turbo and Base models are not yet unified. I plan to verify whether they can truly be unified in the next experiment.
Conclusion/revised
There were two errors in the previous experiment. The reason is that we did not use the base model as the target for taking the differences, and we drew our conclusions based on the results.
When I was making the sample, I felt it was a bit strange, but after one night I realized that I was doing something completely pointless.
The reworked sample has already been reflected in the article. The big topic is that models with merged differences can now be merged normally. The picture below is the result of merging the Turbo model and Base model that were merged via Alkahest. The workflow listed at the beginning is the content, but in my environment (VRAM 32GB) I could not run it as is, so I saved each model once and then merged it. It seems that the final problem from the previous experiment was resolved, so it seems that there will be no need for the next large-scale experiment that was planned.
The current problem is that Alkahest's model is very different from the standard Z-Image picture. We are currently creating a dataset for the next version, so it is expected to be improved, but since the pictures of Turbo and Base are quite different to begin with, it is expected that the effect will remain on the model after merging.


