ver.0.6112 Anime ZiBase
This is the final version FT tested on the ver. 0.6 dataset. The colors are a little lighter than 0.68, but this is partly due to a dataset issue. It's not necessarily better than 0.66 or 0.68, but I think it's easier to produce a more stable image.
Distll LoRA and output settings are surprisingly important. For stable operation, I recommend using both the 4-step version and ver. 1 version at 0.8 and outputting at 5 steps. The image remains almost the same even at 4 steps, so I recommend outputting at 4 steps and increasing the step only for the parts that look good. If you're prioritizing atmosphere, using ver. 1 as the main output and outputting at 8 steps or more will produce interesting results. Faces and other features will change slightly, however.
This was my first FT with Z-Image, and while I discovered various issues along the way, I was a little surprised that the image turned out surprisingly well. I'm also slightly changing my Z-Image captioning strategy, and next time I'll try completely different settings.
ver.0.68 Anime ZiBase Finetuned
Due to file upload problems, the BF16 version of ver.0.68 has been uploaded as the BF16 Full version of ver.0.66.
ver.0.66 Anime ZiBase Finetuned
This is the new FT version. It's more stable than the first one. There are two versions of BF16, but I tried uploading it as a trial and was able to upload it, so I will also be releasing the next version at the same time (it is labeled as the full version). The next epoch is being created while I'm uploading it, but I can't seem to do anything about it. . .
It is not recommended to use it as is. It takes over 30 steps. Please use 4Step LoRA. For this case, I think it's okay to do it in 4 steps (if the details are rough in the FP8 version, use 8 steps).
You can use it as is, but since it does not display only anime pictures, it is recommended to add anime to the prompt. The output of my pictures has gradually become more stable, but compositions and poses also tend to be uniform. In such cases, we recommend using the 4-step version of LoRA at strength 0.5 and trying 6-step execution. If you do this, the colors will fade, the details will become rough, and the body structure will often become questionable, but it seems that making the generation more unstable reflects the learning contents more strongly, so there is a higher possibility that you will get a rough but interesting picture. Keep the Shift value low (3 or less). Normally, 4-step execution with 4-step LoRA:1.0 is OK, but please try it when you want more variety. Most of the samples this time are from 6-step execution of 0.5 (that's why the colors are whitish).

LoRA: 1.0 can produce images with a sufficient amount of information, but 0.5 is closer to the model's intent, as it has an excessive amount of information. Please be careful as lowering the Shift value will result in a more complex shadow being cast.
For problems with washed out colors and low contrast, please refer to the sample workflow using color adjustment nodes on ComfyUI attached to an image of a girl drinking tea.
ver.0.6 Anime Finetuned (test)
This is the FT version using the ver.0.6 dataset.
To be honest, based on the sample images, I was expecting a proper image to be released around the middle of next week, but I found some images that looked surprisingly usable even at low epochs, so I decided to release them quickly. Currently, uploading to CIVITAI is extremely heavy, so only the FP8 version is available.
The samples are outputted in 4 steps using 4-step LoRA in the BF16 version, but for the FP8 version, please use around 8 steps using 4-step LoRA. As I wrote before, using 8 steps in the 4-step version is recommended over the 8-step version.
The anime images are not yet stable, so I recommend adding "anime" to the prompt. However, since there is a tendency for the animation images displayed to be those stored in Z-Image rather than those that have been learned, I found that a simple "[[anime]]" or "[[[anime]]]" to lightly indicate that the image is anime picture would be better. This affects shading as well as faces.

Tip: This is only available in Forge Neo, but specifying a value for an undefined word affects the entire image (e.g., image:0.5). Be careful when trying out a prompt imported from somewhere else. You can also use this, such as image:0.9 or image:1.1. Values below 1 result in lighter colors and increased sharpness. Values above 1 result in darker colors and a blurrier image.
Now that I'm finally able to display an image using Z-Image FT, the overall quality will likely improve, but overwriting Z-Image concepts isn't necessarily the best approach. I think intermediate versions have a certain meaning. Adding [[anime]] allows you to fully confirm the learning effect. Currently, a specific area of weakness is NSFW content.
However, there's no way to know if the FT model wasn't usable without conversion...
ver.0.8 Anime (ZiBase)
This is an anime model that was trained using ver.0.5 as a base model.
All samples use Distill LoRA.
Sampler: euler/euler a, Scheduler: simple, step: 7, CFG scale: 1.0, Shift: 16
When not using Distill LoRA, set Step 30 or higher and CFG scale to about 4.5.
This time, I mainly use the 4-step version of LoRA with 7-step output. If you use the normal 8-step version, the composition will look a little strange. 8-Steps-2602_UDCAI Edit version does not work well even if I use LCM. However, although it is a completely wrong usage, using both 4step version and 8-Steps-2602_UDCAI Edit version with 1.0 seemed to be the best choice for this model.
If the steps are too low, the quality of the picture itself will drop, so try changing the number of steps depending on your purpose.
ver.0.6 Anime (ZiBase)
Although it includes a small amount of version 0.5, most of it is a model made with new materials. It supports more anime-like and delicate designs. There are areas where physical stability has improved and areas where it has become unstable compared to 0.5. This model is at its best at low shift values, but the body structure becomes more fragile.
This model focuses on producing a variety of pictures rather than producing stable pictures. However, there are also unstable aspects.
If you want to produce stable images, it is better to set ModelSamplingAuraFlow quite high (about 20) and CFG scale high. It may be better to fix the denoise value to 1.0 as well.
If you think that Z-Image or Qwen-Image images are boring, you may occasionally get better images if you try lowering the Shift value.
If you enter an existing prompt and get a shabby picture, it's probably filled with stupid parentheses, so delete them all. If noise appears in the picture with ComfyUI, please update ComfyUI.
The colors seem to be a little paler and brighter on this model. You can adjust it using CFG scale, but if you're concerned about the image changing, try adjusting it with comfyui-latent-vae-tweaker. This tool was originally designed for SD1.5/SDXL, and 16-bit formats such as Flux.1 and Z-Image cannot be edited at the latent stage, but they can be edited at the image stage. Please refer to the workflow for the added images.
*I've made comfyui-latent-vae-tweaker compatible with 16ch latent format. I don't think it's something that an average person can control, but just for reference.
ver.0.5 Anime (ZiBase)
This is a test model based on a new anime dataset. Captioning and other features were trained using conventional methods.
While the training resolution was only 1280, the image quality was clearly improved from ver.0.4, so it seems that lowering the accuracy of the previous training test was not effective after all. It did use less memory, though.
This time, the base model used was a 16-step run of Z-Image Base, with a small amount of Z-Image Turbo layers mixed in. The basic settings for generating the sample images are: Sampler: DDIM, Scheduler: simple, Steps: 16, CFG: 8.0, Denoise: 0.85, ModelSamplingAuraFlow: 8.0.
I have uploaded three types of files: BF16, FP8, and GGUF Q6_K.
Sorry, something wrong. Fixed.
IF YOU USE GGUF FILE, PLEASE REPLACE loader.py IN THE ComfyGGUF FOLDER.
Due to convenience, it has been uploaded as training data.
The included lcpp.patch and convert.py are not necessary for ordinary people. Please use only those who understand. I also intend to make it compatible with flux.2, but its operation has not been verified.
・more speed?
Although this model uses its own speed-up method to reduce the number of steps, it is not very fast. Several high-speed LoRA versions have been announced for Z-Image Base.
Z-Image-Fun-Lora-Distill : doesn't work
Z-Image (base) Distilled Lora | Extracted: works but looks plastic
So add "anime" to prompt.

The picture will change, but it may be useful in some cases.

ver.0.4 SemiReal (ZiTurbo)
This version uses the exact same materials as the 0.4 SR Base version. This version has a more semi-realistic look. The base model is a mix of Z-Image Turbo and a small amount of Z-Image Base. This may improve the ability to express details a little.
The sample images make extensive use of high CFG settings. CFG 3 is recommended, but this may fail depending on the sampler and scheduler. At the same time, the image quality is controlled by the ModelSamplingAuraFlow shift value (and denoise value). Please refer to the sample and try various options.
ver.0.4 SemiReal (ZiBase)
This model was created as a test run using Z-Image Base.
I was aiming for a semi-realistic model, but the resulting model produced a variety of different images. The captioning is outdated, the resolution is only 1280 (which results in moiré), and the calculation accuracy was minimal.
It can be used in the same way as a regular ZiB model, but I recommend setting the CFG scale higher (around 6).
I created the GGUF version in 6 bits, but since CIVITAI only supports 16/8/4, I uploaded it as 8 bits for convenience.
ver.0.3 Anime
This is a new version of the animation version. Full fine-tuning has been difficult, so this version also uses LoRA. The amount of data has been increased by more than three times. I think the body stability has improved compared to the previous version, but it may still not be sufficient.
Since a wide range of images was used for training this time, it is easier to produce photographic images than the previous version. Adding "anime" to the prompt is effective. However, please note that specifying a number such as "anime:2" will cause the image to break down.
ver.0.2 Real
This is a realistic version of LucidDreamer. I'm having difficulty adjusting the prototype, so I've decided to release it as a test version. I'm still getting used to using DiT, but with Z-Image, body structures tend to be more distorted than with other models using DiT. As a result, combining LoRA results in a greater loss of quality. Since finetune isn't working properly, this can't be helped for the time being. This time, I created three models by applying the three types of LoRA I prepared to create one ckpt at slightly weaker rates. These were materials for finetune, but it seems that using each alone produces better results than using multiple LoRAs.
In terms of datasets, 1 and 2 are completely different, while 3 is a subset of 2. 1 is packed with miscellaneous elements and turned out more illustrative than I expected. 2 has a relatively large number of Western women (especially NSFW) and a fair amount of data, but you may not see any particularly cute girls. 3 is an Asian-style model extracted from 1. Please note that if you leave it unspecified, it often results in a white background.
This series has been adjusted to obtain a wide variety of output with a single prompt, so if no image is specified, the seed will result in larger changes in the image and pose.
In terms of dataset type, it is in the same series as the anime version 0.2, so it is called ver.0.2R.
ver.0.2 Anime
I increased the number of training images. I think stability is improved compared to ver.0.0 and 0.1.
Because it was trained with a wide variety of images, the image style is not stable.
If you don't specify anything in the prompt, the image tends to have a cool(?) style. Sometimes it will look more like a photo. By adding "anime," it will stabilize to an anime style. Adjust to your liking.
By the way, my fine-tuning failed. After a long time, all I got was noise...
There are still issues with the body structure and prompt tracking, but I don't think the other models are much different...
ver.0.1
I believe Z-Image is expected to be a replacement for SDXL. In ver. 0.0, LoRA was applied quite heavily in that direction. It was adjusted to produce a variety of images similar to those in SDXL. On the other hand, it was a bit of a stretch, and it had significant limitations on body structure, etc.
So, I gave up on trying to create anime-style images using the model alone, and created ver. 0.1, which assumes anime:2 prompts.
So, please add "anime" or "anime:2" to the prompt. Tags like "masterpiece" are not very effective. Sometimes they are effective, but more often than not it's better to remove them.
I've also added two more materials. The body structure still looks strange, but since there seem to be many elements missing from the basic ZiT model, I'll have to add them little by little. Please wait for testing with the finetuning version.
ver. 0.1 produces sharper results than 0.0. The image's body stability has also been improved (though it's still not perfect).
ver.0.0
This is my first model created with Z-Image Turbo. It may be a bit difficult to handle. I created an anime-style model, but the body structure is unstable.
I haven't set a trigger word, but adding "anime" makes it relatively stable. If it's not enough, try adding "anime:2."

