Future plan of Kohaku-series Models

I guess some of you guys are always wondering when will be the next update for kohaku xl (or kohaku vX), also wondering if I will train models for Stable-Cascade or SD3.

And here is basically what I will do in the future:

Kohaku XL

Kohaku XL delta is training with 3.6M images and it will be the last main version of Kohaku SDXL. I may update it with tiny improvements but I think kxl-epsilon will not come true.

But, if SD3 use same vae as SDXL, and the "flow matching" they mentioned is used as upscaler[1]

I may update new model with lower latent resolution which is more friendly for me...

Kohaku SD3

SD3 have different scale ranging from 0.8B~8B, maybe with different text encoder arch. (we assume it is T5, but not sure the scale)

Since finetuning 8B may require lot of computing resource (vram is not a big problem here. since actually you can use lokr to train your base model. I'm using lokr to train kxl delta too), I my choose a second/third large model to train. I assume it is 2~4B, which should be more friendly for me and user.

If you want to use 8B anime model, maybe you can wait pony or animagine. They have way more computing resource than me XD.

But this is not the final decision, based on some sponsor of computing resource, I may be able to train the larger scale.

But anyway, definitely I will make Kohaku SD3.

Kohaku SD1

SD1 have larger community and use it for refine the style still a good idea in my opinion. I may casually update it with slight improvements. For me, the Kohaku SD1 models should be used as style refiner or use it with controlnet. And I don't think I have anything else to add into the model. It have the style I want.

Stable Cascade

Sadly, I will not train Kohaku SC. I don't like its arch and the scale is weird for me. (3.58B stageC + 0.69B TE + 1.56B stageB)
Its weight is basically ruined so finetuning on it is not stable, also, you cannot run FP16 on it natively.

So finally, I only did this FP16 fix version as my little effort for the SC community:
KBlueLeaf/Stable-Cascade-FP16-fixed · Hugging Face

Conclusion

SD1: Basically no updates

SDXL: No major updates after KXL-delta

SD3: will focus on 2~4B scale, if I can get sponsor for computing resource. 8B still in the TODO list.

SC: NO

Reference

[1] [2312.07360] Boosting Latent Diffusion with Flow Matching (arxiv.org )

Appendix

I will make few more articles to explain training base model with lycoris, if you are interested in it or have some direct question, leave comment below.

Thumbnail and Attachments are 50% progress result of Kohaku XL delta. Which is trained from a merge of gamma rev2 and beta7, trained with LoKr (factor 2~8). The main purpose is to show you can use lycoris to finetune base model. (Actually, Kohaku V3/V4/V5 are all trained with lycoris)

Samples of Kohaku XL delta (50% progress)

These images are generated with resolution slightly higher than 1024x1024 (still it is trained in 1024x1024) and then hires fix to 3072*1728 (or 1728*3072)
It is quite interesting that almost all my model shows better performance with higher resolution than it is trained.