Mirrored from here
The end is here.
of the year that is. And with that, here's the rough roadmap for the first quarter of 2025!
Naturally Ill (Jan 1st-5th): This is a small fine-tune of Illustrious using parts of the Natural Vision dataset, with reworked captions for booru tagging. Think of it more of an experimental model... Illustrious and it's booru tagging system are not great for photorealism. But you can get some creative generations and the character knowledge domain of Illustrious has largely been maintained.
NatViS v3.0 (Mid January - Early February): This model builds off of NatViS; with a refined dataset and inclusion of auxiliary data to adapt more 'interesting' concepts that current photorealistic models struggle with.
Anti-Pony, BETA v2 (February-March 🤞): The intentionally provocatively named SDXL fine-tune for beautiful artwork and anime illustrations. With a focus on character knowledge, artist knowledge (including traditional artists), Booru+ tags/Natural Language/Short n' Punchy prompting, and not being yet another generic Danbooru anime model.
Note: If you downloaded the pre-release beta, firstly - thank you 💖. That version was trained on illustrious as a experiment. The results were, Illustrious suffers the same issues as Pony (I'll do a write up soon on what those issues are and why they matter). So BETA v2 is fine-tuned exclusively on SDXL. But v2 will be limited compared to models like Pony or Illustrious, specifically in the character knowledge domain. This is because fine-tuning SDXL correctly to reach convergence on those features will cost thousands of dollars(US) and potentially a small team of volunteers or even paid team members spending countless hours on dataset creation, research & development, and large accelerator (A100/H100) clusters to make the model I want to make for the community. Because of this:
Revamping of Ko-Fi/Crowdfunding: My models will always be free an open-source, even if it's a detriment to achieving my goals. It's out of principle. But I am realist, the cost of fine-tuning the latest generation of image-synthesis models (SDXL+) is simply too expensive for one person without financial backing from sponsors/investors to do. I do what I can with what I have, and I'm truly thankful for those who have donated; with no incentive other than to show appreciation for the models I create. Now it's time to give back to those who want to, and can support my models. The exact perks/rewards are not yet determined. I'll be uploading a open Poll here on Kofi and a civitai post in the coming days to get feedback. One thing that's for certain, is models will always have a public and free release. Outside of the inclusion of rewards/goals/ect.. I will start being more active online; sharing progress reports, answering questions, posting guides/tips&tricks, and so on..
Dataset Toolkit: A modern, cross-platform GUI (optional CLI) desktop application for creating/cleaning/sorting/postprocessing datasets for image-generation model fine-tuning. This also includes the weights for trained multimodal modal agents, and a robust modular pipeline for image captioning and hallucination correction. In addition, the software supports OAI/Claude/Google and OpenRouter API and schema with model-specific configs/prompts for use with foundation models. Code will be open-source and open to PR. (More details soon).
Register-Token PR (pull request) for Kohya-ss: Register-Tokens were mentioned in the paper for Illustrious, with no technical information regarding the implementation. After research and experimentation I'm fairly certain I figured out how these registers were implemented for unreleased version Illustrious. I will be sharing my findings in the coming week, along with a fork of Kohya-ss/sd-scripts that implements the Tag Manipulation technique.
Look for more items to be added to the roadmap throughout January and Happy New Year! 😊