Type | |
Stats | 229 |
Reviews | (22) |
Published | Jun 4, 2024 |
Base Model | |
Hash | AutoV2 0D9A46F41A |
Overview
I am an artist with complete Aphantasia. That means I do not have a mind's eye. I used to think the phrase "picture it in your mind" was a figure of speech. When I discovered the vivid world inside the minds of others, I was heartbroken. Around the same time, a friend introduced me to Stable Diffusion. Here was something that could turn my mental narratives back into images. It felt like a gift.
I set out to really explore the possibilities of generative AI as a concept visualization assistive technology. I vowed to create and share resources to enable this for others. That's how this checkpoint began. For this purpose, I believe a successful checkpoint should be:
General purpose
Efficient
Accessible
I set about finding out how to define and accomplish that. I found Civitai. By looking around, I arrived at definitions for those terms, which became goals for this checkpoint.
A general-purpose checkpoint can generate images of many different styles, is based on a model that is, and shows promise to continue to be, popular, and works well with LoRAs.
An efficient model can generate useable images often and in few steps. Looking around at LCM and Turbo models, four steps became my target.
An accessible model is readily available at a popular destination and easy to prompt for.
I then began experimenting with block merging. I decided on the SDXL family with the Turbo version being particularly attractive for its speed. Then I merged the SDXL DPO U-Net to increase output quality. I was just about satisfied with the result of 1024x1024 images at four steps when SDXL Lightning was introduced. It was something which I could not ignore because it so closely aligned with my goals, so I postponed my plans to upload the prior version to incorporate any gains SDXL Lightning could provide. That formed the basis for the first version.
Version 1
Based on SDXL Lightning, version 1 performs well at the target size of 1024x1024 and in four steps with DPM++ SDE Karras at CFG Scales 1-2.5.
Version 2
Based on SDXL Hyper and finetuned with a dataset of over a thousand curated images, version 2 performs well at the target size of 1024x1024.
In an efficiency improvement over version 1, great results can be achieved in three to four steps with DPM++ SDE Karras at CFG Scales 1 through 2.5.
Additionally, amazing images can be produced in five to six steps with Euler Ancestral Simple at CFG Scales 1 through 2.5.
Version 2 + PCM
Incorporating the best parts of Version 2 and a Phased Consistency Model, generated images are similar to version 2, but perhaps a bit more vivid. In the same configuration, it produced more visually pleasing results with AnimateDiff SDXL.
Like version 2, great results can be achieved in three to four steps with DPM++ SDE Karras at CFG Scales 1 through 2.5.
Additionally, amazing images can be produced in five to six steps with Euler Ancestral Simple at CFG Scales 1 through 2.5.