Type | |
Stats | 931 |
Reviews | (181) |
Published | May 18, 2023 |
Base Model | |
Training | Steps: 231,000 Epochs: 100 |
Hash | AutoV2 5E1D7DE310 |
V3 is now live!
As always you can check all the details, get all the data we used, parameters, and code snippets on our substack https://followfoxai.substack.com/p/impact-of-tags-on-sd-general-model
Check out our upcoming roadmap below - lots of exciting things ahead!
About V3
note - this might be a great base for your LORA needs - model is very neutral, can react to all ranges of prompt styles, and can perform across multiple image types
We have added a subset of Booru tags to our images, so now it can react to those tags!
Tags that you should try:
Solo
- puts one character in the generated image, works quite consistentlylooking at viewer
- has a strong female bias but does a good job of making the character to be centered and look at the cameraoutdoors
- works consistently to generate an outdoor environment or place characters there.blurry
- empty generations consistently generate blurry images. When tested as a negative prompt, it has some improvementsBlurry background
- works quite well to mimic the bokeh style of MidJourney. Here is an example of using it as a positive promptJewelry
- generates images of jewelry or adds them to the generationindoors
- works similarly to the outdoor prompt
Image Generation Recommendations
The model is versatile, and you can prompt it in almost any style. Whether it is MidJourney style prompts or anything from Civitai or Lexica, you should expect some interesting results in most cases.
Additionally, you can now experiment with the tags that we discussed above.
And finally, we highly recommend using some form of upscale method. Here are two of our favorites:
Hires. Fix
Enable Hires. Fix, set denoising strength between 0.3-0.5, upscale by 1.5-2x, and use Latent (nearest exact) or 4x-Ultrasharp upscaler. The rest of the parameters are quite flexible for experimentation.
ControlNet + Ultimate SD Upscale
Check the ControlNet tile upscale method from our previous post (link).
Upcoming Roadmap
Vodka Series:
Vodka V3 (complete) - adding tags to captions to see their impact
Vodka V4 (in progress) - addressing the ‘frying’ issue by decoupling UNET and Text Encoder training parameters
Vodka V5 (data preparation stage) - training with a new improved dataset and all prior learnings
Vodka V6 (TBD) - re-captioning the whole data to see the impact of using AI-generated captions vs. original user prompts
Vodka V7+, for now, is a parking lot for a bunch of ideas, from segmenting datasets and adjusting parameters accordingly to fine-tuning VAE, adding specific additional data based on model weaknesses, and so on.
Cocktail Series:
These models will be our mixes based on Vodka (or other future base models).
Bloody Mary V1 (complete, unreleased) - Our first mix is based on Vodka V2. Stay tuned for this: Vodka V2 evolved from generating good images with the proper effort to a model where most generations are very high quality. The model is quite flexible and interesting.
Bloody Mary V2+ (planned): nothing concrete for now except for ideas based on what we learned from V1 and improvements in Vodka base models.
Other cocktails (TBD) - we have plans and ideas to prepare other cocktails but nothing is worth sharing for now.
LORAs, Textual Inversions, and other add-ons:
We have started a few explorations on add-on type releases to boost the capabilities of our Vodka and Cocktail series, so stay tuned for them.
Please note that we will share the posts on these explorations regardless of the success. Some will likely fail, but most importantly, we will learn from the process.
Full User Experiences and Solutions:
This is just the first hint on some of our upcoming releases. We are working on translating some of our accumulated experience and our vision into full release products. Stay tuned as we will be sharing more and more about some of our most exciting projects!
Older Versions and History of Vodka
Overview
TLDR: We are releasing Vodka_V2 by FollowFox.AI, a general-purpose model fine-tuned on an updated dataset - now from Midjourney V5.1. And as usual, in this post, we will share all the details how we got there. What you should expect from the mode:
We used an objectively better dataset - 2.5x larger, which was cleaned better.
The resulting model is quite similar to V1 but marginally better. It’s a step up but not a breakthrough-type improvement.
In the current state, we can generate some cool images with some effort
The model is still far from effortlessly and consistently generating MidJourney or even top SD models level output
You can read all the details about the model training process on followfox.ai (link to the post), as we can embrace the open-source nature of this community. You can recreate the process, see exactly how we got here, and provide feedback and suggestions on individual aspects of the protocol.
Parameters and Workflow that Works Well for Vodka V2
There is a lot more to test here, but we will share a few observations:
Compared to V1, you can try a wider range of CFG values; anything from 3 to 7.5 can generate good output
Booru tag-only prompts do not work well since we didn’t tag the dataset
Human sentence-type description followed by adjectives and “magic words” works quite well
Almost all samplers seem to generate interesting results.
SD upscale workflow (outlined below) with tile ControlNet enhances the image quality of this model
Using EasyNegative TI (link) is recommended. “blurry” in negative prompts also helps.
Upscale Workflow to Try in Automatic1111
After generating the initial image you like in the txt2img tab (we recommend doing a grid of different samplers and CFG values for each prompt to find the promising ones), send it to img2img.
Use the same prompt and sampler as in the original generation
Set sample steps high; in our case, we used 150 for most of the images
Set width and height to 2x the original. So 512x512 should have 1024x1024
Set the denoising strength to something low; we used 0.2 to 0.25.
For the CFG value, we used the (original - 0.5) formula. So if the original image was generated at 7.0, we would set it to 6.5.
ControlNet settings: enable it; for preprocessor select “tile_resample,” and for model ‘control_v11f1e_sd15_tile’. You can also switch to the “ControlNet is more important” option. No need to adjust any other settings.
Make sure to have the “Ultimate SD upscale” extension installed. Select it from the Script dropdown, select the 4x-UltraSharp upscaler, and set tile width and height to 640x640.
Press generate, wait a bit, and you should have a decent output. You can repeat the process to go even higher resolution.
Conclusions and Next Steps
We believe the model development is going in the right direction, and we will continue releasing the new versions. And, of course, we will document and release every step of that journey.
For the V3 release, we already have a working hypothesis of where the blurriness and lack of details in some of the generations might be coming from, and we will try to deal with that.