Foreward and Down
I've been struggling hard since the new Forge WebUI update dropped earlier this month and been seeing some related issues come to the fore across the board as the new hotness of FLUX is riding its wave of well deserved success. The issue however, is as always the gateway to certain kinds of functions is being kept by both hardware and skill.
Only one of these things is usually resolved at a time. I've had to go back to basics and thoroughly test every Lora, fixer, embed, and model for compatibility problems, one at a time, slowly peeling back problematic models from the pool of stuff I use in my generation techniques. While no individual model or lora, baring minor exceptions have needed to be removed from my use pool entirely, certain kinds of check points without question have had major issues with the changes to accommodate FLUX models.
Most notable is the way unet and CLIP are handling word choice and the balance of prompt word count/token allocation. This means a much stricter and consideration for possible impossible prompt definitions when generating images. Actions that are not possible to do standing or sitting as an example become increasingly important to how your UI and model of choice regard your inputs.
So why does this matter?
This matters mostly in regard to image generation Error, bricking, artifacting, and Problematic content images.
I ran some tests, using the same model, same prompts, same lora, same seed etc. Comparing my older images with new ones generated on the new UI/tools. It's different, not always radically so, but without question, it means the replication of it is no longer possible and throws out almost, if not all, the currently experienced information and things I've read, seen, or been told about how prompt input function.
However, I have more or less, discovered a few things of importance so far in terms of identifying problematic or bricking image likely things to look out for:
As usual compatibility matters in ensuring you are on a SDXL or more so a Pony built on top of SDXL when using the model.
Check training robustness, if the model was not trained across a few check points at least or styles of check points, it is far more likely to brick itself on some keywords.
Number or tokens/words in positive vs negative prompting is significantly more sensitive and formerly safe CFG scores might be less so.
check for multi-lora overlapping in image showcases against main demo's for existing checkpoint bias. If the base model is doing most of the work the lora isn't going to be cross model compatible.
Weighting has influence but no longer seems to care even at low weights and some can and will brick images even at near 0 weighting if added to the generation pool.
I've noticed a definite bias towards Comfy UI trained models in the new back end for compatible use, but it is near impossible to tell or determine who is and is not doing this unless they explicitly say so.
Beware of source image redundancy and unintended feature inclusion. The building of a lora off of too similar other characteristics in a dataset will taint the concept it is used for with other effects that you may not want in your images but singular experimentation needed on this subject in particular.
Model Limitations relative to these Lora are also now a factor, realism and drawn clash far more than they did prior.
The major issue with this means every Lora you get and new model you want to try out likely needs significant adjustment in prompt craft and Lora use relative to before. Pony was supposed to resolve a fair amount of this issue, as will the FLUX CFG distillation pool over time, but this too will have it's own issues.
Flux Capacity: Forge Ahead
I found forge fairly easy with the new update handling flux was drop in checkpoint and VAE and you're good to go. It's fairly lacking on NSFW but given the nature of it, that will fall to custom trained merges, lora, and third party development just like everything else in that vein. No one wants the liability I'd imagine.
That aside I will note a few discoveries of the new update regarding FLUX:
Choose non-source checkpoints carefully, some are broken monster merges from old SDXL/Pony just reported or half merged sloppily into the mix.
You need to source Lora carefully unless you have monstrously more VRAM than 12gb etc since generation times are still pretty high.
15+ step count is my minimum recommendation for anything you want with text or more detail than basic shapes or if you want human level figures.
Sampler and schedule seem about the same as usual but I did notice Resolution size is more important.
Skip Highrez fix unless you are making above 1500x1500+ sizes, even with ESRGAN tile scaling it's just too slow on Flux, though img2img your flux generation on a separate checkpoint seems viable at low enough denoising strength just fine.
Note you will have issues trying to put text in more than one place or in extremely long strings of text, keep it short.
Prompting style is also very different with Flux as noted in it's documentation since anyone who's managed to paly with the distillation slider can tell you, words don't matter as much as the meaning of them in the internal dictionary. Since even on my system images take a minimum of 2-3 minutes to generate at reasonable size and nuking every back ground process I have to shave time/not crash, I've not spent too much time exploring it yet.
So where from here?
I think the next step is largely going to be seen from FLUX integration, optimizations, and how well their video model works out. If in the next 6 months their video model can sustainably do even 10 seconds of accurate guided video every other model and process will be functionally dead overnight.
Tack on Video to Video editing and some meta data machinations and NVIDEA stock will shoot to the moon again as the process will crush all the much bigger and more paid access generation tools out there overnight.
If FLUX stays as is, without serious competition from any other dev team out there right now even that will do a lot for both Meme warfare and with the new Face Swap controlnet style tools that came out this month funtimes will be had by all who can update their hardware to not slog their generation times.
Since you're here
I'd love to hear more from followers and people stalking my other less tame content or my tame content, you degenerate hand holders that you are about what more I should set for my overnight runs, further big lora limit tests, or in general things to keep me from getting bored and frustrated by the updates.
I think civitai has done a good job gamifying the system of generation, building interaction into the process of getting more content generated while the content drives traffic in it's own right. Using these generations to further cumulatively build out more robust training data and tools internally to grind stone and refine the very best of the thousands of us who contribute, use, and enjoy the ecosystem here of content.
So do please ask me questions, tell me what potentially causes certain image/prompt issues, and enjoy the endless surges of post dumps I do every day and hope to hear from you all more while I continue to bug fix my webui.