Hi everyone!
This is a quick article to let you know how to get your creations to another level with SD1.5.
I will present you a very simple workflow allowing you to precisely control the image similar to SDXL double CLIP performance BUT WITH ANY LANGUAGE*
* any language you know from the list ;)
Here is the full workflow:
Points of interest are wrapped into purple groups.
It contains two main parts:
Traditional advanced CLIP Conditioning on the top half which controls a visual appearance and style of the resulting picture.
[KILLER FEATURE AHTUNG] ELLA-driven conditioning on the bottom half which replaces the old-good CLIP. This conditioning is not only shows significantly higher adherence, BUT it also SUPPORTS MANY LANGUAGES.
Important Note: I recommend you to increase the steps because ELLA changes the image significantly up to the last steps. It is better to go with big amount of small steps instead of few big leaps.
Important Note #2: at the moment of writing there was two custom nodes for ELLA in ComfyUI. One with suffix "wrapper" works with different FLAN-T5, but uses custom version of sampler and cannot be combined with CLIP conditioning. Other custom node without "Wrapper" (shown on screenshots) can be combined with CLIP, but I managed to make it working only with FLAN-T5-Encoder-only-bf16. Didn't try full FLAN-T5-XL model, because it is too big.
Main ideas are reflected in notes of the workflow. Hope you can read from screenshots.
If you have latest ComfyUI, you can just take one of the latest images I posted into my CinEro SD1.5 model.
PS: marked the article as XXX intentionally because ELLA is superpower and this is your responsibility to avoid law violation. DON'T MISUSE or ABUSE THIS PIECE of SOFTWARE!!!