SD 1.5 - Triple Length
I can not be the only one that misses the diversity that one prompt could yield. Granted it is equally frustrating to have an image clear in mind spell out what you want and not get it.
a highly detailed cinematic portrait of a futuristic warrior, ultra realistic, 8k
My current training - in progress is SD 1.5 with the 248 CLIP.
Attention training was more intensive then the UNET, as it was double load comparing the 77 token to 248 like a reverse distillation
The UNET pictured above is coming along nicely.
I do not have a name for the 3x length clip SD 1.5 yet. Input welcome.
Some of my other thoughts:
Take the diversity of SD 1.5 and use at as prompt enhancement for models that use mistral (Ernie) - In theory you could feed the image to the VL of mistral and it could generate a prompt from the what 1 second it takes SD to generate a 512x512 image.
I truly think this would be faster then the PE (Prompt Enhancement)
Let me know if you have any interest in 3x length SD 1.5


