SDXL 1.0 (Distilled/Predictive)
Full FP32 Model
CLIP-G and CLIP-L are both trained and distilled.
Distilled: The clip models have been distilled from larger text/token projections: teacher/student; goal the smaller model learns the latent shape of the larger.
Predictive: The clip model had separate training where the padding token was used as a mask. Allowing for the clip model to predict some additional context if the 75 token limit is not used.