Sign In

JoyClip has Arrived

9

JoyCLIP

JoyCLIP is a massive training project for CLIP-L and CLIP-G for all models using these files. The list of models and benefits listed as follows.

Primary Benefit

All pony models will benefit from the base JoyCLIP for Pony. Improvement shown on 90%+ of images for ethnicity accuracy, hand placement, NSFW content. When the results where neutral to prompt accuracy I found it about 50/50 which image I found more visual appealing.

Realistic models will benefit most.

Secondary Benefit

JoyCLIP Name Recognition CLIP model can be used in SDXL or PONY but its primary target is pony realism models. This CLIP-G model will instantly restore thousands of name triggers from getty, imdb and other sources to pony with less compromises then using the LAION model.

Name Recognition CLIP in PONY will heavily degrade many PONY features and should only be used in adadetailer face, or if name recognition is a high priority.

Most PONY users should use JOY_PONY_CLIP-G and JOY_PONY_CLIP-L

Tertiary Benefit

CLIP-L (Not the PONY CLIP-L) can be used in any model using CLIP-L. This includes SD 1.5, FLUX, Hunyuan, WAN, SD3, SD3.5 and others.

This model likely improves NSFW accuracy of those models. However the Pony CLIP's have been tested thousands of times and the base CLIP-L has only been tested a few hundred.

Seed to Seed results across SD 1.5, Hunyuan, WAN, and FLUX need to be reported at least 1k if not 10k times for me to consider this model an improvement.


I do not consider a clip training to be successful unless out of 100 images the new clip (Joy) does not have seed to failure more then 5 times.

A failure being a deformity, dual limb, something major wrong and the old clip does not have that issue.

In that same 100 images the new clip (Joy) should show major improvement on 10-20 images out of 100, and minor improvement on 20-50.

In most cases 90%+ Joy CLIP improves prompt accuracy, when accuracy is effected. Rarely 2% or less Standard CLIP outperforms JoyCLIP in hand accuracy or some other visual metric.

I achieved these results on PONY, however FLUX and the Video models remain untested. As this requires 1000's of generations to average.

9

Comments