<edge-media url="197ebaa4-1995-462b-a111-95738abc4f5e" type="image" filename="wtf.jpg"></edge-media><h1 id="gemma-clip"><a target="_blank" rel="ugc" href="https://civitai.com/models/1850968">Gemma CLIP</a></h1>Every metric says this training should have failed. It used the Gemma Vision projected down to CLIP vision size.So that is a vision model that has its dim reduced by more then half and its training size reduced from 896x896 down to 224x224The latents created by the model should have been so far out of alignment that it could not possibly train. The graph shows a model that is very unhappy.But yet, the clip model retains text. In fact it learned concepts that I defiantly did not have training images of.<h3 id="dog-on-fire">Dog on Fire</h3><edge-media url="3b819a02-9811-4414-8369-e164f6c79bda" type="image" filename="1.jpg"></edge-media><h3 id="cat-underwater">Cat underwater</h3><edge-media url="c9066310-4b3b-4a4c-9af9-5cec464ad60b" type="image" filename="2.jpg"></edge-media>In fact many other complex prompts tested better with Gemma CLIP.But I sat on this for a few days, and questioned even releasing it. It appears to work even better then my prior work with distillation, but only when the prompt is very short.This model does not do as well with long prompts as my distilled CLIP, which I am happy about. As that was the whole point of the distilled training.With low token count gemma excels but with long prompts <a target="_blank" rel="ugc" href="https://civitai.com/models/1805024/long-clip-distilled">distilled CLIP</a> far outperforms it.

ComfyUI_00010_.png

Gemma CLIP - A training that should not work

adult toys

anal

blowjob

breast out

breasts out

child on child

convenient censoring

corpses

covered nipples

diapers

dildo riding

disturbing

downblouse

emaciated bodies

exposed female nipple

extremist

female nudity

female swimwear

female swimwear or underwear

female underwear

futanari

genitals

gigantic breasts

graphic female nudity

graphic male nudity

graphic violence or gore

hair over breasts

hanging

hate speech

hate symbols

hentai

huge breasts

huge butt

illustrated explicit nudity

incest

lingerie

male nudity

male swimwear or underwear

male underwear

nazi party

no panties

nsfw

nude

nudity

one breast out

oral

oral invitation

partial nudity

peeing

pg-13

physical violence

porn

revealing clothes

scat

self injury

sexual activity

sexual intent

sexual situations

sexy

sitting on face

strapless leotard

suggestive

thick thighs

undressed

urine

vore

weapon violence

white supremacy

wide hips

bukkake

fellatio

bikini

cumshot

implied fellatio

eat_cum

cumdrip

cum in pussy

cum on face

after fellatio

cum on hair

cum on body

cum on tongue

cum on hands

cum in mouth

presenting ass