Follow-up: Why replacing CLIP in ComfyUI works and simultaneously breaks Model Toolkit

This is going to be a quick article to show why ConcoctionMix's CLIP is seen as broken by Model Toolkit, but not in the usual way that can be fixed by it. It also explains why it doesn't matter for any models that did CLIP replacements using zer0int's CLIP files.

During my reading of https://rentry.org/clipfix to try and diagnose why Model Toolkit failed to detect the CLIP of my ConcoctionMix models, despite it working properly on both reForge and ComfyUI, I decided to check these models using https://github.com/iiiytn1k/sd-webui-check-tensors (which works on reForge, hurray). Here's what it should normally look like (on a standard SD1.5 model with no oddities):

And here's the error it threw on the model with zer0int's CLIP being "grafted" onto it:

Now, this is a very weird error. sd-webui-check-tensors only detect SDv1 CLIPs (usually SD1.4/1.5), but it threw an error that says this instead. Now, ConcoctionMix is a perfectly usable checkpoint in both ComfyUI and reForge, and works without any complaints, even if it's the first loaded one. If you don't trust me, go to the model page and try it for yourself.

This means that the second part (checkpoint is SDv2/SDXL format) is the problem, right? I've also tested Segmind Vega (distilled SDXL) and it also has the same error. This one makes sense since it's a SDXL model (kinda). For ConcoctionMix... this doesn't hold up. This model was built on SD1.5 on both UNET and VAE side. SD1.5 LoRAs works on it with no issues. Only the CLIP is "broken" here, and even that is a lie if you use it on ComfyUI. On Model Toolkit, it's still detected as an SDv1 model:

This is a direct excerpt from the Model Toolkit GitHub page:

"For example a checkpoint that's missing its CLIP will be recognized as containing the UNET-v1-BROKEN and the VAE-v1-BROKEN models, which can be exported like normal if you want to fix the model"

So... it's a "missing" CLIP that isn't actually missing (the model is still properly weighted at 2GB like every other fp16 precision SD1.5 model out there) that somehow still exists (since it can still generate images without issues). Can I call this Schrodinger's CLIP, or is that a gross misunderstanding of how Schrodinger's Cat works?

In any case, this means that whatever CLIP is in there counts as "missing" for Model Toolkit (and/or also sd-webui-check-tensors), but can still generate images like normal. What this means to me is that there is a CLIP there - zer0int's CLIP - that is not recognizable by most normal methods of checking CLIPs. The model itself is still compatible with everything that alters CLIP embeddings like LoRAs or textual inversions, but trying to fit it in SDv1's structure completely breaks it.

Anyways, thanks for reading this short article about my simple discovery. If you still want a "pristine" SD1.5 model from ConcoctionMix or any model with CLIP issues, replace the CLIP with one from a similar checkpoint that is detected, or extract the UNET and put it on a model that works.

Follow-up: Why replacing CLIP in ComfyUI works and simultaneously breaks Model Toolkit

Comments