Your CLIP is forced in FP32 - Why That Matters
Why is clip forced in FP32?
By default all major GUI use the CPU for handling CLIP
Forge Users currently the GUI is forcing GPU use and ignoring most user-line commands
Why does this matter
When the CLIP is loaded it has to be up-cast to FP32 and stored in RAM
An up-cast FP16 CLIP should have just been FP32 your loosing precision at the cost of 200MB for CLIP-L and 1.25GB for CLIP-G
What Should You Do?
Option One: Merge FP32 CLIP Models
You can merge FP32 CLIP into your favorite model, this increases model size but ensures better accuracy across the board. - This will not decrease IT's per second unless you have very low system ram.
Option Two: Force the GUI to Use VRAM
If you have sufficient GPU VRAM, you can configure the system to process CLIP in VRAM, if your going this route I would use BF16 for your CLIP
Forge:
--always-gpu (As of 12/30/2024 Forge Acts like this command is in use)
Comfy:
--gpu-only
Specific Commands for CLIP Optimization when on CPU
These commands ensure torch doesn't take your FP32 CLIP convert it to FP16 and up-cast it back to FP32 (Yes it can do that)
Forge:
--clip-in-fp32
Comfy:
--fp32-text-enc
Petition your favorite model creators
Petition your favorite model creators to at minimum make a mixed precision model with the FP16/BF16 UNET and CLIP/VAE in FP32 - This is simple to do and has no downside.
The CLIP model included with nearly all models on site is FP16. This was to reduce the model size and keep everything in VRAM. However this was before Torch updated its handling and allowed auto-casting and mixed precision models. So we can have a FP8, UNET with a FP32 CLIP (And even mixed precsion blocks in the same model)