Sign In

Z-image / Lumina 2 / Newbie fp16 ComfyUI plugin for old Nvidia GPUs

5

17

1

Updated: Jan 9, 2026

tool

Type

Other

Stats

17

0

Reviews

Published

Jan 7, 2026

Base Model

Other

Hash

AutoV2
A5CB506FFA

A ComfyUI plugin to patch Lumina 2 based models to use fp16 "safely" on old Nvidia GPUs (rtx 20xx and before) which do not support bf16.

Lumina 2 based models means: Z-Image, Lumina 2 and NewBie.

Those models don't support fp16, internal layers will overflow and you will get NaN as output (black/pure noise image).

Before ComfyUI v0.4, it uses fp32 when bf16 is unavailable, which won't overflow, but extremely slow (2~4x slower than fp16).

ComfyUI v0.4 starts to use fp16 by default and clamp overflows, you no longer get NaN and it's very fast, but it changes output drastically (e.g. clamp a value from 5b to 60k), and so does the quality of your final image.

This patch can handle overflows in fp16 mode "safely".

  • Can apply a Linear algebra trick to model weights ("reduce weight"), which directly prevent (90~100%) overflows.

  • Will automatically recompute the layer in fp32 again if overflow/NaN was detected.

  • No clamping. Identical output. Thus identical image.

  • Still as fast as fp16, ideally only 5% layers need to be recomputed in fp32.

To clarify, this plugin isn't uploaded to GitHub as a normal plugin because it's very "dirty". "dirty" means it hot/monkey patches the ComfyUI core code. This approach is terrible from a programming perspective. But this is the simplest approach I can think of. I don't want be toast.

Tested on ComfyUI v0.7.


How to use:

  1. Put the py file in the ComfyUI "custom_nodes" dir.

  2. Open it with a text editor. Modify settings.

  3. Restart ComfyUI.

  4. Add "ModelComputeDtype" node to your workflow and set dtype to "fp16".

This plugin will directly patch Lumina 2 code when ComfyUI is loading. There is no node. Those settings take effects immediately and globally.


Note about "reduce weight":

Linear algebra trick, can avoid most (90%) overflows. Does not change final results. (basically: if A x B overflowed in accumulation, then A / 32 x B x 32)

Model weights will be modified when loading. Thus does not support loading LoRA dynamically. The base model weights changed, and the LoRA weight patch will be invalid.

However, you can merge your LoRA and save a checkpoint in advance. FYI, You need to disable this when making checkpoint.

Supports fp16/bf16/scaled fp8 models. Does not support pure fp8/gguf models.

Note about Z-Image:

All overflows can be handled "safely". No clamping. Identical results.

If you enable "reduce weight", there will be no overflow at all.

Note about Lumina 2:

All overflows can be handled "safely". No clamping. Identical results.

If you enable "reduce weight", only 2 layers need to recompute (total 30)

Note about NewBie:

NewBie does NOT support fp16. Same as Lumina 2. I don't know why its author claims it supports fp16 in diffusers. Might be a copy-paste typo.

The text embedding must be clamped. It's overflowed before even reaching the DiT. But might be outliers (0.01%). I don't see difference during testing. So, still identical results.

If you enable "reduce weight", only 2 layers need to recompute (total 40)