Hey Diffusion Models, Your Colors Are Wrong

Did you know the majority of diffusion model training code, lora trainers, etc don't handle color correctly? Here's an example of what can happen:

The image on the left is what most data pipelines will give you. The right is how it should look. On the left you can see that the image has its colors muted and washed out compared to the correct version.

So what's going on?

Color Profiles

A color space defines what the colors in an image's data actually mean. That is, not everyone agrees what exact shade of red the point (128, 0, 0) is, but we have established various (incompatible) standards that do. Take an image encoded in one space and try to display it naively in another? You get something like the above example. Nothing too crazy different about it, but the colors are subtly wrong.

(NOTE: I picked a particularly egregious example that would stand out side-by-side. Humans are quite bad at color and even if the colors are way off, they can't tell when the images are side-by-side. If you do experimentation on your own, I recommend a way to swap wrong vs correct images in-place with a click. It's much more obvious that way.)

Now most images you encounter will be in what has become the lingua franca of colorspaces: sRGB. And by most I mean something like 90%. Photos are the most likely to use something other than sRGB, though it does occur on drawings, digital art, etc as well (just far less often).

Annoyingly, the best quality images have a disproportionately high likelihood of having a different colorspace. Darn those fancy photographers and artists!

What To Do

Specifically with models like SDXL, Flux, etc, what you want is to make sure all the inputs to the model are sRGB. That way the outputs will always consistently be sRGB. Which means you want to make sure if an image isn't in sRGB, you convert it. The best place to do this is during latent encoding. Here's how most tools prepare images for latent encoding:

image = Image.open(image_path).convert("RGB")

Which looks right, but Pillow does not do any colorspace conversion for you; it just gives you the raw RGB data. The (more) correct way:

from PIL import Image, ImageCms

def to_srgb(im: Image.Image, assume="sRGB"):
	srgb_cms  = ImageCms.createProfile("sRGB")
	srgb_wrap = ImageCms.ImageCmsProfile(srgb_cms)

	# 1) source profile
	icc_bytes = im.info.get("icc_profile")
	if icc_bytes:
		src = ImageCms.ImageCmsProfile(io.BytesIO(icc_bytes))
	else:
		src = ImageCms.createProfile(assume)

	# 2) CMYK → RGB first
	if im.mode == "CMYK":
		im = im.convert("RGB")

	im = ImageCms.profileToProfile(
		im, src, srgb_cms,
		outputMode="RGB",
		renderingIntent=0,
		flags=ImageCms.FLAGS["BLACKPOINTCOMPENSATION"],
	)

	# keep an sRGB tag just in case
	im.info["icc_profile"] = srgb_wrap.tobytes()
	return im


image = Image.open(image_path)
try:
	image = to_srgb(image)
except Exception as e:
	# Fallback if we run into a problem
	image = Image.open(image_path).convert("RGB")

We just see if the image has a color profile attached and load it if it does. Then we ask PIL to convert to sRGB. If it's already sRGB, this should be a no-op. Otherwise, it handles the conversion for us. Done!

Now I should note that this code might still not be 100% correct. Color spaces are complex and subtle things. Heck, PIL itself might make mistakes (I've found plenty a bug in image processing libraries). But it's at least a step up from just dumping the raw RGB into the model and hoping for the best.

(Just wait until you learn about the colorspaces for videos. Ugh.)