48bit Images Accuracy VS. Quantization
One advantage of coming to the AI table with a higher degree of knowledge of 3D, Art, photography and the like is my constant, and likely pointless goal of trying to bring 16bit per channel images into AI
Why Pointless currently? The difference in 16bit 0-65k and 8bit 0-255 per channel when normalized to standard values for training is around 0.0025 - 0.0035
This variation means that even if you source only RAW images formats and correctly convert them with CV2 to png the effort made will not effect training in anyway
One approach would be to use the 48bit source images and amplify the delta difference calculated to 32bit by 10x
Per GPT this looks like:
def prepare_tensor_with_delta(img_arr, gain=10.0):
arr48 = img_arr.astype(np.float32)
tensor_48 = torch.from_numpy(arr48 / 65535.0 * 2 - 1)
arr8 = (img_arr / 257).astype(np.uint8)
tensor_32 = torch.from_numpy(arr8.astype(np.float32) / 255.0 * 2 - 1)
delta = (tensor_48 - tensor_32) * gain
amplified_tensor = torch.clamp(tensor_32 + delta, -1.0, 1.0)
amplified_tensor = amplified_tensor.permute(2, 0, 1)
return amplified_tensorAnother approach would be to normalize to larger values then 0,1 or 1,1 but this likely would break any training.
A third approach would be to train a gan to amplify the differences.
Can you come up with another approach?
Is the variation in color of a 48bit image meaningful depth or just noise to be discarded by quantization?
Could anything be gained by training on raw pixel values?
The image in the cover is FLUX.2 with the math listed.
Slightly off topic but by default image editing software like GIMP does not edit in 16bit per channel mode, and turning this mode on will prevent realtime editing from all but the highest end GPU's or GPU accelerators.
Most GPU's and monitors support only 10bits per channel rather then 16.
So why worry about colors we can not see? The issue even back then was rounding errors that happen when you times by 4 or divide by 4.






