What is the difference between FP16 and BF16? Here a good explanation for you

Updated and improved 29 July 2024

Floating-Point Representation:

FP16 (Half Precision): In FP16, a floating-point number is represented using 16 bits. It consists of 1 sign bit, 5 bits for the exponent, and 10 bits for the fraction (mantissa). This format provides higher precision for representing fractional values within its range.

BF16 (BFloat16): BF16 also uses 16 bits, but with a different distribution. It has 1 sign bit, 8 bits for the exponent, and 7 bits for the mantissa. This format sacrifices some precision in the fractional part to accommodate a wider range of exponents.

Numerical Range:

FP16 has a smaller range but higher precision within that range due to its 10-bit mantissa.
BF16 has a wider range but lower precision for fractional values due to its 8-bit exponent and 7-bit mantissa.

Examples:

Let’s use examples to illustrate the differences between FP16 and BF16 with 3 example cases. TensorFlow is used to make the tests and code shared at the bottom:

Original value: 0.0001 — Both methods can represent
FP16: 0.00010001659393 (Binary: 0|00001|1010001110, Hex: 068E) — 10 mantissa and 5 exponent
BF16: 0.00010013580322 (Binary: 0|01110001|1010010, Hex: 38D2) — 7 mantissa and 8 exponent

As you can see they have different exponent and mantissa and thus able to represent differently. But we can see that FP16 represented it more accurately with more closer value.

Original value: 1e-08 (0.00000001)
FP16: 0.00000000000000 (Binary: 0|00000|0000000000, Hex: 0000)
BF16: 0.00000001001172 (Binary: 0|01100100|0101100, Hex: 322C)

This is a very interesting case. FP16 fails and make the result 0 but BF16 is able to represent it with a special formatting.

Original value: 100000.00001
FP16: inf (Binary: 0|11111|0000000000, Hex: 7C00)
BF16: 99840.00000000000000 (Binary: 0|10001111|1000011, Hex: 47C3)

In above case, FP16 fails since all exponent bits become full and not enough to represent the value. However BF16 works

Use Cases:

FP16 is commonly used in deep learning training and inference, especially for tasks that require high precision in representing small fractional values within a limited range.

BF16 is becoming popular in hardware architectures designed for machine learning tasks that benefit from a wider range of representable values, even at the cost of some precision in the fractional part. It’s particularly useful when dealing with large gradients or when numerical stability across a wide range is more important than precision of small values.

In summary

FP16 offers higher precision for fractional values within a smaller range, making it suitable for tasks that require accurate representation of small numbers. BF16, on the other hand, provides a wider range at the cost of some precision, making it advantageous for tasks that involve a broader spectrum of values or where numerical stability across a wide range is crucial. The choice between FP16 and BF16 depends on the specific requirements of the machine learning task at hand.

Final Conclusion

Due to all above reasons, when doing Stable Diffusion XL (SDXL) training, FP16 and BF16 requires slightly different learning rates and i find that BF16 works better.

The Code Used To Generate Above Examples

import numpy as np
import tensorflow as tf
import struct

def float_to_binary(f):
    return ''.join('{:0>8b}'.format(c) for c in struct.pack('!f', f))

def display_fp16(value):
    fp16 = np.float16(value)
    binary = np.binary_repr(np.float16(value).view('uint16'), width=16)
    sign = binary[0]
    exponent = binary[1:6]
    fraction = binary[6:]
    return f"FP16: {fp16:14.14f} (Binary: {sign}|{exponent}|{fraction}, Hex: {fp16.view('uint16'):04X})"

def display_bf16(value):
    bf16 = tf.cast(tf.constant(value, dtype=tf.float32), tf.bfloat16)
    bf16_float = tf.cast(bf16, tf.float32)
    binary = float_to_binary(bf16_float.numpy())[:16]  # Take only first 16 bits
    sign = binary[0]
    exponent = binary[1:9]
    fraction = binary[9:]
    return f"BF16: {bf16_float.numpy():14.14f} (Binary: {sign}|{exponent}|{fraction}, Hex: {bf16.numpy().view('uint16'):04X})"

values = [0.0001,0.00000001, 100000.00001]

for value in values:
    print(f"\nOriginal value: {value}")
    print(display_fp16(value))
    print(display_bf16(value))

What is the difference between FP16 and BF16? Here a good explanation for you

What is the difference between FP16 and BF16? Here a good explanation for you

Floating-Point Representation:

Numerical Range:

Examples:

Use Cases:

In summary

Final Conclusion

The Code Used To Generate Above Examples

Comments