Sometimes the reason why your model isn't learning some details in your training images is because it simply doesn't see it. The VAE being lossy, discards some information when encoding the images from pixel space to latent space. This is more noticeable for small text, pupils, or any high frequency features. To see what has been changed, we can just decode the latents back to pixels. I set up a space to make it easier to do this using the NovelAI/anything VAE. For ease of comparison, you can open the output images in new tabs.
You can also do something similar straight from the a1111 webui without extensions. Go to the img2img tab and input your image. Set the resizing options accordingly so the image doesn't get distorted. Set the sampling method to DDIM, steps to 1, denoising strength to 0 and then press generate. The output is then the original image as seen by the VAE.
Image credit: nocopyrightgirl