santa hat
deerdeer nosedeer glow
Sign In

Dataset Preparation Guide for Visual Learners

Dataset Preparation Guide for Visual Learners

Preamble

Scientific papers should be referenced and math should take precedence over popularity. However anecdotal evidence has a place in AI learning. A Lora or model that has a high scientific accuracy score may not be visually appealing.

A Lora trained to convergence may not be as flexible as one that has not been.

Your dataset may have had "bad" data that could have been learned by one form of training and not another, and thus you will argue for AdamW, or a certain rank or training type.

Data Sets for Visual Learners

Sobel Edge Detection

Midas Depth approximationPosterization - 6 Color

In the examples above we see a representation of what the AI "sees" when looking at the cover image.

It is important to note that a more visually appealing image does not equal better training data.

Enhancing images for learning.

Care needs to be taken when preparing images for ML or AI learning. It is possible to enhance results by some simple steps. It is also possible to cause detrimental learning.

On the left we see a depth approximation of the cover image, on the right is the same image after the background was removed.

Even though the background appeared to be a solid color it was not, and it did effect the outcome of the depth map.

Difference of Depth after removing a "Solid Background"

We can see the enhanced details that the depth map picked up by overlaying the images.

I do not recommend removing the backgrounds of all your images in a data set but it can enhance them.

Up-scaling Avoid at all cost and learn to identify

The detrimental effects of up scaling can be seen in this image.

  • The same Sobel filter was applied after a AI up scaling at 8x

  • Both the Sobel filter and the Depth map (Not Pictured) appear more flat and washed out

Key Points

  1. Removing backgrounds can increase depth on learned subject.

  2. Up-scaled images should not be used for training. Identify the super clusters of pixels with Retinex Filter

  3. Use Midas Depth map or Sobel Edge detection to visually approximate learned data.

  4. Many 4k and 8k images are just upscaled. Even if they are native 4k or 8k some data can be lost when Linear downscaling.

  5. In my experience using "auto drawable levels" on 50% of the dataset was beneficial

  6. In my experience using removing the background of around 50% of images was beneficial

Repeat point - A visually appealing image does not always equal better learning.

Note Image used was marked for CC attribution "vecteezy" low res non professional licensed image used.

16

Comments