Manual Image Classification Program for Large Image Datasets

Many times, when downloading a dataset related to an anime, character, or concept, we come across low-quality and unhelpful images. In order to make the most out of this data, it is convenient to perform a classification that allows us to create different datasets with varying levels of quality.

With this program, you can quickly classify images into categories such as "Premium," "Medium," and "Low quality." This gives you the opportunity to train your model with specific datasets and adjust the number of repetitions of low-quality images to have less influence on the final result.

Additionally, you can use these classified datasets to train various models, each focused on a specific quality. This provides flexibility and options to achieve the best results according to your needs.

The usefulness of proper classification is extensive and diverse. You can optimize your machine learning models, filter and select higher quality images, generate balanced datasets, and much more. With this tool, you can save time and effort by quickly classifying hundreds of images in a matter of minutes.

That's why we created this program, which allows you to classify images into different groups or categories. When running it, one image will be displayed at a time, and you will be presented with several buttons representing the different categories. Simply click on the corresponding button to classify the image into that group. The program will take care of moving the image to the appropriate folder. Repeat this process with each image until they are all classified. It is an efficient way to organize and categorize images according to your needs.

~~~~

import os
import random
import shutil
from PIL import Image
from IPython.display import display, clear_output
from ipywidgets import widgets, VBox, HBox

# Ruta de la carpeta que contiene las imágenes
EeveelutionsCollection = "EeveelutionsCollection" #@param {type:"string"}
images_folder =  f"/content/drive/MyDrive/Dataset/{EeveelutionsCollection}"

# Obtener la lista de imágenes en la carpeta
images = [os.path.join(images_folder, filename) for filename in os.listdir(images_folder) if
          filename.endswith(".jpg") or filename.endswith(".png")]

# Inicializar variables
current_image = 0
buttons_layout = None

# Obtener el número de botones (carpetas) deseado
num_buttons = 3 #@param {type:"slider", min:0, max:10, step:1} 
# Cambiar por el número deseado

# Crear las subcarpetas y los botones
subfolders = []
buttons = []

# Lista de emojis con variedad visual
emojis = ["😃", "😊", "🥳", "🤩", "😎", "🤔", "🌟", "🎉", "🌈", "🦄",
          "🍕", "🍦", "🎈", "🎁", "🎵", "💃", "🚀", "⚡️", "🔥", "❤️",
          "💪", "🌺", "🍓", "🐶", "🐼", "🐬", "🏀", "⛵️", "🏰", "🌌"]

for i in range(num_buttons):
    subfolder_path = os.path.join(images_folder, f"Subcarpeta{i+1}")
    subfolders.append(subfolder_path)
    os.makedirs(subfolder_path, exist_ok=True)

    # Obtener un emoji aleatorio y no repetible
    emoji = random.choice(emojis)
    emojis.remove(emoji)

    button = widgets.Button(description=f"Botón {i+1} {emoji}")
    button.on_click(lambda x, folder=subfolder_path: move_image(folder))
    buttons.append(button)

# Función para mover la imagen a la subcarpeta correspondiente
def move_image(subfolder):
    global current_image

    if current_image < len(images):
        image_path = images[current_image]
        image_filename = os.path.basename(image_path)
        new_image_path = os.path.join(subfolder, image_filename)

        shutil.move(image_path, new_image_path)
        print(f"Imagen movida a la subcarpeta: {subfolder}")

        current_image += 1

    # Cargar la siguiente imagen
    clear_output(wait=True)
    load_next_image()

# Función para cargar la siguiente imagen
def load_next_image():
    global current_image, buttons_layout

    # Verificar si quedan imágenes por mostrar
    if current_image < len(images):
        # Cargar la imagen actual
        image_path = images[current_image]
        image = Image.open(image_path).resize((200, 200))

        # Mostrar la imagen usando el widget IPython.display
        display(image)

        # Imprimir el contador de imágenes restantes
        print(f"Imágenes restantes: {len(images) - current_image}")

        # Actualizar los botones
        buttons_layout = HBox(buttons)
        display(buttons_layout)

    else:
        # Se han mostrado todas las imágenes
        print("Se han movido todas las imágenes")

# Crear el diseño horizontal de los botones
buttons_layout = HBox(buttons)

# Cargar la primera imagen
load_next_image()

Importing libraries: We import several libraries that will allow us to work with images, interact with widgets, and display images in the Jupyter Notebook environment.

Defining the image folder: The path of the folder containing the images we want to classify is specified. You can adjust this path according to your needs.

Getting the image list: The code searches the specified folder for all images with the .jpg or .png extension and creates a list with the complete paths of each image.

Variable initialization: Some initial variables are set, such as the current image index and the variable for the button layout.

Getting the number of buttons: Using a slider, you can select the desired number of buttons (folders) in which the images will be classified.

Creating subfolders and buttons: The subfolders where the images will be moved are generated, and the corresponding buttons for each subfolder are created. A randomly selected emoji from a predefined list is also chosen to be displayed in the button text.

Function to move the image: When a button is pressed, this function is executed. It moves the current image to the corresponding subfolder, using the shutil.move() function. Then, it increments the current image index and loads the next image.

Function to load the next image: This function loads the current image and displays it in the Jupyter Notebook environment using the IPython.display widget. It also updates the remaining image counter and displays the buttons.

Button layout: A horizontal layout (HBox) is created to display the classification buttons.

Loading the first image: When the code starts, the first image is loaded and displayed to initiate the classification process.

Considerations and Precautions when using this Code:

Verify the image folder path: Make sure the images_folder variable contains the correct path to the folder containing the images you want to classify. Ensure that the folder exists and contains the images you want to process.

Check the image file extension: The code is set to search for files with the .jpg or .png extension. If your images have a different extension, you need to modify the line filename.endswith(".jpg") or filename.endswith(".png") to include the correct extension.

Define the desired number of buttons and subfolders: The code uses the num_buttons variable to determine the number of buttons and subfolders that will be created. Adjust this value according to your needs. Remember that the maximum allowed number is 10.

Use of emojis: The program uses emojis to label the buttons and subfolders. Make sure you have an emoji font available in your execution environment for them to display correctly.

Interacting with the buttons: When running the program, one image will be displayed at a time, and the corresponding buttons will be presented. Click on the button that corresponds to the desired classification to move the image to the appropriate subfolder. Be careful when clicking, as the image movement is irreversible.

Program completion: Once all the images have been displayed and moved to the corresponding subfolders, the program will print the message "All images have been moved." Verify that all images have been classified correctly before ending.

Backup of the images: Before running the code, make sure you have a backup copy of your original images. The program will move the images to the subfolders, which involves a change in their location. Keep a backup to avoid data loss.

Changes in the execution environment: If you are running the code in a different environment, such as Jupyter Notebook, make sure you have all the necessary libraries installed. If you encounter any errors related to the libraries, ensure that you have the correct version installed.

Performance considerations: If you have a large number of images, keep in mind that processing may take time and consume system resources. Make sure you have sufficient storage capacity and resources available to complete the classification process.

Remember that it is important to understand and review the code before running it. It is always recommended to perform tests on a test image set before applying it to a complete dataset.

Open in Gith

Open in Colab

Financial assistance: Hello everyone!

This is Tomas Agilar speaking, and I'm thrilled to have the opportunity to share my work and passion with all of you. If you enjoy what I do and would like to support me, there are a few ways you can do so:

~~Ko-fi (Dead)~~
Patreon
Buymeacoffee

Manual Image Classification Program for Large Image Datasets

Open in Gith

Open in Colab

Comments