Libraries: opencv-python, pillow, and keyboard
Unsplash Downloader: Download Here
(or any other method/website to gather the images. I use Unsplash.)
How to Create a Regularized Image Dataset:
Use the Unsplash Downloader to download collections of images instead of individual ones.
you can use other tools for different sites like Reddit, Pinterest, etc
Determine Minimum Resolution:
Resolution.pyfile in the same folder as your images.
Run the script using
The script will provide you with the largest and smallest width and height values.
Take note of the smallest value (either width or height, whichever is smaller) for the next step.
Important: We need to ensure that all images in your dataset meet the minimum resolution requirement you want (e.g., if you want final images to be 1024x1024). We need to delete any images with height or width less than this minimum resolution.
Now it will ask to enter minimum resolution of image you want, any image with either width or height less than this value will get deleted.
It will generate two files:
.txtfile contains the names of images to be deleted.
.batfile to remove these images.
You can delete the
.batfile and move
Resolution.pyout of your image folder.
Four new folders will be created:
Place your images in the
Set Crop Dimensions:
Crop.pyagain and enter the value you want as resolution (eg. 1024). This will prevent the crop box from going below this resolution.
The script will start. Use Scroll to adjust the size of crop box (Shift+Scroll for fine adjustments) and mouse to drag the box.
Hit Enter to move to the next image, or press Esc to skip an image.
Use Ctrl+C in the console window to end the process.
Resize Images in Photoshop:
Now all your images are in 1:1 but different resolutions.
Use a program like Photoshop to resize all cropped images to your desired resolution.
In Photoshop, go to
File > Scripts > Image Processor.
Select the source folder (new cropped images) and the output folder.
Check "Save as JPEG" and "Resize to Fit."
Set the quality to 12 and specify the desired width and height (e.g., 512x512 or 1024x1024).
Uncheck any other options and click "Run."
Duplicates Removal (Optional):
If you downloaded multiple collections, you may encounter duplicates.
Use a tool like AntiDupl to identify and remove duplicate images from your dataset.
That's it! You now have a dataset of regularized images ready for your project.
Important: Make sure to always take backup of your files, cause even though the scripts are tested, it might delete some stuff, so please take care of that.