Hi, here is a free google colab to prepare your dataset.
From your Google Drive Folder it will :
Convert Webp to Jpg,
Resize the image to 1024 pixels for the bigger side,
Detect Text Watermak (automaticly or specific words of your choosing) and blur them or crop them,
Do BLIP2 captioning with a prefix of you choosing.
All of that with a web gradio graphic interface.
To browse your Google drive, choose a folder and click "enter selected item".
Download, unzip and paste it in google drive in the colab netbook folder and run (faster with T4 GPU support)
I hope you will like it and if can still support me here : https://ko-fi.com/photobait or with few buzz.
I'm working to convert also AVIF and PNG and improve the captioning. I would also like to add to the watermark detection the ability to show on a picture what to detect on the others.