Screencap Datasets: Where to Find Them?
Published at Mar 2, 2024 - edited and expanded for clarity.
Introduction:
Tired of sifting through complicated datascrapers or wading through endless code? Sometimes, all you need is a simple solution: a curated collection of hand-capped images, lovingly gathered by generous individuals online. That's exactly what we're offering – a treasure trove of screencap datasets, plus some insider tips on where to find more. As a bonus, we'll introduce you to our own capping project, as well as some hidden gems on Tumblr and Hugging Face.
Part One
Our Screencapping Journey:
We're excited to share our new screencapping project, CAPSEKAI, with you. As fans of anime and gaming, we're having a blast discovering lost and found media, and we're happy to share our finds with you. Our mission is simple: we cap what we love, and we take requests too. While not everything we cap will make it into our dataset, we're passionate about building a collection that's both fun and useful.
Where to find our caps:
You can find our caps at two locations:
Huggingface: https://huggingface.co/Capsekai
Tumblr: https://capsekai.tumblr.com/
Part Two
Locations on Tumblr
Apart from OUR specific caps blog there are more locations on tumblr:
https://captasticcaps.tumblr.com/ - They have a really LARGE list, and some really large datasets including Sailor moon, and some older movies.
https://screencaps.tumblr.com/ - More or less a gallery, you'll need a "mass downloader" For this.
https://neverscreens.tumblr.com/post/699019483965259777/masterlist-of-screencaps-pt1
https://waftingcurtains.tumblr.com/capseroo
https://hd-screencaps.tumblr.com/galleries
https://www.tumblr.com/tagged/movie%20screencaps
https://soul-eater-screencaps.tumblr.com/
Clearly, there are MANY MORE on tumblr LOL. We're just scraping the surface.
Locations Elsewhere
https://screencapped.net/board/forumdisplay.php?fid=491
https://movie-screencaps.com/ - Warning on this one the file sites aren't dodgy to get them from after this just that yo'ure gonna fight for ads etc.
https://www.bluscreens.net/screen-captures-a-z--updates.html
https://film-grab.com/tag/screenshots/
https://www.homeofthenutty.com/
https://starwarsscreencaps.com/
https://www.outkick.com/tag/morning-screencaps/
https://www.livejournal.com/blogs/en/screencaps
Where else?
If you're feeling adventurous, you can also search for public datasets on Hugging Face, where some enthusiasts might have shared their own collections. Just remember to respect their work and don't name-drop anyone without permission!
Additionally, there are websites with public archives of video game art, such as CreativeUncut, which offer HD access and zipped archives for a minimal monthly subscription on Patreon. You can also explore other resources like sprite databases and Booru-type boards. However, be cautious when browsing these sites, as some may contain mature content. Please be aware of international and local laws regarding access to such content, and ensure you're at least 18 years old to comply with these regulations.
Part Three: The Lowdown on Screencapping
Why not use a datascraper?
Datascrapers are powerful tools that can help you gather data quickly and efficiently. However, sometimes you may need a more specialized approach to find exactly what you're looking for. If you're searching for something rare or unique, a datascraper might not be able to deliver. That's when you need to think outside the box and explore alternative methods. You might be surprised at what you can find!
So if you're looking for something out of the rarity, there are resources from other articles we can eventually link to!
Why cap things yourself?
If you're curious about screencapping, you can use VLC Media Player to capture screenshots from videos. Here's a quick rundown on how to do it:
Open VLC Media Player and load the video you want to capture.
Press
Ctrl + R
(Windows) orCmd + R
(Mac) to open the "Convert/Save" window.In the "Convert" tab, select "Video" as the capture mode.
Choose the desired video codec, frame rate, and resolution.
Click on the "Filters" tab and select "Scene video filter" from the list of available filters.
Configure the scene detection settings to your liking, such as the threshold for scene changes.
Click "Save" to start capturing screenshots. VLC will automatically detect scene changes and capture screenshots at those points.
Use the "Snapshot" button or press
Shift + S
to capture additional screenshots manually.
Tips:
The Scene Video Filter can be adjusted to detect scene changes based on various criteria, such as brightness, color, or motion.
You can also use the filter to capture screenshots at specific intervals or when a certain event occurs, like when a new scene is detected.
If you're already a Lora or model trainer, you know the importance of high-quality screenshots for training your models.
What about Copyright?
Let's be real, screencapping and ripping videos is already a gray area. But hey, it's all for research and learning, right? Just don't go after some newbie artist who just released their first animatic and has no clue about AI. Be kind, folks!
What to watch for
When searching for screencaps or trying to cap things yourself, beware of dodgy websites and rogue ads. Don't venture into the dark web, and please, for the love of god, use an adblocker! Pinterest can be useful, but watch out for lower-quality content. And as for Tumblr, be cautious when scraping for art, as some independent artists are now using tools like Nightshade to add noise to their work, which could potentially "poison" AI models. However, without knowing the training data and algorithms used in these models, it's unclear how this might affect the output. Proceed with caution!
Part Four: Resource Articles
Hw Tagger: https://civitai.com/articles/5752/hw-tagger-application-how-to-use-basic-tutorial
Sort Images: https://civitai.com/articles/5707/tool-to-sort-images-and-captions-and-more-for-datasets
GitPull: https://civitai.com/articles/1024/so-you-fucked-up-and-did-a-git-pull-in-a1111
No way to FCK up: https://civitai.com/articles/1722/the-no-way-to-fuck-it-up-this-time-guide-to-installing-auto1111-or-forge-or-the-proompt-ninja
SD Prompt Reader: https://civitai.com/articles/2584/sd-get-prompt-easy-display-for-stable-diffusion-exif-data-on-gtk-dialog
Dataset Creator: https://civitai.com/articles/3627/loradataset-creator-v2
Regional Prompter: https://civitai.com/articles/3437/using-regional-prompter-extension-in-automatic1111
Onsite Lora Trainer: https://civitai.com/articles/2175/using-civitai-the-on-site-lora-trainer
Face Cropper: https://civitai.com/articles/2147/face-cropper-tool-to-automatically-crop-the-faces-from-photos
Colab Merger: https://civitai.com/articles/1619/how-to-script-merge-flow-quickly-for-colab
Chattori's Merger: https://civitai.com/articles/654/how-to-use-chattioris-model-merger-bismuthmix-v40-recipe
Xview: https://civitai.com/articles/2499/the-best-image-viewer-in-my-opinion
EmbLab: https://civitai.com/articles/5382/emblab-experimental-embedding-lab-extension-for-a1111-sd15
Danbooru tagging Viz: https://civitai.com/articles/5150/danbooru-tagging-visualization-for-ponyxl-autismmix
Pony Cheatsheet: https://civitai.com/articles/4829/pony-cheatsheet-new-version-linked-at-top
Score 9: https://civitai.com/articles/4248/what-is-score9-and-how-to-use-it-in-pony-diffusion
Mnemic's Lora Training: https://civitai.com/articles/2138/lora-datasets-training-data-list-civitai-dataset-guide
Ads with Invoke: https://civitai.com/articles/723/make-ai-ads-with-invokeai-easy
Lora Info Editor: https://civitai.com/articles/3595/lora-info-editor-edit-or-remove-metadata-or-lora-yuan-or-lora
YoloV8: https://civitai.com/articles/4080/training-a-custom-adetailer-model-with-yolov8-detection-model
Prompt: https://civitai.com/articles/1009/prompt-guidance-tags-to-avoid-and-useful-tags-to-include
Guy90's Dataprep: https://civitai.com/articles/91/how-to-correctly-obtain-images-for-a-dataset
Rulles: https://civitai.com/articles/75/useful-online-tools-for-datasets-and-where-to-find-data
https://civitai.com/articles/269/how-to-make-other-file-types-usable-pdfs-gifs-webps-avifs-jpegs
AsaTyr: https://civitai.com/articles/5106/tagging-listsindex-bring-order-into-chaos
Extra2AB: https://civitai.com/articles/2333/how-to-prepare-regularization-images
DonMischo: https://civitai.com/articles/3432/wip-resources-for-fantasy-art-and-creatures
Laymans Anzch Regularization: https://civitai.com/articles/3342/regularization-from-the-ai-layman-perspective
Webui Addons: https://civitai.com/articles/3289/developing-webui-addons-being-a-good-citizen
Clip STudio Tutorial: https://civitai.com/articles/5985/editing-your-outputs-with-clip-studio-rough-guide
Diffusers Conversion: https://civitai.com/articles/2756/convert-15-and-sdxl-to-diffusers
Training Loras: https://civitai.com/articles/1716/a-fresh-approach-to-sdxl-and-pony-xl-lora-training
Dataset Tools: https://civitai.com/articles/5720/dataset-tools-from-earth-and-dusk-image-and-captions-editor
Large Dataset: https://civitai.com/articles/699/large-dataset-lora-tips-and-tricks-google-colab-sd-15-optimized
No person Loras: https://civitai.com/articles/2667/quick-guide-to-no-person-style-loras
Underfitting and overfitting: https://civitai.com/articles/5467/identifying-underfitting-and-overfitting
Meta Human Creator: https://civitai.com/articles/5391/no-person-loras-metahuman-creator
Compatibiltiy Issues with Clips on SDXL and SD 1.5: https://civitai.com/articles/4859/the-incompatibility-of-sd-15-embeds-with-sdxlpdxl (Contested, as technically in a way they are but not in the way you think)
Supermerger Video: https://civitai.com/articles/3862/supermerger-tutorial-video-new-version-added
Convert Back to safetnsors: https://civitai.com/articles/3551/elusive-convert-back-to-safetensors
Miro Board: https://civitai.com/articles/3190/miro-compare-your-models
Huggingface Demos: https://civitai.com/articles/2425/tutorial-free-demo-spaces-for-sd-models-on-huggingface
Online Privacy: https://civitai.com/articles/2360/protecting-personal-privacy-best-practices-for-ai-art-model-creators
Rented Gpus" https://civitai.com/articles/583/differences-between-colab-and-rented-gpus-a-short-comparison
(Outdated) Lora Training Theory: https://civitai.com/articles/1057/lora-training-theory-what-model-do-i-pick-to-train-it-on
Extension Picks: https://civitai.com/articles/2448/the-model-makers-toolkit-extensions-you-cant-miss
About & Links
About Us
We are the Duskfall Portal Crew, a DID system with over 300 alters, navigating life with DID, ADHD, Autism, and CPTSD. We believe in AI’s potential to break down barriers and enhance mental health, despite its challenges. Join us on our creative journey exploring identity and expression.
Join Our Community
Website: End Media
Discord: Join our Discord
Backups: Hugging Face
Support Us: Send a Pizza
Patreon: https://www.patreon.com/earthndusk
Community Groups:
Subreddit: Reddit
Embeddings to Improve Quality
Negative Embeddings: Use scenario-specific embeddings to refine outputs.
Positive Embeddings: Enhance image quality with these embeddings.
PLEASE for optimal depth and clarity use @Zovya's ZPDXL series.
Extensions
ADetailer: ADetailer GitHub
Usage: Use this extension to enhance and refine images, but use sparingly to avoid over-processing with SDXL.
Batchlinks: Batchlinks for A1111
Description: Manage multiple links when running A1111 locally or on a server.
Addon: @nocrypt Addon (The link is broken for now i'll find it later OOPS)
Additional Extensions:
Backups for Loras on SDXL & Pony XL:
2024: https://huggingface.co/EarthnDusk/SDXL_Lora_Dump_2024/tree/main
2023: https://huggingface.co/EarthnDusk/Loras-SDXL/tree/main
Referral codes for Vast & Runpod:
VastAI: https://cloud.vast.ai/?ref=70354
Runpod: https://runpod.io/?ref=yx1lcptf