Sign In
Screencap Datasets: Where to Find Them?

Screencap Datasets: Where to Find Them?

Published at Mar 2, 2024 - edited and expanded for clarity.

Introduction:

Tired of sifting through complicated datascrapers or wading through endless code? Sometimes, all you need is a simple solution: a curated collection of hand-capped images, lovingly gathered by generous individuals online. That's exactly what we're offering – a treasure trove of screencap datasets, plus some insider tips on where to find more. As a bonus, we'll introduce you to our own capping project, as well as some hidden gems on Tumblr and Hugging Face.

Part One

Our Screencapping Journey:

We're excited to share our new screencapping project, CAPSEKAI, with you. As fans of anime and gaming, we're having a blast discovering lost and found media, and we're happy to share our finds with you. Our mission is simple: we cap what we love, and we take requests too. While not everything we cap will make it into our dataset, we're passionate about building a collection that's both fun and useful.

Where to find our caps:

You can find our caps at two locations:

Huggingface: https://huggingface.co/Capsekai

Tumblr: https://capsekai.tumblr.com/

Part Two

Locations on Tumblr

Apart from OUR specific caps blog there are more locations on tumblr:

https://captasticcaps.tumblr.com/ - They have a really LARGE list, and some really large datasets including Sailor moon, and some older movies.

https://screencaps.tumblr.com/ - More or less a gallery, you'll need a "mass downloader" For this.

https://neverscreens.tumblr.com/post/699019483965259777/masterlist-of-screencaps-pt1

https://waftingcurtains.tumblr.com/capseroo

https://hd-screencaps.tumblr.com/galleries

https://www.tumblr.com/tagged/movie%20screencaps

https://soul-eater-screencaps.tumblr.com/

Clearly, there are MANY MORE on tumblr LOL. We're just scraping the surface.

Locations Elsewhere

https://screencapped.net/board/forumdisplay.php?fid=491

https://movie-screencaps.com/ - Warning on this one the file sites aren't dodgy to get them from after this just that yo'ure gonna fight for ads etc.

https://fancaps.net/movies/

https://www.cap-that.com/

https://screenmusings.org/

https://screencapped.net/

https://www.bluscreens.net/screen-captures-a-z--updates.html

https://film-grab.com/tag/screenshots/

https://www.trekcore.com/

https://www.homeofthenutty.com/

https://starwarsscreencaps.com/

https://www.outkick.com/tag/morning-screencaps/

https://www.livejournal.com/blogs/en/screencaps

Where else?

If you're feeling adventurous, you can also search for public datasets on Hugging Face, where some enthusiasts might have shared their own collections. Just remember to respect their work and don't name-drop anyone without permission!

Additionally, there are websites with public archives of video game art, such as CreativeUncut, which offer HD access and zipped archives for a minimal monthly subscription on Patreon. You can also explore other resources like sprite databases and Booru-type boards. However, be cautious when browsing these sites, as some may contain mature content. Please be aware of international and local laws regarding access to such content, and ensure you're at least 18 years old to comply with these regulations.

Part Three: The Lowdown on Screencapping

Why not use a datascraper?

Datascrapers are powerful tools that can help you gather data quickly and efficiently. However, sometimes you may need a more specialized approach to find exactly what you're looking for. If you're searching for something rare or unique, a datascraper might not be able to deliver. That's when you need to think outside the box and explore alternative methods. You might be surprised at what you can find!

So if you're looking for something out of the rarity, there are resources from other articles we can eventually link to!

Why cap things yourself?

If you're curious about screencapping, you can use VLC Media Player to capture screenshots from videos. Here's a quick rundown on how to do it:

  1. Open VLC Media Player and load the video you want to capture.

  2. Press Ctrl + R (Windows) or Cmd + R (Mac) to open the "Convert/Save" window.

  3. In the "Convert" tab, select "Video" as the capture mode.

  4. Choose the desired video codec, frame rate, and resolution.

  5. Click on the "Filters" tab and select "Scene video filter" from the list of available filters.

  6. Configure the scene detection settings to your liking, such as the threshold for scene changes.

  7. Click "Save" to start capturing screenshots. VLC will automatically detect scene changes and capture screenshots at those points.

  8. Use the "Snapshot" button or press Shift + S to capture additional screenshots manually.

Tips:

  • The Scene Video Filter can be adjusted to detect scene changes based on various criteria, such as brightness, color, or motion.

  • You can also use the filter to capture screenshots at specific intervals or when a certain event occurs, like when a new scene is detected.

  • If you're already a Lora or model trainer, you know the importance of high-quality screenshots for training your models.

Let's be real, screencapping and ripping videos is already a gray area. But hey, it's all for research and learning, right? Just don't go after some newbie artist who just released their first animatic and has no clue about AI. Be kind, folks!

What to watch for

When searching for screencaps or trying to cap things yourself, beware of dodgy websites and rogue ads. Don't venture into the dark web, and please, for the love of god, use an adblocker! Pinterest can be useful, but watch out for lower-quality content. And as for Tumblr, be cautious when scraping for art, as some independent artists are now using tools like Nightshade to add noise to their work, which could potentially "poison" AI models. However, without knowing the training data and algorithms used in these models, it's unclear how this might affect the output. Proceed with caution!

Part Four: Resource Articles

Hw Tagger: https://civitai.com/articles/5752/hw-tagger-application-how-to-use-basic-tutorial

Sort Images: https://civitai.com/articles/5707/tool-to-sort-images-and-captions-and-more-for-datasets

GitPull: https://civitai.com/articles/1024/so-you-fucked-up-and-did-a-git-pull-in-a1111

No way to FCK up: https://civitai.com/articles/1722/the-no-way-to-fuck-it-up-this-time-guide-to-installing-auto1111-or-forge-or-the-proompt-ninja

SD Prompt Reader: https://civitai.com/articles/2584/sd-get-prompt-easy-display-for-stable-diffusion-exif-data-on-gtk-dialog

Dataset Creator: https://civitai.com/articles/3627/loradataset-creator-v2

Comfy UI: https://civitai.com/articles/3304/a-guide-to-diving-into-ai-image-creation-on-comfyui-with-gpu-rentals

Regional Prompter: https://civitai.com/articles/3437/using-regional-prompter-extension-in-automatic1111

Onsite Lora Trainer: https://civitai.com/articles/2175/using-civitai-the-on-site-lora-trainer

Face Cropper: https://civitai.com/articles/2147/face-cropper-tool-to-automatically-crop-the-faces-from-photos

Colab Merger: https://civitai.com/articles/1619/how-to-script-merge-flow-quickly-for-colab

Chattori's Merger: https://civitai.com/articles/654/how-to-use-chattioris-model-merger-bismuthmix-v40-recipe

Xview: https://civitai.com/articles/2499/the-best-image-viewer-in-my-opinion

EmbLab: https://civitai.com/articles/5382/emblab-experimental-embedding-lab-extension-for-a1111-sd15

Danbooru tagging Viz: https://civitai.com/articles/5150/danbooru-tagging-visualization-for-ponyxl-autismmix

Pony Cheatsheet: https://civitai.com/articles/4829/pony-cheatsheet-new-version-linked-at-top

Score 9: https://civitai.com/articles/4248/what-is-score9-and-how-to-use-it-in-pony-diffusion

Mnemic's Lora Training: https://civitai.com/articles/2138/lora-datasets-training-data-list-civitai-dataset-guide

Ads with Invoke: https://civitai.com/articles/723/make-ai-ads-with-invokeai-easy

Lora Info Editor: https://civitai.com/articles/3595/lora-info-editor-edit-or-remove-metadata-or-lora-yuan-or-lora

YoloV8: https://civitai.com/articles/4080/training-a-custom-adetailer-model-with-yolov8-detection-model

Prompt: https://civitai.com/articles/1009/prompt-guidance-tags-to-avoid-and-useful-tags-to-include

Guy90's Dataprep: https://civitai.com/articles/91/how-to-correctly-obtain-images-for-a-dataset

Rulles: https://civitai.com/articles/75/useful-online-tools-for-datasets-and-where-to-find-data

https://civitai.com/articles/269/how-to-make-other-file-types-usable-pdfs-gifs-webps-avifs-jpegs

AsaTyr: https://civitai.com/articles/5106/tagging-listsindex-bring-order-into-chaos

Extra2AB: https://civitai.com/articles/2333/how-to-prepare-regularization-images

DonMischo: https://civitai.com/articles/3432/wip-resources-for-fantasy-art-and-creatures

Laymans Anzch Regularization: https://civitai.com/articles/3342/regularization-from-the-ai-layman-perspective

Webui Addons: https://civitai.com/articles/3289/developing-webui-addons-being-a-good-citizen

Clip STudio Tutorial: https://civitai.com/articles/5985/editing-your-outputs-with-clip-studio-rough-guide

Diffusers Conversion: https://civitai.com/articles/2756/convert-15-and-sdxl-to-diffusers

Training Loras: https://civitai.com/articles/1716/a-fresh-approach-to-sdxl-and-pony-xl-lora-training

Dataset Tools: https://civitai.com/articles/5720/dataset-tools-from-earth-and-dusk-image-and-captions-editor

Large Dataset: https://civitai.com/articles/699/large-dataset-lora-tips-and-tricks-google-colab-sd-15-optimized

No person Loras: https://civitai.com/articles/2667/quick-guide-to-no-person-style-loras

TIs: https://civitai.com/articles/1184/make-your-own-textual-inversions-with-just-a-simple-prompt-embedding-merge-tutorial

Underfitting and overfitting: https://civitai.com/articles/5467/identifying-underfitting-and-overfitting

Meta Human Creator: https://civitai.com/articles/5391/no-person-loras-metahuman-creator

Compatibiltiy Issues with Clips on SDXL and SD 1.5: https://civitai.com/articles/4859/the-incompatibility-of-sd-15-embeds-with-sdxlpdxl (Contested, as technically in a way they are but not in the way you think)

Supermerger Video: https://civitai.com/articles/3862/supermerger-tutorial-video-new-version-added

Convert Back to safetnsors: https://civitai.com/articles/3551/elusive-convert-back-to-safetensors

Miro Board: https://civitai.com/articles/3190/miro-compare-your-models

Huggingface Demos: https://civitai.com/articles/2425/tutorial-free-demo-spaces-for-sd-models-on-huggingface

Online Privacy: https://civitai.com/articles/2360/protecting-personal-privacy-best-practices-for-ai-art-model-creators

Rented Gpus" https://civitai.com/articles/583/differences-between-colab-and-rented-gpus-a-short-comparison

(Outdated) Lora Training Theory: https://civitai.com/articles/1057/lora-training-theory-what-model-do-i-pick-to-train-it-on

Extension Picks: https://civitai.com/articles/2448/the-model-makers-toolkit-extensions-you-cant-miss



About Us

We are the Duskfall Portal Crew, a DID system with over 300 alters, navigating life with DID, ADHD, Autism, and CPTSD. We believe in AI’s potential to break down barriers and enhance mental health, despite its challenges. Join us on our creative journey exploring identity and expression.


Join Our Community

Community Groups:


Embeddings to Improve Quality

Negative Embeddings: Use scenario-specific embeddings to refine outputs.

Positive Embeddings: Enhance image quality with these embeddings.


Extensions

  • ADetailer: ADetailer GitHub

    • Usage: Use this extension to enhance and refine images, but use sparingly to avoid over-processing with SDXL.

  • Batchlinks: Batchlinks for A1111

    • Description: Manage multiple links when running A1111 locally or on a server.

    • Addon: @nocrypt Addon (The link is broken for now i'll find it later OOPS)

Additional Extensions:


Backups for Loras on SDXL & Pony XL:

Referral codes for Vast & Runpod:

VastAI: https://cloud.vast.ai/?ref=70354

Runpod: https://runpod.io/?ref=yx1lcptf

48

Comments