⚠️This reminder/tip is urgent.⚠️
Reference: https://civitai.com/articles/3796/be-careful-of-where-you-get-your-images-for-training
No, I am not kidding. This is a very urgent reminder.
So you have to be cautious of what you gather for your training dataset. Here's why:
There is a tool called Nightshade that is designed to poison the training data. It causes the image to appear as something different to the AI, which is great for preventing anti-AI artists from knowing their art has been used in training datasets at all. This can also cause distorted images to be generated. There's this article that shows how an output image would look like if the training data was affected by this tool.
There are also some artists who don't want their art to be used in training datasets (like I said before), and you HAVE to respect them. If you search up "Do not use my art for ai" on DeviantArt, you may find lots of "No AI" posts, and this would also be a good way to find out whether the artist is against their art being used for training datasets or not.
Not to mention, on Sketchers United, some artists don't want their art to be downloaded and just toggle download to let people see the better resolution/quality. You cannot really let an artist know you downloaded their image, though (unless you took a screenshot/have a video of the file being downloaded or you have other signs you downloaded the file).
Do not gather images or videos from anti-AI platforms like Cara or Fusion for training datasets, since these platforms are against AI. Be careful when gathering images from sources where you're not allowed to post AI art, since some sources who have this rule may be against AI and therefore forbid AI art.
Some sites like X and Instagram say that they will train their art posted there to train AI models on them. It doesn't mean that all artists who post their art there grant you the right to use their art in training datasets, though. Some artists may post their art there but still prohibit the use of them in datasets, as they may not be aware that the platforms they post their art on say they will train their art posted there on AI models, so you can ask them to see if they really allow or prohibit it.
It may be difficult for artists to know if their art has been used in AI training datasets. Yes, it actually is.
It's common for people to state what the model's dataset contains in the model's description. However, it should be noted that descriptions can be misleading, and jokes are also possible, meaning that these descriptions DO NOT confirm the actual dataset. If you see the description and immediately say something like "Oh how dare you trained this model on my art without permission", this might be considered a false accusation depending on the actual dataset.
Questions about the model's dataset would also be useless, as the answerer can also lie about it. Do not immediately believe any answers about it.
A misconception that one may have is that it's easy to verify by the artstyle that the LoRA generates. The fact that imitating other artists' artstyles is possible means that it is incorrect.
An anti-AI filter, Glaze, is meant to prevent AI from imitating an artist's style, making it difficult to know if their art is used in a dataset.
There's also an anti-AI filter named Nightshade that poisons training datasets, causing a low-quality model. This also contributes to the difficulty of verification.
Metadata viewers are meant to view metadata of files, but they are useless, since metadata can be edited or corrupted. Even if the dataset captions in the metadata viewer are accurate, dataset images can still be mistagged.
References/External links mentioned above
https://civitai.com/articles/3796/be-careful-of-where-you-get-your-images-for-training
https://towardsdatascience.com/how-nightshade-works-b1ae14ae76c3
https://www.deviantart.com/search?q=Do+not+use+my+art+for+ai
Anti-AI platforms
Any platforms listed there are known to be against AI. Usually, these platforms promise to not use art for AI. Just because a platform doesn't allow AI art, doesn't mean it will be listed here (although anti-AI platforms usually don't allow AI-generated images). It is not advised to use any of the images from these kinds of platforms, since these platforms are anti-AI. I provide links to these anti-AI platforms, so you can see what images shouldn't you use.