Long time no see! Let me update you on the latest developments:In a nutshell:<ol><li>We have launched version 1.4 of the fully automated LoRA process.</li><li>We've implemented an algorithm for extracting and clustering character images from anime videos.</li><li>For the v1.4 process, we conducted tests on both the web-based dataset (collected and cleaned from image websites) and the anime-based dataset (generated from anime video keyframes using target detection algorithms). Both datasets yielded impressive models:<ul><li>For the web-based dataset, the v1.4 process produces LoRA models with significantly improved detail quality compared to previous versions, while maintaining sufficient generalization.</li><li>For the anime-based dataset, v1.4 achieves extremely high fidelity on large or massive datasets, preserving a high level of generalization.</li></ul></li></ol><h1 id="heading-2347">Dataset Scale</h1>Let's define datasets based on their scale:<ol><li>1-5 images: Few-shot dataset</li><li>5-20 images: Tiny dataset</li><li>20-60 images: Small dataset</li><li>60-150 images: Medium dataset</li><li>150-350 images: Large dataset</li><li>350+ images: Massive dataset</li></ol>As far as we know, the majority of manually trained LoRA datasets (including those widely considered to have high quality on civitai) fall within the tiny to small dataset range. In fact, collecting and processing datasets of medium size and above through manual efforts is usually impractical due to the tremendous amount of human labor required.However, when it comes to character datasets extracted from web images or anime videos, these limitations are less significant. The main challenge lies in some characters having fewer images available online, often due to lower popularity, or in cases where characters have limited screen time in the anime, leading to a scarcity of usable images.<h1 id="heading-2348">About the v1.4 Process</h1><h2 id="heading-2349">Process and Versions</h2>It's important to clarify that the versions (such as <code>v1.0</code>, <code>v1.3</code>, etc.) associated with the models released by this account do not refer to individual model or character versions. Instead, they represent the process versions used in the automated training pipeline for the models. In simple terms, all models labeled as <code>v1.0</code> were trained using the same process, and similarly, models labeled as <code>v1.3</code> used a distinct but consistent process.Let's briefly describe the currently available process versions:<ol><li><code>v1.0</code> Process:<ul><li>Dataset Source: Character datasets automatically collected and cleaned from various websites (including zerochan, anime-pictures, danbooru, and over a dozen more), capped at 200 images per character (many characters have more images available), with no additional augmentation.</li><li>Training Approach: NAI model training; all images resized to 640x880 for training; fixed training steps at 1500 regardless of dataset size.</li><li>Preview Images: Generated using the anything-v5 model; prompts for preview images are mainly clustered based on dataset tags, plus 1-2 general prompts.</li><li>Most of the models previously uploaded by this account belong to the <code>v1.0</code> process.</li></ul></li><li><code>v1.3</code> Process:<ul><li>Dataset Source: Same as <code>v1.0</code>.</li><li>Training Approach: Trained for 12 epochs regardless of dataset size; other aspects remain the same as <code>v1.0</code>.</li><li>Preview Images: Generated using meinamix_v11; additional prompts for changing clothing (miko, maid, suit, yukata) and NSFW prompts to test model generalization.</li><li>This is the result of the first round of process improvements, showing some level of quality enhancement.</li></ul></li><li><code>v1.4</code> Process:<ul><li>This is the latest process, and the focus of this article.</li></ul></li></ol>To visually demonstrate the quality of LoRA models produced by previous processes, let's take a look at a few images.The following examples roughly represent the typical quality of models produced by the <code>v1.0</code> process:<img src="https://image.civitai.com/xG1nkqKTMzGDvpLrqFT7WA/7eef52ec-2203-4144-94eb-85373aa11979/width=525/7eef52ec-2203-4144-94eb-85373aa11979.jpeg" />Similarly, the following examples roughly represent the typical quality of models produced by the <code>v1.3</code> process:<img src="https://image.civitai.com/xG1nkqKTMzGDvpLrqFT7WA/a0bb5bda-6842-4d6c-8640-1ab4b72f32c3/width=525/a0bb5bda-6842-4d6c-8640-1ab4b72f32c3.jpeg" /><h2 id="heading-2350">Extraction of Characters from Anime Videos</h2>Firstly, the <code>v1.4</code> process now supports LoRA training on characters from anime videos. It involves a complete automated pipeline from the original video to a character dataset, as outlined below:<ul><li>Obtain the magnet link or torrent file for the anime video resources.</li><li>Automatically download video resources to the cluster.</li><li>Automatically extract keyframes using anime video keyframe extraction techniques.</li><li>Automatically capture all characters from the keyframes using AI techniques like object detection.</li><li>Automatically clean the data.</li><li>Automatically cluster the characters based on the extracted CCIP feature vectors.</li><li>Automatically package the clustered results and upload them to Hugging Face.</li></ul>An <a target="_new" rel="ugc" href="https://huggingface.co/datasets/BangumiBase/imocho">example of an uploaded dataset</a> can be found here, as shown in the image below:<img src="https://image.civitai.com/xG1nkqKTMzGDvpLrqFT7WA/2ca8dcf9-5faf-4542-a8b6-4eb1c3d815d5/width=525/2ca8dcf9-5faf-4542-a8b6-4eb1c3d815d5.jpeg" />You can observe that nearly all character images have been extracted.However, it's worth noting that the current clustering algorithm is still not perfect (in reality, CCIP was mainly trained on illustration data and performs less effectively on anime videos), which may result in some impurities and confusion. The packaged character data is not guaranteed to be 100% accurate. Nevertheless, this is not a major concern, as the errors are well within an acceptable range, and subsequent secondary refining can effectively eliminate them.After this, all that remains is to perform secondary processing on the character data package and associate the index (the leftmost first column) with the character name, as demonstrated here:<ul><li><a target="_new" rel="ugc" href="https://huggingface.co/datasets/CyberHarem/mitsuki_kanzaki_imocho">Link to Example - Mitsuki Kanzaki</a></li><li><a target="_new" rel="ugc" href="https://huggingface.co/datasets/CyberHarem/hiyori_kotobuki_imocho">Link to Example - Hiyori Kotobuki</a></li></ul><h2 id="heading-2351"> v1.4 Process</h2>Speaking of the latest <code>v1.4</code> process, let's discuss the main improvements made:<ul><li>Dataset:<ol><li>Implemented a 3-stage cropping approach (full body - upper body - close-up of head) for characters on top of the existing dataset.</li><li>After removing small-sized images, saved the dataset as three separate copies.</li><li>This means that for the original large datasets with 200 images, the number often increases to around 500 after expansion, forming massive datasets.</li><li>An example of a processed dataset can be found here: <a target="_new" rel="ugc" href="https://huggingface.co/datasets/CyberHarem/surtr_arknights/blob/main/dataset-raw-stage3.zip">Dataset Example</a></li></ol></li><li>Training Approach:<ol><li>Trained using the NAI model.</li><li>Clustered all images based on aspect ratio into several buckets used directly for training.</li><li>Defaulted to train for 15 epochs (for the mentioned massive datasets, this means over 7000 steps, taking up to 45 minutes).</li><li>For small and medium-sized datasets, at least 3000 steps are trained (micro datasets are currently not within our scope).</li></ol></li><li>Preview Images:<ol><li>Still generated using MeinaMix V11.</li><li>Designed an evaluation metric for character fidelity (called Recognition Score or RecScore, with range of 0.0-1.0, 0.0 means not similar at all, 1.0 means 100% percent similar to dataset) based on CCIP. This involves using CCIP to compare the batch-wise recognizability of images from the dataset and preview images and calculating a score.</li><li>With RecScore, it's possible to assess the quality of models at various training steps and automatically select the best quality step.</li></ol></li></ul>With the multiple improvements mentioned above, confirmed enhancements include:<ul><li>Significantly better facial detail fidelity, especially the quality of pupils.</li><li>Due to the introduction of the evaluation metric, it's possible to confidently use large training steps to ensure high fidelity.</li><li>For web-based LoRA, there is a significant improvement in overall quality and detail quality. For anime-based LoRA, both the character and style can be restored to the extent that they look like screenshots from videos.</li><li>At the same time, the original model's generalization ability has not decreased. It's still possible to use generic prompts for outfit changes, and there's almost no overfitting observed.</li></ul>To illustrate the effects more clearly, let's look at some comparisons.<h3 id="heading-2352">v1.0 vs. v1.4</h3>Here are two characters under the <code>v1.0</code> training process:<img src="https://image.civitai.com/xG1nkqKTMzGDvpLrqFT7WA/f051eb8d-d3ed-47b4-a183-de6fbabfdc43/width=525/f051eb8d-d3ed-47b4-a183-de6fbabfdc43.jpeg" /><img src="https://image.civitai.com/xG1nkqKTMzGDvpLrqFT7WA/2074c8c3-0f19-43e6-984f-fd2b626c5d95/width=525/2074c8c3-0f19-43e6-984f-fd2b626c5d95.jpeg" />And here is the same character, using the same original dataset (the dataset used in v1.4's 3-stage cropping is the same training dataset used in v1.0), under the <code>v1.4</code> training process:<img src="https://image.civitai.com/xG1nkqKTMzGDvpLrqFT7WA/e78ade5a-b885-4bf1-874a-0898f2496a90/width=525/e78ade5a-b885-4bf1-874a-0898f2496a90.jpeg" /><img src="https://image.civitai.com/xG1nkqKTMzGDvpLrqFT7WA/aa969b40-a965-4d34-a78a-3230e25713bf/width=525/aa969b40-a965-4d34-a78a-3230e25713bf.jpeg" />A significant improvement in facial detail is evident, with no observable loss of generalization. In fact, v1.4 might have a stronger generalization ability due to the use of massive datasets.The abovementioned models, you can take a try by yourself:<ul><li><a target="_blank" rel="ugc" href="https://civitai.com/models/130877?modelVersionId=151658">https://civitai.com/models/130877?modelVersionId=151658</a></li><li><a target="_blank" rel="ugc" href="https://civitai.com/models/121400?modelVersionId=151644">https://civitai.com/models/121400?modelVersionId=151644</a></li></ul><h3 id="heading-2353">v1.3 vs. v1.4</h3>Here's an anime character trained under the v1.3 process:<img src="https://image.civitai.com/xG1nkqKTMzGDvpLrqFT7WA/88e6a4e8-f5f9-4927-9380-0eb52570f078/width=525/88e6a4e8-f5f9-4927-9380-0eb52570f078.jpeg" />And here is the same character, using the same original dataset, under the <code>v1.4</code> training process:<img src="https://image.civitai.com/xG1nkqKTMzGDvpLrqFT7WA/36624fd4-94f5-4bc2-a089-17dac0063b5d/width=525/36624fd4-94f5-4bc2-a089-17dac0063b5d.jpeg" />The blurriness issue has been resolved, significant facial detail improvement is noticeable, and sufficient generalization is retained.This is the abovementioned model, you can take a try:<ul><li><a target="_blank" rel="ugc" href="https://civitai.com/models/135630?modelVersionId=150687">https://civitai.com/models/135630?modelVersionId=150687</a></li></ul><h1 id="heading-2354">Limitations of Existing Work</h1>Despite the substantial improvement in automatically generated model quality, there are still some limitations in the pipeline:<ul><li>Video Processing:<ul><li>There are a few instances of failed character detection, indicating that the object detection model still needs further refinement.</li><li>CCIP's accuracy on anime videos can still be improved.</li></ul></li><li>LoRA Training:<ul><li>The issue of dataset image quality filtering remains unsolved, which might result in low-quality images entering the dataset.</li><li>The clothing clustering problem for characters is yet to be resolved, requiring the training of a contrastive learning model similar to CCIP.</li><li>RecScore has difficulty distinguishing certain characteristic characters (e.g., characters with horns in Arknights), often yielding scores close to 1.0, even when underfitting is evident.</li><li>The main function of RecScore is limited to evaluating the character fidelity of the LoRA model. However, currently, there is still a lack of a metric that can assess the controllability or overfitting level of the LoRA model. One possible approach at the moment is to use CLIP to extract features from the generated images and compare them with the input prompts. Nevertheless, there are still several issues to be addressed, and we plan to conduct research and replication of relevant papers in this regard. If successful, the combined use of the controllability metric and RecScore would mean that the best-performing step in all aspects (fidelity, controllability, etc.) could be automatically selected.</li></ul></li></ul>Addressing these points will be the direction of our ongoing efforts.Please continue to follow our work.

6118ff6c-8201-4a0a-9f79-89ea73c1ae90

[2023-8-31] Release of v1.4 Training Automation Process

sexual situations

physical violence

disturbing

male nudity

hanging

hate symbols

nazi party

revealing clothes

weapon violence

female swimwear or underwear

male swimwear or underwear

partial nudity

white supremacy

adult toys

graphic male nudity

illustrated explicit nudity

nudity

graphic violence or gore

graphic female nudity

pg-13

corpses

wide hips

convenient censoring

peeing

oral invitation

emaciated bodies

exposed female nipple

blowjob

female nudity

sexual activity

sexual intent

undressed

male underwear

female swimwear

genitals

female underwear

thick thighs

breasts out

strapless leotard

vore

breast out

one breast out

huge breasts

gigantic breasts

huge butt

covered nipples

hair over breasts

no panties

sitting on face

anal

dildo riding

downblouse

oral

porn

futanari

hentai

nude

lingerie

nsfw

suggestive

child on child

self injury

extremist

hate speech

diapers

urine

incest

scat

sexy

latex clothing

swimwear

bukkake

fellatio

cumshot

implied fellatio

eat_cum

cumdrip

cum in pussy

cum on face

after fellatio

cum on hair

cum on body

cum on tongue

cum on hands

cum in mouth

triple fellatio

autofellatio

fucked silly

cum on pussy

pov fellatio