santa hat
deerdeer nosedeer glow
Sign In

[Summary 2023-8-25] What work have we done before these LoRAs?


User @bruh776 broght a very valuable question. So we decide to briefly introduce our overall work (original comment: ).

Hey I read the way you automated the generation of this lora, and it's cool too, though I don't understand none of what's written cuz it's all high level stuff, and I'm talking here with no knowledge whatsoever. You could web scrape from danbooru when a user inputs tags ig, like if I wanted a lora of ash from Pokémon I would input: ash(Pokémon) 1boy, and if there's a negative search bar 1girl or multiple_girls. Then after getting all the images and their tags you could train a lora with it. Does your program do the same or what does it do(can you explain in basic terms)

Here's our response:

Hello! Your overall understanding is basically correct, although what we do is more complex than that:

  1. Firstly, we need to acquire index information containing characters from various games and anime. In simple terms, you need to know which characters can be trained into LoRA. Without this index, batch producing LoRA would be impossible. However, because there are so many games and anime with characters scattered all over the internet, consolidating this information is quite a challenging task.

  2. Secondly, the image sources we scrape come from more than just danbooru. In fact, our dataset comes from nearly 20 websites including sankaku, danbooru, pixiv, zerochan, and others. Among them, the tags for the same character on different sites can vary (e.g., yae miko from Genshin Impact is tagged as yae_miko on danbooru, but as yae_miko_(genshin_impact) on sankaku). Moreover, there are characters with the same or similar names across different games (e.g., there are numerous characters named ash in different contexts, like Pokémon, Rainbow 6 and other shows). Therefore, a reliable solution is needed to match characters in the index with their corresponding tags on different websites. This step has very little room for error – a mistaken character match (assigning a character to the wrong tag) can lead to major dataset anomalies, and a failed character match can result in many high-quality images not being included in the dataset.

  3. Thirdly, images come from various websites, which means that almost all high-quality images on the internet are captured. However, these image sites also contain a significant portion of low-quality images, images with incorrect tags, and images with multiple individuals. Even on well-known sites like danbooru, the proportion of low-quality images can be higher than most people expect; for images with incorrect tags, even for critical character tags, the error rate is close to 5%, based on our rough estimation. These issues can have a detrimental impact on LoRA training (in fact, before we solved the filtering problem for incorrect characters, the quality of batch-trained LoRA was much lower than what you see now). Therefore, addressing how to select and crop the required character images, as well as filtering out low-quality images, are all critical. Regarding the judgment and filtering of low-quality images, we are currently working on solutions, which has the potential to significantly improve LoRA quality in the future.

  4. Fourthly, the code for a1111's webUI is actually difficult to integrate into an automated LoRA training pipeline due to its deep system-level coupling. While it's undoubtedly a successful AI drawing tool for regular users, it's not suitable for bulk operations. So, we need to build a complete framework for data management, model training, model management, and model deployment to achieve automation in the middle and end stages.

The above is an overview of the entire process. In fact, we've done a lot of preliminary work for this pipeline, including writing tens of thousands of lines of code and training over 20 AI models (mostly used for data cleaning and processing). It's a lengthy and complex process with many problems to solve. And what our team truly focuses on isn't just the quality of LoRA for one or a few characters; it's about constructing a fully automated pipeline that can produce high-quality LoRA models in batches. REPRODUCIBILITY IS THE MOST IMPORTANT THING. In the long run, this is what has the potential to bring substantial change to AIGC – something without reproducibility, no matter how exquisite, remains a fragile piece of art; while something reproducible, even if its quality is slightly lower (but still within an acceptable range), will ultimately reach numerous users. Of course, we are well aware that automation in anything can often cause displeasure for some individuals. In this regard, we can only express regret. We know that our process is not perfect at this moment, and there is still plenty of room for improvement. The classic methods for training high-quality LoRAs are also worth learning, and we hold deep respect for the work of our predecessors in any form. But we still have to say, rejecting the embrace of more efficient methods and clinging to outdated practices is their choice. The shame should be on them, not us. This is the value of our team.

We welcome those interested in our technology work to join in discussions here and to follow or contribute to our ongoing projects:

  • DeepGHS – where we store our tool models and datasets.

  • DeepGHS CyberHarem – our experimental field for bulk training character LoRAs and the core infrastructure of the aforementioned pipeline. Its relationship with DeepGHS is similar to that of a production workshop and a research center.

  • HCP-DiffusionOur training framework, developed by 7eu7d7, which contains richer functionalities than webui. For individuals major in computer science, the user experience is quite excellent. Highly recommended!!!

  • deepghs - Github – our GitHub organization that holds various project-related engineering codes. This includes:

    • dghs-imgutils – a library that integrates various anime image processing tools, now officially released on PyPI. Here is the documentation:

    • cyberharemour main pipeline, and all character LoRAs under this account are generated by it.

    • waifuc – a convenient and easy-to-use image data scraping, processing, and packaging pipeline library (currently under continuous development and iteration; it will be released when the architecture stabilizes). It contains rich functionalities; here are some excerpts from the files:

# waifuc/waifuc/source/
from .anime_pictures import AnimePicturesSource
from .base import BaseDataSource
from .compose import ParallelDataSource, ComposedDataSource
from .danbooru import DanbooruSource, SafebooruSource, ATFBooruSource, E621LikeSource, E621Source, E926Source
from .derpibooru import DerpibooruLikeSource, DerpibooruSource, FurbooruSource
from .duitang import DuitangSource
from .gchar import GcharAutoSource
from .huashi6 import Huashi6Source
from .konachan import KonachanLikeSource, YandeSource, KonachanSource, KonachanNetSource, LolibooruSource, \
    Rule34LikeSource, Rule34Source, HypnoHubSource, GelbooruSource, XbooruLikeSource, XbooruSource, \
    SafebooruOrgSource, TBIBSource
from .local import LocalSource, LocalTISource
from .paheal import PahealSource
from .pixiv import BasePixivSource, PixivSearchSource, PixivUserSource, PixivRankingSource
from .sankaku import SankakuSource, PostOrder, Rating, FileType
from .wallhaven import WallHavenSource
from .web import WebDataSource
from .zerochan import ZerochanSource

# waifuc/waifuc/action/
from .align import AlignMaxSizeAction, AlignMinSizeAction, PaddingAlignAction
from .augument import RandomFilenameAction, RandomChoiceAction, BaseRandomAction, MirrorAction
from .base import BaseAction, ProcessAction, FilterAction, ActionStop
from .basic import ModeConvertAction
from .ccip import CCIPAction
from .count import SliceSelectAction, FirstNSelectAction
from .filename import FileExtAction, FileOrderAction
from .filter import NoMonochromeAction, OnlyMonochromeAction, ClassFilterAction, RatingFilterAction, FaceCountAction, \
    HeadCountAction, PersonRatioAction
from .lpips import FilterSimilarAction
from .split import PersonSplitAction
from .tagging import TaggingAction, TagFilterAction

# waifuc/waifuc/export/
from .base import BaseExporter, SaveExporter, LocalDirectoryExporter
from .huggingface import HuggingFaceExporter
from .textual_inversion import TextualInversionExporter