Since I am Japanese, I may not be able to use English properly. Please let me know if there are any unnatural sentences.

Swim in Latentとは | What is "Swim in Latent"

https://civitai.com/models/118525/swim-in-latent

先日公開した、StableDiffusionXL1.0ベースのアニメ風モデルです。

良かったら使ってみてください。上のリンクからダウンロードできます。

This is an anime-style model based on StableDiffusionXL1.0 that was released the other day.

Please use it if you like. You can download it from the link above.

制作手順 | How the model was learned

Step 1: データセットを作る | make a dataset

- ベースデータセット | Base dataset

まず初めにdanbooruから数万枚の画像を集めました。

集める画像は4年以内のものにし、なるべくscoreが高い順に、"rating:general"とそれ以外を半々の割合になるように収集しました。

タグ付けにはwd14taggerのswinv2を利用し、頭にはwdxlに習って"new, newest, masterpiece, best quality"を頭に付けました。今考えるとこれは冗長だったのかもしれません。

これをベース用データセットとします。

本当はdanbooruすべてを何周か学習させるべきだと思いますが、自分にはそのような開発環境がないため妥協しました。

First, we collected tens of thousands of images from danbooru.

Images were collected within the last 4 years, and were collected in descending order of score as much as possible, with "rating:general" and others in a 50/50 ratio.

I used wd14tagger's swinv2 for tagging, and added "new, newest, masterpiece, best quality" to the head following wdxl. Now that I think about it, this may have been redundant.

Let this be the base dataset.

Actually, I think I should learn everything about danbooru several times, but I compromised because I don't have such a development environment.

- FT用データセット | Dataset for FT

次に絵柄をある程度統一するためにFT用データセットを作ります。

このデータセットでは、なるべくモダンでリアルな質感の絵柄の絵を500枚ほど集めました。

ちょうどよさそうな絵柄のアーティストを探し、いくつかリストアップしてから集めました。

これも同じようにwd14taggerのswinv2を使用しました。

Next, create a data set for FT to unify the style to some extent.

For this data set, we have collected about 500 pictures with modern and realistic textures as much as possible.

I searched for an artist whose style looked just right, made a list of several, and collected them.

I also used wd14tagger's swinv2 in the same way.

Step2: ベースモデルを学習させる | Train the base model

ベースモデルを学習させます。

使用するツールはkohyaさんのsd-scriptsです。
sdxl_train.py を使用します。

以下は特に大事と思われる設定です。

Train the base model.

The tool used is kohya's sd-scripts.

Use sdxl_train.py.

Below are the settings that are most important.

optimizer: Lion
lerning rate: 4e-6
lr_scheduler: cosine_with_restarts
min_snr_gamma: 5
caption_dropout_rate: 0.1
mixed_precision: bfloat16
shuffle_caption: true

Step3: ベースモデルをファインチューンする | Finetune the base model

次に先ほど作ったベースモデルをファインチューンします。

ベースモデルと全く同じ設定でデータセットだけ変えて学習します。

Next, fine-tune the base model you made earlier.

It learns by changing only the dataset with exactly the same settings as the base model.

Step:4 さらにファインチューンする | Fine tune further

最後にFT用のデータセットをさらに選別して100枚ほどにし、Loraを学習させます。

以下は設定です。

Finally, the dataset for FT is further screened to about 100 images, and Lora is trained.

Below are the settings.

optimizer: Lion
lerning rate: 5e-5
lr_scheduler: cosine_with_restarts
min_snr_gamma: 5
caption_dropout_rate: 0.1
mixed_precision: bfloat16
shuffle_caption: true

Loraが学習できたら、何度か生成してちょうどいい割合を探します。

そしたら、 sdxl_merge_lora.py を使用してモデルにloraをマージします。

これで完成です。

Once Lora is trained, generate it several times to find the right ratio.

Then merge lora into the model using sdxl_merge_lora.py.

That's all there is to it.

おまけ | Extra

先日StabilityAIから公式のモデルメタデータの形式が発表されました。

今後モデルを作成する場合はこのメタデータをモデルに埋め込むことが望ましいです。

今回リリースしたモデルには以下のようなメタデータが埋め込まれています。

StabilityAI recently announced an official model metadata format.

It is desirable to embed this metadata in the model when creating a model in the future.

The following metadata is embedded in the released model.

https://github.com/Stability-AI/ModelSpec

{
    "modelspec.sai_model_spec": "1.0.0.alpha",
    "modelspec.architecture": "stable-diffusion-xl-v1-base",
    "modelspec.implementation": "sgm",
    "modelspec.title": "SwimInLatent",
    "modelspec.author": "ddPn08",
    "modelspec.description": "StableDiffusionXL model fine-tuned for anime.",
    "modelspec.date": "2023-07-29",
    "modelspec.license": "CreativeML Open RAIL++-M"
}

Swim in Latentの作成手順的ななにか | Something like a "Swim in Latent" training workflow