Sign In

Sumeshi FLUX.1 S 🍣

23
194
6
Updated: Sep 23, 2024
base modelanimegirls
Type
Checkpoint Trained
Stats
99
Reviews
Published
Sep 8, 2024
Base Model
Flux.1 S
Hash
AutoV2
175228752A
default creator card background decoration
FA770's Avatar
FA770
License:

Note: This model is a Schnell based model, but it requires guidance scale 3 or 5 and CFG scale 3 or higher (not guidance scale) and 20 steps or more. Needs to be used with clip_l_sumeshi_f1s. (It's the 234.74 MB file in the menu on the right.)

泚意:このモデルはSchnellモデルですが、Guidanceスケヌル3たたは5、CFGスケヌル3以䞊、20step以䞊必芁です。付属のclip_l_sumeshi_f1sず合わせお䜿甚する必芁がありたす。(右のメニュヌの234.74 MBのファむルです。)

My English is terrible, so I use translation tools.

This is an experimental anime model to verify if de-distilling and enabling CFG will function. You can use a Negative prompt which works to some extent. Since this model uses CFG, it takes about twice as long to generate compared to a regular FLUX model, even with the same number of steps. The output is blurred and the style is blurred depending on the prompt, perhaps because the model has not been fully trained.

24/09/23 update

Added v004G. This is a test model aimed at reducing blurriness in low-step outputs (around 20 steps) by introducing guidance. Blurriness in both bright and dark outputs has been reduced. Due to training with parameters that push the limits to save time, response to prompts has worsened.The recommended parameters have been updated, so please refer to the Usage(v004G) section.After verification, two factors were suspected to cause blurriness, so we reinforced these areas during training.

  • Guidance Parameter: While v002E was filled with zeros, we used He initialization and conducted some training with FineTune and the network_args "in_dims". This enabled the guidance scale to function properly. Although the reason is unclear, outputs seem to be abnormal with values other than scales 3 and 5.

  • Timesteps Sampling:Previously, discrete_flow_shift 3.2 was used, but it was suspected to be a reason for poor response at low steps. Verification results showed that not using shift and having a smaller sigmoid_scale reduced blurriness. However, insufficient training leads to noisy backgrounds, so further exploration of hyperparameters seems necessary.

Usage(v004G)

  • resolution: like other Flux models

  • (distilled) guidance scale: 3 or 5

  • CFG scale: 6 ~ 9 recommend 7 ( Scale1 does not generate decent outputs. )

  • step: 20 ~ 30 (Not around 4 steps)

  • (distilled) guidance scale: 3 or 5 ( Does not work due to schnell based model. )

Usage(v002E old)

  • resolution: like other Flux models

  • CFG scale: 3.5 ~ 7 ( Scale1 does not generate decent outputs. )

  • step: 20 ~ 60 (Not around 4 steps)

  • (distilled) guidance scale: 0 ( Does not work due to schnell based model. )

  • sampler: Euler

  • scheduler: Simple, Beta

Prompt Format ( from Kohaku-XL-Epsilon )

<1girl/1boy/1other/...>, <character>, <series>, <artists>, <general tags>,<quality tags>, <year tags>, <meta tags>, <rating tags>

Due to the small amount of training, the <character><series><artists> tags are almost non-functional. And training is focused on girl, it may not be able to generate boy or other non-persons well. Since using hakubooru to create the dataset, the prompt format will be the same as the KohakuXL format. However, based on experiments, it is not strictly necessary to follow this format, as it interprets meaning to some extent even in natural language.

Special Tags

  • Quality tags: masterpiece, best quality, great quality, good quality, normal quality, low quality, worst quality

  • Rating tags: safe, sensitive, nsfw, explicit

  • Date tags: newest, recent, mid, early, old

Training

  1. Dataset preparing

    I used hakubooru based custom scripts.

    exclude tags: traditional_media,photo_(medium),scan,animated,animated_gif,lowres,non-web_source,variant_set,tall image,duplicate,pixel-perfect_duplicate

    minimum post ID:1,000,000

  2. key addition

    I added tensors filled with zeros with the "guidance_in" key to the Schnell model. This tensor is adjusted to the shape of the corresponding key in Dev, as inferred from flux/src/flux/model.py. This is because the trainer did not work properly when these keys were missing if the model name did not include 'schnell'. Since it is filled with zeros, I understand that guidance, like in the Schnell model, will not function. Since my skills are lacking and I added it rather forcefully, I'm not sure if this was the correct approach.

  3. Training

    Basically, the assumption is that the more we learn, the more the network will be reconstructed, the more the distillation will be lifted, and the more CFGs will be available.

    I trained using a single RTX 4090. The training is done using the LoRA method and merging the results.

    sd-scripts was used for training. The basic settings are as follows (the guidance value is set to 7, which has no particular meaning because, as mentioned earlier, it is a zero tensor.)

    accelerate launch --num_cpu_threads_per_process 4 flux_train_network.py --network_module networks.lora_flux --sdpa --gradient_checkpointing --cache_latents --cache_latents_to_disk --cache_text_encoder_outputs  --cache_text_encoder_outputs_to_disk --max_data_loader_n_workers 1 --save_model_as "safetensors" --mixed_precision "bf16" --fp8_base --save_precision "bf16" --full_bf16 --min_bucket_reso 320 --max_bucket_reso 1536 --seed 1 --max_train_epochs 1 --keep_tokens_separator "|||" --network_dim 32 --network_alpha 32 --unet_lr 1e-4 --text_encoder_lr 5e-5 --train_batch_size 3 --gradient_accumulation_steps 2 --optimizer_type adamw8bit --lr_scheduler="constant_with_warmup" --lr_warmup_steps 100 --vae_batch_size 8 --cache_info --guidance_scale 7 --timestep_sampling shift --model_prediction_type raw --discrete_flow_shift 3.2 --loss_type l2 --highvram

    The following datasets are trained in the following order.

    3,893images (res512 bs4 / res768 bs2 / res1024 bs1, acc4) 1epoch

    60,000images (res768 bs3 acc2) 1epoch

    36,000images (res1024 bs1 acc3) 1epoch

    3000images (res1024 bs1 acc1) 1epoch

    18,000images (res1024 bs1 acc3) 1epoch

    merged into model and CLIP_L

    693images (res1024 bs1 acc3) 1epoch

    693images (res1024 bs1 acc3 warmup50) 1ecpoh

    693images (res1024 bs1 acc3 warmup50) 10ecpohs

    693images (res1024 bs1 acc3 warmup50) 15ecpohs

    merged into model and CLIP_L

    543images (res1024 bs1 acc3 warmup50 --optimizer_args "betas=0.9,0.95" "eps=1e-06" "weight_decay=0.1" --caption_dropout_rate 0.1 --shuffle_caption --network_train_unet_only) 20epochs

    merged into model and CLIP_L

    21,000images (res1024 bs1 acc3 warmup50 timestep_sampling sigmoid sigmoid_scale2) 15ecpohs

    21,000images (res1024 bs1 acc3 warmup50 sigmoid_scale2 discrete_flow_shift3.5) 15ecpohs

    merged into model and CLIP_L

    -this training merged only CLIP-

    3,893images (res1024 bs2 acc1 warmup50 unet_lr5e-5 text_encoder_lr2.5e-5 sigmoid_scale2.5 discrete_flow_shift3 --network_args "loraplus_lr_ratio=8") 3epochs

    3,893images (res1024 bs2 acc1 warmup50 unet_lr5e-5 text_encoder_lr2.5e-5 sigmoid_scale2 discrete_flow_shift3 --network_args "loraplus_lr_ratio=8") 1epochs

    merged into CLIP_L only

    --

    He initialized "guidance_in" layer

    3,893images (Full-finetuned res1024 bs2 acc1 afafactor --optimizer_args "relative_step=False" "scale_parameter=False" "warmup_init=False" lr5e-6 warmup50 guidance_scale3.5 max_grad_norm 0.0 timesteps_sampling discrete_flow_shift 3.1582 ) 1epoch

    3,893images (res1024 bs2 acc1 warmup50 guidance_scale1 timesteps_sampling sigmoid sigmoid_scale 0.5 --network_args "in_dims=[8,8,8,8,8]") 4epochs

    3,893images (res512 bs2 acc1 warmup50 guidance_scale1 timesteps_sampling sigmoid sigmoid_scale 0.3 --network_args "in_dims=[8,8,8,8,8]") 12epochs

    543images (repeats10 res512 bs4 acc1 warmup50 unet_lr3e-4 guidance_scale1 timesteps_sampling sigmoid sigmoid_scale 0.3 --network_args "in_dims=[8,8,8,8,8]") 4epochs

    merged into model and CLIP_L

    --v004G--

Resources (License)

License

Apache2.0

Acknowledgements

  • black-forest-labs : Thanks for publishing a great open source model.

  • kohya-ss : Thanks for publishing the essential training scripts and for the quick updates.

  • Kohaku-Blueleaf : Thanks for the extensive publication of the scripts for the dataset and the various training conditions.

蒞留を解陀しCFGが機胜するかを怜蚌するための実隓的なアニメモデルです。Negative promptがある皋床機胜したす。このモデルはCFGを䜿甚するため、同じstep数でも通垞のFLUXモデルの玄二倍生成に時間がかかりたす。モデルの孊習が足りおいないのためかプロンプトによっおは出力のがやけやスタむルのぶれが激しいです。

24/09/23 v004Gを远加したした。guidanceを導入するこずで、䜎ステップ(20step付近)での出力のがやけを抑制する詊隓モデルです。明るい/暗い出力時のがやけが軜枛されおいたす。時間短瞮のため無理のあるパラメヌタでトレヌニングしおいるので、プロンプトぞの応答は悪くなっおいたす。掚奚パラメヌタが倉曎されおいたすので、Usage(v004G)を参照しおください。怜蚌を行った結果以䞋の二぀ががやけの芁因であるず掚枬されたため、その郚分を孊習で匷化したした。

  • guidanceパラメヌタ:v002Eは0で埋めおいたしたが、これをHe初期化しFineTuneずnetwork_args "in_dims"におある皋床孊習をするこずでguidance scaleが機胜するようになりたした。理由は分かりたせんが、scale3ず5以倖の倀では出力がおかしくなるようです。

  • timesteps_sampling:これたではdiscrete_flow_shift 3.2を䜿甚しおいたしたが、これが䜎ステップぞの応答を悪くしおいるのではず掚枬したした。怜蚌の結果、shiftせず、sigmoid_scaleが小さいほどがやけが軜枛されるこずがわかりたした。しかし孊習が足りないず背景がノむゞヌになる欠点もあるため曎なるハむパヌパラメヌタの探求が必芁そうです。

䜿甚法

英語郚分を参照しおください。CFGスケヌル1ではたずもな出力が埗られないため、必ず3.5以䞊で䜿甚しおください。

プロンプトフォヌマット

英語郚分を参照しおください。基本的にはKohakuXL同様のスタむルですが、自然蚀語でもある皋床動くようです。孊習量が足りないため、キャラ、䜜品、アヌティストタグはほが機胜したせん。

特殊タグ

英語郚分を参照しおください。

å­Šç¿’

  1. デヌタセット䜜成

    hakubooruを䜿甚しおデヌタセット䜜成を行いたした。陀倖タグず䜿甚post範囲は英語郚分を参照しおください。

  2. キヌ远加

    Schnellモデルぞguidance_inキヌを持぀0で埋められたテン゜ルを远加したした。このテン゜ルはflux/src/flux/model.pyから掚枬されるDevの該圓キヌのshapeぞ合わせおいたす。これはトレヌナヌがモデル名に"schnell"が入っおいない堎合これらのキヌが無いず䞊手く動䜜しなかったためです。0で埋められおいるため、Schnellモデル同様guidanceは機胜しないず認識しおいたす。私のスキルが乏しくかなり匷匕に远加しおいるため、これが正しい方法だったかはわかりたせん。

  3. å­Šç¿’

    基本的に孊習すればするほどネットワヌクの再構築が進み、蒞留が解陀されCFGが䜿えるようになるのではずいう掚枬の元、孊習を進めおいたす。

    RTX4090䞀台を䜿甚しお孊習をしたした。LoRAで孊習しおマヌゞする方匏で孊習しおいたす。

    トレヌニングにはsd-scriptsを䜿甚しおいたす。基本ずなる蚭定は英語郚分の通りです。(guidance倀を7にしおいたすが、先に述べおいるように0テン゜ルのため特に意味はありたせん。)

    トレヌニング条件詳现は英語郚分を参照しおください。

䜿甚リ゜ヌス(およびそのラむセンス)

英語郚分を参照しおください。

ラむセンス

Apache2.0

謝蟞

  • black-forest-labs : 玠晎らしいオヌプン゜ヌスモデルの公開に感謝したす。

  • kohya-ss : 䞍可欠なトレヌニングスクリプトの公開、迅速なアップデヌトに感謝したす。

  • Kohaku-Blueleaf : デヌタセット甚スクリプトや各皮孊習条件の幅広い公開に感謝したす。