Type | |
Stats | 99 |
Reviews | (16) |
Published | Sep 8, 2024 |
Base Model | |
Hash | AutoV2 175228752A |
Note: This model is a Schnell based model, but it requires guidance scale 3 or 5 and CFG scale 3 or higher (not guidance scale) and 20 steps or more. Needs to be used with clip_l_sumeshi_f1s. (It's the 234.74 MB file in the menu on the right.)
泚æ:ãã®ã¢ãã«ã¯Schnellã¢ãã«ã§ãããGuidanceã¹ã±ãŒã«3ãŸãã¯5ãCFGã¹ã±ãŒã«3以äžã20step以äžå¿ èŠã§ããä»å±ã®clip_l_sumeshi_f1sãšåãããŠäœ¿çšããå¿ èŠããããŸãã(å³ã®ã¡ãã¥ãŒã®234.74 MBã®ãã¡ã€ã«ã§ãã)
My English is terrible, so I use translation tools.
This is an experimental anime model to verify if de-distilling and enabling CFG will function. You can use a Negative prompt which works to some extent. Since this model uses CFG, it takes about twice as long to generate compared to a regular FLUX model, even with the same number of steps. The output is blurred and the style is blurred depending on the prompt, perhaps because the model has not been fully trained.
24/09/23 update
Added v004G. This is a test model aimed at reducing blurriness in low-step outputs (around 20 steps) by introducing guidance. Blurriness in both bright and dark outputs has been reduced. Due to training with parameters that push the limits to save time, response to prompts has worsened.The recommended parameters have been updated, so please refer to the Usage(v004G) section.After verification, two factors were suspected to cause blurriness, so we reinforced these areas during training.
Guidance Parameter: While v002E was filled with zeros, we used He initialization and conducted some training with FineTune and the network_args "in_dims". This enabled the guidance scale to function properly. Although the reason is unclear, outputs seem to be abnormal with values other than scales 3 and 5.
Timesteps Sampling:Previously, discrete_flow_shift 3.2 was used, but it was suspected to be a reason for poor response at low steps. Verification results showed that not using shift and having a smaller sigmoid_scale reduced blurriness. However, insufficient training leads to noisy backgrounds, so further exploration of hyperparameters seems necessary.
Usage(v004G)
resolution: like other Flux models
(distilled) guidance scale: 3 or 5
CFG scale: 6 ~ 9 recommend 7 ( Scale1 does not generate decent outputs. )
step: 20 ~ 30 (Not around 4 steps)
(distilled) guidance scale: 3 or 5 ( Does not work due to schnell based model. )
Usage(v002E old)
resolution: like other Flux models
CFG scale: 3.5 ~ 7 ( Scale1 does not generate decent outputs. )
step: 20 ~ 60 (Not around 4 steps)
(distilled) guidance scale: 0 ( Does not work due to schnell based model. )
sampler: Euler
scheduler: Simple, Beta
Prompt Format ( from Kohaku-XL-Epsilon )
<1girl/1boy/1other/...>, <character>, <series>, <artists>, <general tags>,<quality tags>, <year tags>, <meta tags>, <rating tags>
Due to the small amount of training, the <character><series><artists> tags are almost non-functional. And training is focused on girl, it may not be able to generate boy or other non-persons well. Since using hakubooru to create the dataset, the prompt format will be the same as the KohakuXL format. However, based on experiments, it is not strictly necessary to follow this format, as it interprets meaning to some extent even in natural language.
Special Tags
Quality tags: masterpiece, best quality, great quality, good quality, normal quality, low quality, worst quality
Rating tags: safe, sensitive, nsfw, explicit
Date tags: newest, recent, mid, early, old
Training
Dataset preparing
I used hakubooru based custom scripts.
exclude tags: traditional_media,photo_(medium),scan,animated,animated_gif,lowres,non-web_source,variant_set,tall image,duplicate,pixel-perfect_duplicate
minimum post ID:1,000,000
key addition
I added tensors filled with zeros with the "guidance_in" key to the Schnell model. This tensor is adjusted to the shape of the corresponding key in Dev, as inferred from flux/src/flux/model.py. This is because the trainer did not work properly when these keys were missing if the model name did not include 'schnell'. Since it is filled with zeros, I understand that guidance, like in the Schnell model, will not function. Since my skills are lacking and I added it rather forcefully, I'm not sure if this was the correct approach.
Training
Basically, the assumption is that the more we learn, the more the network will be reconstructed, the more the distillation will be lifted, and the more CFGs will be available.
I trained using a single RTX 4090. The training is done using the LoRA method and merging the results.
sd-scripts was used for training. The basic settings are as follows (the guidance value is set to 7, which has no particular meaning because, as mentioned earlier, it is a zero tensor.)
accelerate launch --num_cpu_threads_per_process 4 flux_train_network.py --network_module networks.lora_flux --sdpa --gradient_checkpointing --cache_latents --cache_latents_to_disk --cache_text_encoder_outputs --cache_text_encoder_outputs_to_disk --max_data_loader_n_workers 1 --save_model_as "safetensors" --mixed_precision "bf16" --fp8_base --save_precision "bf16" --full_bf16 --min_bucket_reso 320 --max_bucket_reso 1536 --seed 1 --max_train_epochs 1 --keep_tokens_separator "|||" --network_dim 32 --network_alpha 32 --unet_lr 1e-4 --text_encoder_lr 5e-5 --train_batch_size 3 --gradient_accumulation_steps 2 --optimizer_type adamw8bit --lr_scheduler="constant_with_warmup" --lr_warmup_steps 100 --vae_batch_size 8 --cache_info --guidance_scale 7 --timestep_sampling shift --model_prediction_type raw --discrete_flow_shift 3.2 --loss_type l2 --highvram
The following datasets are trained in the following order.
3,893images (res512 bs4 / res768 bs2 / res1024 bs1, acc4) 1epoch
60,000images (res768 bs3 acc2) 1epoch
36,000images (res1024 bs1 acc3) 1epoch
3000images (res1024 bs1 acc1) 1epoch
18,000images (res1024 bs1 acc3) 1epoch
merged into model and CLIP_L
693images (res1024 bs1 acc3) 1epoch
693images (res1024 bs1 acc3 warmup50) 1ecpoh
693images (res1024 bs1 acc3 warmup50) 10ecpohs
693images (res1024 bs1 acc3 warmup50) 15ecpohs
merged into model and CLIP_L
543images (res1024 bs1 acc3 warmup50 --optimizer_args "betas=0.9,0.95" "eps=1e-06" "weight_decay=0.1" --caption_dropout_rate 0.1 --shuffle_caption --network_train_unet_only) 20epochs
merged into model and CLIP_L
21,000images (res1024 bs1 acc3 warmup50 timestep_sampling sigmoid sigmoid_scale2) 15ecpohs
21,000images (res1024 bs1 acc3 warmup50 sigmoid_scale2 discrete_flow_shift3.5) 15ecpohs
merged into model and CLIP_L
-this training merged only CLIP-
3,893images (res1024 bs2 acc1 warmup50 unet_lr5e-5 text_encoder_lr2.5e-5 sigmoid_scale2.5 discrete_flow_shift3 --network_args "loraplus_lr_ratio=8") 3epochs
3,893images (res1024 bs2 acc1 warmup50 unet_lr5e-5 text_encoder_lr2.5e-5 sigmoid_scale2 discrete_flow_shift3 --network_args "loraplus_lr_ratio=8") 1epochs
merged into CLIP_L only
--
He initialized "guidance_in" layer
3,893images (Full-finetuned res1024 bs2 acc1 afafactor --optimizer_args "relative_step=False" "scale_parameter=False" "warmup_init=False" lr5e-6 warmup50 guidance_scale3.5 max_grad_norm 0.0 timesteps_sampling discrete_flow_shift 3.1582 ) 1epoch
3,893images (res1024 bs2 acc1 warmup50 guidance_scale1 timesteps_sampling sigmoid sigmoid_scale 0.5 --network_args "in_dims=[8,8,8,8,8]") 4epochs
3,893images (res512 bs2 acc1 warmup50 guidance_scale1 timesteps_sampling sigmoid sigmoid_scale 0.3 --network_args "in_dims=[8,8,8,8,8]") 12epochs
543images (repeats10 res512 bs4 acc1 warmup50 unet_lr3e-4 guidance_scale1 timesteps_sampling sigmoid sigmoid_scale 0.3 --network_args "in_dims=[8,8,8,8,8]") 4epochs
merged into model and CLIP_L
--v004G--
Resources (License)
FLUX.1-schnell (Apache2.0)
License
Apache2.0
Acknowledgements
black-forest-labs : Thanks for publishing a great open source model.
kohya-ss : Thanks for publishing the essential training scripts and for the quick updates.
Kohaku-Blueleaf : Thanks for the extensive publication of the scripts for the dataset and the various training conditions.
èžçã解é€ãCFGãæ©èœããããæ€èšŒããããã®å®éšçãªã¢ãã¡ã¢ãã«ã§ããNegative promptãããçšåºŠæ©èœããŸãããã®ã¢ãã«ã¯CFGã䜿çšãããããåãstepæ°ã§ãéåžžã®FLUXã¢ãã«ã®çŽäºåçæã«æéãããããŸããã¢ãã«ã®åŠç¿ã足ããŠããªãã®ãããããã³ããã«ãã£ãŠã¯åºåã®ãŒãããã¹ã¿ã€ã«ã®ã¶ããæ¿ããã§ãã
24/09/23 v004Gãè¿œå ããŸãããguidanceãå°å ¥ããããšã§ãäœã¹ããã(20stepä»è¿)ã§ã®åºåã®ãŒãããæå¶ããè©Šéšã¢ãã«ã§ããæãã/æãåºåæã®ãŒããã軜æžãããŠããŸããæéççž®ã®ããç¡çã®ãããã©ã¡ãŒã¿ã§ãã¬ãŒãã³ã°ããŠããã®ã§ãããã³ãããžã®å¿çã¯æªããªã£ãŠããŸããæšå¥šãã©ã¡ãŒã¿ãå€æŽãããŠããŸãã®ã§ãUsage(v004G)ãåç §ããŠãã ãããæ€èšŒãè¡ã£ãçµæ以äžã®äºã€ããŒããã®èŠå ã§ãããšæšæž¬ãããããããã®éšåãåŠç¿ã§åŒ·åããŸããã
guidanceãã©ã¡ãŒã¿:v002Eã¯0ã§åããŠããŸãããããããHeåæåãFineTuneãšnetwork_args "in_dims"ã«ãŠããçšåºŠåŠç¿ãããããšã§guidance scaleãæ©èœããããã«ãªããŸãããçç±ã¯åãããŸããããscale3ãš5以å€ã®å€ã§ã¯åºåããããããªãããã§ãã
timesteps_sampling:ãããŸã§ã¯discrete_flow_shift 3.2ã䜿çšããŠããŸãããããããäœã¹ããããžã®å¿çãæªãããŠããã®ã§ã¯ãšæšæž¬ããŸãããæ€èšŒã®çµæãshiftãããsigmoid_scaleãå°ããã»ã©ãŒããã軜æžãããããšãããããŸããããããåŠç¿ã足ããªããšèæ¯ããã€ãžãŒã«ãªãæ¬ ç¹ãããããæŽãªããã€ããŒãã©ã¡ãŒã¿ã®æ¢æ±ãå¿ èŠããã§ãã
䜿çšæ³
è±èªéšåãåç §ããŠãã ãããCFGã¹ã±ãŒã«1ã§ã¯ãŸãšããªåºåãåŸãããªããããå¿ ã3.5以äžã§äœ¿çšããŠãã ããã
ããã³ãããã©ãŒããã
è±èªéšåãåç §ããŠãã ãããåºæ¬çã«ã¯KohakuXLåæ§ã®ã¹ã¿ã€ã«ã§ãããèªç¶èšèªã§ãããçšåºŠåãããã§ããåŠç¿éã足ããªãããããã£ã©ãäœåãã¢ãŒãã£ã¹ãã¿ã°ã¯ã»ãŒæ©èœããŸããã
ç¹æ®ã¿ã°
è±èªéšåãåç §ããŠãã ããã
åŠç¿
ããŒã¿ã»ããäœæ
hakubooruã䜿çšããŠããŒã¿ã»ããäœæãè¡ããŸãããé€å€ã¿ã°ãšäœ¿çšpostç¯å²ã¯è±èªéšåãåç §ããŠãã ããã
ããŒè¿œå
Schnellã¢ãã«ãžguidance_inããŒãæã€0ã§åãããããã³ãœã«ãè¿œå ããŸããããã®ãã³ãœã«ã¯flux/src/flux/model.pyããæšæž¬ãããDevã®è©²åœããŒã®shapeãžåãããŠããŸããããã¯ãã¬ãŒããŒãã¢ãã«åã«"schnell"ãå ¥ã£ãŠããªãå Žåãããã®ããŒãç¡ããšäžæãåäœããªãã£ãããã§ãã0ã§åããããŠãããããSchnellã¢ãã«åæ§guidanceã¯æ©èœããªããšèªèããŠããŸããç§ã®ã¹ãã«ãä¹ããããªã匷åŒã«è¿œå ããŠããããããããæ£ããæ¹æ³ã ã£ããã¯ããããŸããã
åŠç¿
åºæ¬çã«åŠç¿ããã°ããã»ã©ãããã¯ãŒã¯ã®åæ§ç¯ãé²ã¿ãèžçã解é€ããCFGã䜿ããããã«ãªãã®ã§ã¯ãšããæšæž¬ã®å ãåŠç¿ãé²ããŠããŸãã
RTX4090äžå°ã䜿çšããŠåŠç¿ãããŸãããLoRAã§åŠç¿ããŠããŒãžããæ¹åŒã§åŠç¿ããŠããŸãã
ãã¬ãŒãã³ã°ã«ã¯sd-scriptsã䜿çšããŠããŸããåºæ¬ãšãªãèšå®ã¯è±èªéšåã®éãã§ãã(guidanceå€ã7ã«ããŠããŸãããå ã«è¿°ã¹ãŠããããã«0ãã³ãœã«ã®ããç¹ã«æå³ã¯ãããŸããã)
ãã¬ãŒãã³ã°æ¡ä»¶è©³çŽ°ã¯è±èªéšåãåç §ããŠãã ããã
䜿çšãªãœãŒã¹(ããã³ãã®ã©ã€ã»ã³ã¹)
è±èªéšåãåç §ããŠãã ããã
ã©ã€ã»ã³ã¹
Apache2.0
è¬èŸ
black-forest-labs : çŽ æŽããããªãŒãã³ãœãŒã¹ã¢ãã«ã®å ¬éã«æè¬ããŸãã
kohya-ss : äžå¯æ¬ ãªãã¬ãŒãã³ã°ã¹ã¯ãªããã®å ¬éãè¿ éãªã¢ããããŒãã«æè¬ããŸãã
Kohaku-Blueleaf : ããŒã¿ã»ããçšã¹ã¯ãªãããåçš®åŠç¿æ¡ä»¶ã®å¹ åºãå ¬éã«æè¬ããŸãã