I'm releasing this model version 0.1, 0.3, 0.4 as feasibility study and proof of concept. I will be able to make version 1.0 if somehow I'll be able to find more time and money.

If you would like me to pursue farther on Pony-based photographic SDXL model, donation with a message can be made here: https://ko-fi.com/eeb_p I can use it to build a better model. Please bare in mind, there is no promise. I might run away with the money.

(More descriptions about proof of work will be written later.)

For Dummies

Recommended Setting

Steps: 24 for normal use, 48 for higher quality
CFG scale: 3
Sampler: Euler a
Size: 896x1152
Emphasis mode: No norm

Positive Prompt

score_9, score_8_up, score_7_up, best quality, masterpiece, source_anime, [photo, irl, real, realistic, ultrarealistic, photorealistic, natural skin, detailed skin:0.5]

Negative Prompt

worst quality, low quality, normal quality, messy drawing, amateur drawing, lowres, bad anatomy, bad hands, source_furry, source_pony, source_cartoon, comic, source filmmaker, 3d, blurry, cropped

Tips

Adjusting Style Factor

If you are getting images that has strong anime-style influence, you need to add more photographic factor.

If you are getting artifacts and corruption with strong photographic-style images, you need to cut down photographic factor. Simply adding and over-emphasizing photographic-style prompt is not good either.

Lora Block Weight

Install https://github.com/hako-mikan/sd-webui-lora-block-weight .

LoRA some time has strong style affect, and you want to limit the affect.

Use Block Weight to only use necessary blocks to achieve your character/pose/clothing.

If you never used Block Weight, here's some ideas to start.

Set BASE and MID to 0.
Set BASE, IN04, IN05, IN07, and MID to 0.
Set BASE, IN04, IN05, IN07, MID, OUT03, OUT04, OUT05 to 0.
Try other combinations.

Prompt Editing / LoRA start, stop, step

Some prompt and LoRA gives strong anime-style. You can use prompt editing to contain them in the early steps of image generation. This will reduce the anime-style affect, but it will lose fine-granularity detail of the prompt. Read https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Features for Prompt Editing. Read https://github.com/hako-mikan/sd-webui-lora-block-weight for LoRA start, stop step.

[<anime_related_prompt>::0.3]
<lora:anime_related_lora:1:stop=18>

Some prompt and LoRA gives strong photo-style, but gives artifact object in the image. You can use prompt editing to contain them in the late steps of image generation. This will reduce the creation of unwanted object in the image, but will lose photographic affect in the early steps on the image generation.

[:photo:0.5]
<lora:anime_related_lora:1:start=8>

CD Tuner

Photorealistic Pony model tends to give whiter lighter image.

You can use CD Tuner to adjust the colors.

https://github.com/hako-mikan/sd-webui-cd-tuner

If you never used CD Tuner, here's some ideas to start.

Detail 2(d2)+1.5, saturation(sat)+5
saturation(sat)+10
Detail 2(d2)+5

Versions

Version 0.1: This is proof of concept. (deprecated)

Version 0.3: Use Lycoris for training. (deprecated)

Version 0.4: Re-selecting training images and improved randoseru

Version 0.8: Merged with other model

Source

This model derived from

Pony Diffusion V6 XL (training base model)

https://civitai.com/models/257749/pony-diffusion-v6-xl

Ebara (training base model and merge base model)

realPony_JPDoll (merge base model)

https://civitai.com/models/420600?modelVersionId=468687

Training (v0.3)

Multiple photograph files of multiple peple (all age over 20 at time of photos taken) were trained for PonyDiffusion model to make Lycoris.

One training was to achieve structural features of photograph.

One training was to achieve stylistic features of photograph.

Result Lycoris from both training was merged to ebora model.

Result (v0.3)

Trained model successfully output photographic image.

Faces of output photograph seems to defuse all trained faces, and does not seem to　resemble any particular one person in the training set. 100 output images and all training images were sent to Google Photo. No output image were recognized as a same person in any training images.

Problem and Next Step

Teeth are crooked. I can try to get photos of people with better aligned teeth.

Faces are mostly same. I can try to merge more pony-derived base-model with more variation in faces.

Mouth is big. Eyes are big. I can try to get photos of people with a smaller mouth, and try to merge more pony-derived base-model with smaller mouth.

Some prompt and LoRA gives you strong anime-style that this model cannot turn it into photographic image. For those cases, you're on your own to come up with better prompt and control LoRA. Refer to the Tips section.

Licenses

Refer to the license of other models in the source.

Also refer to licensing terms and conditions on this page

For commercial use, refer to my profile.

Promptを正しく調整しないと実写調画像は出力されません。

Pony (Ebara) ベースの実写SDXLモデルです。

0.1も0.3も0.4もfeasibility studyおよびproof of conceptの物です。時間とお金ができれば1.0が作れるかもしれません。

寄付はhttps://ko-fi.com/eeb_p で受け付けていますが、寄付をいただいても1.0を作る約束はできません。

(More descriptions about proof of work will be written later.)

初心者向け

Recommended Setting

Steps: 通常なら24, 高品質向けなら48
CFG scale: 3
Sampler: Euler a
Size: 896x1152
Emphasis mode: No norm

Positive Prompt

score_9, score_8_up, score_7_up, best quality, masterpiece, source_anime, [photo, irl, real, realistic, ultrarealistic, photorealistic, natural skin, detailed skin:0.5]

Negative Prompt

worst quality, low quality, normal quality, messy drawing, amateur drawing, lowres, bad anatomy, bad hands, source_furry, source_pony, source_cartoon, comic, source filmmaker, 3d, blurry, cropped

Tips

画風調整

アニメ調の画像が出た場合は実写調のPromptを加えてください。

強めの実写調画像で破綻やアーティファクトが生成される場合は実写調のPromptを抑えてください。

単純に実写調のpromptを強調し過ぎても上手く行きません。

Lora Block Weight

https://github.com/hako-mikan/sd-webui-lora-block-weight をインストールしてください。

LoRAによっては強いアニメ調の画風影響を持ちます。

Block Weightを利用することで上手く意図したキャラ、ポーズ、衣装のみを出力してください。

どう設定していいか分からない場合は手始めに以下を試してみてください

BASE, MIDを0
BASE, IN04, IN05, IN07, MIDを0
BASE, IN04, IN05, IN07, MID, OUT03, OUT04, OUT05を0
その他の組み合わせ

Prompt Editing / LoRA start, stop, step

PrompやLoRAによっては強いアニメ調の画風影響を持ちます。

画像生成の序盤でのみPromptやLoRAを有効かすることで画風影響を抑えます。

その代わりに細かいディテールに対する影響は失われます。

以下の資料を読んでください。

https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Features for Prompt Editing.

https://github.com/hako-mikan/sd-webui-lora-block-weight for LoRA start, stop step.

[<anime_related_prompt>::0.3]
<lora:anime_related_lora:1:stop=18>

PrompやLoRAによっては強い実写調の画風影響を持ち、アーティファクトを作ってしまいます。

画像生成の序盤でのみPromptやLoRAを有効かすることで画風影響を抑えます。

これによって意図していない物体が生成されることを防ぎます。

ただし生成序盤において実写調の影響を失います。

[:photo:0.5]
<lora:anime_related_lora:1:start=8>

CD Tuner

Pony系実写モデルの出力する画像は白く薄くなりがちです。

CD Tunerを使って調整することができます。

https://github.com/hako-mikan/sd-webui-cd-tuner

どう設定していいか分からない場合は手始めに以下を試してみてください。

Detail 2(d2)+1.5, saturation(sat)+5
saturation(sat)+10
Detail 2(d2)+5

Versions

Version 0.1: proof of concept. (deprecated)

Version 0.3: 学習にLycorisフォーマットを利用 (deprecated)

Version 0.4: 学習画像の再選別とランドセルの改善

Version 0.8: 他のモデルとマージ

Source

このモデルの元になったのは

Pony Diffusion V6 XL (training base model)

https://civitai.com/models/257749/pony-diffusion-v6-xl

Ebara (training base model and merge base model)

realPony_JPDoll (merge base model)

https://civitai.com/models/420600?modelVersionId=468687

学習(v0.3)

複数の被写体の写真ファイルをPonyDiffusion向けに学習してLycorisを作りました。被写体はすべて撮影時に20歳以上の人物です。

150枚のランドセルとブルマ関連の写真。4386枚の日本人女性の写真。ランドセルとブルマ関連の写真はstep数を上げています。

1つのLycorisは実写のキャラ学習を

1つのLycorisは実写調の画風学習を

学習したLycorisをebaraにマージしました。

結果(v0.3)

このモデルで実写調の画像を出力できました。

出力画像100枚と学習画像をすべてGoogle Photoにアップロードしましたが、出力画像の人物はどの学習画像の人物とも同一人物扱いはされませんでした。

Licenses

元になったモデルのライセンスを確認してください。

商業利用は profile をご参照ください。

作り方 (v0.1)

実写モデルや他の種類のcheckpointモデルを作る人の役に立てるか分からないけど、このcheckpointを学習した方法ついて記載します。

全てのデータが残ってないので、残ってるログから確認していますが一部間違ってることもあるかもしれません。

私の通りにすればいいというよりも、初めてモデルを作る人のとっかかりになるかと思って書いておきます。

使っているGPUは4070 12GBです。Checkpoint Finetuningではなく、LoRA学習+マージだけでモデルを作っています。

雰囲気で学習してもそれなりの結果が出るという例になれば。

v0.3では同じ設定でLoRAではなくLycorisを作りました。

学習画像の準備

学習画像は9480枚の写真。

すべて女性1人だけが写っている写真です。被写体は複数の人間です。

WD14 captioningでデータを付けて1girl,soloは抜いています。

キャラ学習

kohya_lora_gui-1.9.0.1のプリセットSDXL(PonyV6XL).xmloraをそのまま使いました。

いちおう以下コマンド

LoRA

accelerate launch --num_cpu_threads_per_process 1 sdxl_train_network.py --pretrained_model_name_or_path "C:\SDXL\ponyDiffusionV6XL_v6StartWithThisOne.safetensors" --train_data_dir "C:\SDXL\image_study\image_sets" --output_dir "C:\SDXL\image_study\LoRA_out" --network_module "networks.lora" --xformers --gradient_checkpointing --persistent_data_loader_workers --no_metadata --cache_latents --cache_latents_to_disk --max_data_loader_n_workers 1 --enable_bucket --save_model_as "safetensors" --lr_scheduler_num_cycles 4 --mixed_precision "fp16" --learning_rate 0.0001 --resolution 1024 --train_batch_size 2 --max_train_epochs 6 --network_dim 8 --network_alpha 2 --shuffle_caption --keep_tokens 1 --save_every_n_epochs 1 --optimizer_type "Lion" --lr_warmup_steps 100 --output_name "JPGIRL" --vae "C:\SDXL\sdxl_vae.safetensors" --save_precision "fp16" --lr_scheduler "cosine_with_restarts" --min_bucket_reso 512 --max_bucket_reso 2048 --caption_extension ".txt" --seed 42 --no_half_vae

Lycoris

accelerate launch --num_cpu_threads_per_process 1 sdxl_train_network.py --pretrained_model_name_or_path "C:\SDXL\ponyDiffusionV6XL_v6StartWithThisOne.safetensors" --train_data_dir "C:\SDXL\image_study\image_sets" --output_dir "C:\SDXL\image_study\LoRA_out" --network_module "lycoris.kohya" --network_args "algo=lora" --xformers --gradient_checkpointing --persistent_data_loader_workers --no_metadata --cache_latents --cache_latents_to_disk --max_data_loader_n_workers 1 --enable_bucket --save_model_as "safetensors" --lr_scheduler_num_cycles 4 --mixed_precision "fp16" --learning_rate 0.0001 --resolution 1024 --train_batch_size 2 --max_train_epochs 1 --network_dim 8 --network_alpha 2 --shuffle_caption --keep_tokens 1 --save_every_n_epochs 1 --optimizer_type "Lion" --lr_warmup_steps 100 --output_name "RBa0CharLycoris" --vae "C:\SDXL\stable-diffusion-webui-forge\models\VAE\sdxl_vae.safetensors" --save_precision "fp16" --lr_scheduler "cosine_with_restarts" --min_bucket_reso 512 --max_bucket_reso 2048 --caption_extension ".txt" --seed 42 --no_half_vae

これで自分好みの顔になるよういくつか学習画像の追加・削除をして2パターンのLoRAを作りました。

画風学習

kohya_lora_gui-1.9.0.1のプリセットSDXL画風.xmloraを少し変えました。ネットワーク次元数（DIM）が高いほど肌の質感を再現しやすいのではないかと考えました。

ネットワーク次元数：64

いちおう以下コマンド

LoRA

accelerate launch --num_cpu_threads_per_process 1 sdxl_train_network.py --pretrained_model_name_or_path "C:\SDXL\ponyDiffusionV6XL_v6StartWithThisOne.safetensors" --train_data_dir "C:\SDXL\image_study\image_sets" --output_dir "C:\SDXL\image_study\LoRA_out" --network_module "networks.lora" --network_args "conv_dim=4" "conv_alpha=1" --xformers --gradient_checkpointing --persistent_data_loader_workers --cache_latents --cache_latents_to_disk --max_data_loader_n_workers 1 --enable_bucket --save_model_as "safetensors" --lr_scheduler_num_cycles 4 --mixed_precision "fp16" --learning_rate 0.0001 --resolution 1024 --train_batch_size 1 --max_train_epochs 8 --network_dim 64 --network_alpha 3 --shuffle_caption --save_every_n_epochs 1 --optimizer_type "AdamW8bit" --lr_warmup_steps 250 --output_name "PonyPhotoA" --save_precision "fp16" --lr_scheduler "cosine_with_restarts" --min_bucket_reso 320 --max_bucket_reso 1536 --caption_extension ".txt" --seed 42 --network_train_unet_only --noise_offset 0.1

Lycoris (ebaraをベースに学習)

accelerate launch --num_cpu_threads_per_process 1 sdxl_train_network.py --pretrained_model_name_or_path "C:\SDXL\ebara_pony_1.bakedVAE.safetensors" --train_data_dir "C:\SDXL\image_study\PhotoRealistic" --output_dir "C:\SDXL\image_study\LoRA_out" --network_module "lycoris.kohya" --network_args "algo=lora" "conv_dim=4" "conv_alpha=1" --xformers --gradient_checkpointing --persistent_data_loader_workers --cache_latents --cache_latents_to_disk --max_data_loader_n_workers 1 --enable_bucket --save_model_as "safetensors" --lr_scheduler_num_cycles 4 --mixed_precision "fp16" --learning_rate 0.0001 --resolution 1024 --train_batch_size 1 --max_train_epochs 5 --network_dim 64 --network_alpha 3 --shuffle_caption --save_every_n_epochs 1 --optimizer_type "AdamW8bit" --lr_warmup_steps 250 --output_name "PonyPhotoFull" --save_precision "fp16" --lr_scheduler "cosine_with_restarts" --min_bucket_reso 320 --max_bucket_reso 1536 --caption_extension ".txt" --seed 42 --network_train_unet_only --noise_offset 0.1

これで肌の質感などを学習します。

LoRAマージ

キャラ学習でできたLoRA(JPGIRL.safetensors JPGIRL2.safetensors)と、画風学習でできたLoRA(PonyPhotoA.safetensors)をいい塩梅の配分で混ぜるWeightを探します。その結果、0.34 0.3 0.3が良さそうという結果になりました。

そこでLoRAをこの配分で混ぜてebaraにマージしました。

2024年3月現在、LoRAのマージはsd-scriptsでのみ上手く行っているので、sd-scriptsのディレクトリで以下のコマンドを実行します。

python ./networks/sdxl_merge_lora.py --save_precision fp16 --save_to 
sd_model=ebara_pony_1.bakedVAE.safetensors --save_to RunBull_a0.safetensors --models JPGIRL.safetensors --ratios 0.34
sd_model=RunBull_a0.safetensors --save_to RunBull_a1.safetensors --models JPGIRL2.safetensors --ratios 0.3
sd_model=RunBull_a2.safetensors --save_to RunBull_a3.safetensors --models PonyPhotoA.safetensors --ratios 0.3

おまけCheckpoint調整 (v0.1のみ)

できあがったCheckpointではやや精細さにかけて全体が白くぼやけていたので、

SupermergerのAdjustでOUT +1しました。

OUT→Contrast→Brightnessの順番で調整するといいと思います。

VAEはsdxl_vae.safetensorsを加えておきました。