Type | |
Stats | 643 |
Reviews | (91) |
Published | May 9, 2023 |
Base Model | |
Training | Steps: 5,000 |
Trigger Words | kcboho07-5000 |
Hash | AutoV2 D9877C70D5 |
This is an experiment to see if I can make a TI embedding that gets the flavour of konyconi’s BohoAI LORA.
https://civitai.com/models/51966/bohoai
Thank you to @konyconi for sharing his dataset for the excellent BohoAI LORA.
https://civitai.com/models/52697/tutorial-konyconi-style-lora
The showcase uses 2 models:
revAnimated_v122.safetensors [4199bcdd14] with clip skip = 2
avalonTruvision_v2.safetensors [a4df55d292] with clip skip = 1
This TI can produce some decent Boho pix but it gets confused sometimes... eg asking for a spaceship and getting a truck. Perhaps for this sort of TI you need to use a lot more pictures in the training dataset, with more subject variation?
---------------------------
Update 09 May 2023
Continued the training to step 4000, and then 5000.
kcboho07-4000 produces a stronger Boho style.
kcboho07-5000 is stronger again but has increased duplication/repetition. e.g. more fingers, more hands, duplicate cities floating in the sky.
Tried 6000 steps but it’s even worse - overcooked.
I’ve uploaded the 4000 step version as, probably, the best result for this experiment.
Also uploaded the 5000 step version since it can produce nice results with careful object prompts.
---------------------------
I’ve been struggling to work out how to make a style TI...
what makes a good training dataset?
what training settings should I use in automatic1111?
how long to cook the TI for?
For my training dataset I copied konyconi’s 76 1024x1024 images to a new folder without the associated TXT files, and reduced them all to 512x512. Then I renamed them “01 aeroplane.png”, “02 city.png”, “03 tank.png” etc.
Why? Because I was trying to match what I’ve done in the past for TIs that ended up usable. The reduced images dataset folder is what I used in the settings below.
The wikipage for automatic1111 Textual Inversion is here:
https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Textual-Inversion
but way out of date. Last rev Jan 5, I’m writing this May 8.
I found this thread useful in parts. It’s a long read!
https://github.com/AUTOMATIC1111/stable-diffusion-webui/discussions/1528
Training model: v1-5-pruned.ckpt [e1441589a6]
I used this because I don’t know any better, and it’s been useful in the past. Should I be using a different model for training, or is the base SD15 the best thing to use? No idea.
Create embedding:
name: kcboho07
initialization text: boho style photo
number of vectors per token: 4
Train embedding:
Embedding name: kcboho07
Embedding Learning Rate: 0.001:250, 0.0005:500, 0.00075:1000, 0.001
Gradient Clipping: disabled
Batch size: 1
Dataset directory: wherever you’ve put it on your computer
Log directory: textual_inversion
Prompt template: minimum_style_2.txt
The template has 3 lines:
<<<
[name] style, [filewords]
[name] style, a photo of [filewords]
[name] style, an illustration of [filewords]
>>>
Width = Height = 512
Do not resize images: OFF
Max steps: 3000
Save image steps: 25
Save embedding steps: 25
Use PNG alpha channel: OFF
Save images with embedding in PNG chunks: ON
Read parameters from txt2img tab: OFF
Shuffle tags: OFF
Drop out tags: 0
Latent sampling method: deterministic
Training time: about 50mins per 1000 steps on a 2060/6GB.
The TI at 3000 steps does produce a Boho style, although I think it’s a bit hit-and-miss compared to the BohoAI LORA.
If anyone has suggestions about what I should be doing differently please add a comment. Or if I’m doing anything obviously stupid! :-)