Sign In

BohoTI

126
1.0k
12
Verified:
PickleTensor
Type
Embedding
Stats
643
Reviews
Published
May 9, 2023
Base Model
SD 1.5
Training
Steps: 5,000
Trigger Words
kcboho07-5000
Hash
AutoV2
D9877C70D5
default creator card background decoration
ODOR Badge
chromesun's Avatar
chromesun

This is an experiment to see if I can make a TI embedding that gets the flavour of konyconi’s BohoAI LORA.

https://civitai.com/models/51966/bohoai

Thank you to @konyconi for sharing his dataset for the excellent BohoAI LORA.

https://civitai.com/models/52697/tutorial-konyconi-style-lora

The showcase uses 2 models:

revAnimated_v122.safetensors [4199bcdd14] with clip skip = 2

avalonTruvision_v2.safetensors [a4df55d292] with clip skip = 1

This TI can produce some decent Boho pix but it gets confused sometimes... eg asking for a spaceship and getting a truck. Perhaps for this sort of TI you need to use a lot more pictures in the training dataset, with more subject variation?

---------------------------

Update 09 May 2023

Continued the training to step 4000, and then 5000.

kcboho07-4000 produces a stronger Boho style.

kcboho07-5000 is stronger again but has increased duplication/repetition. e.g. more fingers, more hands, duplicate cities floating in the sky.

Tried 6000 steps but it’s even worse - overcooked.

I’ve uploaded the 4000 step version as, probably, the best result for this experiment.

Also uploaded the 5000 step version since it can produce nice results with careful object prompts.

---------------------------

I’ve been struggling to work out how to make a style TI...

what makes a good training dataset?

what training settings should I use in automatic1111?

how long to cook the TI for?

For my training dataset I copied konyconi’s 76 1024x1024 images to a new folder without the associated TXT files, and reduced them all to 512x512. Then I renamed them “01 aeroplane.png”, “02 city.png”, “03 tank.png” etc.

Why? Because I was trying to match what I’ve done in the past for TIs that ended up usable. The reduced images dataset folder is what I used in the settings below.

The wikipage for automatic1111 Textual Inversion is here:

https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Textual-Inversion

but way out of date. Last rev Jan 5, I’m writing this May 8.

I found this thread useful in parts. It’s a long read!

https://github.com/AUTOMATIC1111/stable-diffusion-webui/discussions/1528

Training model: v1-5-pruned.ckpt [e1441589a6]

I used this because I don’t know any better, and it’s been useful in the past. Should I be using a different model for training, or is the base SD15 the best thing to use? No idea.

Create embedding:

name: kcboho07

initialization text: boho style photo

number of vectors per token: 4

Train embedding:

Embedding name: kcboho07

Embedding Learning Rate: 0.001:250, 0.0005:500, 0.00075:1000, 0.001

Gradient Clipping: disabled

Batch size: 1

Dataset directory: wherever you’ve put it on your computer

Log directory: textual_inversion

Prompt template: minimum_style_2.txt

The template has 3 lines:

<<<

[name] style, [filewords]

[name] style, a photo of [filewords]

[name] style, an illustration of [filewords]

>>>

Width = Height = 512

Do not resize images: OFF

Max steps: 3000

Save image steps: 25

Save embedding steps: 25

Use PNG alpha channel: OFF

Save images with embedding in PNG chunks: ON

Read parameters from txt2img tab: OFF

Shuffle tags: OFF

Drop out tags: 0

Latent sampling method: deterministic

Training time: about 50mins per 1000 steps on a 2060/6GB.

The TI at 3000 steps does produce a Boho style, although I think it’s a bit hit-and-miss compared to the BohoAI LORA.

If anyone has suggestions about what I should be doing differently please add a comment. Or if I’m doing anything obviously stupid! :-)