May 9, 2023
Base Model
SD 1.5
Steps: 5,000
Trigger Words
This is an experiment to see if I can make a TI embedding that gets the flavour of konyconi’s BohoAI LORA.

Thank you to @konyconi for sharing his dataset for the excellent BohoAI LORA.

The showcase uses 2 models:

revAnimated_v122.safetensors [4199bcdd14] with clip skip = 2

avalonTruvision_v2.safetensors [a4df55d292] with clip skip = 1

This TI can produce some decent Boho pix but it gets confused sometimes... eg asking for a spaceship and getting a truck. Perhaps for this sort of TI you need to use a lot more pictures in the training dataset, with more subject variation?


Update 09 May 2023

Continued the training to step 4000, and then 5000.

kcboho07-4000 produces a stronger Boho style.

kcboho07-5000 is stronger again but has increased duplication/repetition. e.g. more fingers, more hands, duplicate cities floating in the sky.

Tried 6000 steps but it’s even worse - overcooked.

I’ve uploaded the 4000 step version as, probably, the best result for this experiment.

Also uploaded the 5000 step version since it can produce nice results with careful object prompts.


I’ve been struggling to work out how to make a style TI...

what makes a good training dataset?

what training settings should I use in automatic1111?

how long to cook the TI for?

For my training dataset I copied konyconi’s 76 1024x1024 images to a new folder without the associated TXT files, and reduced them all to 512x512. Then I renamed them “01 aeroplane.png”, “02 city.png”, “03 tank.png” etc.

Why? Because I was trying to match what I’ve done in the past for TIs that ended up usable. The reduced images dataset folder is what I used in the settings below.

The wikipage for automatic1111 Textual Inversion is here:

but way out of date. Last rev Jan 5, I’m writing this May 8.

I found this thread useful in parts. It’s a long read!

Training model: v1-5-pruned.ckpt [e1441589a6]

I used this because I don’t know any better, and it’s been useful in the past. Should I be using a different model for training, or is the base SD15 the best thing to use? No idea.

Create embedding:

name: kcboho07

initialization text: boho style photo

number of vectors per token: 4

Train embedding:

Embedding name: kcboho07

Embedding Learning Rate: 0.001:250, 0.0005:500, 0.00075:1000, 0.001

Gradient Clipping: disabled

Batch size: 1

Dataset directory: wherever you’ve put it on your computer

Log directory: textual_inversion

Prompt template: minimum_style_2.txt

The template has 3 lines:


[name] style, [filewords]

[name] style, a photo of [filewords]

[name] style, an illustration of [filewords]


Width = Height = 512

Do not resize images: OFF

Max steps: 3000

Save image steps: 25

Save embedding steps: 25

Use PNG alpha channel: OFF

Save images with embedding in PNG chunks: ON

Read parameters from txt2img tab: OFF

Shuffle tags: OFF

Drop out tags: 0

Latent sampling method: deterministic

Training time: about 50mins per 1000 steps on a 2060/6GB.

The TI at 3000 steps does produce a Boho style, although I think it’s a bit hit-and-miss compared to the BohoAI LORA.

If anyone has suggestions about what I should be doing differently please add a comment. Or if I’m doing anything obviously stupid! :-)