Sign In

MarblingTIXL

138
1.1k
130
27
Verified:
SafeTensor
Type
Embedding
Stats
441
130
Reviews
Published
Jan 31, 2024
Base Model
SDXL 1.0
Training
Steps: 500
Trigger Words
xlmrblng36-500
Training Images
Download
Hash
AutoV2
25613074D7
default creator card background decoration
ODOR Badge
chromesun's Avatar
chromesun

31 Jan 2024

v2.0 isn’t a better version of MarblingTIXL. Just different. v1.0 still works fine.

With the changes in kohya it turns out the way I made v1 of this TI no longer works, or at least doesn’t produce anything very useful.

Thanks to @raken for letting me know about this.

I still think there’s great potential in SDXL embeddings so I’ve done a fresh kohya_ss install (v22.6.0 at time of writing) and worked my way through various parameters/settings until I found a combo that makes a close relative of the original MarblingTIXL.

In case anyone’s interested in SDXL TIs (and I know there are at least 2 of you out there!), I’ve included my training data and kohya_ss config JSON. Possibly some notes as well if I can think of something useful.

On the upside, this TI trained faster... on the downside it’s not as consistent as the older TI. Or maybe I haven’t played with it enough. Who can tell out here on the bleeding edge?!

If anyone’s got any questions, observations, opinions or wisdom to share please stick a comment below. There’s doesn’t seem to be much hard info out there about how to create TI styles at the moment... I’ve read/watched lots of contradictory viewpoints. It can be done though, and I think there’s scope for better TIs than I’ve managed so far.

Competition for LoRAs? Nope, not really - LoRAs add something to a checkpoint whereas TIs leverage what’s already in the checkpoint. If I’ve understood it right, TIs let you reach areas in a checkpoint’s possibility space that it would be difficult to reach consistently. So TIs and LoRAs are different things for different purposes... that you can use together. So everyone is happy :-)

There are technical papers around (covering what a TI is, how you should train one, stuff about text encoders, etc) but I’m usually out my depth by half-way through the first page :-(

As far as I can tell, kohya_ss is only training the first TE (text encoder) in SDXL. That’s the one from SD v1.x that should work in auto1111 SDXL generation but doesn’t. (Some people have reported that SD v1.x TIs do work in Comfy, but the experience seems to be variable.) As far as I can tell the second TE isn’t being trained in kohya_ss (it’s the one from SD v2.x). Or maybe it’s a duplicate of TE1?

I had a go with OneTrainer (which has options for both TEs) but didn’t have any success with the few runs I tried, so I’m sticking with kohya_ss for now.

For reference, I’m using an RTX-3060 with 12GB on a reasonable PC. Current kohya_ss runs just overflow the 12GB (+ another 6GB if I’m generating samples) so it’s heavier on resources than doing a LoRA. I thought TIs would need less (or the same) resources so I’m a bit surprised. Perhaps there’s no perceived need to optimise for TIs? Yet :-)

The TI here was trained on:

sd_xl_base_1.0_0.9vae.safetensors

The showcase images were generated using:

crystalClearXL_ccxl.safetensors [0b76532e03]

i.e. a TI trained on vanilla base should work with other checkpoints.

Images are generated in a1111 v1.7.0 and I’ve used Hires.fix but no other adjusters.

The additional gallery below shows without/with pairs so you can see how the TI affects some selected prompts. Label “xlmrblnh36-500” means without, label “xlmrblng36-500” means with. I’ve done it that way to keep the two prompts as similar as possible.

If you’re interested, the Training Data zip contains the all the saved TIs at 25-step (*4 gradient accumulation = 100 normal steps) intervals.

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

NOTE: There is a problem with SDXL in the current version of automatic1111’s webui (v1.6.0). If you use a refiner checkpoint, webui forgets all your embeddings until you load a different checkpoint and then reload your original checkpoint (or restart webui). I have raised the issue with the developers:

https://github.com/AUTOMATIC1111/stable-diffusion-webui/issues/13117

and it has been confirmed as a bug.

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

***SUMMARY***

This embedding will apply a surreal/fantasy aesthetic inspired by vintage marbled paper patterns. The effect varies from low to extreme depending on how “close” your prompt already is to this aesthetic.

The training for this TI did not include any artist works or tags.

Copy the generation data from one of the showcase images and adjust it to taste, or start with a prompt like this that should give a decent result with any seed:

award-winning Art Nouveau xlmrblng15-1300, analog realistic colour photo of a Japanese mermaid sitting on a rock in the midst of crashing waves, very detailed

checkpoint: crystalClearXL_ccxl.safetensors [0b76532e03]

sampler: DPM++ 2M Karras

steps: 40

CFG: 7

height=width=1024

and then vary the terms as you please. Try to keep between 3 and 5 words before “xlmrblng15-1300”.

The simplest prompts worth trying are this sort of thing:

cybernetic nun, xlmrblng15-1300

fantasy winter landscape, xlmrblng15-1300

but generally you’ll need more words to get interesting results.

After lots of experimentation I found I was getting my best results with prompts between 30 and 45 tokens, with no negative prompts.

I have provided some before/after image pairs in the extra galleries below.

xlmrblnh15 = without this TI

xlmrblng15 = with this TI

As you’ll see, this TI does more than simply adding marbled paper patterns :-)

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

***MORE DETAIL & TRAINING INFO***

This is a TI (textual inversion) embedding that adjusts your image generations by adding marbled paper patterns, or adjusting things towards marbled paper patterns depending on your prompts. Because of the way the SDXL system works, the effect with longer/complex prompts will often be structural rather than simplistic.

It’s the SDXL successor of my MarblingTI for SD v1.5:

https://civitai.com/models/69768/marblingti

Because of all the changes in SDXL I had quite a lot of false starts (20+), but I think this new TI is more useful than the old one... at least for the surreal/illustrative stuff I like to create.

Switching from automatic1111 to kohya_ss for training was not an easy process. More on that below.

The TI is 8 vectors (i.e. it takes 8 tokens of your prompt). It is overpowered for short/simple prompts. That’s by design - I did make a few subtle versions but they were no use for the longer/complex prompts I’ve been using with SDXL. From what I understand of Stable Diffusion, 4 vectors should have been enough but I couldn’t get consistent results with 4 vectors.

The source material consists of scans/photos of vintage marbled paper that were made into several precursor TIs, that were then used to create hybrid pictures, that became the inputs for this TI.

For prompting you’ll need to front-load with 3 to 5 tokens.

i.e.

portrait of a woman, xlmrblng15-1300

rather than

xlmrblng15-1300, portrait of a woman

If you use a short/simple prompt you’ll likely just get a vintage marbled paper pattern. Fine but boring. Also, for shorter prompts the TI might add a slight green cast to images. I’m not sure why; the training images don’t have any overall cast.

Weight/emphasis: from 0.81 up to 1.33 is usable depending on prompt. I find I get more consistent results by moving the TI token rather than using weighting.

All the image generation for this TI was done in automatic1111 webui v1.6.0. The only non-built-in extension I’m using is Dynamic Prompts (installed via Extensions tab). I’ve not used Hires.fix or in/outpainting or detailers or other TIs or LoRAs etc, so that you can get an idea from the showcase/gallery images about whether it’s worth your while to try this TI.

https://github.com/AUTOMATIC1111/stable-diffusion-webui

https://github.com/adieyal/sd-dynamic-prompts

I usually use the CrystalClearXL:

https://civitai.com/models/122822?modelVersionId=133832

or SDXL FaeTastic

https://civitai.com/models/129681?modelVersionId=157988

checkpoints for SDXL image generation, but this TI works with every SDXL checkpoint I tried.

Because of the way prompting works, if you want to see the effects with/without this TI then change a single letter only.

e.g.

WITH: cybernetic nun, xlmrblng15-1300

WITHOUT: cybernetic nun, xlmrblnh15-1300

You can change the trigger word by renaming the safetensors file you downloaded. PROBLEM: if you change the trigger to a word that SDXL “knows” such as marbling, you’ll get unexpected results. Even if you stick words together like newmarbling, SDXL will pick out “new” and “marbling” and, umm, do stuff with those rather than the TI.

The name I’ve used is to tell me it’s an SDXL TI, it’s marbling (mrblng), and it’s the 1300 step iteration of version 15.

I often use an art movement at the start of my prompts, e.g. Art Nouveau, either as-is or weighted down to somewhere between 0.3 and 0.5. Jumping off pages for Art Movements:

https://en.wikipedia.org/wiki/List_of_art_movements

https://en.wikipedia.org/wiki/Periods_in_Western_art_history

If that doesn’t suit the prompt purists, try something like “award-winning illustrative” instead. For me, adding an art movement I enjoy means I don’t have to fiddle with the rest of the prompt as much to get a similar effect. The short-list of art movements I like are listed in a txt file and put into Dynamic Prompt’s wildcards folder so that I can just use __Art_Movements__ in my prompts.

As a rule I don’t use artist names except, occasionally, posthumous ones to get a very particular effect. e.g. René Lalique

https://en.wikipedia.org/wiki/Ren%C3%A9_Lalique

I’ve started using kohya_ss (v21.8.9) for TI training since it looks like automatic1111 will not be adding SDXL training to webui.

https://github.com/bmaltais/kohya_ss

There are a lot of settings/config in kohya_ss and I still don’t know what half of them mean :-( However, I’m going to give some info here that might help people wanting to train SDXL Textual Inversion styles using kohya_ss. I haven’t tried an SDXL TI object, and I can’t get LoRA training to work in kohya_ss (it either fails to start training or falls over partway through).

I can only describe the settings that worked on my own PC, but I hope it’s still relevant for similar PCs. So...

The PC I’m using is:

Nvidia 3060/12GB (not the Ti version), MSI X570 mb, Ryzen 7-2700 (8c/16t), 64GB system RAM, multiple SSDs, Win10pro.

Created a folder structure:

XLmrblng15

\--img

\--\--50_XLmrblng15 style

\--log

\--model

Training images:

I created 45 1024x1024 images and put them in the “50_XLmrblng15 style” folder. Then created a .caption file for each image. Example:

cliff with waterfall.png

cliff with waterfall.caption

Caption files are just text files so I used a simple text editor. The contents of each .caption file follow the same pattern:

xlmrblng15, cliff with waterfall

That’s the name of the TI I’m creating, a comma, a space, and the descriptive filename.

I don’t use captioning utilities.

For the stuff below, if a parameter isn’t mentioned it means I left it at default.

In the main “Textual Inversion” tab in kohya_ss:

Source model tab

Model Quick Pick = custom

Save traind model as = safetensors

Pretrained model name or path = G:/stable-diffusion-webui-master/webui/models/Stable-diffusion/SDXL/sd_xl_base_1.0_0.9vae.safetensors

SDXL model = ticked

Folders tab

Image folder = G:/KOHYA/TRAIN/XLmrblng15/img

Output folder = G:/KOHYA/TRAIN/XLmrblng15/model

Logging folder = G:/KOHYA/TRAIN/XLmrblng15/log

Model output name = xlmrblng15

Parameters (basic) tab

Token string = xlmrblng

Init word = pattern

Vectors = 8

Template = caption

Mixed precision = bf16

Save precision = bf16

Number of CPU threads per core = 1

Cache latents = ticked

Cache latents to disk = ticked

LR Scheduler = constant

Optmimizer = AdamW8bit

Learning rate = 0.001

Max resolution = 1024,1024

No half VAE = ticked

Parameters (advanced) tab

VAE = G:/KOHYA/sdxl_vae.safetensors

Save every N steps = 100

Gradient checkpointing = ticked

Memory efficient attention = ticked

Max num workers for DataLoader = 4

Parameters (samples) tab

Sample every n steps = 100

Sample prompts =

an analog realistic photograph of a magnificent jug on a table with glass tumblers, very detailed, intricate, xlmrblng15 --w 1024 --h 1024

xlmrblng15, an analog realistic photograph of a magnificent English lady wearing a Victorian bathing dress, very detailed, intricate, --w 1024 --h 1024

With all the above settings, training time settled at around 6s/it. Variable because I still use the PC for other (simple!) things while kohya is doing its stuff. The xlmrblng15-1300 was produced at around 2hr10min into the run.

For the bulk of the training run, GPU RAM usage was just inside the 12GB of my 3060. However, during sample generation and saving the TI every 100 steps an extra 7GB was used (i.e. 19GB in total). That extra 7GB comes from “Shared GPU memory”, i.e. main system RAM. After the sample generations the memory usage went back to just the 12GB GPU RAM.

The slowdown when using “Shared GPU Memory” was about 10x. Bleeping bleep :-(

The sample images kohya produces are very poor compared to using the base SDXL model in automatic1111 webui. But I left them in because at least I could see if the training was heading roughly in the right direction.

Obviously your training dataset is very important. I tried lots of different combinations of generated and real images until I got a set that produced the TI on this page.

For 45 images, using a batch size of 1 (the default), the special folder name “50_XLmrblng15 style” tells kohya to process the images 45 times. 45 * 50 = 2250 steps in total. After testing the various saved TIs at 100, 200, 300 steps and so on, I decided the 1300 step one worked best.

On the Parameters-Basic tab there is an “Init word” field. I found that training was very very sensitive to what I used as the init word. In this particular case I used “pattern” which is a 1-token word as far as SDXL is concerned. Theoretically I should have been using an 8-token phrase (kohya gives a console warning if vectors =/= init tokens). For some runs I did use more tokens and got some very interesting TIs, but not what I was looking for.

Using “pattern” has a drawback: depending on your prompt to SDXL, you may get lots of repetition - like a repeating pattern on wallpaper or gift-wrap.

Using “marblng” or “paper marbling” didn’t work: compared to SDv1.x, SDXL “knows” much more about marbling. Try it in your prompts! Ask for marble/marbling/marbled things and SDXL does so much better than SDv1.x. Any time I made a TI where the init word was marbling or a conceptually close term, what I got was a TI that used SDXL’s inbuilt marbling rather than the training from my dataset. :-(

I looked into the history of marbled paper and tried some terms such as “ebru”, the Turkish version of marbled paper. That didn’t work very well either. In the end, the the very wide term “pattern” gave me most of what I wanted.

Kohya_ss has the option of a “style” template on the Parameters-Basic tab. I’ve had a couple of decent results using “style” for some of my non-published SDXL TIs, but for this marbled paper one the results were not good.

Textual Inversion .vs. LoRA

I’m concentrating on TIs because (a) I can’t get a LoRA to train to completion, and (b) I want to leverage the content in SDXL rather than layer more data on top of it. I’m not against LoRAs - far from it! I’m having great fun with LoRAs by konyconi and others. Getting some great wow! results :-)

But I feel more of an affinity to TIs at the moment. The way I think of it is that TIs allow you to adjust your prompts into SDXL areas that simple words can’t reach, but LoRAs add new stuff on top of SDXL that you “merge” with SDXL’s own content by way of prompts.

That’s very simplistic but I don’t want to get into a discussion about SDXL’s full sample space .vs. its probability space, and what happens in a superset. This is a hobby for me, not a job :-)

Last thing I’ll mention here is that I’m using Dynamic Prompts’ wildcard system heavily. My typical prompts using this xlmrblng15-1300 TI look like this:

(__Art_Movements__:0.5) xlmrblng15-1300, mature __Nationalities__ (__Character_MF__) riding a __BW_Animals__ in a white-tinted __Landscapes__, __Metal_Color__ filigree inlay

Instantiated prompts (i.e. looking at it after Dynamic Prompts has done its thing) tend to be between 30 and 45 tokens.

When I drag a generated image into the “PNG Info” tab in automatic1111 webui, a typical result of the above dynamic prompt is 34 tokens long:

(Surrealism:0.5) xlmrblng15-1300, mature Swedish (male vampire) riding a dalmation in a white-tinted mudflats with scarlet cranes, black filigree inlay

Why put things like nationalities in when SDXL pays little attention to it in longer prompts? Because SDXL is biased and will add little extras associated with nationalities. It can be things like red hair if Scottish is mentioned, pyramids if Egyptian is mentioned, or Mt Fuji if Japanese is mentioned. Works for other things too; context linking/association seems to be much heavier in SDXL than SDv1.x. Trying to control it is a pain in the butt :-(

Resolutions I use for SDXL are usually 1024x1024, 960x1344 and 1344x960. The suggested resolutions I’ve seen here and there on the net suggest using the base 1mp (megapixel) 1024x1024 of course, and other resolutions that are as close as possible to 1mp. So if I want 1344 width I should use 768 height. I tried that and my perception of quality for the 1344x768 image was much less than for the 1024x1024 and 1344x960 ones. Also, the 1344x960 scales exactly to my 7“ by 5” photo paper. So there’s that :-)