Notes on LoRA Training

Foreword

The article is inspired by the article One Image Is All You Need [1] from urusan [2]. My approach is a little bit different, but similar. This is driven by the fact that I am most of the interested in photorealistic images.

I was curious to see if I could apply and implement the described approaches to photorealistic images. It was not clear to me whether his approach could be reproduced in this way. As one can see, his approach is working well.

This should also show me how I can better implement my own ideas in the future by use of a LoRA model.

Presumptions

I am using AUTOMATIC1111 on Linux together with some extensions like TrainTrain at the moment. With help of the extension TrainTrain one can really good train LoRA models from within AUTOMATIC1111.

Starting point

Based on the given idea from [1] for an image of a woman I used a slightly modified Prompt for my testing needs.

woman, holding red apple in the right hand, blue blouse with pocket, short blonde hair, blue eyes, sitting behind a big desk, wooden desk, one hand on the table, wall, window, cityscape, tree, front view

The Negative Prompt used was as follows.

bad anatomy, wrong proportions, papers, shelves, pictures

I modified the Negative Prompt on the way to remove unwanted papers on the desk, shelves at the wall as well as pictures on the walls.

The result should simply be a sitting girl behind a desk holding an apple in a nearly empty room.

The former described approach was resulting after some tries in the following more or less photorealistic image. I have not made any special efforts to ensure that everything is anatomically correct.

Training

As stated before I used one image, introduced in the last section, like urusan did and added a caption file with the following content.

woman, holding red apple in the right hand, blue blouse with pocket, short blonde hair, blue eyes, sitting behind a big desk, wooden desk, one hand on the table, wall, window, cityscape, tree, front view

In contradiction to urusan I used some possibilities from AUTOMATIC1111 and TrainTrain to use 8 images instead of 1 image for the training. Therefore I used the mirroring feature and the feature to create a bunch of buckets from a given image.

The model I used for the training was sd-v1-5.safetensors. The dimensions of the image were 512 x 512 pixel. I decided to use as Trigger Word applegirl. Training iterations were 1000. Train lr scheduler was cosine. Train optimizer was adamw. Image buckets step were 128 and image min length 256. Other parameter were standard.

The LoRA model creation took on my local machine round about 12 minutes.

The crested model can be found here for testing purposes https://civitai.com/models/823154/applegirl.

Applying the Model

First I created an image based on the original Prompt an got a replica of my training image.

No I tried to exchange the apple by an orange.

Next I tried to change the apple to an cup, which worked.

Last but not least I created a image with a red ball.

So far so good. It is possible to exchange one thing like an apple by other things with a little bit patience.

Changing the Prompt

No I changed the Prompt to

woman, holding red apple in the right hand, blue blouse with pocket, short blonde hair, blue eyes, running in the park, front view, <lora:applegirl:0.8>

The results in

No I changed the Prompt to

woman, holding red apple in the right hand, blue blouse with pocket, short blonde hair, blue eyes, running in the park, front view, <lora:applegirl:0.5

results in

Last but not least I tried a full body shot to see if this is in principle possible and got the following image. More effort is needed here to get a really good picture.

And now a more complicated image of a girl with an apple sitting on a car. As before the quality has to be improved for a good image.

Realising Different Concepts

Such a model can be used to realize very different concepts. One example is the following picture of a girl with a magic wand in a magic school in a black outfit.

Learning Rate

The following figure shows the learning rate of the LoRA model with the topic woman holding the apple. I trained not using warm up iterations. So one gets the smooth curve of the learning rate.

I am focusing more and more on this learning rate to see how I can improve model training.