A simple guide for VEHICLE LORA

Some users asked me how to train a LORA on their designated vehicles . But due to my acknowledgement there are too less LORA tutorial that are about vehicles, they are lack in tutorials , videos and articles discuss about it, it may take you a lot of time to learn and form a dataset then proceed to training. I faced a lot of problem when I start making a vehicle LORA so after some trying I hope that my rough experience can provide a little bit assistance in helping you training your favorite's vehicles because I have seen some of them gave up in training vehicle LORA because it's guideless. But before you start to train a vehicle LORA, I would recommend you train a real person LORA first to obtain some experience to get a smoother experience in training LORA. This would just simplified some process only. Specially credited to kuroya[RTX4080 16G], gesen2gee, A587and xi_nai_wa_sha_xia. You interact with them at here (https://discord.gg/TM5d89YNwA). They are my teacher in doing this VEHICLE LORA guide. Really appreciate to their assistance.

I also feel that this guide is too long jut directly look at PROCESS if you are in hurry.

Requirements for training a VEHICLE LORA

At least have one experience in training a ANIME or REALISTIC LORA, it would be more easier for you to train a LORA.

===========================================================================

Introduction

Some vehicles are easy to make because there are many pictures to make but for military jet some of them their pictures are less or blur especially old jets but luckily most of them have enough pictures to train but their specialty of the jet is hard to be callout without controlnet . For my opinion except for Realistic Vision this checkpoint there is no other checkpoint that can train a jet fighter and tank that well. But to represent most of the vehicle especially for jet characteristics, it may be not generalization. For my personal opinion I think that this kind of theory could be commonly adapt in VEHICLE LORA training.

So, let's take Mig-21 as example

Preparation for dataset

I only use about 25 images that are suitable for training an Mig-21. Remember high resolution is always a important point in training a LORA don't chase on numbers of pictures use as dataset but chase on quality of picture of dataset. Due to my experience over 45 repeats and 11 epochs are needed for jets. Adjust due to your dataset pictures, don't follow me because even for me there are a huge floating in the range of repeats, epochs and optimizer type I needed in training.

Captioning tricks

Before training your LORA use blip caption with only add the name of the jet in front of the blip or at the end of the captioned sentence then train your first LORA version out (Let us start)

(PICTURE A1)

The first version of LORA without further captioned are trained. So, what is missing picture on the above picture compare to Mig-21 picture provided below?

The pictures generated shares the same prompt, steps and every setting. So why are they so different?

The pictures of the above and below are generated by two LORA (Mig-21) that shares the same dataset, repeat, epoch and other settings. The only difference are one is well captioned and another ones doesn't.

(PICTURE A2)

As an comparison, after a detail captioned on the jet, the LORA with a adjusted caption would be more easily to show the details of the jet.

So due to my observation a well adjusted caption is very important to a jet fighter, furthermore it may also suitable for other types of vehicle doesn't perform well in normal blip caption .

The caption provided below is a comparison to show the difference between the datasets. Let's us use this photo as example.

Original caption (PICTURE A1): Mig-21, a fighter jet flying through the air with smoke and fire coming from back

Adjusted caption(PICTURE A2): Mig-21, a single seated fighter jet Mig-21 with cockpit canopy closed with a grey military camouflage is accelerating through it's afterburner at the jet tail duct with landing gear open with main wheels exposed with a air data boom similar to one string at front of the jet fighter near intake center body in the head of the jet which also with no any missiles are hanging on the wing fence, there also one dots one each fighter jet wings ,one of the horizontal stabilizer at left is block by the vertical stabilizer cause it's unseen and the jet fighter is flying in a cloudy sky. ( Yes, it's manual adjusted caption)

Process

For vehicle LORA that's a little bit time consuming as you are suggested to train a version with only original blip caption and a added vehicle name. In the second round of training the LORA, you need to tag most of the components that are missing out in your first version and retrain it again. For jet fighter it need a higher epoch and repeat for a better result which make it very easy to overfitting. (Any big boss please train a jet fighter checkpoint pls). After these process it may easy for you to train an vehicle. Cheers. A little reminder use REALISTIC VISION as training checkpoint.

BAD EXAMPLE IMAGE OF MY LORA WITHOUT THE DETAIL CAPTIONING

Here are some example that use the way that I mention before .

This is an excellent example due to my opinion.

A Mig-29 jet without detailed captioning , the horizontal stabilizer become two smoke at the back.

Mig-29 with three horizontal stabilizer without detailed captioning.

A mig-29 without detailed captioned become a F-16 (my bad, sorry)

F-4 with three jet pilots without detail captioned.

Notes: I don't know why for vehicles especially for jets the logics are alternative with others, it's kinda weird. Also I am not a professional or a person gone through systematic learning. So, there might be so many mistake in this articles, all of these are gain through my experience only. I heard an explanation before that said it separate the details of my picture to my caption to adapt to the dataset. Also, for dual jet engine plane, I am still fixing it. No clue for me to solve.

If you think I am wrong don't hesitate you are correct. Because I am not professional person, it's just a trick I found interesting during the training process.