First of all, these are the tips that I have derived for myself, which have accelerated and facilitated my interaction with the AI. And I wrote this post because one person asked for advice. Although I still have a lot to learn myself.
Perhaps some of the beginners will find them useful, if anyone has something to add or wants to argue, write in the comments. There is a chance that in a day some of these tips may seem wrong to me.
1. You need to imagine what you want to get, at least in general terms of the composition, it will be MUCH easier to compose the prompt. An example of reflection: girl + car => how the girl interacts with the car (in the car, near, on it?) => girl inside the car (driver or passenger?) => girl driver of the car (racing, regular?) =>girl driving a car (where? where?) => girl driving a car on a highway in the middle of the desert (from what angle are we looking at her?) => "girl driving a car on a highway in the middle of the desert, view from inside the car, from the side."
The main prompt is ready. This is the most banal hint for a beginner. And in principle, the first step of thinking is also the last for many, there is immediately a picture in the head. Here is what came out of promt=
I did not describe the girl or the car, the model did not quite understand what angle I wanted. But otherwise the result is acceptable. Then you just need to expand promt-
"Sexy blonde girl driving a sports car on a desert highway. She is wearing a white top and denim shorts. She is holding the steering wheel with one hand, and her other hand is out the window . Side view from the inside car."=
Prompt is written for the XL model, she understands the text in this form, as well as tags.
2. You need the right model, not all models depict everything in the world equally well. As you can see above, the car turned out so-so, unlike the girl. (of course, this can be corrected first of all by adding promt yourself, secondly by embedding, thirdly by using Lora) -
"Sexy blonde girl driving a sports car on a desert highway. She is wearing a white top and denim shorts. She is holding the steering wheel with one hand, and her other hand is out the window, detailed car interior. Side view, from the inside car. extremely high quality RAW photograph, detailed background, intricate, Exquisite details and textures, highly detailed, ultra detailed photograph, warm lighting, artstation, 4k, sharp focus, high resolution, detailed skin, detailed eyes, 8k uhd, dslr, high quality, film grain, Fujifilm XT3"=
I added "detailed car interior". and one of the templates for realistic images. in fact, not all tags are needed here, but I'm too lazy to choose only the right ones). I only got by with a promt to fix the car's interior.
But all this is an experiment in any case, and the results may require a dozen generations. For specifics, it is better to use control net tools.
So, before you torture any model, trying to get a turtle riding a dolphin, check if it can draw them. In the description of the author of the model, using the script X Y Z.
3. The quality of the drawing itself is affected by the description and settings, again comparing the drawings above, after adding various tags to the prompt that talk about high quality - you can see how it has changed. Negative prompt can also greatly change the drawing. I would recommend doing the first generations with an empty negative, if you do not know exactly how the tags you enter there affect.
A greater number of steps improves the quality, sometimes at 20 and 30 steps there are completely different drawings, BUT here it should be noted that this does not always work, the more steps - the better, it all depends on the model and the samplers used. You can look for graphs of the samplers depending on the steps. Some of them, on the contrary, give better results with a small number of steps. CFG, again it all comes down to the model, more or less, try. There is such an extension DynamicThresholding, in short it can imitate a high CFG, thereby increasing the detail (but changing the picture).
I will also mention embeddings and Loras, but with them you will also have to do tests first to find the right settings, especially when you use several at the same time. I usually start with the maximum value of gravity for Lora, and then lower it.
Well, and choose the resolution correctly. And see when it is desirable that the height > width, and when the width > height, this can affect the overall appearance of the image.
In general, for me it all comes down to using 3-4 models, in which I already know approximately what it can do and what I should tell it for this.
If anyone has more precise advice or data on how the processes work, please share in the comments.