Before diving in, I want to make clear that in this field everyone has their own recipe and workflow for building datasets and training LoRAs. There is no definitive guide β experimentation is how you find your own path. With this simple guide you will end up with a complete dataset ready to train a LoRA for your favourite character. Personally, I focus on photorealistic characters, so the tutorial is oriented in that direction, but the workflow is perfectly adaptable to other styles.
Phase 1 β Choosing the Reference Image
Everything starts with a reference image. Everyone has their own method: some use commercial models, some run local models with prompts, some use dedicated tools. The important thing is to get an image of a bust or a face β it does not need to be photorealistic. In the example below I used an image generated with Artbreeder.

The image can be square or rectangular β it does not matter. This will not be the starting image for the dataset; it is purely a visual reference.
Phase 2 β Creating the Starting Image
Once you have the reference image, you feed it into a workflow that produces a high-quality photorealistic result. The workflow is based on Flux Klein 9B β a model well suited for image editing β combined with a βconsistencyβ LoRA you can download here:
https://huggingface.co/dx8152/Flux2-Klein-9B-Consistency
From my tests, this LoRA significantly improves coherence with the reference image and reduces much of the artificial aesthetic typical of Klein, such as plastic-looking skin and overly standardised facial features. This issue is less pronounced with Flux 2 Pro, but that is a separate discussion β here we want to do everything locally.
The workflow outputs images at 1024Γ1536. I find a vertical format ideal: by including the upper body in the frame, the dataset will produce better consistency results when training.
With this workflow you get a result like the one below:

As you can see, the result maintains a good likeness to the reference. Without the consistency LoRA the outputs tend to be far too generic.
Phase 3 β Photo Retouching
If the result satisfies you, move on to the next phase. Otherwise, here are two approaches I often use:
Light retouching in Photoshop: Use Camera Raw and masks to adjust tones for individual areas (hair, eyes, mouth, and skin tone).
Liquify filter: This lets you adjust the size and spacing of facial features. I recommend keeping changes subtle β if you push things too far, the model will tend to standardise the exaggerations during training. Think of it as a small nudge rather than a transformation.
ZIT denoising: If you are not happy with the skin quality, you can run the image through ZIT with a very low denoise value (0.1). This avoids altering the character while smoothing out some of the artefacts Klein can produce. Results vary β sometimes it helps, sometimes it does not. Experiment!

Phase 4 β Dataset Creation
Now that you have a solid starting image, it is time to build the dataset. The attached workflow can generate around 45 images of the subject in various poses and framings.
Inside the workflow there is a ComfyUI node called CR Prompt List (part of the CR Roll suite). The prompts were selected after extensive testing and, in my opinion, offer a good balance between close-ups, half-body shots, and full-figure images.
At the top of the node you can add a prefix phrase that will be prepended to every prompt β for example, "a tall woman with large breasts". This encourages the model to generate more consistent body proportions. As always, feel free to experiment β you can also leave it blank.
Out of 45 images, some will be duplicates and others will show obvious deformities. Those should be discarded. I prefer a bulk-generation approach: produce a large batch, then hand-pick the best results. Some images can also be flipped and reused (e.g., left and right profile).

Phase 5 β Captioning
To generate captions for the images I use TagGui, a practical and powerful tool for creating easily editable captions. It uses JoyCaption as its engine and the results are very good.
Once the captions are generated, I edit them manually, removing anything superfluous. I generally keep the description of the pose, clothing, and background β everything else I want the model to learn on its own.
For example, the caption for the photo below might read:

"Close-up photograph of a woman with a neutral expression. She is wearing a white t-shirt and looking to the right. Simple dark grey background."
That is all! I hope you find this guide useful β feel free to leave feedback.
Attached you will find both workflows: one for creating the starting image and one for generating the full dataset.

.jpeg)