Community question time #2: What’s Your Image Generation Process?

(I planned her to have 7 fingers, do you believe me? 😅)

Hey everyone, I’ve been curious lately - what does your workflow look like when you create AI images? I feel like a lot of us probably follow pretty similar steps, even if the tools or styles differ. I actually use two totally different processes depending on whether I’m doing anime or realistic stuff, and I’ve seen some really clever tricks out there that I’d love to try. So, genuine question: how do you do it?

Use this TL;DR as a checklist to see if you’re already familiar with all the approaches. If you want to dive deeper into any of them, check the full comments.

TL;DR of the entire comment section (as of Dec 6 2025):

1. Prompt-Centric Generation (Focus on Text Input and Iteration)

Start with basic or vague ideas, build detailed prompts gradually by adding elements like eyes, hair, clothes, scenes, or actions; test with low steps (e.g., 12-14) for speed, then increase for quality.
Use trial-and-error: Generate single images or small batches (e.g., 1-4), tweak prompts based on results, repeat until satisfied; common for both anime and realistic styles.
Employ precise prompt engineering: Spend time crafting intricate structures, including weights, indirect positioning tokens, or editing (e.g., switching tokens mid-generation for better object placement or focus).
Leverage AI assistance: Use LLMs (e.g., for expanding prompts into natural language) or chat tools to brainstorm action/posing ideas, generate sets of prompts, then refine them manually.
Incorporate tags and wildcards: Look up booru-style keywords, use quality/scene/character/clothing order, or wildcards for variety; run overnight for large batches (e.g., 25-50) and pick best.
Low-effort variants: Type a prompt, generate 20-50 times at once, select one; or aim for one-shot with pre-selected LoRAs, adjusting if conflicting.

2. Multi-Stage Workflows (Layered Generation and Refinement)

Basic sequence: Txt2img for drafts (e.g., batch 2-25, 40 steps), select best, then hi-res fix (e.g., 24-35 steps, denoising 0.55-0.8) with sampler changes (e.g., Euler A to DDIM for creativity).
Advanced sampling: Use 2-3 KSamplers in sequence (e.g., advanced for controlled start/end steps like 0-20 then 20-35) for more precision over single sampler; or multi-KSampler at mid-denoise (e.g., 0.7) to correct initial outputs.
Model switching mid-process: Generate initial with one model/checkpoint for concepts, refine with another for details; or swap CLIP encoders while using generation layers from a different model.
Upscaling and detailing: Native high-resolution generation (e.g., 1024+ sides, experiment with ratios like 1080x1920), followed by iterative upscalers (e.g., ultimate or by factor of 2-3); include stabilizer add-ons for high-res stability.
Face and detail enhancement: Apply face detailer/fixer (multiple workflows, e.g., detection vs. inpainting for multi-faces); use separate models for faces; chain 2-3 detailers in row for better results; optional unsampler for reversing generations.
Automation loops: Randomly select images from folders, auto-generate descriptions/prompts via vision models, create variants with latent fidelity (zero/medium/strong); mix conditioning tensors/latents from pairs for variety.

3. Control and Guidance Techniques (Beyond Pure Text)

Pre-generation aids: Use pose tools (e.g., for exact poses) combined with ControlNet (pose/depth maps) to guide composition instead of random results.
Drawing-based input: Sketch roughly (even poorly) in tools like digital programs or basic editors, then refine via generation tools; or draw crude elements for complex scenes and inpaint each.
Image-guided starts: Replace random noise with patterned images (underpainting) for unique compositions; or use img2img on existing gens (e.g., from one style/tool) to adapt to new styles.
Regional/masked control: Apply masked prompting for specific angles/poses; or inpaint parts to fix poses, add objects (e.g., background items without interaction), or enhance capabilities.

4. Tool-Specific and Hybrid Methods (Local/Online Platforms)

Online generators: Rely on site-based tools for txt2img/hi-res/img2img, with fast/balanced presets; generate drafts cheaply, upscale; avoid high-cost models by testing prompts first.
Local setups: Use interfaces like ComfyUI/Auto1111/Forge/SwarmUI/Draw Things; workflows with 150-300 nodes for custom sequences (e.g., advanced KSamplers + iterative upscaler); run at night for time-intensive batches.
Multi-tool chaining: Generate in one tool for initial results, switch to another for variations/quality (e.g., due to realism differences); or use mobile apps for closed-source models to save resources.
Post-processing integrations: Edit in external software (e.g., for inpainting, content-aware fill, text); add metadata editors for compatibility; use cloud services for age detection/filtering to avoid moderation issues.

5. Specialized Refinements and Fixes (Post-Generation Polish)

Inpainting/editing: Fix flaws (e.g., hands, faces) via dedicated workflows; regenerate parts for freedom (e.g., adding hard-to-prompt elements like specific object holds); use add-ons for eyes or body parts.
Final enhancers: Apply tools for polishing imperfections (e.g., at 1024+ resolution); or enhancers that work best on lower-quality inputs to boost details.
Merging for variety: Combine checkpoints/LoRAs to reduce biases (e.g., avoid default faces); use strong negatives for unwanted features or multiple descriptions to create variation around desired traits.

6. Video Generation Approaches

Image-to-video: Upload static image, add short action prompts (auto-generated), produce 4 variants with 8 steps, interpolation/upscaling; refine prompts/Loras over multiple sessions.
If image isn't as intended, convert directly to video to repurpose.

7. Philosophical/Experimental Variants

Pure luck/selection: Emphasize choosing from many options over creation; generate variations per seed for fun/exploration.
Full automation: Set up systems to run indefinitely, converging on model tendencies; question uniqueness vs. others' outputs.
Directorial mindset: View process as directing (prompt manipulation) rather than artistry; anticipate future tech for thought-to-image.
Haphazard fun: Low CFG/less prompts for surprises; incorporate rituals/humor for mindset (e.g., clearing mind before crafting).

My Approaches

I’m mainly here because I want to learn from all of you, but I figured it’s only fair to share my own workflow first, so you’ve got something in return.

But feel free to skip straight to the comments and just tell me yours!

Lately, I’ve been thinking about this a lot: I don’t really feel like the artist here. I’m more like a director (or maybe a very picky manager) who explains exactly what I want to the model, keeps iterating until it gets close, and then simply chooses the best result from what it gives me.

Anyway, here’s how I generate (My account is all character-focused).

Anime-style characters (all on Civitai)

Rough drafting. I use the Civitai generator with batch 2 and only 12 steps - for speed. I start simple (eyes, hair) and slowly layer in clothes, accessories, etc. Usually 20-30 generations until I like the result.
Scene + action brainstorming. Start with batch 2 while I’m still changing the prompt a lot, then bump it to 4 or 8 once the idea is almost ready. At the end, it’s mostly just generating images with different seeds and picking favorites. Another 10-30 generations, depending on how picky I’m feeling.
Final polish. I send the winner through hi-res fix/Adetailer with 35 steps. If I’m lucky, a good result will be generated after the first try. But sometimes I had to do up to 10 generations (bad hands mostly)

(I post 4 images per character, so steps 2-3 get repeated for each pose.)

Realistic style (local, ILustREAL + Pony Realism)

I use 3-4 workflows:

I copy and paste the prompt from the anime version and fine-tune it very little - ILustREAL model that I use follows the prompt more strictly, so I had to delete (close-up:1.2), for example.
I use txt2img workflow with batch 25 to generate the first draft. I set steps number to 40, just because I generate images locally at night, so time is not a problem.
I choose the best image of those 25. Sometimes I change the prompt and return to step 2, but not often because of optional step 4.
(Optional) Inpainting workflow - I regenerate parts of the image. That actually increases the capabilities of the model a lot! And give you so much freedom. You can fix poses or add objects. Two good examples - it’s hard to explain that the character should hold a Christmas bauble with their mouth. Or explain that the aquarium should be on the table in the background (spoilers from my posts in two weeks).
Upscaling workflow. I generate 4 generations of each image and then choose the best. Usually, I just choose the image with the best hands.
Face detailer workflow (also called face fix) - I have two different workflows for this. Simple with face detection, and another with inpainting, in case there is more than one face on the image. I use another model for face fix - Pony Realism, just because I like faces generated by this model.
AWS Rekognition for age detection. It is actually a part of the face detailer step. The main idea is to filter out all images with an age of less than 18. After I introduced this step, civitai automatic scanners never block my images (except for some bugs with prompt scanner), unfortunately, it doesn’t help from moderators.

Videos (both styles)

I upload the image to the workflow and come up with a 1-2 sentence prompt with actions. I have “Automatic prompt generation” ON on this step. I generate 4 videos per image. Again, as soon as I generate them at night, I don’t care about the time, so I have 8 steps in the sampler, and I have enabled interpolation and upscaling.
I am rarely satisfied with the first generation. So I copy the prompt from step 1 (that was autogenerated) and change it to how I feel will be the best. Sometimes I can use loras. I repeat this process 2-3 more nights and then give up, even if I’m not 100% satisfied with the result.

So, as you can see, I have quite different processes. In all processes, it’s more about choosing the best option than a creative process, except for the first step of the anime image generation process. So, I want to know - is it the same for you? What does your process look like?

Useful Articles

I also want to share something that I found - other ideas of how the process can look like:

https://civitai.com/articles/19231/my-process-of-creating-ai-generated-images
- The most interesting idea so far. The author uses Posemy.art to create the exact pose they want to achieve. Then he uses ControlNet with the pose and depth map to generate a scene. I think it's the closest approach to actually generating what you want instead of just generating random staff and choosing the best.
https://civitai.com/articles/22880/underpainting-for-ai-a-creative-approach-to-sdxl-composition
- If you remember, the typical txt2img workflow uses random noise to generate your image. The author proposes to use another image with interesting patterns instead of noise.
https://civitai.com/articles/23111/chroma-z-image-controlnet-workflow
- Not sure I understood the approach correctly, but how I see it - you generate an initial image with the model that knows better the concept, and then you use another model for detailing and refinement. I actually used this approach once when I generated an image with Grok and then used img2img to get it in my style.
https://civitai.com/articles/22328
- You need to scroll to the idea of using Qwen Edit for final image polishing and fixing imperfections.

That’s it from me. Now, seriously - drop your process in the comments, even if it’s just “I type prompt, roll 50 times, pick one.”