Phase 1: Decoupled Generation & Character Prototyping
By utilizing an uploaded pose reference (OpenPose), the initial generation concentrates computational power exclusively on the character subject. This step significantly reduces randomness and prevents background complexity from interfering with character details, ensuring a foundational subject with a high-fidelity match to your intended appearance, attire, and posture.
Phase 2: Interactive Composition & Spatial Layout
This phase introduces a "Quick Canvas" mechanism, allowing the generated character to be freely moved and scaled within the frame. Once the position is finalized, the system automatically extracts the LineArt and Zoe Depth maps of the character in that specific location. This spatial data serves as a positional guide for subsequent background generation, effectively solving common issues with character-environment scale mismatch.
Phase 3: Background Synthesis & Lighting Integration
Backgrounds are generated independently while maintaining the character's designated position. Subsequently, the Qwen Instruct model performs a logical analysis of the composite lighting. Through the BlendMap node, the workflow executes image blending and color grading, ensuring that character edges, shadow depth, and ambient occlusion are perfectly unified with the environmental lighting of the background.
Phase 4: Qwen-VL Intelligent Self-Correction & Repair Loop
This is the core closed-loop of the process. The system invokes Qwen-VL (Vision-Language Model) to scan the image for potential anatomical errors or logical inconsistencies (such as hand artifacts or unnatural limb postures). Qwen-VL provides specific repair instructions, which are fed back into the inpainting module for targeted structural correction.
Phase 5: High-Res Resampling & Final Optimization
Following the logical self-check, the image enters the Ultimate SD Upscale stage. Utilizing Tiled Diffusion and high-definition upscale models, this phase preserves the established structure and lighting while enhancing textures for skin, hair, and environmental details, ultimately producing a high-resolution, production-ready masterpiece.
Required Models & Resources
To ensure this workflow runs correctly, please download and place the following models in their respective folders:
1. Base Model & VAE
2. ControlNet Models (SDXL/Illustrious)
3. Multi-Modal & VLM (Qwen Series)
⚠️ Hardware & Setup Note
VRAM Optimization: The Qwen3-VL Loader is configured to download necessary weights automatically.
For 8GB VRAM Users: If you encounter Out-of-Memory (OOM) errors, please replace the loader or use a lower quantization version of the GGUF models to ensure smooth operation.