This workflow is tested on my potato notebook (i5-9300H, GTX1050, 3Gb Vram, 16Gb Ram) and should work with any PC with a GPU (preferably NVidia one) and 16Gb Ram. Since I use modularized workflows, this one is also constructed as a modular workflow. The workflow itself is fairly straight forward to understand.
Model and Clip Set-up
Model is a Q5_K_S GGUF finetune model merged with 6 loras. Dual Clip Loader is set for CPU. And the rest is just the standard setup.
Image Input and ControlNet Set-up
In the 'Image Input' Group, I resized the image to the standard SDXL resolution (1024 X 1024) and set it as the control image. In the 'Controlnet' Group, I am using Union Promax ControlNet model with the depth preprocessor 'Depth Anything V2' model. There is a node called 'SetUnionControlNetType' node to set the type of the Controlnet such as Depth or Canny. But it seems to work just fine without it (at least for Canny and Depth).
You can find the models at:
Depth Anything V2: https://huggingface.co/Kijai/DepthAnythingV2-safetensors/tree/main and place the model in 'custom_nodes\comfyui_controlnet_aux\ckpts\depth-anything' folder
Union Promax ControlNet: https://huggingface.co/xinsir/controlnet-union-sdxl-1.0/tree/main and place the model in 'models\controlnet' folder
KSampler and VAE Decode Set-up
The sampler is currently set this way because I was testing SD3.5 Medium. But you can use your favorite sampler and scheduler for SDXL. You can also use 'tiled VAE decode' but I wouldn't recommend it (takes too much time.)
And that should do it! If you want to use the workflow without ControlNet, simply turn the ControlNet Group off in the group bypass node and change the get PositiveC and NegativeC to Positive and Negative in the Sampler Group. Cheers!