Recently, I started using ComfyUI instead of Automatic1111 WebUI. And the first problem I encountered was, "How do I perform the Hires Fix like Auto1111 does?" In this article, I want to share the research I conducted to solve this problem.
First, what I did was open the source code of WebUI by Automatic1111 on GitHub. In the
processing.py file inside the
modules directory, I found the
StableDiffusionProcessingTxt2Img class that contains the
sample_hr_pass method. What I found is that, firstly, it decodes the samples into an image. Secondly, it upscales it with a desired model, then encodes it back to samples, and only after that, it performs the
img2img pass. This was confirmed when I found the "Two Pass Txt2Img Example" article from official ComfyUI examples.
Also, I found a very interesting YouTube video by poisenbery about an alternative method of upscaling that involves the usage of ControlNet.
From a little research I made, I found out about three different methods of upscaling: Latent, Non-latent, and ControlNet-assisted Latent.
In this article, I am going to use the ComfyUI workflow I made. You can find the
.json file in the attachments.
Here is the txt2img part:
As a result, I get this non-upscaled 512x1024 image:
As you can see, it is already a good image (thanks to the model). But the picture still looks a bit blurry, and the eyes and hands don't look right.
Latent upscale method
The latent upscaling consists of two simple steps: upscaling the samples in latent space and performing the second sampler pass. The main issue with this method is denoising strength. Low denoising strength can result in artifacts, and high strength results in unnecessary details or a drastic change in the image.
Workflow for this method is very simple. It consists of only two nodes (excluding VAE decode). Here it is:
From the image, samples are upscaled in the latent space and then fed into the sampler. The result of this method look like this:
As you can see, the resolution and quality of an image improve, but the image also changes due to the high value of denoising strength.
Non-latent upscale method
Here it is, the method I was searching for. This method consists of a few steps: decode the samples into an image, upscale the image using an upscaling model, encode the image back into the latent space, and perform the sampler pass. With this method, you can upscale the image while also preserving the style of the model. It allows you to use lower values of denoising strength, which results in this.
The workflow of this method is not very simple, but it's also not very complicated. You can see it on this screenshot:
As I mentioned before, it decodes the result of text-to-image part, upscales it with the chosen model, encodes it back into the latent space, and then feeds it into the sampler. The result of this process looks like this:
From the image, the original details were preserved. Because of the low value of denoising strength, the style of the model was retained, and unnecessary details were not added.
ControlNet-assisted latent upscale method
This is pretty unique method of upscaling for me. Usage of ControlNet with a latent upscale method allows us to retain original look of an image even with high values of denosing strength.
Attention! This method involves the usage of custom nodes for ComfyUI. You can download them using this link: GitHub.
The workflow for this method is demonstrated here:
This method decodes the result of a text-to-image part, extracts the line art from the image, and applies ControlNet to the positive prompt. At the same time, it upscales samples in the latent space and feeds them into the sampler.
As the result we get upscaled image:
This method preserves the original shapes and style. Also (for me) this results in a bit brighter result.
In conclusion, when transitioning from Automatic1111 WebUI to ComfyUI, the need for a "Hires Fix" arises to enhance image quality. Through research and experimentation, I explored three distinct methods of upscaling images: Latent, Non-latent, and ControlNet-assisted Latent.
Here is the final comparison between the methods:
Stable Diffusion web UI by Automatic1111 (github.com)
2 Pass Txt2Img (Hires fix) Examples (comfyanonymous.github.io)
If you wish, you can support me on my Ko-fi page.