If you find our articles informative, please follow me to receive updates. It would be even better if you could also follow our ko-fi, where there are many more articles and tutorials that I believe would be very beneficial for you!
如果能同时关注我们的 ko-fi 就更好了，
I've been using Reference for a few weeks now and I've noticed that many tutorials teach people how to "COPY" other people's work using it. Llyasviel's own examples seem to do the same.
But is Reference really just a copy tool?
Many people struggle to copy using Reference as imagined. When I played around with this new toy for the past few weeks, I discovered some completely different uses I'd like to share.
Before that, I want to briefly discuss why you can't copy successfully.
The essence of Reference is inpainting, so it doesn't need a model. Unlike normal inpainting, it's more like a "white box". There is a blank canvas on top and an image on the bottom. The AI directly extracts references from the bottom image to paint on the top canvas.
However, since it's inpainting, prompts are still needed, like "masterpiece" and "best quality". So whether your Reference copy succeeds depends firstly on the SD's understanding of the reference image (so reference_adain + attn usually works better, avoiding a blank canvas).
Secondly, how accurate your prompts describe the reference image. But if you can already accurately describe the reference image, you might as well do a tex2img.
Why use Reference? Because Reference can reproduce the colors or lights in the reference image more accurately. This leads to the three advanced uses of Reference I want to discuss today.
More vivid backgrounds
First, I randomly chose a photo and input it into img2img. Then, I used DeepBooru to extract keywords from that photo.
osaka night by haryarti
The initial keywords generated didn't even mention "night," so I adjusted the prompts by:
Removing some unnecessary keywords
Adding the word "night"
Increasing the weight for "night"
We ended up with an image with a background that was very similar to the original photo's elements. We chose one for Hires fix, and remember to use ADetailer to fix the face.
Let me turn off controlnet and compare. You can see that the background details generated when using Reference are richer and more vivid, while the background without using Reference can only produce the deep street view generated by the street keyword.
Of course, I don't intend to stop there.I can further tweak the prompts to generate an image of a ninja girl instead, like this:
masterpiece, best_quality, realistic,1girl, cyberpunk, architecture, black_hair, brown_hair, building, city, japanese_clothes, (ninjia, weapon, black ninja_mask:1.3), scenery, night, star_(sky), tokoy_tower, moon,Negative prompt: , (worst quality, low quality:1.4), [:badhandv4:1.5):27], nsfw:1.5, (armrest:1.5), big headSteps: 24, Sampler: DPM++ 2M Karras, CFG scale: 6, Seed: 2389734597, Size: 768x512, Model hash: 5c9a81db7a, Model: 1.5_Photo_majicmixSombre_v20, Denoising strength: 0.48, Clip skip: 2, ENSD: 31337, RNG: CPU, Version: v1.2.1, ControlNet 0: "preprocessor: reference_adain, model: None, weight: 0.9, starting/ending: (0, 0.75), resize mode: Crop and Resize, pixel perfect: True, control mode: Balanced, preprocessor params: (512, 0.9, 64)", Hires upscale: 2, Hires upscaler: R-ESRGAN 4x+
This is still a comparison after turning off Reference.
1. Reference is not magic, it's just inpainting, so prompts are still very important.
2. Reference only mode can easily lead to overexposure, so trying the other two adaptive algorithms may work better.
3. Give the AI some freedom by:
Setting ControlNet weight to 0.9
Ending control step at 0.75
Style fidelity at 0.9
Specifically, you can refer to the prompts I used:
I set "Ending control step" to let the constraints end earlier and allow the AI more freedom in the later stages. I also set "Style fidelity" to indicate how much of the effects from the reference image I wanted to retain.
Lighting and Atmospheric Effects
As an example, I randomly chose two reference images from Nelleke to demonstrate this technique.
I wanted to generate an image of riding through a forest in the morning, with light fog and shafts of light filtering down through the trees. Describing all that intricately through prompts alone would be difficult and unlikely to produce stable outputs. However, with Reference it is simple.
masterpiece, best_quality, realistic,1girl, armor, long_floating_hair, riding_on_horse, forrest with fog, sunrise, soft_light, shadow,Negative prompt: , (worst quality, low quality:1.4), [:badhandv4:1.5):27], nsfw:1.5, (armrest:1.5)
I chose one of the images for Hires, then used ADetailer to refine the faces, (most controlnet models distort the image to some degree, so refining faces is often necessary).
masterpiece, best_quality, realistic,1 death_knight, armor, long_floating_hair, riding, dark forest, ice, frost, (night, moonlight:1.5) monster, Negative prompt: , (worst quality, low quality:1.4), [:badhandv4:1.5):27], nsfw:1.5, (armrest:1.5), signature, copyright_name
By simply changing the reference photo and adjusting the prompts, I was able to generate the images above.
Let's start with this photo bathed in moonlight, which by now everyone is probably familiar with. We'll input it into Reference using Reference_adain + attn, with settings: 1, 0, 0.8, 0.9
Let's try a different reference - a terrifying forest image I searched for using "dark forest". By slightly modifying the prompts - mainly increasing the weight for "night" while lowering the brightness - the aesthetic shifts dramatically.
masterpiece, best_quality, absurdres, horror_(theme),1girl, (dark night:1.5), osen, ass, looking_back, in the water, wet_clothes, white_robe, dancing,Negative prompt: bad-picture-chill-75v bhands-neg, bad_pictures, big_head,
Reference makes things possible that would be difficult or inconsistent to achieve through prompts alone. It can reproduce colors, textures and atmospheric effects from reference images in a more stable and realistic way.
However, Reference also suffers from the same issues as most controlnets - reduced image quality. It tends to make the output look fragmented.
The Reference algorithm seems inclined towards overexposure, especially when raising Style Fidelity, though I'm not sure if that's by design or a bug.
Reference doesn't work well with hires. Applying hires directly to a low-res Reference output often produces very different results, as if the Reference influence happens after the hires transformation. So if you want to preserve the low-res content, the best approach is to manually hires the output. If you don't know how to manually hires images, that could be a topic for a future lesson.——————————————————————————————————————