Stable Diffusion 3.5 Large: A Short Experiment on Quality, Resolution, and Aspect Ratio

DISCLAIMER: YOU CAN GET GREAT IMAGES AT ALL THE COMMON RESOLUTIONS, BUT I FOUND IT MUCH EASIER AT SOME, HENCE THE ARTICLE - THIS IS ANECDOTAL AND NOT EXHAUSTIVE

I didn't want a long meandering article, but I'll type a little bit in here. I've seen wild fluctuations in quality from small differences in aspect ratio OR pixel count using SD 3.5 and it really does shine at just a few specific resolutions. I used mostly euler/sgm_uniform, 3.5 guidance 28 steps in comfyui for testing and did not experiment with other schedulers or denoising curves in this trial. The prompts were straightforward and not embellished. 'A woman standing on a street in new york city' is a good example of a test prompt I was using.

A note on 'what about other systems/schedulers/etc':
I have gotten similar results with an ODE solver vs a static scheduler, and have tested outside of just comfy, which doesn't use Flux's suggested sampler/scheduler - flowmatch euler discrete - which I have also tried using the basic diffusers library and a simple python inference script and gotten great results.

TLDR:

"For SD 3.5 Large, 768 x 1152, 1152 x 768, and 1024x1024 tend to perform well, and btw a few smaller ones are good at earlier stages for better hands and spacial cohesion...

EDIT: SD 3.5 Medium (very brief preliminary tests) 1440x1440, 1024x1024, 768x768, and slightly lesser but good: 768x1152, 1024x1440, 768x1024"

My tests were not exhaustive, and I'm not an authority. I guarantee that I missed some key ratios or resolutions, so take that disclaimer to heart when you read my findings.

There are a few ideal resolutions that stand way out above the rest.

There are a few specific lower resolutions that excel at spatial cohesion, realism, hands, etc., but are worse at finer details - these can be leveraged for multi-stage workflows to increase image quality while decreasing generation time and cost.

Overview

Stable Diffusion 3.5 is highly sensitive to aspect ratios and resolutions.
Minor changes can drastically affect image content, quality, and realism.
Many aspect ratios and resolutions tested to identify optimal settings for high quality or photo-realistic image generation.

Key Findings

Best Resolutions for Stable Diffusion 3.5:
- 768×1152 (0.667 aspect ratio, 884,736 pixels)
- 1152×768 (inverse of above)
- Both stood out as superior in realism and detail.
Other Notable Resolutions:
- 1024×1024: Above average with excellent fine details but lacked realistic skin tone variations.
- 832×1216 and 1216×832: Less detailed; images appeared plain or waxy.
Landscape vs. Portrait:
- Landscape images were good but slightly pixelated around edges.
- 1152×768 matched the quality of its portrait counterpart.

Additional Experiments

Extended Aspect Ratios:
- Tested 768×1280 (0.6) and 768×1344 (0.57).
  - 768×1280: Excellent, nearly matching 768×1152.
  - 768×1344: Slightly behind but still above baseline.
Resolution Impact:
- Lower resolutions improved hands and body realism but degraded fine details.
- 960×960: Sweet spot balancing realism and detail.
- Resolutions below 640×640: Unusable due to poor quality.
Effects of Pixel Count:
- Better resolutions tended to have lower pixel counts.
- Both aspect ratio and pixel count significantly impact image quality.

My Personal Takeaways

Single-Pass Generation:
- Use 768×1152, 1152×768, or 1024×1024.
- Ideal for quick generation with high quality.
- Crop images to desired ratios if needed.
Two-Stage Generation:
- Stage 1: Generate at 768×768 or 576×864 for better cohesion and body/hands.
  - Use CFG (Classifier-Free Guidance) 3.5, 28 steps.
- Stage 2: Upscale to 1024×1024 or 1152×1152.
  - Denoise at 0.55–0.68 (start between step 9 and 13 of 28).
  - Continue with CFG 3.5.
- More efficient and yields better results than single high-resolution generation.
Optimal Resolutions and Ratios:
- 1:1 Aspect Ratio: 960×960, 768×768.
- 2:3 Aspect Ratio: 768×1152, 864×1296, 624×944, 576×864.
- Sweet spots exist where specific resolutions and aspect ratios produce superior images.

Implementation Strategy Suggestions

For Portrait or Landscape Images:
- Generate at 768×1152 or 1152×768.
- Crop to desired ratio post-generation.
For Square Images:
- Use 960×960 for balanced realism and detail.
- Consider two-stage process for enhanced quality.
Complex Workflows:
- Start at lower resolutions for better spatial cohesion.
- Upscale with partial denoising for improved detail.
- Utilize ControlNets (e.g., OpenPose, Depth) to maintain quality during upscaling.

My CFG Settings:
- Single-Pass: CFG 3.5–4.5, 28–40 steps.
- Two-Stage: Keep CFG at 3.5 for both stages.
My Denoising Settings:
- Optimal denoise range during upscaling: 0.55–0.68.
Avoid:
- Below 640×640 was pretty worthless, but 768x768 is a very useful starting point for multi-stage workflows.
- Above 1152×1152 don't do well for single pass, but I got some really good results on 1152x1152 (with some random bad ones).

Conclusion

Identifying sweet spot resolutions and aspect ratios can be helpful for optimal image generation in Stable Diffusion 3.5.
Combining lower-resolution generation with strategic upscaling can yield a good balance of realism and detail.
Stronger resolutions tend to take less steps and work to be good, saving time and money...