Hi there,
I have a low hardware computer (NVIDIA GeForce GTX 1050) that can handle SDXL Lightning in a decent amount of time and I wanted to get the same experience with Pony models. Unfortunately, Pony models require around 20-40 steps and there aren't many models there with lower steps. Therefore, I'm creating this post to share recommendations to speed up Pony models. Notice that I'm relatively new to AI image generation, specially with Pony models, and due to hardware limitation I can't do many experiments with different checkpoints and configurations (steps, cfg,...). I highly suggest you to:
Do your own tests (I can't afford doing them for you!)
Share in comments your results. I will update the post with attribution.
Keep reading even if you have a powerful computer, some options might interest you. Some of them improve performance and prompt adherence.
Notice also that the images here might not be the best examples, but I wanted to do something for all publics.
Already low steps models
There are some models that already offer low steps versions, check (share in comments more models):
Photorealistic:
SDXL DPO-Turbo-LoRA
If you look at Pony Diffusion V6 XL you'll see a V6 Turbo DPO merge version. I'm not 100% sure, but I think they made a merge with this SDXL DPO-Turbo-LoRA.
Notes:
For Euler a, it works on the model I tested. This sampler tends to give softer results in base models.
Doesn't seem to work with DPM++ SDE Karras, which is recommended for most photorealistic models.
I think that the artifacts introduced by the LoRA are compensated by the sampler.
Make sure you use the file sd_xl_dpo_turbo_lora_v1-128dim.safetensors
I have some concerns about license, since there is some Turbo there. Let me know if there is any problem.
This is probably the easiest and fastest way of doing things.
CyberRealistic Pony V6.1
There aren't recommendations from the author, but from my experience:
Steps: 30
CFG: 7
Sampler Scheduler: DPM++ SDE.
With SDXL DPO-Turbo-LoRA:
Steps: 12-16
CFG: 3-4
Sampler Scheduler. Euler a
Use (12, 3) for prompt and seed search, use (16, 4) for best results.
In my device with usual settings it takes ~30 minutes to generate an image, with SDXL DPO-Turbo-LoRA 16 steps, 'only' ~6 minutes. Fastest option.
https://civitai.com/images/22694360
Cover image was also generated with this LoRA: https://civitai.com/images/22731636
SDXL Lightning 8 steps LoRA
This is a traditional solution. I've experimented less with this one, it behaves similar to SDXL DPO-Turbo-LoRA but with less quality.
CyberRealistic Pony V6.1
There aren't recommendations from the author, but from my experience:
Steps: 30
CFG: 7
Sampler Scheduler: DPM++ SDE.
With SDXL Lightning 8 steps LoRA:
Steps: 12-16 (definitely not 8)
CFG: 3 (higher might produce artifacts)
Sampler Scheduler. Euler a
In my device with usual settings it takes ~30 minutes to generate an image, with SDXL Lightning 8 steps LoRA 16 steps, 'only' ~10 minutes. Slower than SDXL DPO-Turbo-LoRA.
https://civitai.com/images/22714568
Align Your Steps (AYS)
I asked in Reddit how to apply Lightning LoRAs with Pony models here and TurbTastic suggested me to use Align Your Steps. I've implemented it in ComfyUI following a simple YouTube tutorial, but let me know if it's possible to apply in other GUIs:
Notes:
Up to date, AYS is not a sampler in Civitai.
The number of steps might still be a bit high, but this is better than nothing.
I don't know if configuration depends on the checkpoint, so I've split this section for different models.
CyberRealistic Pony V6.1
There aren't recommendations from the author, but from my experience:
Steps: 30
CFG: 7
Sampler Scheduler: DPM++ SDE.
With AYS:
Steps: 15
CFG: 8
Sampler Scheduler. DPM++ AYS
More than 12 and 16 steps gave me 'less good' results. CFG might be more customizable. As an example, in my device with usual settings it takes ~30 minutes to generate an image, with AYS 15 steps, 'only' ~15 minutes. Slowest option.
https://civitai.com/images/22551038
Reducing iteration time and improving prompt adherence
Increasing CFG scale typically increases iteration time, but reducing it may reduce image quality. I advise you to use DPO (Direct Preference Optimization) LoRA for XL and 1.5 - OpenRail++. It will improve prompt adherence to get good results (not burnt), you might reduce the cfg scale, reducing the iteration time. The specif value of CFG and iteration time depends on you model. I'm getting good results with (for best quality):
Steps: 18
CFG: 3.5
Sampler Scheduler. DPM++ 2M SDE AYS
Do not use SDXL DPO-Turbo-LoRA with this LoRA. DPM++ 2M SDE is faster than DPM++ SDE, and results are also good.
I don't attach an image here because I'm too busy at the time of writing.
Fast VAE decode
First, this is a ComfyUI extension, I don't know for equivalent options in other GUIs. This won't accelerate image generation, but latent decode. In my laptop, image decoding may take around one minute, which is a waste of time if the image has bad anatomy or major issues. It is possible to perform a quick decode with ComfyUI_FastVAEDecorder_SDXL node. This will allow you to save a smaller low quality image, but enough to see if generation failed. Notice that you might have to store latents to get the image rightly decoded.