santa hat
deerdeer nosedeer glow
Sign In

Over 50% better performance with cuDNN v8.8.1 (March 8th, 2023)

Performance Gain

From 2 it/s (old cudnn)

to 5 for 1024x

from cuDNN v8.8.0 (February 7th, 2023)

TO cuDNN v8.8.1 (March 8th, 2023)

for 768

Both DLLS for CUDA 12.x & 11.x work with current automatic1111 Commit 0cc0ee1bcb4c24a8c9715f66cede06601bfc00c8

cuDNN v8.8.0 (February 7th, 2023) (11 & 12 DLLs)

Work for Stable Tuner & kohya_ss

cant test v8.8.1 for the next 40+hours .... 2 training sessions and don't have a spare local gpu.

but confirming v8.8.1 12 and 11 work on Automatic.

IT depends on samplers but most do perform better.
Euler A .. can get and 69 Steps image for 5 seconds
Batch off 8x768 - 34sec

1069 steps for 26secs ¯\_(ツ)_/¯ 🤟 🥃

Training dows work too on 8.8.0 both on 4090 and 3090

Download official DLLs from the latest officially

Extract zip
and copy and paste the dlls in your *****\venv\Lib\site-packages\torch\lib

For Automatic it should work with the current xformers there...
i have tested with

torch: 1.13.1+cu117  •  xformers: 0.0.17.dev465

ON 4090 and 3090

Try changing xformers if you are on 4090 to latest test builds
./venv/scripts/activate pip install xformers-0.0.17.dev465-cp310-cp310-win_amd64.whl

and did anyone tried other Torch versions yet on Automatic?

4 Answers

Tested this on a 3080 with no real difference when generating in Auto1111.

Unable to get this working in kohya_ss. I got an error about missing DLLs. It looks like the OP screenshot includes DLLs that are not present in my \lib folder.

Did anyone else get increased performance from this?

Mine was usually running at 5 ish on old cuda. After getting ver 12, what happens is it starts at 6.9 or so but it drops down to 5 ish again, find it a bit weird. Barely no difference otherwise, after the drop, maybe 0.3 ish.

I might give 11 a try tomorrow.

Thanks for the share anyway, worth a shot

Tested on Kohya_ss training on Nvidia RTX 3070

Before new Dll: 1,19 it/s

After new Dll: 1.37-1.42 it/s

I Upgrade now to Torch 2.0 and xformer 0.0.17rc482 and mit it/s is now 2.39!!! (LoRA Training)


python: 3.10.9  •  torch: 2.0.0+cu118  •  xformers: 0.0.17rc482  •  gradio: 3.16.2

Holy Shit! 1920x1080 Image 1,52 it/s - No OOM error! (8gb vram (rtx3070))!!!!

512x512 - 7,80 it/s

Your answer