Over 50% better performance with cuDNN v8.8.1 (March 8th, 2023)
Performance Gain
From 2 it/s (old cudnn)
to 5 for 1024x
from cuDNN v8.8.0 (February 7th, 2023)
TO cuDNN v8.8.1 (March 8th, 2023)
for 768
Both DLLS for CUDA 12.x & 11.x work with current automatic1111 Commit 0cc0ee1bcb4c24a8c9715f66cede06601bfc00c8
cuDNN v8.8.0 (February 7th, 2023) (11 & 12 DLLs)
Work for Stable Tuner & kohya_ss
cant test v8.8.1 for the next 40+hours .... 2 training sessions and don't have a spare local gpu.
but confirming v8.8.1 12 and 11 work on Automatic.
IT depends on samplers but most do perform better.
Euler A .. can get and 69 Steps image for 5 seconds
Batch off 8x768 - 34sec
1069 steps for 26secs ¯\_(ツ)_/¯ 🤟 🥃
etc
Training dows work too on 8.8.0 both on 4090 and 3090
Download official DLLs from the latest officially
https://developer.nvidia.com/rdp/cudnn-download
Extract zip
and copy and paste the dlls in your *****\venv\Lib\site-packages\torch\lib
For Automatic it should work with the current xformers there...
i have tested with
torch: 1.13.1+cu117 • xformers: 0.0.17.dev465
ON 4090 and 3090
Try changing xformers if you are on 4090 to latest test builds
https://pypi.org/project/xformers/#history
./venv/scripts/activate
pip install xformers-0.0.17.dev465-cp310-cp310-win_amd64.whl
and did anyone tried other Torch versions yet on Automatic?
4 Answers
Unable to get this working in kohya_ss. I got an error about missing DLLs. It looks like the OP screenshot includes DLLs that are not present in my \lib
folder.
Did anyone else get increased performance from this?
Mine was usually running at 5 ish on old cuda. After getting ver 12, what happens is it starts at 6.9 or so but it drops down to 5 ish again, find it a bit weird. Barely no difference otherwise, after the drop, maybe 0.3 ish.
I might give 11 a try tomorrow.
Thanks for the share anyway, worth a shot
Tested on Kohya_ss training on Nvidia RTX 3070
Before new Dll: 1,19 it/s
After new Dll: 1.37-1.42 it/s
I Upgrade now to Torch 2.0 and xformer 0.0.17rc482 and mit it/s is now 2.39!!! (LoRA Training)
WebUi:
python: 3.10.9 • torch: 2.0.0+cu118 • xformers: 0.0.17rc482 • gradio: 3.16.2
Holy Shit! 1920x1080 Image 1,52 it/s - No OOM error! (8gb vram (rtx3070))!!!!
512x512 - 7,80 it/s