Sign In

100% Speed boost in AUTOMATIC1111 for RTX GPUS! Optimizing checkpoints with TensorRT Extension

100% Speed boost in AUTOMATIC1111 for RTX GPUS! Optimizing checkpoints with TensorRT Extension

This is a guide on how to use TensorRT on compatible RTX graphics cards to increase inferencing speed.

Caveats:

  • You will have to optimize each checkpoint in order to see the speed benefits.

  • Optimized checkpoints are unique to your system architecture and cannot be shared/distributed

  • Optimizing checkpoints takes a lot of space - twice the size of the original base model. So a 2GB base model costs another 4GB of space to optimize, for a total of 6GB optimized model size.

  • This technology is bleeding edge and I'm not responsible for any issues you create as a result of following the advice in this guide

Prerequisites:

  • Nvidia RTX gpu with 8GB of VRAM

  • cudNN 8.9.4.25 for 11.x or higher 11.x version installed in \venv\Lib\site-packages\torch\lib

    • download cudNN here: https://developer.nvidia.com/cudnn

      • download the zip from that website, then put the contents of the /bin/ and /lib/x64/ folders (all of the individual files) into: \stable-diffusion-webui\venv\Lib\site-packages\torch\lib

    • you can check your current version with pip list

Instructions:

  1. activate your venv ./venv/scripts/activate

  2. Launch webui ./webui-user.bat and go to the extensions tab -> install from URL

  3. install the url https://github.com/NVIDIA/Stable-Diffusion-WebUI-TensorRT

  4. Go to Settings → User Interface → Quick Settings List, add sd_unet. Apply these settings, then reload the UI.

  5. Reload webui. You will have a new tab for TensorRT and a SD Unet dropdown next to the checkpoint dropdown.

  6. Select the checkpoint you want to optimize in the checkpoint dropdown.

  7. Select Preset - this determines what resolution is optimized. Choosing "static" will be faster, but only works for that resolution. Recommend choosing 512x512 -768x768 for generating a range of resolutions for SD1.5 checkpoints

  8. Click "Export Engine" - watch the terminal and wait while the model is "folded" and optimized for your GPU.

  9. After the process finishes the optimized UNet and onnx files are output to stable-diffusion-webui/models/Unet-trt and stable-diffusion-webui/models/Unet-onnx respectively.

  10. Now, you can select the optimized SD Unet:

  11. Taste the speed!!!

41

Comments