Sign In

Install SageAttention 2 Notes

3

Oct 23, 2025

(Updated: 3 months ago)

tool guide
Install SageAttention 2 Notes

These are notes for installing SageAttentio 2 on a local PC. If you want to install it with an embedded Python, replace every place that runs Python with the embedded Python executable.

Because SageAttention has poor compatibility with other packages, make sure every installed component’s versions match to avoid errors.

References:

https://www.reddit.com/r/StableDiffusion/comments/1iyt7d7/automatic_installation_of_triton_and/?tl=zh-hant

https://github.com/wildminder/AI-windows-whl?tab=readme-ov-file

https://github.com/woct0rdho/triton-windows

https://github.com/comfyanonymous/ComfyUI/blob/master/comfy/cli_args.py

Prepare the data

1.Install Python 3.12

Go to the official Python website to download and install Python 3.12 for your system.

https://www.python.org/downloads/release/python-3120/

Please remember to add Python to your environment variables so you can use it in CMD later.

2.Install CUDA Toolkit 12.6

Go to NVIDIA’s official website to download and install CUDA Toolkit 12.6 for your system.

https://developer.nvidia.com/cuda-12-6-0-download-archive

3.Install MSVC Build Tools

Go to the official visualstudio website to download and install MSVC Build Tools for your system.

https://visualstudio.microsoft.com/zh-hant/visual-cpp-build-tools/

check the items as shown in the picture.

  • Desktop development with C++

  • MSVC v143 - VS 2022 C++ x64/x86 build tools

  • Windows 10/11 SDK

  • C++ CMake tools for Windows

  • Ninja("Some versions are under “Individual Components.”)

automatic-installation-of-triton-and-sageattention-into-v0-4mba5gfvkile1.webp

After installation, add cl.exe to the PATH environment variable.

C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.44.35207\bin\Hostx86\x64

4.Install Pytorch 2.6

Run the following commands in CMD.

pip install torch==2.6.0+cu126 torchvision==0.21.0+cu126 torchaudio==2.6.0+cu126 --index-url https://download.pytorch.org/whl/cu126

Verify the installed version.

python -c "import torch; print(torch.__version__, torch.version.cuda)"

5.Install Triton 3.2 Wheel

Install the required packages first.

pip install onnxruntime-gpu
pip install wheel
pip install setuptools
pip install packaging
pip install ninja
pip install "accelerate >= 1.1.1"
pip install "diffusers >= 0.31.0"
pip install "transformers >= 4.39.3"
python -m ensurepip --upgrade
python -m pip install --upgrade setuptools

Install Triton Wheel

pip install https://github.com/woct0rdho/triton-windows/releases/download/v3.2.0-windows.post10/triton-3.2.0-cp312-cp312-win_amd64.whl

6.Double-check the environment before installing SageAttention.

Run the following commands in CMD.

python --version

where cl

nvcc --version

nvidia-smi

python -m torch.utils.collect_env

where cudart64_*.dll

pip show Triton

python -c "import torch; print(torch.__version__, torch.version.cuda)"

When checking, make sure the CUDA ABI matches the CUDA version.

The ABI (Application Binary Interface) determines how C++/CUDA extension modules interact with the PyTorch core during compilation.

7.Install SageAttention 2.1.1

Download the matching SageAttention .whl file from

https://github.com/wildminder/AI-windows-whl?tab=readme-ov-file

After downloading, use Python to run the .whl file.

python -m pip install D:\downloads\package-1.2.3-cp312-cp312-win_amd64.whl

After installation, verify that it was installed successfully.

pip show SageAttention

If you plan to install SageAttention 2.2, note that it only supports CUDA ≥ 12.8; therefore PyTorch ≥ 2.7 is required.

Afterward, either pass --use-sage-attention in ComfyUI’s command-line arguments or use the node in KJ_Nodes—use either method.

8.Supplementary materials

A list of NVIDIA GPU architecture names for each generation and their corresponding architecture numbers:

螢幕擷取畫面 2025-10-23 184700.png

  • Fermi and Kepler were deprecated starting with CUDA 9 and 11

  • Maxwell was deprecated starting with CUDA 11.6

9.postscript

The official SageAttention documentation is really sparse, causing many people to run into installation issues.

I was also frustrated by all the version compatibility problems until I found AI-windows-whl on GitHub, which finally solved it. Just run the prebuilt .whl directly.

Hopefully the project will provide version-matched .whl files officially in the future.


Update 2026/01/13

I updated my environment to Python3.12 and PyTorch 2.10.0 and CUDA 13.0, and reinstalled SageAttention 2.2.

I re-downloaded and installed the corresponding .whl file, but SageAttention 2.2 shows an error: “from . import fused ImportError: DLL load failed while importing fused: The specified procedure could not be found.”

After repeatedly troubleshooting with ChatGPT, I found the issue was with AIB: “This .pyd was built with a different PyTorch version/build options.”

Switching to woct0rdho’s Windows release wheels (ABI3 + libtorch stable ABI) and reinstalling once fixes it.

https://github.com/woct0rdho/SageAttention/releases

螢幕擷取畫面 2026-01-13 140351.png

Below is a verification script that can help with troubleshooting.

You can find _fused*.pyd in a single line.

python -c "import glob,os; sp=r'C:\Users\HomePC\AppData\Local\Programs\Python\Python312\Lib\site-packages'; print(glob.glob(os.path.join(sp,'sageattention','_fused*.pyd')))"

List all files inside the sageattention package

dir /b "C:\Users\HomePC\AppData\Local\Programs\Python\Python312\Lib\site-packages\sageattention"
螢幕擷取畫面 2026-01-13 132249.png

Check whether certain DLL dependencies exist

where msvcp140.dll
where vcruntime140_1.dll
where cudart64_13.dll
where cublas64_13.dll
螢幕擷取畫面 2026-01-13 132150.png

Quickly rule out VC++ Runtime issues

powershell -NoProfile -Command "(Get-Item $env:windir\System32\vcruntime140_1.dll).VersionInfo.FileVersion"
powershell -NoProfile -Command "(Get-Item $env:windir\System32\msvcp140.dll).VersionInfo.FileVersion"

Verify that “_fused” can be imported

python -c "from sageattention import _fused; print('_fused OK')"

Update 2026/01/13: Post-update notes

I updated my entire environment to install and enable Comfy Kitchen.

I also updated ComfyUI to v0.8.2.

In this environment, everything runs very smoothly—SDXL, Z-Image, and Wan2.2 all work properly, with a slight performance improvement.

When updating the environment, I tried upgrading Python to 3.13, but compatibility with older plugin nodes was too poor and caused many bugs, so I decided to stick with Python 3.12.

For now, I’ll use PyTorch 2.10.0, Python 3.12, and CUDA 13.0 as the base environment for running ComfyUI.

If you’ve also installed Comfy Kitchen, you can use this command to verify whether Comfy Kitchen was installed successfully.

python -s -c "import comfy_kitchen as ck; print('comfy-kitchen', ck.__version__); print('backends', ck.list_backends())"
  • You’ll see a list of available backends (e.g., cuda / triton / eager).

  • If it only shows eager (or there’s no cuda), it usually means the CUDA wheel didn’t load successfully, but it can still run (just not at full performance).

螢幕擷取畫面 2026-01-13 140142.png螢幕擷取畫面 2026-01-13 14075412313.png

3