Updated: Mar 10, 2025
toolIntro
This is a basic workflow for use in conjunction with WAN video, Sage Attention, and the new teacache support. Basic tests with 30 samples brought my generation speed down from 17 minutes to around 6. This is a massive upgrade in speed. A few people have had trouble installing it, so I thought I would post my workflow here to help out.
Note - this is the Kijai version. Models used are different than the Comfy supported ones. Click here for a link to Kijai's walkthrough with huggingface links and example workflows
To download the Teacache node:
go to your custom_nodes folder and type:
git clone https://github.com/welltop-cn/ComfyUI-TeaCache
Sage Attention
I am aware that Sage Attention is a pain to set up, and I do not yet have a guide here to do so. I can offer advice though - make sure that your Nvidia SDK, Torch version, Python Version, and sage attention version are lined up and compatible. This is where I had most of my issues. Start with Sage Attention, check compatibilities, and work backwards from there to get it.
I did dig around on Youtube and found a good tutorial on how to get it running (the speaker is not me). Keep in mind again that your versioning really matters - when I set it up, I had to downgrade my NVidia SDK version so everything was compatible.
Personal Tests
My test with this setup on a GTX4090+64 GB RAM (30 steps)
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [05:56<00:00, 11.89s/it]
My test using SDPA instead:
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [17:27<00:00, 34.91s/it]
6 minutes, versus 17.5 minutes. Almost 3x faster isn't bad.