< The TensorRT + Automatic1111 Article >
An Explainer + Guide + Exploration
(What it is, how to use it, and testing)
Written and Maintained by : michaelpstanich
Update Oct. 21.2023 (6:00PST) :”Completed Technical Details and Testing” section along with adding a ‘conclusion’, expanded troubleshooting section, expanded “Generate” and “Presets” section (including info about using TensorRT with LoRAs), added info about removing presets and issues with hires fix, clarified deterministic behavior, fixed typos and grammatical errors.
Update Oct.19.2023 (18:00PST) : Updated some bits to the article, added Troubleshooting section with new fixes. Added start to Technical Details with a data comparison.
There is now an official TensorRT extension for Automatic1111, and this is HYPER exciting news! TensorRT is an optimization scheme for Nvidia RTX graphics cards which future their AI accelerating Tensor Cores. TensorRT offers nearly double (yes, DOUBLE) the performance with 2x render times and drastically reduced VRAM usage! In this article we'll explore TensorRT in combination with Automatic1111, we'll cover what it is and what are the caveats, how to install and use the extension, then we'll get into troubleshooting and testing! Time for SPEED!
<<> NOTICE 1 <>>
Aside from potential troubleshooting additions, this article is effectively complete for the current release version of the TensorRT extension. I’ll still be updating this article as new information arrives and will release major article updates as new TensorRT versions are released, so remember to check back a bit after new TensorRT versions are available!
<<> NOTICE 2 <>>
The TensorRT plugin is currently buggy and is quite technical, if you're not really the technically inclined type or you're not willing to troubleshoot if something goes wrong, I strongly recommend holding off on TensorRT for now. I'll cover more of this stuff in the "Pros and Cons" section as well as through-out the rest of the article. If you run into issues make sure to check out the troubleshooting sections!
<-<>-> > > <-< -1 >-> < < <-<>->
Resources (Some of these we'll come back to in the article)
Nvidia's set-up guide - https://nvidia.custhelp.com/app/answers/detail/a_id/5487
TensorRT Github Page - https://github.com/NVIDIA/Stable-Diffusion-WebUI-TensorRT
Automatic1111 Github page - https://github.com/AUTOMATIC1111/stable-diffusion-webui
My Arbitrary Links!
Discord - The Broken Chatbox - https://discord.gg/h3vB7S4FEw
Donations! =^-^= - https://ko-fi.com/michaelpstanich
<-<>-> > > <-< 0 >-> < < <-<>->
What this article contains (Glossary)
< 1 > What is TensorRT? How does it work?
< 2 > Pros and Cons (Benefits and drawbacks)
< 3 > How to set-up the TensorRT Extension
< 4 > Using the TensorRT Extension
< 5 > Troubleshooting and Issues
< 6 > Technical Details and Testing
< 7 > You want more???
<-<>-> > > <-< 1 >-> < < <-<>->
What is TensorRT? How does it work?
What is TensorRT?
TensorRT is an optimization specifically for Nvidia RTX graphics cards which take advantage of their included Tensor Cores (processing chips designed to accelerate and process AI calculations). This optimization can drastically reduce VRAM use while also speeding up gens by up-to 200%! That's not a joke or exaggeration, it's REALLY that fast! The current implementation has some limitations, firstly it's only available on Nvidia GPUs with included Tensor Cores, it requires processing for each Model you wish to use it with, and the current version has some limitations on what tools and systems it will work with.
Source : Nvidia (https://nvidia.custhelp.com/app/answers/detail/a_id/5487)
How does it work?
We won't get super technical here (check out the Nvidia provided documentation and articles for that - https://docs.nvidia.com/deeplearning/tensorrt/) but the short and easy version specific to this extension is, TensorRT will process your selected Checkpoint and create a special Unet model for it, then when you gen within the set sizes Stable Diffusion/Automatic1111 will be able to use that Unet to speed up your generation speeds and reduce memory usage. All this uses the CUDA and Tensor APIs and will only work on Nvidia GPUs with tensor cores, as it's hardware specific.
<-<>-> > > <-< 2 >-> < < <-<>->
Pros and Cons (Benefits and Drawbacks)
Pros (Positives to using TensorRT)
- Massive Speed Boost (up-to 200%)
- Reduced VRAM usage
- Don't need to compromise quality for speed
- Doesn't cause artifacting/blurring like some other optimizations (from my testing so far)
Cons (Negatives/Drawbacks to using TensorRT)
- Each Model (Checkpoint) must be processed before it can run optimized by TensorRT
- Eats up drive space as each processed models saves 1 onnx unet per-model and 1 additional unet per-profile, per-model
- Using a lora used with TensorRT must be processed with EACH checkpoint it's used with
- Processing a model's onnx unet model (the 1 unet saved per-model) takes a while to generate
- Changes seed (Will change output similar to other optimizations, is same-version deterministic)
- Limited compatibility with additional tools and models (Still testing what does and doesn't work properly)
- Current extension is buggy as all hecks and breaks easily/quickly
So let's talk about these benefits and drawbacks more in-depth, because it's a bit important to know before we get into it. Aside from being Nvidia exclusive, TensorRT has some technical speed bumps along the way to greatness. It does offer a massive 2x speed boost, cutting render times nearly in half while reducing VRAM usage, however it does require more disk/storage space since each model you process creates unet models which must also be saved. Along with the initial processing time of creating the unet models to even run the optimizations, usually taking anywhere between 2 minutes to 15 minutes depending on hardware and set-up, you'll really only use this optimization on models you use frequently and has limited application when using a large number of models or testing new models you've added to your library. Those with massive amounts of storage (like yours truly) won't be too bothered by this, however all this AI stuff takes a lot of space and is a major limiting factor for a lot of people.
Additionally comes with limitations. we'll talk more about size limitations later (you can gen any size of image your VRAM can handle, but you have to set up multiple profiles, eating up more storage space) but another major factor is how Loras work and how many tools we use regularly are currently either broken with TensorRT, or have major bugs and issues. For example, if you wish to speed up a gen that features a Lora you must process that lora with the current checkpoint you wish to use it with, and that profile/Unet model will only work with that EXACT combination. Currently there's also no way to combine multiple lora models with these optimizations (though you can with the manual API) which means you're currently stuck to just 1 Lora at a time, which sucks...
Certain extensions work just fine, but some major players like ControlNet currently have issues and may need additional patches/updates to get working properly. ControlNet (something I've tested extensively) has a strange VRAM usage error when processing VAE above a certain resolution, sky-rocketing memory usage and basically causing the gen to halt, so it'll work for lower resolutions but may not always work. Other tools like Deforum and animation tools also have major issues, so make sure to check info and compatibility with whatever tools you intend to use before enabling TensorRT for that workflow.
The good news is, you can just choose not to use TensorRT and revert back to other optimizations in the webUI, just set your “SD Unet” setting to "none" and reload your checkpoint, and boom! Old optimization, but no further issues (hopefully).
<-<>-> > > <-< 3 >-> < < <-<>->
<>> How to set-up the TensorRT Extension <<>
Nvidia actually provides a simple guide here - https://nvidia.custhelp.com/app/answers/detail/a_id/5487 - though it assumes you don't have Automatic1111 installed and has some different recommendations while skipping over some details that will be useful, so we'll cover it more in-depth here.
If you don't already have Automatic1111 set-up or maybe you're somewhat new to Stable Diffusion, I highly recommend checking out my Automatic1111 Getting Started and Explainer guide playlist. It's a 4 part series which not only shows how to install and use Automatic1111 but also explains various elements so you'll be able to fully understand what it is you're doing and how to make great gens! - https://youtube.com/playlist?list=PL503HskLvxy3rBEE8zTil37WQF_gJ0kNq&si=EMe9PGNDWK0Thdwh (not really meant to be a plug, just an actually useful resource which will get you up and running)
Couple notes before we continue.
- TensorRT will consume a lot of storage space if you use it for a large number of models, it creates 1 unet per model with 1 additional unet per-profile for said model, with another additional unet per-lora+checkpoint combo. Each of these files are around 1.5GB-2GB in size for SD1.5 models.
- If you intend to use TensorRT with hires fix but not utilizing ‘dynamic’ ranges (will cover this later in the article) you’ll want to have your unet models stored on an SSD or else the time it takes to switch unet models will slow down the gen enough to negate any speed benefit. (Currently won’t be writing a section for this as dynamic will cover this use case, but the easiest way is just installing Automatic1111 on an SSD to begin with, or using symbolic links to ‘move’ the unet models to load from an SSD)
From this point forward we'll assume you have Automatic1111 installed and running gens without issues. (Clear up any additional errors before installing TensorRT)
<>> Installation <<>
Note : Example images were taken using Automatic1111 WebUI 1.6.0 (Dark gradio/monochrome theme). If your layout looks different, just make sure you’re up-to-date, if you are on at least 1.6.0 then don’t worry about minor differences.
Installing the extension is super easy, we’re just going to use the “Extensions” tab in the webUI, install using URL (https://github.com/NVIDIA/Stable-Diffusion-WebUI-TensorRT), then click install, after that we enable sd_unet for our quicklist settings.
a. ) So first, open up your Automatic1111 WebUI if it isn’t already up and running.
b. ) Next click the “Extensions” Tab. By default this tab should be at the end of the list in the same section “txt2img” and “PNG info” reside.
c. ) After the Extensions tab opens up, follow the sub-tab called “Install from URL” in the same section as “Installed” and “Available”
d.0 ) Next navigate to the “URL for extension’s git repository” located below the tabs and enter in “https://github.com/NVIDIA/Stable-Diffusion-WebUI-TensorRT” (you can copy-paste it in there)
d.i1.) This is the GitHub repository for the official TensorRT Automatic1111 extension, you can follow it for the source code and use git pull like usual to manually install the extension should you ever find the need.
e ) Once the url is entered, click the “Install” button at the bottom of this section. Wait for all processes shown in the console/terminal to stop.
e.i1 ) After clicking install Automatic1111 should try to run the install.py script included with the extension, however if you get errors then you’ll want to check the “Troubleshooting and Issues” section to try and figure out what went wrong.
f ) Real quick we also need to enable the “SD Unet” option in our quicklist settings so we can quickly change which Unet we are using. For this click on the “Settings” tab and find “User interface” on the left side column that shows up, and add “sd_unet” to the “Quicksettings list”
f.i1 ) This adds “SD Unet” to the top of the UI for easy access and allows us to select specifically which TensorRT Unet we’ll want to use. I tend to leave mine set to “Automatic” as it will generally find which Unet model to use automatically but if you have multiple dynamic profiles or need to force a lora to be used this option is very handy to have here. You can also choose to add the other options I have here as well if you’d like, I find them all very helpful and I recommend having them! (SD Unet can also be changed within the “Settings” tab)
g. ) Now if everything installed correctly,
STOP!
INTERCEPTION!!!
This is where things get tricky. Normally we’d just reload the WebUI and we’d be good, however there’s a couple of bugs with Automatic1111 and the TensorRT extension which could break your install of TensorRT and prevent it from working if you happen to be effected by it, so do these steps to make your life much easier!
g.1 ) To help add some certainty and help in potential troubleshooting in the future, backup the “venv” folder located in your Automatic1111 install, I did this by making a copy and calling it “venv.bak”. Also backup your config.json the same way. (can also backup the uiconfig file if you have one, but that may not be necessary). It may be worth moving these files outside of the Automatic1111 root folder just so it can’t be touched by the webUI at all, but I’m lazy.
g.i2 ) The “venv” folder acts as a stand-alone python environment and cache folder of sorts. We’re creating a backup since the install.py script from the TensorRT extension will install things into the venv folder when it runs the first time, but may fail to check for and install any missing requirements when launching or restarting the webUI. This can cause various issues such as detecting the requirements are there but thinking they’re out of date, so uninstalling them, then not re-installing them but still marking them as installed. If things break the first step would be to use a copy of your backup venv folder (Delete the current venv, make a copy of venv.bak, rename the copy to venv).
f.i3 ) In addition to this you may want to make a backup of the folders “tensorrt, tensorrt_bindings, tensorrt_libs” located in Automatic1111>venv>Lib>site-packages as these are difficult to re-acquire should they go missing. If you don’t see them you can search for them to back them up (links may change in different versions).
h. ) Ok, NOW you can Reload your WebUI, quickest and safest way would be to click the “Settings” tab in the same section as the “Extensions” tab and clicking the “ReloadUI” button in the top right of that tab’s page. (You can technically also close and reopen the WebUI, but this can incur a bug in the current version so I’d recommend using the ReloadUI button in the UI this time around)
h.i1 ) When starting up, the TensorRT extension may show a few messages in the console referring to requirement checks but as long as there’s no errors you can safely ignore these messages. If, however, you either get pop-up messages reading “The procedure entry point-” or the console displays errors, often reading “Error loading script: trt.py” and ending with “ModuleNotFoundError: No module named ‘tensorrt_bindings’” your installation may have failed and I recommend checking the “Troubleshooting and Issues” section for a proper fix so the extension works as expected.
<-<>-> > > <-< 4 >-> < < <-<>->
<>> Using the TensorRT Extension <<>
Now would be a great time to check for errors in the Console/Terminal, if you see any errors and the “TensorRT” tab isn’t showing up (should be in the same section as ”txt2img” and such) then something may have broken (check the “Troubleshooting and Issues” section for potential help). Otherwise, if you got no errors and see the tab, let’s carry on and get into the exciting part!
Note : The visuals and specific wording in the extension may change with updates so some text may not 100% align. I may try to update images as it becomes relevant if I can.
Here we are going to select a Checkpoint we want to gen with, create a “Default” Unet using TensorRT for that checkpoint, then create a custom “dynamic” profile for use with hires fix, then we’ll move to generating.. WITH SPEED!
a. ) First we should select the checkpoint we want in the top left of the UI, I’ll be using my own custom mix “SpiritMix - Rival (2D Anime).safetensor. You’ll notice I also have it in a folder named “SD15” where I store my SD1.5 models for webUI access. We’ll cover how this is relevant a bit later.
1.i1 ) The current stable version of Automatic1111 (1.6.0) and TensorRT only support SD1.5 by default. You can technically use the DEV branch of Automatic1111 with some changes to get SDXL models working, however we won’t cover that here for now, so stick to SD1.5 while following this guide. SDXL support should be officially added in the next stable release of Automatic1111.
b. ) With our desired Checkpoint selected, let’s head over to the TensorRT tab, ensure “TensorRT Exporter” sub-tab is enabled, Default is set for Preset, then click the “Export Default Engine” button (future versions of the TensorRT extension will say “Generate Default Engine”). You can check the console for the progress and details on what’s happening. This process will take anywhere between 2 minutes to 15 minutes (or maybe even longer on low-end hardware?) depending on your set-up. If you cancel it or it fails for whatever reason but refuses to start again it may have created a corrupted Unet model in the “Unet-onnx” model folder, so you’ll have to delete it and click the refresh button next to “SD Unet” before trying again.
b.i1 ) Currently you must do this for every checkpoint you use with TensorRT, otherwise if you try to export a different preset you will get an error when trying to export relating to “No ONNX Unet detected”. The “Default” export creates the required onnx unet export which is required to use presets
b.i2 ) The ONNX export from this process is saved to Automatic1111 > Models > Unet-onnx by default. If you would like to remove an ONNX unet model, this is where to search, should be named what you see in the checkpoint field with some special characters replaced (in this case it was called “SD15_SpiritMix - Rival (2D Anime)_316a05bb” with the numbers being a hash)
b.i3 ) Presets are saved in Automatic1111 > models > Unet-trt with the checkpoint name like before, followed by the hash and parameters used. If you create presets and wish to remove them you can delete them here, however in some cases you may need to also remove the presets from the model.json file found in the same folder as it may not properly update in the UI (be VERY careful when editing the model.json file as it may break if done improperly and could lead to models not loading properly, so I recommend making a backup before editing, just in-case). The “Default” preset we need to run for the ONNX Unet model shows up in this folder as well, if you’d like to remove that default preset this is where you’d want to delete it. If, while using hi-res fix, TensorRT is still loading multiple models even though it fits in a single dynamic preset, you may need to remove the offending preset.
b.i4 ) The file name and hash used to detect the Unet model also includes the folder path (in this case SD15\) so if you move or rename the file and Automatic1111displays it differently in the “Stable Diffusion checkpoint” box, then the UNet generated will no longer work. (I have not tested if renaming the Unet model works yet, will try to remember to update this entry after testing)
b.i5 ) The GitHub page, at the time of writing, currently states “[--medvram] and [--lowvram] Have caused issues when compiling the engine and running it.” so if you get ONNX errors, removing these from your command line arguments when exporting may help. I’ve also seen some people report --xformers also cause issues but haven’t confirmed either of these claims myself.
b.i6 ) If you’d like to generate a Unet with a Lora, you’ll need to select the desired Checkpoint in the webUI’s “Stable Diffusion checkpoint” selection box, then switch to the “TensorRT Lora” tab. In the “LoRA Model” box select which Lora you’d like to use and click “Convert to TensorRT” which will create a new ONNX Unet to use with that exact combination of checkpoint and Lora. Remember, you have to do this for EACH lora you intend to use, and multiple loras may not currently work even if you generate multiple ONNX Unet models. There’s also a chance it will report “Please export the base model first.” and a reboot may be required to get it to work correctly. (No idea why it does this. More testing will be required but for now I’m just not using TensorRT with Loras)
c. ) This default export should support gens between 512x and 768x for SD1.5 (768x and 1024x for SDXL) with a max batch size of 4, meaning it can be used by any of those resolutions and TensorRT will kick in. At this point you can hit the refresh icon next to the “SD Unet” select box in the top quicklist bar and set either “SD Unet” to “Automatic” and test a generation with an image between the sizes of 512 and 768 (like 512x768, though I’d test with 768x768 to feel the SPEED) with batch size between 1 and 4. (The Unet should be named “[TRT] model name”, in this case it was “[TRT] SD15_SpiritMix - Rival (2D Anime)”, notice how it added the folder name still) However, this won’t work for hires fix... yet. That’s next.
d. ) So what about “big picture make GPU go brrrrr”? Don’t worry, this is where Custom Presets come in! Now that we know how to generate the ONNX model required as a base, we can now define an optimized Unet model to use for any size we desire. (Just keep in mind you’re still limited by VRAM, TensorRT is impressive, but it won’t make a 4gb VRAM card magically output 8k images!) So from here let’s select a new preset in the “preset” select box, then click “Advanced Settings” to get the full options list. I’ll display my dynamic preset I created for use with my custom mix, SpiritMix - Rival, designed for generations starting at 768x min and 1024 max with 2x hires fix (outputting at 1536x min and 2048x max). Keep in mind you’ll want to change these parameters to fit your specific set-up and accommodate how much VRAM you have available.
d.i1 ) So let’s go over all these settings starting from top-to-bottom (Keep in mind some settings may change with updates, so check back for updates or if ask around about settings you don’t see here!)
> Use Static Shapes : This option makes the resolution/size and batch parameters static, meaning if you set “768x768 batch size 1” then it will only work/trigger when generating at 768x768 with a batch size 1 and will error out if you try to use this Unet preset with anything else. Static Unets are faster than dynamic but come at a massive convenience and storage space cost since you’d need several to cover all possible resolutions you may use. (NOTE : If we use hires fix while having “Automatic” in SD Unet, then it may try and use a static Unet for the base gen, then it will unload and load another Unet when doing hires fix which will drastically slow down the generation, although using an SSD for storing models does help here it’s not optimal. The current version seems to prioritize static Unet models, so I suggest creating a dynamic one like the one I show above to accommodate both base and hires fix gens, avoiding the use of static Unet models until this is fixed.)
> Batch Sizes : Set the smallest, optimal, and largest Batch Size that will work for this Unet preset. Using increased batch size can increase speed however it may change depending on hardware. Unless you are making this a static Unet for mass image gen testing I recommend keeping optimal at 1. (I typically left it at 1-1-4, though setting it strictly to 1-1-1 does improve performance a small amount)
> Height/Width : Sets the smallest, optimal, and largest Height/Width that will work for this Unet preset. If you “Use static shapes” these will be merged into just Height/Width with no min/optimal/max. If you’re using dynamic (“Use static shapes” is disabled) then I recommend setting min to whatever your intended lowest resolution is, and max to double your intended max base gen size. In this example I may generate 768x1024 with 768 being the lowest I will go, and 1024 the highest I will go, as a result my lowest is 768 and max is 1024 2 = 2048. Optimal is a bit of guesswork, however I found my speeds were not really effected at larger resolutions (where it mattered) and setting 2048 actually decreased performance during hires fix, so I set it to the median, or double my intended min resolution size, so 768 2 = 1536 in this case. (Optimal is supposed to be where the Unet performs its best, however this hasn’t really shown up in testing so far, we’ll need more testing to see how it behaves and make better recommendations)
> Prompt Token Count : Sets the smallest, optimal, and largest token ‘block’ that will work with this Unet model. 75 is the default block size for Stable Diffusion and min is best left there, I don’t currently have a recommendation for Optimal (currently lacks testing) but I did increases the token count to 235 (3 75 token sized blocks) so this Unet will work with larger prompts. If you’re unfamiliar with Tokens, those are the word bits that translate into Stable Diffusion to try and interpret what our prompt ‘says’ to create, the longer your prompt the more tokens. Your current token count for a prompt will show in the top right (starts at 0/75, every 75 tokens a new block is added and the second number increases by 75). If that number exceeds your set “Max prompt token count” then the Unet won’t work properly.
> Force Rebuild : Forces the preset to process even if one was already detected. Useful to enable if a preset refuses to export even though it doesn’t exist or if a current preset stops working/corrupted during export.
e. ) With the parameters set and we double checked they are what we want, we can finally click that “Export Engine” button at the bottom. This will start generating a preset we can then use. This should be much faster than when we created the “default” onnx export, but check the console for progress and let it finish. Once it’s finished we can hit the refresh button next to “SD Unet” and it’ll then work for gens when we select either Automatic or that specific model’s Unet selection.
f. ) Now let’s generate with TensorRT enabled! Let’s get back to the txt2img tab (TensorRT does work with img2img as well, though we won’t cover that in this article) and enter in our params. Here’s my set-up, but make sure you have SD Unet set to Automatic or your UNet model’s corresponding name which should be “[TRT] model name” with the folder added to the front (grr), mine looks like “[TRT] SD15_SpiritMix - Rival (2D Anime)”. Go ahead and set up what you want and hit Generate like normal, and get READY FOR THE SPEED! (or an error if we screwed up >.<)
f.i1 ) The resolution Height/Width set (for both base gen and hires fix) must be a multiple of 64, or else it will error out or not work properly. (I didn’t actually test this as I keep x64 as a baseline, it’s a good idea in Stable Diffusion to always request images at multiples of 64 for technical reasons)
f.i2 ) The Unet should show loading in the console, if it doesn’t show the loading block of text then it may have failed to apply (may even say no Unet found). In this case you’ll want to hit the refresh button next to the “SD Unet” selection and reload the checkpoint (can do this by selecting and loading another checkpoint, then selecting your intended checkpoint again)
f.i3 ) Short talk on “Deterministic Behavior” (we’ll cover this in more depth in the “Technical Details and Testing” section). TensorRT does, in-fact, change seed, as in, with the exact same parameters you will get a different generation with TensorRT on and with TensorRT off. The difference isn’t enough to change a composition but is enough to change a generation in either a positive or negative way and won’t be exact, so you can’t use TensorRT to replicate old gens with the new speed. If the results have worse ‘RNG’ or change things in an undesirable way you’ll need to disable TensorRT to gen them. TensorRT is, however, “same version” Deterministic, meaning as long as your versions, Unet, settings, and parameters are exactly the same, the gen will be the same. So if, for example, I run the parameters I posted above 100 times, I would get the same exact image 100 times even if I reboot my PC, restart the webUI, ect. However, it’s also important to know that when generating the Unet model for TensorRT, the results can (and most likely will) change, meaning if you need to regenerate the Unet model for any reason the seed will then change again, meaning you can’t generate the exact images from the last used Unet.
f.i4 ) The current version of TensorRT can do some weird stuff with metadata with certain extensions. When using something like the “PNG info” tab, double check that the metadata is, in-fact, the same. (I had an issue where it would double the positive prompt for some reason, I’ve since cleaned my extensions and this no longer happens. *shrug)
f.i5 ) VAE spikes still happen, you’ll notice on my image it reports 14.8GB was used, this is because the image export at the end of a process has to load in and process the gen with the VAE and this is not optimized by TensorRT in the current version. This step is unavoidable so if you’re extending beyond your normal VRAM limits you may still run into issues like the gen taking forever at the last 1-2% or giving an OOM (Out Of Memory) error and failing to export.
g. ) Now just revel in the all mighty glory of TensorRT and Tensor Core Technology! Also... kitty <3
<-<>-> > > <-< 5 >-> < < <-<>->
<>> Troubleshooting and Issues <<>
<<> NOTICE <>>
Information and issues, as well as any fixes provided, may change or have updated solutions as time goes on. Make sure you’re using the latest version of Automatic1111 and the TensorRT extension for the best possible experience!
Some fixes are being added to the TensorRT GitHub page, and they are actively working on Hotfixes, may be worth a look! - https://github.com/NVIDIA/Stable-Diffusion-WebUI-TensorRT
<>> TensorRT “install.py” fails to run, TensorRT fails to install properly <<>
or
<>> TensorRT was working but has broke and won’t re-install <<>
If you try to install the plugin and see “Error running install.py for extension” with “Error code:1” at the bottom, or if your plugin seemed to install but you get “Error loading script: trt.py” ending with “ModuleNotFoundError: No module named ‘tensorrt_bindings’” or similar errors at any point, this fix ‘should’ resolve the issue, though the issue may happen again in the future. These issues are caused by TensorRT not installing correctly or somehow messing up the requirements, if your install eventually breaks you can re-run these steps and the environment should go back to working as before.
Your install may not have finished properly if you also get the messages reporting “The procedure entry point-” errors, so running these steps before trying another fix is a good idea.
Original fix is from here (thank you racerx2oo3!) - https://github.com/NVIDIA/Stable-Diffusion-WebUI-TensorRT/issues/27#issuecomment-1767570566
I’ll be going over these steps a bit more in-depth and explaining each step. I’m not entirely sure why the error occurs precisely, has something to do with not properly entering the venv (venv folder, stands for Virtual Environment) properly and maybe something to do with python packages not being updated, but at least we can fix it.
To summarize, we need to navigate a cmd to our Automatic1111 install and enter the virtual environment (venv), then run the install scripts the extension is supposed to install, installing them manually, then removing the conflicting cuDNN versions. So let’s get to it!
a. ) First, since getting this error means you’ve tried to install the extension, we’ll need to remove the old extension files. Inside Automatic1111 > extensions we need to remove the extension folder “Stable-Diffusion-WebUI-TensorRT” (You can just delete it, we’ll re-acquire it later).
b. ) Next we’ll need to remove the venv folder found in our Automatic1111 install (You can also just rename it if you’re concerned, but we’ll be rebuilding our venv next step)
c. ) Now we want to run “webui.bat” to start up stable diffusion without any command arguments (so there’s no conflicts) to rebuild the venv. Once everything is installed and webUI tries to auto-open, showing in your browser, close both that browser tab and the console window running webUI/Stable Diffusion. (You MUST close these or else later we may fail to enter the Virtual Environment)
d. ) Let’s grab the path for your Automatic1111 install since we’ll need it in a moment, you can do this by navigating to where your Automatic1111 folder, right click, copy as path, or by right clicking the ‘address’ bar at the top of the browser and selecting “copy as address”.
e. ) Open up the console using CMD.exe (this specifically needs to be CMD.exe and NOT PowerShell, if you don’t know the difference, just type in “cmd” into your windows search bar and it’ll show up).
f. ) We’ll be using the console to install the requirements for TensorRT manually, but first we must navigate to the proper location, go ahead and get/copy the path to your Automatic1111 install (you can do this by finding the install folder and right clicking it, selecting “copy as path” or right clicking the top address bar and selecting “copy as address”). Then in the console type in “cd /d “ followed by the path to your Automatic install. (“cd” navigates to a directory, /d declares a possible drive change) You should now see the path to your Automatic1111 install on the left instead of “user” like before.
g. ) Once it shows the proper directory, we need to enter the Virtual Environment. We do this by activating “activate.bat” within the environment to set the proper variables. We do this by typing into the console “venv\Scripts\activate.bat” and hit enter. After a short pause it should show “(venv)” before the address on the left.
h. ) We’re now in the Virtual Environment, so we’ll update pip, install the cudnn package, install the tensorrt package, uninstall the cudnn package (or else we’ll get those message pop-ups and compatibility issues) then exit the Virtual Environment. We’ll be adding the “--no-cache-dir” to our commands to ensure the pip/python cache doesn’t interfere. After each command let the install for each component complete before continuing to the next command, you’ll see the path from before when it’s complete. Run this list of commands in order...
python.exe -m pip install --upgrade pip
python -m pip install nvidia-cudnn-cu11==8.9.4.25 --no-cache-dir
python -m pip install --pre --extra-index-url https://pypi.nvidia.com/ tensorrt==9.0.1.post11.dev4 --no-cache-dir
python -m pip uninstall -y nvidia-cudnn-cu11
venv\Scripts\deactivate.bat
webui.bat
After running these commands we should have the requirements installed, exited the Virtual Environment, and are now launching the webUI.
i ) Once the webUI opens back up, head to the extensions tab and re-install the extension. (Same as the installation steps, here’s the URL for convenience “https://github.com/NVIDIA/Stable-Diffusion-WebUI-TensorRT” ). Once it’s done installing/processing, use the ReloadUI buttons just like the install process.
<>> When trying to generate a Preset “ONNX not found” and no Unet ever gets created / some error shows up <<>
Tried to create a preset without generating a “Default” preset first :
If you tried to create a Preset without using “Default” the ONNX Unet may not generate properly, you must generate using the “default” preset first to create that ONNX Unet. If you tried to generate a preset then TensorRT can get stuck and you may need to restart SD/webUI by closing the console and browser tab, then reopening them.
Default/Preset still fails to generate :
You may have --lowvram or --medvram in your command line arguments, these can cause errors in Automatic1111 v1.6.0 since it changes loading in a way TensorRT doesn’t expect. When generating an image, remove these two command line arguments and re-launch your webUI by closing the console and re-opening it, then the default ONNX Unet and Presets should generate properly. Once you have the Unet models generated you can re-add --lowvram or --medvram to your command line arguments, though keep in mind they do affect performance and may not be necessary with TensorRT depending on your system and settings.
<>> cuDNN version mis-match errors and issues <<>
By default Automatic1111 doesn’t pull the most recent cuDNN since updating cuda related resources can change seed and may have some compatibility issues on older hardware, however the TensorRT plugin (and TensorRT in general) uses a newer version of cuDNN. This mismatch can cause a number of issues, most shouldn’t affect TensorRT itself and will just affect speeds, but some may encounter errors that stop the extension from working. This mismatch may also cause error boxes to pop up when starting the webUI. (Note: This error can also be caused by missing files, though usually you get a different error before this one would get a chance to crop up)
(Potential Fix)
Updated Fix: The TensorRT plugin is intended to install a temporary version of a newer cuda library than default, however if the plugin doesn’t install correctly it does not uninstall that temporary library, which is what causes this issue (otherwise this notification wouldn’t show up). Check the troubleshooting issue above called “TensorRT “install.py” fails to run, TensorRT fails to install properly” and run those steps instead. (I’ll be leaving this fix here just in-case it becomes relevant or if people still have issues with this message showing up)
Outdated Fix:
For this error we need to acquire the same version of cuDNN that the TensorRT extension installs and uses. We can see what version installed by looking inside the extensions files, install.py to be exact, located at “Automatic1111 > extensions > Stable-Diffusion-WebUI-TensorRT > install.py”. You can open this file up in notepad and see the line “launch.run_pip("install nvidia-cudnn-cu11==8.9.4.25", "nvidia-cudnn-cu11")” (Note, version it shows may change in the future, you’ll want that version, newer versions may also work)
We can see in this example it’s cudnn11 8.9.4.25, so I will need that version in this specific case. To get that version you’ll need a Developer Account with Nvidia (don’t worry, it’s free and non-exclusive, just need to give your intent) and download the library through https://developer.nvidia.com/cudnn (make sure to check the archive link to see all versions).
Once you download the archive, extract it and enter the “bin” folder, we’ll be copying over these files over to our torch instance for Automatic1111.
Open another file browser and navigate to your “Automatic1111 > venv > Lib > site-packages > torch > lib” and copy over the cudnn files from the package we downloaded earlier and you should overwrite them. (Note : You can backup the previous versions of the cudnn files if you’d like, changing these files may change seed output so it’s a good idea if you intend to roll back at any point, however we did make a backup of our venv already)
After these are copied over you should be able to start your webUI and the message boxes should no longer appear! (if they do, something else broke Q.Q)
<>> Models refuse to load after a WebUI Restart <<>
(Update: may be related to certain models and pruning on a specific Nvidia Driver Version)
This is a strange one, and I can’t find a direct cause, only a temporary work-around/fix. When loading up the webUI errors will begin to pop up saying “Stable diffusion model failed to load” ending with “RuntimeError: Boolean value of Tensor with more than one value is ambiguous” and then no models will load up afterwards. Disabling the TensorRT extension fixes this error, but I have another work around if you’re affected by this. (though finding the source is still better)
(Potential Fix)
Close the webUI and open the “config.json” file (Automatic1111 > config.json) in any text editor like notepad or notepad++ and delete the hash found in the setting “sd_checkpoint_hash” and delete the checkpoint name in “sd_model_checkpoint” so that they read "sd_checkpoint_hash": "", and "sd_model_checkpoint": "", respectively. Then save the config and load up the webUI again, it should skip the initial model load from cache and work as intended, just need to select your desired checkpoint before generating.
<-<>-> > > <-< 6 >-> < < <-<>->
<>> Determinism and Quality Comparison <<>
As mentioned in the article, TensorRT is only deterministic with “same versions” self, meaning the results will not match the output without TensorRT and running another version, including another GPU and their own TensorRT or even another Unet generated on the same model on the same machine (Same-Version also includes using the same generated Unet). With the current version of TensorRT, each install will change and be entirely unique, not just because of how the Unet generates but also seems related to what GPU the user has, including vendor, model revision, bios, and potentially Nvidia driver as well. Add in that there’s also other seed changing updates like cuda updates and altered scripts and we basically have another “can’t replicate” images situation.
Of course this isn’t new and not really a big deal on the surface, we’re used to seeds changing frequently and most people are using optimizations these days that can drastically change results anyways, so TensorRT will just be another one to add to the list. However, in my opinion at least, TensorRT is the first to actually give a drastic benefit that justifies a bit of annoyance in specific cases. TensorRT is also the first one I’ve tested where the ‘quality’ of gens is equatable over a larger sample size, some RNG is always at play which can mess with hands or change artifacts, but overall the quality equals out, the images are just as sharp as without optimizations, and if I was shown 100 images with TensorRT made images mixed in, I’d never be able to even guess a single one.
To be clear, we should never expect to replicate results exactly across different hardware and revisions, nearly anything we modify from default can change a seed, and we need to specifically set certain settings to even ensure deterministic settings from the start. But it is disappointing when we can’t even transfer to another machine to, say, bulk upscale to test the output as a whole. Outside of memory optimizations like XFormers or SDP-mem-attention though, most seed changes are related to stuff we have more direct control over like software version, and if we needed to revert to get old results we had the option, but TensorRT seems to adjust in a way were that currently possible. But let’s get into some test results before I ramble too long.
Methodology :
To ensure my results are not from any fault of random errors on my end, I’ve completely uninstalled all python, pip, git, and cuda packages to ensure everything is 100% clean, then did a fresh new install of Automatic1111. I used deterministic settings I outline in my starting guide here - https://www.youtube.com/playlist?list=PL503HskLvxy3rBEE8zTil37WQF_gJ0kNq - but the only thing different from default that affects seed is “RNG:CPU” which is set to ensure seed is the same across all Nvidia GPU systems (other vender GPU webUI versions produces different seed results). These settings should be deterministic across the same webUI, Torch, and cuda versions using Nvidia GPUs as long as no memory optimizations are set. Additionally I’ll be doing some speed comparisons so for comparisons I am using nvidia driver version “31.0.15.4584” (545.84 Game Ready, yes I game >.>) and will be using Google Chrome as browser choice, no GPU heavy tasks in the background (just basic stuffs, like Discord, file browsers, naughty images, the standard).
The extensions installed are...
Dynamic Prompts (shouldn’t affect seed, used to import old metadata for version comparisons since it changes metadata)
ControlNet (doesn’t affect seed), we’ll be testing with controlNet in some images
Regional Prompter (doesn’t affect seed), we’ll be testing with regional prompts in some images
Model Converter (doesn’t affect seed), used to quickly test precision differences and baked VAE issues
TensorRT (obviously)
Hardware Used (which may affect software/seed) :
AM5 platform (7700x)
Samsung 870 Evo Sata SSD for model storage (including Unet)
Asus RTX 4090 TUF ( Bios : RTX4090 VB Ver 95.02.3C.00.AS05)
I’ll be using the program “WinMerge” to create visualizations of the “Math” differences.
<>> NOTICE <<>
Performance here isn’t cross-comparable to other performance numbers from other people, this is NOT designed to be a benchmark of the 4090 and Stable Diffusion performance, rather we are just looking for the differential between TensorRT on/off. While certain factors like loading, parameters, ect, ect, will affect the overall performance, according to reports I’ve seen and people I’ve spoken with, the performance delta (percentage differences) should translate directly up and down the product stack. In other words, if a 4090 gets 1.8x performance (Delta of 1.80 aka 180%) then a 3060 will see the same benefit, even if the actual iterations per second are on different scales. Note I will be primarily using the settings shown in my setting up presets section of installation for the TensorRT Unet generation.
Ok, so let's start with a very clear example where the seed has changed. This will also help explain my visual methods.
DPM++ 2M SDE Exponential
768x768 R-ESRGAN 4x+ 1536x1536
SDP-no-mem
Base Gen : 00:03 13.39 it/s
Hires Gen : 00:34 2.91 it/s
TensorRT
BaseGen : 00:02 18.49 it/s
Hires Gen : 00:25 3.87 it/s
So it’s pretty clear these images are different, and I intentionally showed an example where the difference is very noticeable for example's sake. In this specific case the no-TensorRT rendered version shown on the left did better but as we’ll see later this won’t always be the case and often the differences are not a question of quality but just pure RNG of the gen.
Left = SDP-No-Mem > Right = TensorRT ON
But now let’s flip them around and turn on the overlay to see the “visually significant” difference. (CD Threshold: 20%)
Now you can see them vertically compared with the “visually significant” differences highlighted to see exactly what was actually changed in a mathematical way. You can see most of the comp stays the same (clearly) so most gens will come out with the same basic form and comp, however specific shapes and maybe even specific objects may change, like down in the bottom right the stone path becomes a pair of rocks. But you may not have even noticed those rocks, or how the hair on the left side of the head forms a better point on the side of the face and doesn’t have an artifact below the ear.
We’ll have more examples explaining these differences and how I came to my conclusion on quality, but let’s explain what “visually significant” difference really means. Here’s a raw difference chart with no Calculated Difference Threshold (CD Threshold).
Now this data doesn’t tell us much, other than yes, the images are different. Sure, there’s some blips of pixels that are the same here and there but overall this doesn’t really give us much useful info, we ‘know’ the images are different, we want to see what’s visually different and ‘how’ so we can better understand how TensorRT affects our output. For these tests I used CDT 20%, which is fairly aggressive, but it also ensures we don’t miss small details someone would easily notice but may not matter like it would in a photo. (Comparing AI is very different from other use cases of this technique, but we won't discuss all that here.)
Now let’s do a quick upscaled comparison. I won’t be doing both for most images, instead opting for whichever provides more interesting data, but for this gen you can see how the upscaler “exaggerates” the differences but the overall quality between the images is virtually identical.
Of note, you can see how the lips specifically exaggerated their differences, and same for the rocks in the bottom right as it ‘refined’ the image. Latent mode upscalers will differ in how much of a difference this makes but for this mostly safe upscaler it doesn’t do too much.
This next comparison is tricky, because it uses controlNet. In the current version of cuDNN controlNet straight up doesn’t work (gives an error) however if you update your cuDNN packages you can get ControlNet running within certain resolutions. (currently there’s a memory run-away bug after a certain point) but it allows us to compare images with ControlNet. The reason we’d want to do this is to test how the image “resolves”, as in, we want to see how well lines and contours form compared to normal vs whatever. For some memory optimizations, like XFormers, you may see minor differences in refinement for some samplers, however with TensorRT you can see the only visually significant difference is just outright noise. While this noise may be notice-able on close inspection, the actual refinement and quality of the image is identical in this case. In other words, if we lessen our reliance on “seed” and RNG for our image, the results are much closer to the deterministic non-optimized image.
DPM++ 2M SDE Exponential + ControlNet
768x1024 R-ESRGAN 4x+ 1536x2048
Speeds unrecorded, hires failed for TensorRT
Left = TensorRT - Right = SDP-no-mem
Left = TensorRT - Right = SDP-no-mem
Ok, but how does TensorRT compare to memory optimizations? Here I’ll compare using SDP with memory attention (XFormers has issues/bugs with certain samplers like I alluded to before, so to be as fair as possible we’re using SDP). What I find interesting in this example is that both optimizations change the images in about the same amount by volume, however comparing TensorRT and SDP-with-mem-attention is actually much closer. I feel TensorRT did better at keeping the character’s colors more consistent over a larger number of gens where SDP seemed to change colors more often, but both produce a different image and this could entirely be coincidence (RNG is heavy here). TensorRT did better at keeping the comp in this instance, but my favorite image out of these 3 is probably SDP-with-mem-attention, but this is entirely down to RNG and was not always the case.
DPM++ 3M SDE Karras
512x768 R-ESRGAN 4x+ 1024x1536
SDP-no-mem (Deterministic Across Machines)
Base Gen : 00:02 18.41 it/s
Hires Gen : 00:11 3.48 it/s
SDP-with-mem (Memory Optimizations)
Base Gen : 00:02 18.18 it/s
Hires Gen : 00:11 3.61 it/s
TensorRT (Same Version Deterministic)
BaseGen : 00:01 24.32 it/s
Hires Gen : 00:09 4.27 it/s
Left = SDP-no-Mem - Center = TensorRT - Right = SDP-With-Mem-attention
Comparing SDP-no-mem with TensorRT you can see how the background changed quite a bit and the sleeves changed shape, but overall the comp is pretty close.
Left = SDP-no-mem - Right = TensorRT
Comparing SPD without mem and with mem we can see the character’s clothes changed color along with some major comp changes.
Left = SDP-no-mem - Right = SDP-with-mem-attention
But now comparing TensorRT with SPD-with-mem-attention we see that they actually are much closer than either compared to SDP-no-mem. SDP-with-mem-attention still did change the comp more than TensorRT did, but both are perfectly acceptable in terms of quality. What’s most interesting to me is how it kept certain figures that differed from SDP-no-mem, like the sleeves being the same shape or the trees on the back left region are very similar.
Left = TensorRT - Right = SDP-with-Mem-attention
So I can hear you saying, “Oh, so TensorRT is basically the same as SPD-with-mem-attention, right?” and, well, no. While XFormers and TensorRT are same-version deterministic between sessions, SDP-with-mem-attention doesn't appear to be, meaning if you close SD and start it back up, you may get different results. Changing the VAE can also change your SDP-with-mem-attention results, for whatever reason. (Is this a bug?)
Left = SDP-no-mem with animevae.safetensor (same as Baked in VAE)
Center = SDP-with-mem-attention after restart (VAE on Automatic, using Baked in VAE)
Right = SDP-with-mem-attention with animevae.safetensor (same as Baked in VAE)
Surely TensorRT doesn’t have anything silly like that, right? Wrong, it’s called “file name.”
Yeah, that’s right, I said FILE NAME!
Short story, before testing I found one of my models was having some issues loading, so I went ahead and started using a new unpruned version with a properly baked in VAE, I originally merged it into my HDD I use for mixing models and then copied it over to my SSD for actually running gens, but I ran into something interesting. Generating the TensorRT Unet models is not actually deterministic, so when I generated a new Unet model for the different versions of the model and their different locations I got different results for each model, this includes when I copied the model from my hard drive over to my SSD for proper loading and generating. So why not just rename the Unet? Well, it uses a hash, which includes both the File Name, AND Folder! This means it would not load the Unet even if I moved the file and renamed the Unet. I also tried renaming the file with a new hash but instead that just resulted in the Unet not working at all and breaking the model.json.
All that is rather frustrating, and completely arbitrary. Why, in any scenario, would you ever use the FILE NAME and PATH as part of its hash or as a hard coded identifier?
For a sanity check I did make sure the 2 models I was loading and using were, in-fact, 100% the same. They are, and if I move them around I get the same results with SDP-no-mem as its deterministic behavior carries over, both SDP-with-mem-attention and TensorRT were not. It appears, for whatever reason, the file name matters.
Anyways... here’s some comparisons.
Left = Broken VAE - Center = From SD15 Folder - Right = From external drive
As you can see, these turned out completely different, but all are using TensorRT with the same prompt, same settings. They are relatively similar, but it’s clear TensorRT in its current form has some sort of bug or oversight that is using all the data instead of only the relevant data when creating the Unet used for optimization. Maybe that’s by design and it’s something internal doing it, but I doubt this couldn’t be fix if they really tried to, and I really hope they do! I also tried restarts and reboots to see if the gen’s changed, but they were same-version deterministic, it’s just that same-version also refers to file location and name...
But I got distracted, this comparison was originally to show Regional Prompter in Latent mode, which is notoriously slow and nearly unbearable. Does TensorRT actually fix this aspect? Yes, and this is where that “up-to 2 times speed!” comes from for RTX 4090 owners. Up until now most workloads were already fast with a 4090 and didn’t ‘really’ push the 4090 like lower end hardware, but Regional Prompter’s latent mode is heavy once you start getting into high latent upscales, and the difference here is VERY noticeable, rubbing right against that 2x speed! (I like TensorRT on this one)
DPM++ 2M SDE Exponential
768x1024 Latent (nearest)1536x2048
SDP-no-mem
Base Gen : 00:13 3.76 it/s
Hires Gen : 01:46 2.13 s/it (0.47 it/s)
TensorRT
BaseGen : 00:09 5.25 it/s
Hires Gen : 00:53 1.08 s/it (0.92 it/s)
Left = SPD-no-mem - Right = TensorRT
Left = TensorRT - Right = SPD-no-mem
I’ve been mostly using K Samplers since those are what works best for my models, but what about Heun, Eular, or even ancestral samplers??? It’s mostly the same story visually, though speeds are interesting since when using the latent upscaler the speed difference actually wasn’t as substantial as before. (not sure why that is, but Heun was already super fast to begin with)
Heun
512x512 Latent 1024x1024
SDP-no-mem
Base Gen : 00:03 14.10 it/s
Hires Gen : 00:12 4.05 it/s
TensorRT
BaseGen : 00:02 24.78 it/s
Hires Gen : 00:10 4.78 it/s
Left = TensorRT - Right = SDP-no-mem
Left = SDP-no-mem - Right = TensorRT
One thing to note before we get into my thoughts, and that’s about memory usage. You may have noticed I stated at the start that TensorRT can decrease memory usage, and that is ‘sort of’ true, however the amount it saves varies heavily on the Unet models you generate (dynamic vs static, dynamic size, model size, ect) and the preset I use doesn’t decrease memory usage much for a single gen. When using it for batches it decreases quite a lot, and if I use static the drop is drastic (I was able to gen 4 768x1024 with 2x latent upscale to 1536x2048 at once using batch size! Took just over 3 minutes, but it did it!) however there’s issues using static, such as the loading required between the base gen and hires fix.
<>> Conclusion <<>
But what are my overall thoughts with our findings.. Well, TensorRT is really promising and I think I may consider using it to produce large latent upscaled images and with SDXL it’ll be much appreciated, but there needs to be some fixes in the pipeline for TensorRT to really become a standard for Nvidia GPU users, and this is also why it wasn’t all the rage before the extension. Currently TensorRT is still far too technical for the average user, the extension helps a ton and got me to finally deep dive into it but overall it’s still too ‘complicated’ through both UX and technical hurdles. The extension itself is also far too buggy and inconsistent at the moment, if it was less buggy it would be easier to recommend. The drive space requirements also make many of the benefits difficult to appreciate if you’re low on disk space.
But what about the tech and its overall usefulness? As we’ve seen here, the results are good and there are clear advantages over optimizations that currently exist, and speed could be even better if we would have compared using static shapes and optimized the Unet to our specific tests, but I wanted to keep this as “real world” as possible, using it like I would use any other optimization. And maybe that’s a fault on my end, maybe that’s not what TensorRT is really for, but I don’t think it CAN’T be that. It has same-version deterministic results, which is better than what most people already use, quality is also very consistent and doesn’t appear to “ruin” results, and the performance boost is truly impressive! But ultimately it’s just too buggy and technical in its current state to really be useful for your average casual user.
For us power users that love playing with the tech and want to push our hardware to its absolute limits, this tech appears to be THE future. For now though, it’s just a buggy extension most people won’t have the patience to put up with, even if they’d be rewarded for doing so.
<-<>-> > > <-< 7 >-> < < <-<>->
Hey, so like, this takes a lot of time and effort and I can’t always find the time to do stuff like this. But if you like this work and would like to see more like it as well as support my other projects and our awesome community, why not consider supporting me over on Ko-Fi? With more funding I could dedicate more time to awesome projects like this!
Donations! =^-^= - https://ko-fi.com/michaelpstanich
You can also check out our community discord, The Broken Chatbox!
Discord - The Broken Chatbox - https://discord.gg/h3vB7S4FEw
Thank you for reading through, and I hope this all helps! If you have questions, advice, corrections, ect, don't be afraid to leave a comment or contact me over on Discord!
Special Thanks to :
Divine over on the CivitAI Discord (I think their name on CivitIA is xumi64776? I forgot to ask >.< Helped me debug and experiment with fixes for the TensorRT Extension)
Eurotaku on both CivitAI and the Discord (Confirmed --lowvram --medvram issue)
Also thanks to everyone helping me learn more about TensorRT and making suggestions! Let's help each other make great stuffs!