Introduction
Disclaimer, I am not an expert and a lot of this knowledge was trial and error as there weren't any proper and complete guides on how to do this prior to the writing of this. This is based around a windows install and use-case. Please tell me if I am missing something or a step should be added in.
Animatediff in AUTOMATIC1111 with stable diffusion is possible and is similar to comfyUI but does not use a workflow and instead is primarily used as extensions to webUI. For the remainder of this guide I will refer to A111 as webUI.
System Requirements
if youre already running webUI locally then you probably are already good to go. but at a minimum, RTX3080 or better. The more VRAM you have access to, the better. more VRAM will generate faster or give the capability of using higher resolutions.
Getting Started
1. Have WebUI installed. This is not a guide for WebUI installation. Ensure you have it working with checkpoints. Running with --xformers is highly encouraged and recommended.
2. ffmpeg - you need this installed and setup to path in order for GIFs to be generated and stitched together. https://www.gyan.dev/ffmpeg/builds/ffmpeg-git-full.7z this link will download the file you need for windows. Extract the file, inside you should see a few folders, make note of bin. On your C: Drive(you can use any file location but this is typically easiest) create a folder named bin. Copy the contents of bin from what you downloaded into the location you have created in your C: Drive. Press the windows key and type "Edit the System Environment Variables" it should be apart of control panel. click on Environment Variables. A new window will appear. Double click on Path under Variables column near the top. A new window will appear again. Click on New button and enter C:\bin(or the location of the bin files from ffmpeg). Hit Ok to close the window and Ok again to close the other window.
3. ControlNet - I spent so much time wondering why my video generations werent following the video prompt I was providing in AnimateDiff. it was because i didnt have ControlNet setup. For some reason I was unable to find ControlNet under the Available extensions in WebUI. instead the next best solution is to install from URL. https://github.com/Mikubill/sd-webui-controlnet instructions on how to install are provided at this link. additionally, you will need to download controlnet models which are available at this link https://huggingface.co/lllyasviel/ControlNet-v1-1/tree/main. you need to download every file that ends in .pth and place it in the models folder of the controlnet extension. this file location should look like something along these lines: stable-diffusion-webui\extensions\sd-webui-controlnet\models.
4. Deforum - the install process for this is similar to ControlNet. https://github.com/deforum-art/sd-webui-deforum this link has install instructions. you can either have webUI do the install for you or download the file and place it in your extensions folder.
5. Animatediff - This link will have the required download. https://github.com/continue-revolution/sd-webui-animatediff. Download the file and unzip it. it should have a name like sd-webui-animatediff-master or something similar. copy that folder and paste it into the extensions folder of webui. You will need motion modules with this extension. they are basically checkpoints for this specific extension. https://huggingface.co/guoyww/animatediff/blob/main/mm_sd_v15_v2.ckpt this one is pretty good and simple and seems to get me the best basic results, you can also use https://huggingface.co/CiaraRowles/TemporalDiff/blob/main/temporaldiff-v1-animatediff.ckpt. there are plenty of other options here at this link if you are wanting a specific movement https://huggingface.co/guoyww/animatediff/tree/main. Place the checkpoints in the models folder of the extension. the file location should look something like this: AI\stable-diffusion-webui\extensions\sd-webui-animatediff-master\model
Lets double-check everything. Ensure your extensions tab has at least these items. Deforum, animatediff, and controlnet. just to be safe, ensure youve restarted webUI by closing the command prompt running it and launching it again.
Settings
There are a few settings you need to ensure are turned on for things to work properly.
First lets go to the Settings tab in WebUI and go down to AnimateDiff. you need to enter the path location of where the checkpoint models are. youve been in this folder already but you just need to tell WebUI where to look for these checkpoints for this extension. the file locaion should look something similar to this: AI\stable-diffusion-webui\extensions\sd-webui-animatediff-master\model. There are additional options that you can change for quality or optimization. gifsicle is exclusive to Linux so dont enable any options related to that if you are on Windows.
Next go to the Optimizations, ensure pad prompt/negative prompt to be same length is enabled. if it is not enabled the GIFs created will be different seeds and you will create 2 of them, one for the positive prompt and one for the negative.
Generating an animated image
Lets start with the Basic simple animation without controlnet.
Pick a StableDiffusion checkpoint to use. typically less realistic checkpoints tend to work better. Dreamshaper, DarkSushi, and Anime/art themed checkpoints tend to work well. For this example I will be using DarkSushiMix, Euler A sampling method and 26 sampling steps at 512x512.
Click the menu area for AnimateDiff. Select a motion module. I have just the basic mm_sd_v15_v2.ckpt loaded. Save format should be GIF and PNG. you can try the other options if you have downloaded the dependencies for them and set them up for PATH in windows. Ensure you click the checkbox for Enable AnimateDiff. Set your Frames to be generated and the FPS. 80 frames at 16 FPS will net you a 5 second animation. you can do the math for all the other combinations of Frames and FPS. if you set it to 0 it will make a decision on its own of how many frames to generate based on the context batch size.
Here is an example of the prompt and settings i was using for reference.
The output from this was quite jumbled and not very cohesive which you can see here: https://civitai.com/images/3636926. A bunch of cars moving around and a Lady morphing through them trying to open them all. But the upside is that the generation is relatively quick. an image may only take 3 seconds to generate, but generating 80 of them and stitching them together takes my computer about 4 minutes. Depending on the prompt it is possible to get some decent results without the use of controlnet like this one: https://civitai.com/images/3636623 where you can clearly see a woman walking and turning her head. The settings for this generation were 180 frames at 8FPS and it took about 10 minutes to generate.
Additionally, you can use the img2img tab to use an image as the starting frame for an animation. the rest of the process is the same when using img2img.
To get a smoother output even though its only generating at 16FPS you can use the FILM frame interpolation. this will put intermediate frames between every generated frame to greatly increase the smoothness of the animation. it does a really good job of it. the Interp X setting is how many frames will be interpolated. so if you generated 16 frames and have interp X set to 10, the final GIF will be 160 frames. this can dramatically increase the runtime of your animation.
Something to keep in mind is that the context batch size is how cohesive your animation is going to be. So with a context batch size of 16, every 16 frames the prompt will be reevaluated and it will try to follow the context of the prompt(the model is trained on 16 but you can use other settings but dont expect the results to be satisfactory every time). so if your prompt is very busy and has alot going on and you want to generate something longer than about 2 seconds when using 16 frames at 8fps with a context batch size of 16 you should expect to see some changes throughout the animation. You can see that in my previous example that has the woman walking. when her clothes are changing color that is the point at which the context of the prompt is being reevaluated. so if you are a stickler for attention to detail this is something to keep in mind and will affect the cohesiveness of the animation.
Generation with ControlNet
ControlNet tries to keep the generation within some guidelines that you set and its pretty good at doing so. With ControlNet we can upload a video to reference in the AnimateDiff section. Grab your favorite tiktok or other short video that you want to use as a basis. Then go down to the ControlNet section and there are three units. Unit 0,1, and 2. Theres no need to upload an image to ControlNet since youve used a video in AnimateDiff. click the enable checkbox for ControlNet that is under the upload area. For control type, use whatever you like, ive found lineart puts out pretty quick generations while tile/blur and open pose take much longer. Feel free to play with the other control options as well to find what works best for what you are trying to achieve with your animation. And thats pretty much it. Tell it to generate and grab a meal to eat. So far in my experience using Controlnet takes much much much longer to generate an animation. Hours instead of minutes. But the upside is that it will follow the video you provide as a guide for generation very well. The results are far better than without controlnet.
For example i used this video as a reference in Animatediff:
Then enable controlnet with lineart model and this is the output after about 20 minutes: https://civitai.com/images/3640472. It isnt perfect but you get the idea. it trys to maintain essential shapes and combine them with your prompt. different SD1.5 models and prompts can get you better results with lots of tweaking and patience.
It is possible to create animations with SDXL as well and this is the extension you will need for it https://github.com/guoyww/AnimateDiff/tree/sdxl
I have not tried making any with SDXL yet but the process to install and use should still be the same.
great thanks to @lucyhawt for having some information about how to get smoother generations in one of their posts, and @impactframes for having some information about how to get things started in his youtube video.