Type | Workflows |
Stats | 2,461 |
Reviews | (180) |
Published | Oct 31, 2024 |
Base Model | |
Hash | AutoV2 D15A9CA2D1 |
11/09/24 - I've updated the THUDM/CogVideoX-5bI2V workflow to include an additional workflow that should be easier to follow, but I've kept the original in there as well in case you prefer it. Still trying to figure out how to make the CogVideoXFun sampler ones work on the latest version.
11/08/24 - The workflows for CogVideoX_5b_fun_GGUF_Q4_0 and Kijai/CogVideoX-Fun-5b use the CogvideoXFun Sampler that received an update that breaks the workflow. If you're experiencing this issue, I've gotten it to work by:
Open terminal in: ComfyUI\custom_nodes\ComfyUI-CogVideoXWrapper
Run the following command: git checkout 0e8f814
This will revert CogVideoXWrapper to a version prior to the recent update, restoring compatibility with the workflows.
These are all IMG2VID workflows:
CogVideoX_5b_fun_GGUF_Q4_0 - for lower VRAM - portrait and landscape aspect ratio
Kijai/CogVideoX-Fun-5b - more "creative" results - portrait and landscape aspect ratio
THUDM/CogVideoX-5bI2V - more demanding on system - more detailed and realistic movement - landscape aspect ratio only
If you're getting just a slow zoom in on a static image, here are the most likely issues:
The prompt - prompting is very important, this custom GPT can help, just drop off your image and tell it what you want. It can also help adjust the prompt based on your feedback. I did this a bunch and made it self update noting down what works while figuring this all out so it's very helpful when you're new to prompting for this.
The seed - I always try 2 different seeds before trying to change up the prompt. I've found the seed to be really influential, but it's less likely than the prompt to be the issue. Also, if you had success with a previous seed for a certain kind of motion, it's worth going back to that seed if you want that kind of motion again (ex. lots of characters walking).
The image - Some images are just stubborn, especially ones that look like still photographs, or graphic novel illustrations. Try to avoid words like "painting" or "illustration" as it seems that the model has strong associations with stillness with words like that.
Aspect ratio - I've found that the further the image deviates from 3:2 aspect ratio, the more likely it ends up stiff. I originally tried a bunch on 2:3 and would only get slow zooms on still images, once I shortened it a bit (the preset portrait resolutions in the GGUF/Kijai workflows) I started getting movement.
11/04/24: I've just added a version that includes two workflows using different models that can also do portrait aspect ratio. I've made one specifically for you out there that have systems with less VRAM, the GGUF workflow. I noticed 20-30% of my normal VRAM usage compared to the THUDM/CogVideoX-5b-I2V version, hopefully that means those of you that couldn't get it to work before will be able to now.
I'm also pleased with this version's output, I've included two examples in both aspect ratios of the GGUF version, they're labeled if you hover over the box under 'ComfyUI', they are the one with the axolotls in brown outfits preparing deserts, and the one where one is stirring a stew.
Here is a custom GPT that can help with the prompts. You can read more about it here.
This is the result from using the GPT and this workflow.
I've included notes that I think might be helpful, and it's organized in a way that I prefer, with the commonly used inputs grouped together, it also has all of the things that I copy into the generation info near each other to make that easier.
When it comes to CogVideo, the prompting is really important. I've experimented a lot in terms of prompting, but almost all within CFG 8 and 50 steps
There are specific things you can do when going for certain motion, or when working with a certain art style. I figured that it would be helpful to create a collection of a variety of good generations in different art styles and ideas so you can see examples. All of the posts include the generation data and is linked here.
Sometimes the seed can mess with what you're trying to achieve, my approach is that if I can't get any movement with 2 different seeds, then there is an issue with the prompt. Rerolling the same same with a targeted negative prompt to get rid of an undesirable aspect is also effective.
Below are some general tips about the prompting that I hope will be useful:
1. Positive Prompt Setup: Building a Movement-First, Detailed Description
The positive prompt is where you define the scene, characters, and environment. In animation, leading with action-oriented, descriptive language keeps the scene fluid and engaging. Below are steps and examples to structure a strong positive prompt.
A. Define the Animation Style Clearly
Use movement-focused language to specify the animation style without suggesting it’s a still image. Avoid terms like "painting" or "illustration," which may result in static outputs.
Incorporate style cues that suggest action to reinforce a sense of movement and vibrancy.
Examples:
Anime Scene:
"vibrant anime-style animation of playful creatures in a magical landscape"
Fantasy Action Scene:
"high-intensity fantasy animation with dramatic lighting and dynamic movement"
Surreal Setting:
"whimsical animated sequence with surreal creatures in a colorful, candy-themed world"
B. Emphasize Character Movements and Interaction with the Environment
Describe each character's movement to make the scene dynamic. Lead with action verbs and specific descriptors to bring the characters to life.
Examples of Action Verbs: “holding,” “jumping,” “walking,” “swinging,” “bouncing,” “twirling”
Use modifiers to add personality to each action, creating more natural movements.
Examples of Descriptive Modifiers: “gripping tightly,” “bouncing excitedly,” “swaying gently,” “snarling menacingly”
Examples:
Action Scene:
"The knight shifts her stance, gripping her sword tightly, her cloak billowing with each movement."
Playful Scene:
"An axolotl bounces on a candy ball, laughing with delight, while another grabs a giant candy cane with a playful grin."
C. Add Environmental and Lighting Effects for Depth
Describe environmental elements that are in motion, as this enhances immersion and prevents a static feel.
Examples: “trees swaying,” “fog drifting,” “sunlight casting shifting shadows,” “water rippling”
Use lighting and shadow effects to add realism, especially for scenes with sunlight or complex lighting.
Examples of Lighting Terms: “reflections on water,” “shadows moving,” “sunlight filtering through the trees”
Examples:
Forest Scene:
"Sunlight filters through the trees, casting shifting shadows, while a light mist drifts along the forest floor."
Candy Land Scene:
"Candy canes sway, and the chocolate river flows gently, with sunlight casting cheerful reflections on the water."
D. Add Cinematic Camera Movements
Describe the camera’s movement to maintain visual flow and prevent the model from interpreting the scene as a static, zoomed-in frame.
Examples: “camera panning across the scene,” “camera circling around the characters,” “camera following the character’s movements”
Examples:
Dramatic Scene:
"The camera slowly pans across the intense standoff, capturing the characters’ expressions and movements."
Playful Scene:
"The camera follows the bouncing axolotl to capture each joyful movement, adding energy to the scene."
2. Negative Prompt Setup: Filtering Out Static Elements and Artifacts
The negative prompt filters out static imagery, visual artifacts, and unintended details that can interrupt animation flow. This section is key to ensuring fluid motion and maintaining a high-quality aesthetic.
A. Combat Static and Still Imagery
Avoid static frames by specifying terms that discourage still images, zoom-only frames, or frozen characters.
Anti-Static Terms:
“still image,” “static shot,” “motionless figure,” “no movement,” “frozen scene,” “rigid pose,” “no action,” “zoom-in”
Why It Works: These terms help prevent the model from interpreting the scene as a single frame, maintaining an expectation of motion from every part of the scene.
B. Anti-Stiffness for Characters and Objects
Prevent characters and objects from appearing stiff by discouraging any rigid postures or frozen limbs.
Anti-Stiffness Terms:
“stiff character,” “rigid figure,” “no leg movement,” “frozen posture,” “static pose,” “motionless limbs,” “no limb movement”
Why It Works: By targeting body-specific terms like “no leg movement” or “motionless limbs,” you help keep characters’ bodies fluid and lifelike, especially for action-focused scenes.
C. Artifact and Quality Control Terms
Remove visual artifacts like pixel noise, color bleeding, and jerky motion that could disrupt the clean aesthetic of the animation.
Artifact Control Terms:
“jerky movements,” “pixel noise,” “color bleed,” “blurry edges,” “flickering light,” “excessive blur,” “overexposed lighting”
Why It Works: Including terms like “pixel noise” and “color bleed” targets issues that often appear in lower-quality outputs, helping you achieve a professional, polished look.
D. Tips for Effective Negative Prompts
Be specific about what you don’t want, especially with body parts and visual clarity.
Combine static terms with artifact terms for balanced control.
Example Combination: “still image, no movement, jerky movements, pixel noise” filters both stillness and artifacts effectively.
Example Prompt for Quick Reference
To bring all these principles together, here’s a sample prompt setup using both positive and negative prompts:
Positive Prompt:
A vibrant anime-style animation of playful axolotl creatures in a colorful candy land. One axolotl creature holds a giant candy cane with a joyful grin, while another bounces excitedly on a candy ball, laughing with delight. Candy canes sway slightly, and the chocolate river flows gently, adding whimsy to the scene. Sunlight casts a cheerful glow across the candy land, with flowers bobbing in the grass and trees swaying gently in the background. The camera pans across the scene, capturing each joyful movement and the vibrant landscape.
Negative Prompt:
still image, static shot, motionless figure, rigid pose, no action, frozen scene, zoom-in, jerky movements, pixel noise, color bleed, blurry edges, flickering light, unnatural shadows, static background.
Final Tips for Success
Experiment with action verbs in the positive prompt. The more specific the action, the more likely you are to get a dynamic result.
Refine negative prompts based on results. If something still appears static or jerky, add more targeted anti-still terms or artifact filters.
Adjust settings as needed. 50 steps and CFG scale of 8 work well for most scenes, but you can adjust depending on complexity.