FramePack Quick Start Guide

FramePack is a powerful AI tool that turns images into videos by predicting each next frame. Here's how to use it:

Installation

Windows: Download from FramePack Windows Package, extract it, run update.bat, then run.bat

Linux:(be carefull use python 3.10 or 3.11 if u success with other version plz comment! the compiling didnt succeed by puthon 3.13)

git clone https://github.com/lllyasviel/FramePack.git
cd FramePack
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu126
pip install -r requirements.txt
python demo_gradio.py

***if u face issue(numpy 2.2 not compatible with scipy do the following)

my video explation

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu126
pip install -r requirements.txt
pip install --force "numpy<1.29.0"
pip install --force "scipy==1.12.0"
python demo_gradio.py

Basic Usage

Upload an image on the left panel
Write a motion-focused prompt describing how you want the image to animate
Set your desired video length (it will generate 2sec firstly and then The video will be extended foe howlong u need)
For faster testing, enable TeaCache (but disable for final high-quality outputs)
Click generate and watch as the video builds section by section

Prompt Structure

Subject-Motion-Details: Always structure prompts with subject first, then motion, then other details
Example: "The girl dances gracefully, with clear movements, full of charm"
Motion Focus: Emphasize dynamic movements like dancing, jumping, running
Keep It Concise: Short, clear prompts work better than long descriptions

Using Prompts Effectively

For dancers: "The [person] dances [style], with [movement quality], full of [emotion]"
For actions: "The [person] [action verb] [adverb], [additional motion detail]"
For objects: "The [object] [movement], [environmental interaction]"
For camera movement: "Camera [movement type] around [subject] as they [action]"

Using ChatGPT for Prompt Generation

You can use this template with ChatGPT to generate effective prompts:

You are an assistant that writes short, motion-focused prompts for animating images.
When the user sends an image, respond with a single, concise prompt describing visual motion (such as human activity, moving objects, or camera movements). Focus only on how the scene could come alive and become dynamic using brief phrases.
Larger and more dynamic motions (like dancing, jumping, running, etc.) are preferred over smaller or more subtle ones (like standing still, sitting, etc.).
Describe subject, then motion, then other things. For example: "The girl dances gracefully, with clear movements, full of charm."
If there is something that can dance (like a man, girl, robot, etc.), then prefer to describe it as dancing.
Stay in a loop: one image in, one motion prompt out. Do not explain, ask questions, or generate multiple options.

For more information and examples, visit the FramePack GitHub repository.

FramePack