I've received a few requests to document how I generate my videos, so I wanted to whip up a quick guide for folks.
This guide provides a script aimed for execution on a UNIX system such as Linux, OSX, or WSL2.
It presumes that you have python3
, pip
, imagemagick
, and ffmpeg
installed.
I have provided a wrapper script for CogVideoX here which improves the UX of the upstream CLI tool. It will take care of cloning CogVideoX and installing its python dependencies, so all you need to do is run the provided script. It also takes care of selecting the appropriate model for an image based on its resolution.
Generating Input Images
We will start by creating input images, but as we do this we'll want to pay close attention to their resolution.
If you'd like fast video generation stick to either 720x480 or 480x720 images. This will use CogVideoX 1.0. In my experience these take ~3 minutes to generate.
If you'd like higher resolution images you can use 768x1360, 1360x768, or 768x768 which will use CogVideoX 1.5. You can use resolutions between 768-1360, but one edge must be 768, and the other must be 768-1360 and divisible by 16 ( e.g. 768x1344 or 1344x768 ). In my experience these take ~15 minutes to generate.
When selecting images try to avoid any that contain hands or feet since CogVideo struggles to animate these. Pay very close attention to details and err on the side of rejecting images - any deformed aspects of the image will only be exaggerated when they become animated.
img2video Prompts
In my experience it seems like the simpler these prompts are the better. For example a great starting point for an image of 1girl
is simply "A beautiful woman".
You can extend this with simple camera movements like "A beautiful woman. The camera moves towards her." but keep them simple.
It seems like mentioning details about the image in the prompt will cause those parts of the image to be animated, so for example "A beautiful woman with black hair." might cause her hair to flow.
Getting Good Results
Frankly the real trick to good results is trial and error. For each video that I post it generally took 2-8 iterations to finally get a good one. I usually invoke the script in a loop and leave it running overnight or while I'm away from my machine.
The Script
Save this script to a file named img2video
, and make it executable with chmod +x img2video;
When using the script you should almost always provide a prompt with img2video -p "Some prompt..." foo.png
since the default is just "A beautiful woman".
#! /usr/bin/env bash
# ============================================================================ #
# Generate a video from an image.
#
# USAGE: img2video [OPTIONS...] IMG-PATH
#
#
# ---------------------------------------------------------------------------- #
set -eu;
set -o pipefail;
# ---------------------------------------------------------------------------- #
_as_me="img2video";
_version="0.1.0";
_usage_msg="USAGE: $_as_me [OPTIONS...] IMG-PATH
Generate a video from an image.
";
_help_msg="$_usage_msg
OPTIONS
-o,--output FILE Output video file. ( default: based on input file )
-S,--steps N Number of inference steps. ( default: 20 )
-s,--seed N Random seed (positive integer). ( default: random )
-p,--prompt TEXT Prompt text. ( default: 'A beautiful woman' )
-h,--help Print help message to STDOUT.
-u,--usage Print usage message to STDOUT.
-v,--version Print version information to STDOUT.
ENVIRONMENT
GREP Command used as \`grep' executable.
REALPATH Command used as \`realpath' executable.
FFMPEG Command used as \`ffmpeg' executable.
CONVERT Command used as \`convert' executable.
PYTHON3 Command used as \`python3' executable.
PIP Command used as \`pip' executable.
MKTEMP Command used as \`mktemp' executable.
GIT Command used as \`git' executable.
";
# ---------------------------------------------------------------------------- #
usage() {
if [[ "${1:-}" = "-f" ]]; then
echo "$_help_msg";
else
echo "$_usage_msg";
fi
}
# ---------------------------------------------------------------------------- #
#@BEGIN_INJECT_UTILS@
: "${GREP:=grep}";
: "${REALPATH:=realpath}";
: "${FFMPEG:=ffmpeg}";
: "${CONVERT:=convert}";
: "${PYTHON3:=python3}";
: "${PIP:=pip}";
: "${MKTEMP:=mktemp}";
: "${GIT:=git}";
# ---------------------------------------------------------------------------- #
# Repo information
: "${XDG_CACHE_HOME:=$HOME/.cache}";
: "${REPO_DIR:=$XDG_CACHE_HOME/img2video/cogvideo}";
: "${COG_URL:=https://github.com/THUDM/CogVideo.git}";
: "${COG_REV:=2fdc59c3ce48aee1ba7572a1c241e5b3090abffa}";
: "${DIFFUSERS_URL:=https://github.com/huggingface/diffusers.git}";
: "${DIFFUSERS_REV:=b5fd6f13f5434d69d919cc8cedf0b11db664cf06}";
# ---------------------------------------------------------------------------- #
declare -a TMPFILES;
TMPFILES=();
cleanup() {
rm -f "${TMPFILES[@]}";
}
trap cleanup EXIT;
# ---------------------------------------------------------------------------- #
while [[ "$#" -gt 0 ]]; do
case "$1" in
# Split short options such as `-abc' -> `-a -b -c'
-[^-]?*)
_arg="$1";
declare -a _args;
_args=();
shift;
_i=1;
while [[ "$_i" -lt "${#_arg}" ]]; do
_args+=( "-${_arg:$_i:1}" );
_i="$(( _i + 1 ))";
done
set -- "${_args[@]}" "$@";
unset _arg _args _i;
continue;
;;
--*=*)
_arg="$1";
shift;
set -- "${_arg%%=*}" "${_arg#*=}" "$@";
unset _arg;
continue;
;;
-o|--output)
if [[ "$#" -lt 2 ]]; then
echo "$_as_me: option '$1' requires an argument" >&2;
exit 1;
fi
OUTFILE="$2";
shift;
;;
-S|--steps)
if [[ "$#" -lt 2 ]]; then
echo "$_as_me: option '$1' requires an argument" >&2;
exit 1;
fi
STEPS="$2";
shift;
;;
-s|--seed)
if [[ "$#" -lt 2 ]]; then
echo "$_as_me: option '$1' requires an argument" >&2;
exit 1;
fi
SEED="$2";
shift;
;;
-p|--prompt)
if [[ "$#" -lt 2 ]]; then
echo "$_as_me: option '$1' requires an argument" >&2;
exit 1;
fi
PROMPT="$2";
shift;
;;
-u|--usage) usage; exit 0; ;;
-h|--help) usage -f; exit 0; ;;
-v|--version) echo "$_version"; exit 0; ;;
--) shift; break; ;;
-?|--*)
echo "$_as_me: Unrecognized option: '$1'" >&2;
usage -f >&2;
exit 1;
;;
*)
if [[ -z "${IMG:-}" ]]; then
IMG="$1";
else
echo "$_as_me: Unexpected argument '$1'" >&2;
usage -f >&2;
exit 1;
fi
;;
esac
shift;
done
# ---------------------------------------------------------------------------- #
# Set fallbacks
: "${SEED:=$RANDOM}";
: "${STEPS:=20}";
: "${PROMPT:=A beautiful woman}";
if [[ -z "${OUTFILE:-}" ]]; then
OUTFILE="${IMG%.png}_$SEED.mp4";
fi
# ---------------------------------------------------------------------------- #
if [[ -r "$OUTFILE" ]]; then
echo "$_as_me: output file '$OUTFILE' already exists" >&2;
exit 1;
fi
# ---------------------------------------------------------------------------- #
rotate_image() {
local _img _angle _tmpfile;
case "$1" in
-c|--cclock) _angle="-90"; shift; ;;
*) _angle="90"; ;;
esac
_img="$1";
_tmpfile="$( $MKTEMP; )";
TMPFILES+=( "$_tmpfile" );
$CONVERT "$_img" -rotate "$_angle" "$_tmpfile";
mv "$_tmpfile" "$_img";
}
# ---------------------------------------------------------------------------- #
rotate_video() {
local _vid _angle _tmpfile;
case "$1" in
-c|--cclock) _angle="cclock"; shift; ;;
*) _angle="clock"; ;;
esac
_vid="$1";
_tmpfile="$( $MKTEMP; )";
TMPFILES+=( "$_tmpfile" );
$FFMPEG -i "$_img" -vf "transpose=$_angle" "$_tmpfile";
mv "$_tmpfile" "$_vid";
}
# ---------------------------------------------------------------------------- #
get_image_size() {
local _img;
_img="$1";
$CONVERT "$_img" -print "%w %h\n" /dev/null;
}
get_image_width() {
local _img;
_img="$1";
$CONVERT "$_img" -print "%w\n" /dev/null;
}
get_image_height() {
local _img;
_img="$1";
$CONVERT "$_img" -print "%h\n" /dev/null;
}
# ---------------------------------------------------------------------------- #
max() {
local _a _b;
_a="$1";
_b="$2";
if [[ "$_a" -gt "$_b" ]]; then
echo "$_a";
else
echo "$_b";
fi
}
min() {
local _a _b;
_a="$1";
_b="$2";
if [[ "$_a" -lt "$_b" ]]; then
echo "$_a";
else
echo "$_b";
fi
}
# ---------------------------------------------------------------------------- #
pick_model() {
local _img _width _height _max _mix;
_img="$1";
_width="$( get_image_width "$_img"; )";
_height="$( get_image_height "$_img"; )";
_max="$( max "$_width" "$_height"; )";
_min="$( min "$_width" "$_height"; )";
if [[ "$_max" -eq 720 ]] && [[ "$_min" -eq 480 ]]; then
echo "THUDM/CogVideoX-5b-I2V";
elif [[ "$_min" -eq 768 ]] && [[ "$_max" -le 1360 ]]; then
echo "THUDM/CogVideoX1.5-5b-I2V";
else
echo "$_as_me: unsupported image size $_width x $_height" >&2;
exit 1;
fi
}
# ---------------------------------------------------------------------------- #
needs_rotate() {
local _img _width _height;
_img="$1";
_width="$( get_image_width "$_img"; )";
_height="$( get_image_height "$_img"; )";
if [[ "$_width" -eq 480 ]] && [[ "$_height" -eq 720 ]]; then
return 0;
else
return 1;
fi
}
# ---------------------------------------------------------------------------- #
DID_ROTATE=0;
IMG_MROT="$IMG";
if needs_rotate "$IMG"; then
DID_ROTATE=1;
IMG_MROT="$( $MKTEMP; ).png";
TMPFILES+=( "$IMG_MROT" );
cp "$IMG" "$IMG_MROT";
rotate_image "$IMG_MROT";
fi
# ---------------------------------------------------------------------------- #
MODEL="$( pick_model "$IMG"; )";
# ---------------------------------------------------------------------------- #
declare -a common_flags v1_flags v1_5_flags;
common_flags=(
'--model_path' "$MODEL"
'--image_or_video_path' "$IMG_MROT"
'--output_path' "$OUTFILE"
'--generate_type' 'i2v'
'--num_inference_steps' "$STEPS"
'--seed' "$SEED"
'--prompt' "$PROMPT"
);
v1_flags=(
'--num_frames' '49'
'--fps' '8'
'--width' '720'
'--height' '480'
);
v1_5_flags=(
'--num_frames' '81'
'--fps' '16'
'--width' "$( get_image_width "$IMG"; )"
'--height' "$( get_image_height "$IMG"; )"
);
# ---------------------------------------------------------------------------- #
declare -a flags;
flags=( "${common_flags[@]}" );
case "$MODEL" in
THUDM/CogVideoX-5b-I2V)
flags+=( "${v1_flags[@]}" );
;;
THUDM/CogVideoX1.5-5b-I2V)
flags+=( "${v1_5_flags[@]}" );
;;
*)
echo "$_as_me: unsupported model '$MODEL'" >&2;
exit 1;
;;
esac
# ---------------------------------------------------------------------------- #
# Setup repo if it doesn't exist
if [[ ! -d "$REPO_DIR" ]]; then
mkdir -p "${REPO_DIR%/*}";
$GIT clone "$COG_URL" "$REPO_DIR";
( cd "$REPO_DIR"; $GIT checkout "$COG_REV"; );
fi
# Setup diffusers if it doesn't exist
if [[ ! -d "$REPO_DIR/diffusers" ]]; then
$GIT clone "$DIFFUSERS_URL" "$REPO_DIR/diffusers";
( cd "$REPO_DIR/diffusers"; $GIT checkout "$DIFFUSERS_REV"; );
fi
# Setup Virtual Environment
if [[ ! -d "$REPO_DIR/.venv" ]]; then
$PYTHON3 -m venv "$REPO_DIR/.venv";
source "$REPO_DIR/.venv/bin/activate";
$PIP install -r "$REPO_DIR/requirements.txt";
$PIP uninstall -y diffusers;
$PIP install -e "$REPO_DIR/diffusers";
else
source "$REPO_DIR/.venv/bin/activate";
fi
# ---------------------------------------------------------------------------- #
echo "$_as_me: Generating '$OUTFILE' with $MODEL" >&2;
$PYTHON3 "$REPO_DIR/inference/cli_demo.py" "${flags[@]}";
# ---------------------------------------------------------------------------- #
if [[ "$DID_ROTATE" -eq 1 ]]; then
rotate_video --cclock "$OUTFILE";
fi
# ---------------------------------------------------------------------------- #
#
#
#
# ============================================================================ #
More to come, but this'll do for now.