Sign In

LTX Image to video Tips

LTX Image to video Tips

LTX

This will include basic, advanced, and just any info i can add that will help others use LTX.

I will add to this when i have time.

I use my workflow.

https://civitai.com/models/1072696/ltx-image-to-video

However this is not the only way to do ltx videos, others have good workflows that may work for you. Most do similar things but differently.

LTX IMAGE to VIDEO with STG, CAPTION & CLIP EXTEND workflow

LTX IMAGE to TEXT to VIDEO with STG workflow

search for others.

Text to video and video to video for LTX is ... really not good...

Image to video however works quite well for LTX on minimal hardware. Smaller videos can take as little as 6gb vram can make good stuff. 384x288 upscaled by 2 can still be a good video.

Sometimes the seed is the problem. changing that can help when you get immediate scrolling or just odd things.


Vram Issues

I have read many comments on many LTX flows, people having issues running things with much better cards then me... 3060 12gb. I dont use more than 8gb vram running LTX, sometimes as low as 5.9gb.

The Following node can help lower vram needed for the flow or any flow to work well.

https://gist.github.com/city96/30743dfdfe129b331b5676a79c3a8a39

This node allows the offloading of the clip to cpu/system ram instead of fighting with the ltx model for room in your vram. I went from using 91% of my vram for 512x768 to using 53 and 61%. The text encode of the prompt is slightly slower but the render works just as fast, and more so if you were maxing out your ram before.

This saves 6gb of vram strait up, as clip is put into RAM.

This node is placed just after the clip loader


Sizes

Sizes LTX works off best are 4:3 aspect ratios. Both tall and wide. LTX can render outside 4:3, but motions are skewed showing mostly in hand/arm/face movements.

These are the sizes that i found work best (at least with my flow)

384x288 - 2x upscale

512x384 - 1.5 - 2x upscale

640x480 - 1.6 upscale

768x576 - 1.5 upscale

1024x768 - no upscale, can try 1.5 but it tends to blur;

These are all 4:3 and the upscale percent is compatible with ltx sizes so it wont auto crop.


Prompting

Prompting is the most Important part of LTX to get what you want. And most of the time you will guess what it wants..

Prompts require 3 things to be effective.

  1. Clear subject of the video to focus on. Person, car, bunny, whatever.

    • Ltx seems to determine the subject of the video by the number of tokens you use for describing it.

    • This mean you can overprompt backgrounds and get scrolling effects. Whatever your prompt focuses on describing, the video will do the same. If you focus on the waves and sunset more than the subject you want, it will change focus to the waves and sunset.

  2. Clear Motion you want to emulate that ltx understands.

    • The motion you want is tricky as LTX does not understand complex movements. Dancing, etc never goes well. Simple motions are understood like waving, walk, run, playing with hair, smile, blink. But trying to get someone to do specific things is down to if the model understands what you want.

    • Motions should be mentioned ONCE only. Mentioning walking at the start and end will change the motion. LTX does understand timing somewhat, so mentioning someone walking at the start of the prompt, then doing the same at the end will cause the walk to change mid video. This results in the someone walking as they should, then suddenly turning to walk another direction. Because the second walking kicked in, and it knows it cant mean the same as the first one mentioned so it changes. This can be used to your advantage to time events, but its tricky.

  3. Clear Subject details that may or may not be in the ref image.

    • Clear subject details are part of Setting the main clear subject but also help when you are working with shot that may scroll, or pan etc. It does not know whats beyond the image. It only can replicate what is there, and what you tell it in the prompt. If pants are not showing in the image, what will it do if it zooms out to reveal them? You need to have that included in the prompt. This also has background scroll contents, what happens if the character walks off the background. What does it draw?


Cropping

This is important because some crops for people don't work for some animations and the crop can often determine the outcome of a video.

Cropping your base image can really help LTX.

  1. walking

    • full body works, but you should have the feet clear and a clear path in the direction they are planed to walk.

    • half body cropped at crotch or stomach works.

    • Knee crop does not work well, it will scroll to full or half body for the animation

    • The direction they walk is based on the direction they are facing in the photo, and the camera can be partially controlled with how center the subject is in start frame. The more center they are the more chance they will walk/look strait forward. If off center they may "walk by the camera" and not at it.

    • If there is ever a problem, cropping one direction by even 10 can change your outcome.

    • Ltx looks to match the videos its trained on, like the walking always defaulting to full or half body unless you play with the inital crop to find an angle that LTX likes.

Controlling zoom with crop

This can change where the prompt is generated from.

  • Nothing enabled uses the full main image.

    • this can force zoom out to full perspective when start image is still cropped.

  • Tag cropped ref crops the main image to the aspect of the video and tags that.

  • Tag image 1 tags the cropped image ref 1 (useful to hold perspective to just the image)

  • Tag image 1 tags the cropped image ref 1 (useful when using 2 images)


Noise

Noise is not real noise. Its compression artifacts. This happens when you save an image in lossy formats like videos do, or many web format image types. This compresses the image to save space, but can cause artifacts in the image. This can also get worse and worse the more times an image has been saved, Chances are if you get an image from google, its got A LOT of compression. In this case adding noise is not always needed.

LTX was trained on videos with compression. It learned how compression forms on moving images and uses this to guide the animation.

You DONT want to use noise if you do not need to... Like i said its compression artifacts, and they can make your output worse the more you add. Most images have been compressed somehow before so adding it may not be needed.

My flow offers 2 noise options, those do the same thing differently. (they may do the same thing but crf levels are different i dont know) They both add compression artifacts to the image to illicit movement. I dont know if the others have both options.

If you have your prompts correct. You may not need any noise to get motion.

Too much noise

You will get block face when quick movements.

Too little nose

You will get slow motion, or blocky limb movements when attempting to move. You may also get scrolling, or people turning around, or change of scene.

Number of steps changes noise needed. Usually need less with more steps, length of video also changes this.

20-42 range lower noise is needed when not inserting latents

Noise levels and what it does

  • 20-30 this can both speed up and slow videos depending on how much noise was originaly in the image. This means if you add 20-30 you can kinda replace the original noise. If no noise animates but its not going well, adding a low amount can help. This has little effect on output quality.

  • 30-38 This speeds things up and increses the amount of motion you will get. This level is usually needed for things like walking, running, etc. The higher you go the more body physics you can get. But going above 35 tends to cause distortions in output when there is fast movement

  • 38+ this can really speed things and i have yet to go above 42 for a image. The reason was attemting to get breast physics to work... and NO it distorts too much.

IMO noise can cause you to misunderstand where the problems are.

Noise can FORCE movement even when it doesn't know what to actually do.

This is done regardless of your prompt as most images will animate without noise as long as the prompt has keywords the model understands. For example, sad/crying... has NO effect. (Dead pan, no movement). Giggling has a large effect(hearty laugh with body movement) Adding noise to sad/crying is not going to make the person cry.

They will move... but it does not understand what you want so it does anything (pan, scroll, distort, talking, etc) . This confuses the process. Because of this i do not recommend adding noise until you know what the prompt is going to do with the image. Using no noise allows you to pinpoint what terms you need to use to get movement and what terms the model just does not understand.

If noise is added, it will move.. and you wont know why, and it may not move how you are telling it. LTX understand SOME things. Other things, it ignores ENTIRELY and with noise, you cant really know what is causing a movement.

Scene Changes or subject changes can be not enough noise, it defaults to text to video and starts just doing a text to video with your prompt if it cant find enough noise to do the motion you requested.

Blur can act as noise. I found one direction works best

x = left right blur, y = up down blur

0.25-0.75 seems to give an effect similar to noise

This seems to work decent to replace noise in some cases


Perturbed attention

I dont know if the other flows have this, it can however be added.

Perturbed attention allows you to scale the attention layers. You can scale them all at once, or one at a time.

Scale is how strong to scale the attention layer, rescale is the percent of the render you want to stop using your new scale and return to the default 1 scale. Rescale 0.3 stops at 30%

The range seems to be 8-14 but i have only found 2 layers useful

  1. 14 Fluidity - Seems to help make more fluid motions

    • 1.3 scale and 0 rescale is default. can go up to 3 if needed and rescale to stop at a point

    • I usually have this one enabled.

    • This can cause unwanted movement eg, turning around when you want a static shot.

    • This works with noise to make the movement more lifelike can illicit movement slightly with this.

    • Raising this seems to increase your prompt weight, if the prompt is good it helps get movement, if its bad it can make the movement uncontrolled.

  2. 10 Coherence - Seems to help hold the subject to its form.

    • 1.2 scale 0.4 rescale is default

    • Too strong burns the video, rescale helps back off at the end to stop the burn, i use this about 1/2 to 3/4 the time.

    • This can hold videos from scrolling or panning too much but can also stop the video from being able to move where it wants to go. Its seems to be strength of the image ref. too strong it wont animate.

    • Using this when using 2 images seems bad. Or it needs rescale set higher

    • This works against noise, trying to hold the image to its original. If this is enabled more noise may be needed.

After taking a gander in other LTX workflows

14 - 1.0 and .25 or 0.5 works well for some images.

Tips

TBD

Problems/Solutions

  1. Stalling (video stalls and begins moving part way into video)

    • Can be caused by perturbed 14 rescale set to a percent, this percent is usually where the stall stops. Rescale 0 may allow the video to start properly

    • Can be caused by the inserted latents if you are adding the images to the latent batch. The first 8 frames are not empty and contain the image. This can cause the first 8-20 frames be unable to move.

    • Can be caused by too little noise, As the video progresses it may be able to add its own noise and it then picks up that noise to start the animation part way through.

    • Seed can cause this but its less likely if movement starts.

  2. Text

    • This can be seed, its the first thing i try

    • This is 90% of the time caused by the prompt. I am sure this model was trained on commercials. Doing a dog video.. turned into a puppy chow commercial. Text and all, i even got a price.

    • This can be too little noise, but i think if you get text the movement is working, it just does not know what to do.

  3. Scrolling and panning

    • This can be the seed, Seeds control the base animation that comes out, other things can change this but if the seed wants to scroll, it will scroll no matter what you change.

    • This can be too little noise in the image. This is rare because to scroll/pan it moves.

    • This is usually the prompt. The action you have prompted is unknown to LTX. If the only motion you have in the prompt is something LTX does not understand, it does whatever movement it feels like. Prompting something like waving to viewer, should stop the scroll and focus on the wave.

10

Comments