What does this workflow do?
This will take your input image, crop/resize if needed to the ideal Cosmos render size, then automatically create an appropriate prompt for Cosmos to work its magic! The result will be a hopefully-amazing video that your family can cherish for generations.
This process is dependent on both Florence (for automatically describing the image) and an LLM (for creating a video prompt from the image description).
Further instructions and links are included in the workflow.
Extremely simple operation after initial setup (model load/LLM configuration):
1. Load an input image.
2. Queue prompt. Really, that's it. Every other setting should be good to go.
Have fun, and I look forward to seeing your creations!
Expectations setting: This model is HEAVY. Using the 7B model, with the included (optional) optimizations, I am running at about 15 minutes and 20GB VRAM usage on a 4090 to generate a 121 frame video at 1280x704.