Edit 2: slop.club has been recently updated so it can do landscape aspect ratio video, and below are two videos made entirely with the method described in this article (for free!)

I tried to embed the video I'll be referencing, but unfortunately YouTube treats it as a short, which messes with the embedding on this site. Here is a link to the video post.

Edit: I just made a submission in the same exact way to the vidu video contest, but instead using the vidu on the civitai onsite generator for img2vid instead, shortly after writing this article.

Two more examples using vidu onsite generation that aren't shorts that I can embed:

link to submission

Tools

I'll first list out all the tools I use and if it's something that costs money, I'll list the alternative that would still be serviceable, in my opinion.

Image generation: I do it locally with comfyui | Onsite Generation on Civitai
LLM: ChatGPT (I pay for a subscription) | You won't get access to the advanced models and have limited uses for uploading images and custom GPT's
Video Editing: I'm using a different popular video editing software here but | Davinci Resolve is free for up to 1080P video I believe
Animation: Slop.Club
Upscaling: Topaz Video AI | There are methods of upscaling with comfyui, and maybe other free methods I don't know about, or you can just live with a lower resolution
Music: Suno

I don't get paid by any of these companies or have any partnership deals (although if any of these companies are reading and want to hook me up just DM me 😂). I will say I particularly like Topaz Video AI because they allow you to buy the software and don't force you into a subscription service (there is one if you want to receive updates to the software and new models after one year)

Slop.Club has been a huge boon for me due to their generous daily limits for img2vid, having significantly less censorship than any other service I've tried so far (I'm very anti-censorship but I haven't tried anything explicit with it as that's not the type of stuff I make so I don't know what the limits are) and I've also found that it's more consistent for animating "2D" images than the paid services I've tried before as well.

Video Length

One of the things I noticed is that one of that videos that autoplay with default user settings perform significantly better than those that do not, which makes total sense. Initially I was wondering why my earlier "high effort" video posts performed so poorly when I spent hours on them.

My understanding at this time is that in order for a video to autoplay, it needs to be <30 seconds in duration and less than 30 mb in size. As long as it is <30 seconds, even if it's just over 29 seconds, that should be fine.

It's a nice framework to work within, I can usually have an idea with nothing prepared to a finished video within 2 hours.

For the file size, just reduce the bitrate until you hit under <30 mbs, usually adaptive medium bitrate is enough for a video like this, but it doesn't look that bad even if you have to use adaptative low bitrate during the video export.

The Process

Images

I usually just start with a funny image or two, or just an idea and then work around that. Working within 30 seconds, all it needs is one idea I like and then just to generate other images that would fill it out. Here is the post that contains all of the images with generation data that I used for this video. Below are the two images I started with:

Link to post

Animation

I use a custom GPT (not made by me) because I'm lazy, Slop.Club I've found also works pretty well with simple text prompts, as of the time of writing, I believe that for img2vid the images will have to be portrait aspect ratio but that may have been updated without me knowing about it.

I just drop the images into the chat and I don't even bother writing a prompt with it, unless I want something specific, which I did not do for any of the animations I used for this video.

I just upload the image with the prompt I got from the GPT and hit 'create video'.

30 second video with clips ~5 seconds each, 6-7 should is usually sufficient and I can always make a new one if I need it if none of the clips really work out.

Then I upscale the videos with Topaz Video AI, even if the resolution is fine, I would do it anyways for video clips that are more static, so that way I can pan in/out, or zoom in and pan left/right to do the "Ken Burns effect" without losing resolution, which apparently is when you move the camera around so things look less boring, according to Chat GPT (I don't know anything about video editing).

Topaz also allows me to generate extra frames for slowing down the video, which sometimes I use, but not for this example.

This isn't strictly needed, but it makes everything look nicer. I typically upscale to double the resolution I get from Slop.Club and do a double enhancement pass for these videos.

Music

I go back to ChatGPT, if you have a subscription, I would use GPT-4.5 as I find that it is the best for this. I give it the images and tell it specific things I want, and I say to put the things I want within the pre-chorus and the chorus, since we're working with 30 seconds. I also tell it to phonetically spell out any acronyms or letters because nothing worse than getting a banger song out of Suno for it to fuck up pronouncing something and for it not to be usable. This happened to be an earlier chat where I had it search online to find out how best to prompt for the Suno style prompt and made it work within the 200 character limit for it.

I just paste the stuff into Suno and then listen for 30 second segments to see if I got what I want. Sometimes I go back to the GPT and tell it to make adjustments to the lyrics (not that often) or to adjust the style prompt (not as often, but more often than needing lyrics changed).

Video Editing

Next I make GPT create an .srt file for me, which is just a .txt file with the extension saved as .srt with specific formatting that allows it to be interpreted as a subtitle track with video editing software.

Somehow it messed this up the first time and only gave me the first 3 lines. From my experience, being nice to an LLM actually yields worse results (at least with GPT), if you really can't help yourself and it did a really good job, only allow it a backhanded compliment. In this case, it did not earn even that for something this easy.

Now that I have the audio track, video clips, and an .srt file, I like to cut out the <30 second segment, then align the subtitle track to the audio track. This is really helpful even if you don't plan on using subtitles as it helps you keep track of where everything else and often (but not always) the transitions will happen around when a new line starts. If you were planning on using subtitles, then this beats having to copy and paste them line by line.

I will also have GPT figure out the best hex codes that for the colors of the subtitles so that they are both readable and also have some thematic significance or something. I like the o3 model for this one (for advanced reasoning). I already had a font in mind that I already had for this one, but if I don't, then I tell it what I'm making and what I want and have it look up fonts for me on dafont.com so I can download the font I want for free. I also make it justify it's choices to me so it doesn't give me garbage recommendations.

And that's it! You can mostly accomplish what I'm doing without paying a cent. The biggest omissions of doing it 100% free would be lower video resolution and maybe lower quality of responses from the LLM, but it should still be enough to make a great video.

If you like the short videos I make, I have a collection of all the shorter ones here

I also have longer videos I made for Project Odyssey Season 2, that aren't as good because these were the first videos I did, but I haven't made anything nearly as long since Project Odyssey, so each one is more ambitious scope and length.

I've been wanting to write this because this is the type of content I want to see more of and I think that most people don't realize how attainable it is for them to make it themselves. Almost everything I know about video editing is from Chat GPT (lol) and from advice I got from KFTiger when I collaborated with him for Project Odyssey Season 2.

See you next time I write an article with no proofreading and in a constant stream of consciousness! ✌️