Maybe not relevant thing, I barely search something like that.
Well, it's just FLF2V workflow, based on lightx2v (6 steps) and 14B models, which can take three images and consistently turn them into short clip.
Works good with a banch of similar looking pictures.
Thinking about to do more pics than 3, but... Tried to merge clips together with avidemux and got nice result (1->2->3, 3->4->5 and more, same seed, for loop - 3->4->1 or 3->2->1 depends on pics). Got 40 seconds clip with visible, but not critic transitions. But, at last step I got kinda not very good part, so remade it with different seed, and than I got it: if I got 12 pics in one workflow - I'd just wasted time thanks to bad last clip. So, I think, better to do 3 pic clips and merge them elsewhere.