I didn't have high hopes for this to work out on the first try but it went ok.
If someone can do a little better prompting it can probably increase the likeness of the music video.
Trained on 11 short video with very simple captioning. The more you caption the training data, the more you have to prompt to get what you want.
Start the prompt with "takeonme style" like: takeonme style video of a two cars racing on a race track
strength between 1 - 1.2 seems ok.