TL;DR
I'll show you how to make a dialogue scene with close up and reverse shots using one originating image, the cinematic hard cut lora, and a few prompts.
Video in question: https://civitai.com/images/108392606 (imagine if you could link a video on civitai to an article!)
Background
I've been dabbling with using Wan for trailer style film making, and had troubles getting good, proper cuts. Wan seems to always want to make either a dissolve transition, or wrangle the camera into the new position. So I decided to make a lora with a few clips and see if it worked better.
This is one example I put together as a proof of concept, of the lora being able to keep consistent characters over cuts.
Let's get to it
I started off with a clip I had generated for the showcase of my Ancient Rome lora. It's well suited for a dialogue scene.

I decided on a simple script:
Mid shot with both characters in view. The man is talking
Close up shot on the man talking
Reverse shot on the woman replying
Back to mid shot of both characters.
So four shots, but I would reuse the first one (which I already had) for the last one. So just two new shots.
First shot - Closeup on man
I took a framedump from the beginning of the clip.
Added it as the initial frame for Wan I2V workflow in comfy.
Used the prompt: "a mid-shot of a man and a woman in roman attire are having a discussion in a bustling market scene. <b>the camera makes a hard cut to a close up view of the man as he speaks.<b>"
The first line is the description of the initial scene. The second line in bold is the trigger for the cut.
After generating, I ended up with a clip with some talking between the two, and then the cut to a close up. This is something Wan could handle quite well before, but it would zoom instead of cutting.

Second shot - Reverse close up on woman
This is one shot where the lora helps.
I used the same initial image, and instead used the prompt:
"a mid-shot of a man and a woman in roman attire are having a discussion in a bustling market scene. <b>the camera makes a hard cut to a reverse over-the-shoulder shot of the man, framing the woman<b>"
I guess you can figure out what I ended up with.

Conclusion
As you can see, the consistency is quite good. Bear in mind that these are really low res, 352p, so if it's struggling, it could be due to that. There are also not that many details of the woman in the initial shot.
From these shots, you can generate an infinite amount of dialogue shots (let's keep lip sync out of this, for now). The closeups can also be used to have the characters do new things, moving them out of the current scene. Try "cut to". Sometimes it works.
Editing
Some editing was inevitable.
I took my 3 shots and threw them into OpenShot (free, easy to get started video editor).
I removed the initial mid-shot section from the two close up shots.
I left some of the initial clip as the start of the video, pasted my two close ups, and then another section from the first clip again.
