Sign In

Updates / Tips / Vidu Q2Pro and Q3

0

Updates / Tips / Vidu Q2Pro and Q3

It's been a crazy few weeks and I've had a bunch of unrelated stuff I've been wanting to write about for some time now.  So many things, the Great Civitai Meme War of 2026, recent contracts, new processes I've been using, new creative partner opportunities, and the Chroma awards.


Vidu Q2Pro / Q3

Full disclosure: I am a creative partner with Vidu, I've been enjoying using it since I've been a creative partner with them, but want to make my relationship with them clear. I am genuinely excited about the quality and audio support of their new models. At the time of writing, these don't seem to available for onsite generation on Civitai.

Here is my invite link with Vidu on a new account you get 120 Credits

Here are some full videos.

Vidu Q3

Vidu Q2pro

Although I do have a relationship with Vidu, I feel like I'd be pretty excited about Q3 even if I didn't. I've been longing for a Sora2 alternative, as their "safety" guardrails makes it more and more unusable by the day.

Vidu Q3 can do the following:

  • up to 16 seconds in duration

  • sound / voice from direct generation

  • supports English, Chinese (even Cantonese!), and Japanese

I don't know if it does other Dialects of Chinese other than Mandarin and Cantonese because I'm not too knowledgeable on dialects of Chinese other than that.

Vidu Q2pro is a lot like Vidu Q2 Ref2Vid, with better prompt adherence.

Here are some Generations from Q3 without any editing:

Img2Vid Prompt:

Shot 1: Low-angle wide shot of the battlefield. The massive dieselpunk mechs are slowly advancing, their heavy legs churning up the dirt. The lead mech in the foreground fires its gatling gun with a continuous, rhythmic chug-chug-chug sound and visible muzzle flashes. Smoke billows from its exhaust stacks. Distant explosions bloom in the orange sky. Shot 2: Medium shot, focusing on the middle ground. A large explosion erupts near the line of mechs with a deep, resounding boom, sending debris and dirt flying through the air. The mechs are momentarily obscured by the smoke but continue their relentless advance. The sound of their heavy footsteps is a constant, low thud. Shot 3: High-angle wide shot, looking down at the trench. Fallen soldiers lie in the foreground. In the distance, the mushroom cloud from a tactical nuke is slowly rising into the upper atmosphere, casting a sickly yellow light over the entire scene. The sounds of battle—gunfire, explosions, and mechanical whirring—fade into a low, ominous ambient rumble.

Img2Vid Prompt:

Shot 1: Wide-angle shot on the deck of a wooden pirate ship in the middle of a chaotic naval engagement. Axolotl sailors in 18th-century naval uniforms scramble across the deck, pulling ropes and swabbing cannons amidst thick plumes of orange fire and black smoke. The audio is dominated by the rhythmic creak of the ship's timbers and the roar of a nearby vessel exploding in the background. Shot 2: Medium close-up on the Axolotl Captain in his gold-trimmed coat and bicorne hat. He points dramatically toward the enemy ship, his mouth moving in perfect synchronization as he shouts: Steady, you lot! Aim for the waterline and let 'em have it!. Shot 3: Tight close-up on a heavy iron cannon as it recoils with a massive, window-shaking BOOM. A flash of white-hot light illuminates the deck, and the Captain can be heard laughing over the chaos: That's the ticket! Send 'em to the locker!. Style: Cinematic oil painting with heavy impasto brushstrokes, dramatic chiaroscuro lighting, and high-fidelity particle effects for the smoke and embers.

Prompt:

Shot 1: Mid-shot of an anime girl in a sailor uniform and white shutter shades in a dark hallway. A driving electronic track with a heavy, pulsating bass beat kicks in. She is locked in an intense fight with a man in a black business suit and sunglasses. She fluidly dodges and weaves under his katana swings, the sound of the bass syncing with the metallic clack of her parries. With a sudden burst of speed, she delivers a finishing slash, and the man falls out of frame. Shot 2: A new man in a suit charges from the shadows, swinging his katana vertically. The camera performs a dramatic cinematic zoom onto the girl's face—her expression is cold and determined. The swords collide with a resonant, vibrating ching that echoes over the bass beat. Shot 3: The girl engages the new attacker in a rapid-fire exchange of strikes. She suddenly ducks low, sweeping under his guard, and impales him with a powerful forward thrust. The audio features a wet shing sound as she quickly withdraws the blade. He collapses to the floor. Shot 4: A third man appears at the far end of the hallway, drawing his sword. The girl snaps into a low, aggressive combat stance, pointing her blade toward him as the electronic music reaches a crescendo. Style: High-octane action anime, crisp lines, cinematic lighting, 1080p HD, 16:9 aspect ratio.

T2V (It can make it look like the actor, but I intentionally didn't mention it because of Civitai TOS) Prompt:

The camera follows Harry Potter in a frantic, handheld tracking shot as he sprints through a dark, torch-lit Hogwarts corridor. His robes are torn and singed. He slides across the stone floor, diving behind a massive marble pillar just as a jet of green light shatters the masonry next to his head. Harry draws a matte-black Glock 17 from his belt. The scene cuts to a tight close-up of Harry’s face, sweat-streaked and intense. He leans out and fires three rapid shots, the muzzle flashes strobing against the dark walls. He yells, "Expelliarmus this, you bastards!" as two masked Death Eaters are thrown backward by the gunfire, their wands clattering to the floor. Harry breaks into a tactical run, charging toward the remaining enemies. The camera moves alongside him as he fires while moving. He shouts, "You’re out of spells, Malfoy!" and takes down a wizard on a balcony with a precise shot. Harry slows to a walk, smoke curling from the barrel. He stares into the shadows at the end of the hall and growls, "Magic is a crutch. Lead is a permanent solution." He raises the gun, and the screen cuts to black as the muzzle flashes one last time.

Img2Vid Prompt:

cinematic anime of a group of axolotls approaching a busy restaurant and the server greets them, then leads them to a table as it zooms in and tracks forward



Creating References with Nano Banana

I can only speak to the effectiveness of doing this for Vidu because I haven't had a chance to test this with other video reference models, but I'm sure it work well with other ones as well. It is also really useful for introducing back into Nano Banana for image generation to keep the character consistent. First, an example:

The original Image:

116556241.png

Reference output:

freepik__only-the-elf-girl-character-sheet-reference-of-her__3906.png

Prompt: character sheet reference of her with no text, showing her from front, side, and rear view, with one close-up of her on the right, no background, black background, high quality studio photography

The prompt for a character is pretty simple, you can add wording to ensure certain props are present in every view, or have different poses and adjust accordingly if it's not realistic, I only include the word about studio photography to ensure it doesn't go into a 3D rendering kind of style if that's not what I'm going for, it will generally work without that wording, like so:

freepik__no-background-black-background-multiple-views-of-t__98578.png

There was a trend I saw on twitter about creating "contract sheets" which is really useful for scenes. These also make very good references for Ref2Vid, but is also very useful for creating start frames for a scene, as you can use it as an input again to extract a specific angle, and allow you to mix up a long scene by approaching it from different angles, the original image is identical to the center image, you can remove the wording from the prompt about not having any text on it if you want them labeled:

freepik__a-photorealistic-9panel-cinematic-contact-sheet-ar__98579.jpg

Prompt:

A photorealistic 9-panel cinematic contact sheet arranged in a 3x3 grid, based strictly on the visual elements of the provided input image.

NEGATIVE CONSTRAINTS (NO TEXT):

The final image must contain absolutely NO text, NO labels, NO captions, NO watermarks, NO UI elements, and NO alphanumeric characters anywhere on the grid or within the panels. It is purely photographic content.

GLOBAL SCENE CONSTRAINTS (FROZEN MOMENT):

Every single panel in the 3x3 grid depicts the exact same subjects, wearing the exact same clothes, in the exact same frozen poses, within the exact same environment, lighting condition, and color grading as the reference image. The scene does not change; only the camera's perspective and focal length shift between panels.

GRID DESCRIPTION (VISUAL CONTENT ONLY):

Top Row (Left to Right):

The first panel (top-left) is an extreme wide-angle landscape shot, showing the subjects as tiny figures in the vast environment. The second panel (top-middle) is a full-body long shot of the subjects anchored in their surroundings. The third panel (top-right) is a medium-long shot framing the subjects from the knees up.

Middle Row (Left to Right):

The fourth panel (middle-left) is a medium shot framed from the waist up. The fifth panel (center) is a medium close-up framed from the chest up. The sixth panel (middle-right) is a tight close-up focusing on the face or main feature of the subject.

Bottom Row (Left to Right):

The seventh panel (bottom-left) is an extreme macro detail shot focusing intently on a specific texture or small feature of the subject. The eighth panel (bottom-middle) is an extreme vertical nadir angle looking straight 90-degrees up from beneath the subjects; they tower over the lens against the ceiling/sky, and crucially, no part of the ground, floor, or feet is visible in this frame. The ninth panel (bottom-right) is a high-angle bird's-eye view looking steeply down onto the subjects from above.


I find these contact sheet references work really well with Ref2Vid because it gives so much more context with just one image. I've also extracted panels to use an inputs as first and last frames which keeps things really consistent:


Exciting Developments

I've been busy the past few months, I got to work on some contracts for some big names, it's all under NDA, but I know one of them I'm eventually going to be able to talk about 😁. I can't believe this all started from getting annoyed that Microsoft censored some Axolotl image I wanted to make involving them robbing a bank.

If i had to give any advice, I would say to figure out what you enjoy making, and to always be trying to improve it. It's better to continue refine what you enjoy doing than to try to chase wins in competitions, I think. I've seen people who get completely crushed when the video they made didn't win a competition. I always try to approach making videos from the perspective of I'm happy with it even if it wasn't a part of a competition. This space we operate in has a lot of subjectivity, and just from a statistical perspective, any given person is unlikely to win or get an honorable mention. If you make good stuff and keep improving, eventually you will be noticed, is what I like the believe.

Speaking of subjectivity, I managed to get one honorable mention for The Chroma Awards, for the video that I made that was the least enthusiastic about, the title sums up my attitude when making it 😂

I just wanted to use Illustrious to make something involving a magic circle and space using r-rain's style lora: wwafr

No character lora, no consistency, and no direction. But a wip version I posted on twitter because I was thought I couldn't take it any further due to it not making any sense or having the usual level of consistency I would strive for ended up being my most interacted with tweet at that time (I have no followers on twitter lol). So I decided to just finish it and ship it, and it is the one that got me an honorable mention 😂

If you want to see all of my submissions for The Chroma Awards, I have them all embedded in an article here.

I've also since become a creative partner for:

  • Veespark | A great service that has functionality for creating storyboards (helpful to me because I just usually wing everything without a plan)

  • floyo.ai | ThinkDiffusion's platform where people create Comfyui workflows that can be run in their browser with API integration

  • pollo.ai | Generation platform with a lot of template integrations


The Great Civitai Meme War of 2026

What a beautiful piece of art. I want this painted on the side of my house.

Credit: UnstableGen | Link to original post

I just wanted to acknowledge this happened. Pretty crazy to see the whole community come together like that, (RIP the fallen). Don't want to get too deep into it here but I was able to archive a lot of the events that have been taken down, maybe I'll have make a Youtube video essay about it 😂.


Anyways, that's all for now, if you like my stuff and want to support me you can use my Vidu Q3 code if you don't already have an account, and/or follow me on Youtube / Twitter. Until next time ✌️

0