Sign In

Retro 90's Anime / Golden Boy Style Lora Wan 14B T2V / I2V

166

1.1k

0

76

Updated: Jul 1, 2025

style

Verified:

SafeTensor

Type

LoRA

Stats

557

0

816

Reviews

Published

Jul 1, 2025

Base Model

Wan Video 14B t2v

Training

Steps: 51,000
Epochs: 504

Usage Tips

Strength: 1

Trigger Words

GoldenBoyStyle

Training Images

Download

Hash

AutoV2
43F8348A30

About this version

Please consider to donate or sub on my ko-fi here

(all funds go right back into making more loras)

V2 whats new:

-Trained on an additional 78 videos to 51K steps (from previously 25K steps)

-Golden Boy Style is consistent more often and more detailed than before.

-In theory videos should help with the motion, no reason to use v1 anymore IMO

-Try the new example workflow for my setup. I have switched from 16/32fps to 12/24fps videos to closely match that anime look in videos.

-Updated below my training process too if you wanna read how I did it in detail

What is this lora?

This is a style lora used to recreate the style of the 1995 anime series "Golden Boy". The series has beautiful mid-90s matte painting style backgrounds which came out great in the lora. And the way they draw the girls is awesome and really represents the art style of the period for raunchy comedies. But if you just want a mid-90's retro style anime look in wan then use this lora too, its really great at just doing older style anime in general. And it is perfect at doing detailed environmental shots. Its captioned on bikes, cars, delicious looking food, garbage etc, not just people. Its trained on the T2V model and therefore should also work for I2V.

Trigger word: Goldenboystyle

(You do not need to add any other descriptions for anime or animation style in the prompt, it should make it the style without any other prompting, it really is amazing).

All the main characters from the show (to be honest almost every character) is in the training data. Some more than others. The blonde woman (Madame President) comes out pretty much if you mention blonde women. If you describe any character from the show it will probably generate them accurately. The main character Kentaro Oe will also come out if prompted but only by description and not by name. The silly faces the characters make are also in the training data. There is naked breasts in the training data but no lower genitals.

Recommended Settings

It can run on default wan workflow just fine, and retains that real nostalgic retro animation style, but I recommend to mix this lora with the following optimization loras:

  1. This lora (golden boy style) (strength 1.0)

  2. Wan2.1-Fun-14B-InP-MPS (strength 1.0)

  3. Wan21_T2V_14B_MoviiGen_lora_rank32_fp16 (strength 0.5)

  4. Wan21_T2V_14B_lightx2V_cfg_step_destill_lora_rank32 (strength 0.8 or 1.0)

<Warning: Do NOT use teacache, SLG if using the above loras together, to avoid OOM on my 3090 I block swap 15 for higher res>

Plus use NAG by Kijai for negative prompt. I recommend adding "slow motion" to the negative if using the above other loras

For testing I run 480x832 resolution, then when I find something I like I run it again on 720x1280 resolution (with no upscale and 12 fps interpolated to 24 fps)

I have attached a sample workflow and the captions from the training data in zip. So simply download these above listed loras and put them in lora folder and use my workflow.

I haven't tested i2v, everything I post here is t2v by the way. So give i2v a try.

Dataset

A gargantuan 368 screen captures directly from the show plus 79 video clips. I took screenshots while rewatching with VLC player and made the clips using handbrake by hand. 768x576 resolution (original resolution of the upscaled releases) for the images and 288x384 for the clips.

I broke the video dataset up into 5 groups based on # of target frames for the bucket. And I converted each clip to 16 fps. (16, 24, 32, 40, and 48 frame buckets). Some of the clips are longer but diffusion pipe automatically will pickup the by each group the first # of frames. I adjusted my dataset toml to look like this. I didnt want to resize the videos myself because I can't do it without creating quality issues for some reason, so I leave it up to diffussion pipe by specifying resolution and let it select the target frames using the frame buckets (now that I think about target_frames argument doesn't work in diffusion pipe, and frame_buckets argument does that for us?)

#etc...

frame_buckets = [1,16,24,32,48]

#etc...

[[directory]]
path = '/data/trainingstuff/train_videos/16_frames'
resolutions = [[288,384]]
num_repeats = 1
target_frames = [16]

[[directory]]
path = '/data/trainingstuff/train_videos/24_frames'
resolutions = [[288,384]]
num_repeats = 1
target_frames = [24]

[[directory]]
path = '/data/trainingstuff/train_videos/32_frames'
resolutions = [[288,384]]
num_repeats = 1
target_frames = [32]

[[directory]]
path = '/data/trainingstuff/train_videos/48_frames'
resolutions = [[288,384]]
num_repeats = 1
target_frames = [48]

Training Info

Model: Default 14B T2V model from wan (so this will also work as I2V model).

LR 2e-5, transformer dtype float8, save_dtype bfloat16, blocks_to_swap 8

Repeats: For first like 10 or so epochs 5 repeats, next 20-30 3 repeats, finally 1 repeat after that. V2 is 1 repeat only.

Steps: 51K (but trained a few thousand more but noticed some issues so reverted to 51K epoch)

Around epoch 360 you can see is where I added the video dataset in. I was worried we would see flat improvement, but actually it started to dramatically cave down again around epoch 440. I did some tests and after around 500 or so epochs I saw some issues with motion on a few epochs so I decided to go back until I didnt get the same weird blur when something fast happens, Epoch 504 had no such issue, so I decided for now to just stop there.

I captioned the video data by feeding the videos with this prompt into google gemini 2.5 pro via AI Studio. I fed them in batches of 10, interestingly enough I didn't have reprompt it and it handled videos with no problems. Though I did have to go through and touch up the captions very slightly. Also I gave it the paper on wan for good measure (the PDF from the wan official hugging face).

You are an advanced image captioner for WAN AI video generation models. Your goal is to create vivid, cinematic, highly detailed captions for training loras in wan 14B model with diffusionpipe therefore your captions follow wans syntax. Our goal for this time is to create a style lora for the anime classic series "Golden Boy". You will get fed screen captures from the show. Never use any character names, purely describe each caption generically so that in training it will pick up the style of the way things are created. Do not use phrases like "or" when describing be precise and choose a description you think is closest. Do not refer to the subject as "the subject" state simply "a man wearing" or "a woman in a car" etc. refer to adult male as "man" and an adult woman as "a woman" you can use modifier like "young woman" or "girl" but lets not use male or female. also be precise dont say "appears to be" etc

Prompt Rules:

Every prompt must begin with: "GoldenBoyStyle".

Use clear, simple, direct, and concise language. No metaphors, exaggerations, figurative language, or subjective qualifiers (e.g., no "fierce", "breathtaking").

Our purpose is to describe everything in the image, with special attention to describing the people whenever they are present. Describe each individual piece of clothing including the colors and positions. We want a standard description of their appearance and usual clothes, but at the same time we need to describe the environment as that is part of the style as well.

Describe what is in the image, but not what the image is. Such as "A photo depicting a cosplay of" is wrong. Just say "Live action Bowsette..." and then describe the image.

When an exejerated or "chibi" face or depiction is shown make sure to note it in the captioning. Lets be uniform in our word choices when possible.

Prompt length: 80–200 words.

Follow this structure: Scene + Subject + Action + Composition + Camera Motion (video only)

Scene (environment description)
Establish environment type: urban, natural, surreal, etc. Include time of day, weather, visible background events or atmosphere. Only describe what is seen; no opinions or emotions.

Subject (detailed description)
Describe only physical traits, appearance, outfit. Use vivid but minimal adjectives (no occupations like "biker", "mechanic", etc.) No excessive or flowery detail.

Action (subject and environment movement)
Specify only one clear subject and/or environmental interaction. Describe only what can be seen in 5 seconds.

Composition and Perspective (framing)
Choose from: Close-up | Medium shot | Wide shot | Low angle | High angle | Overhead | First-person | FPV | Bird’s-eye | Profile | Extreme long shot | Aerial

Motion (cinematic movement) (only used when describing video sources)
Use: Dolly in | Dolly out | Zoom-in | Zoom-out | Tilt-up | Tilt-down | Pan left | Pan right | Follow | Rotate 180 | Rotate 360 | Pull-back | Push-in | Descend | Ascend | 360 Orbit | Hyperlapse | Crane Over | Crane Under | Levitate

Describe clearly how the camera moves and what it captures. Focus on lighting, mood, particle effects (like dust, neon reflections, rain), color palette if needed. Be visually descriptive, not emotional. Keep each motion or camera movement concise — each representing about 5 seconds of video. Maintain a strong visual "Teen Titans" animation aesthetic: bold, vibrant, energetic, fluid animation feeling.

Use simple prompts, like you're instructing a 5-year old artist but follow Wan principles for syntax and wording so the lora can be properly trained with this caption data you're creating .  Reference the attached images and caption them. Format the captions as a prompt, so we dont need the label of scene subject action etc for the captions themselves. For example (From the raven lora we captioned for in the past)

Raven, with pale lavender skin and her short, dark purple angular hair, is shown in a yoga pose resembling an upward--cut legs. A small, dark purple bowtie is at her neck, and white cuffs are on her wrists. Tall, dark purple bunny ears are perched on top of her head. Her hands are raised on either side of her headfacing dog, against a plain white background. A red gem is on her forehead. She wears her black long-sleeved leotard, a gold-colored belt with visible red gems, and dark blue cuffs with gold and red circular details on her wrists. Her body is arched, supported by her arms straight down to the floor and the tops of her bare feet. Her head is lifted, looking forward and slightly upwards with a surprised or inquisitive expression, her mouth slightly open. The Camera is waist height and lower looking up at Raven in a semi profile view.

Sample Prompt:
GoldenBoyStyle. Interior setting. A young man with short dark hair, a red baseball cap backwards, wears a light green t-shirt. His face has an extreme comedic expression of lecherous excitement, with wide, crazed eyes, a broad, toothy grin, and prominent red blush marks on his cheeks. He is holding an open, dark brown notebook with a white pen, writing intently. Close-up shot, focusing on his exaggerated facial expression.

Sample Prompt:
GoldenBoyStyle. Interior setting. A young man with short dark hair, a red baseball cap backwards, wears a light green t-shirt. His face has an extreme comedic expression of lecherous excitement, with wide, crazed eyes, a broad, toothy grin, and prominent red blush marks on his cheeks. He is holding an open, dark brown notebook with a white pen, writing intently. Close-up shot, focusing on his exaggerated facial expression.

Big Thanks

As always seruva19's Ghibli , Red Line, and now his banging Princess Kaguya lora post along with training data have been a constant inspiration and source of knowledge for me. I owe a lot to his open nature in sharing his process and data.

Banodoco discord for always answering my questions on training

Kijai for his amazing nodes and advise on using them.