Updated: Feb 27, 2026
base modelAll cover images are directly from the model, at 1024px, no upscale, no face/hands inpainting fixes. Workflow included.
Pinned note:
(2/25/2026): Why v0.12 has low contrast and saturation?
v0.12 adjusted distillation method. This is the model's own choice during distillation. Not mine. This is its sweet spot. At this contrast level, it knows how to generate better (stable and detailed) images.
FYI, the base model of Anima, aka the Nvidia Cosmos Predict 2, is a model for industrial robotics. It's not a model for aesthetic.
(2/17/2026): This finetune series probably will not be updated.
Anima is a wonderful model. but it has a very restrictive license.
I'm fine with dual licenses (non-commercial + commercial). We all know that training a model needs lots of $. Commercial license is necessary. Commercial means $, $ means better model.
I didn't expect that they keep the right to "sell" your "non-commercial Derivatives". You don't even have the right to make your "non-commercial Derivatives" non-commercial (copy-left). Because they keep the right to apply their commercial license to your "non-commercial Derivatives".
Personal opinion, that's a little bit greedy. Unfortunately, too restrictive for my personal situation.
So, this model will not be further finetuned.
Many models are coming up. It's still too early to say who is the best. E.g Chroma2. Which should be Apache 2.0. And is based on Flux Klein 4B. Much better than Nvidia Cosmos Predict2.
RDBT [Anima]
Finetuned circlestone-labs/Anima. Experimental, but works.
Trained as LoRa, for better training and distribution efficiency.
CFG distilled to further improve quality and stability.
Dataset:
Every image in dataset is handpicked, only top quality masterpieces.
Contains common enhancement such as hands, clothes, lightings backgrounds, etc.
No glossy AI slops in dataset. Glossy Al images are polluting the world, but not on my watch.
Captions are natural language from Gemini and are very comprehensive.
Usage:
You must load the LoRA with strength 1 on it's base model.
If you don't know what this means, or which one is the right "base model", you can download and use this full fp16 model, which has merged this LoRA: https://civitai.com/models/2356447.
Prefer natural language prompt. Prompt structure: style, subject, action, background.
Omit all the quality tags. You don't need those tags. The average output quality of this model should be higher than so-called "masterpiece".
Recommended settings:
"Euler a" sampler.
CFG scale 1. CFG scale 1~2 also doable.
20~30 steps.
Why LoRA?
I don't have millions of images.
I can save VRAM and you can save 98% storage.
Why CFG distilled?
TLDR: Because distillation can improve quality. Distillation can "purify" the model and solve many problems.
That's why distilled models have become the mainstream. We have seen that people like z-image-turbo, flux klein more than their base versions. Yes, distillation means losing knowledge. But you will get much better images. And distilled models are much faster.
For example, this is what a 30 steps sampling process looks like. You can find more examples and workflow in cover images.
Up: RDBT v0.12.
Bottom: anima preview CFG 4. Latent is bouncing back and forth, sometimes over/underflowed. Image is a little bit deformed.

Restrictions:
Making merges using this model is not allowed. FYI, this model was trained with latent watermark.
Update log
f = finetuned, d = cfg distilled.
Versions below are based on "anima preview":
Recommended:
(2/19/2026) v0.12fd:
Better stability and details, extended dataset.
(2/12/2026) v0.6d:
CFG distilled only. No finetuning. Cover images are using Animeyume v0.1.
Old:
(2/3/2026) v0.2fd:
Speedrun attempt, mainly for testing the training script. Limited training dataset. Only covered "1 person" images plus a little bit of "furry". But it works, and way better than what I expected.

