home models images videos 3D Models articles comics challenges updates shop

Anima - JSON+English

Name: Anima - JSON+English
Rating: 5 (32 reviews)
Author: AbstractPhila

263

317

Updated: Jul 5, 2026

tool

tags modifier multi-concept json plain english

Download

1 variant available

bf16 SafeTensor

json_brent_90k_e2.safetensors

BF16, good balance • 264.36 MB

Verified: 12 days ago

Download (264.36 MB)

Details

Type

LoRA

Stats

Reviews

Positive

(9)

Published

Jul 5, 2026

Base Model

Anima

Training

Steps: 16,500

Epochs: 2

Usage Tips

Strength: 1

Hash

AutoV2

E9893ACCA2

Tensors

default creator card background decoration

#50

9.8K

631

1.5K

AbstractPhila

Joined Feb 18, 2023

License:

Anima

The Anima Model is licensed by CircleStone Labs LLC. Copyright CircleStone Labs LLC. IN NO EVENT SHALL CIRCLESTONE LABS LLC BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH USE OF THIS MODEL.

Built on NVIDIA Cosmos

What is this?

A tool for using JSON with Anima. This model does not require JSON, however it does provide added beneficial control WITH JSON while simultaneously being capable at many new plain English prompting capacities that were quite weak or non-existent before.

The trigger word is NOT the exact token "JSON", it's literal json in string form.

Prompt Directly

Use JSON > ENGLISH > BOORU.

You will get the best yield in this order. You can swap booru for english if you get hallucinations.

The model was trained with both english and booru json, so the processing should be okay.

90k Brent E1+E2 1.0

Temporary version, will be replaced with the full 1.0 train. All epochs available on huggingface.

https://huggingface.co/AbstractPhil/anima-90k

This is only the VLM half, that only ran for about 1 epoch. The plan is 2 epochs VLM and 1 epoch animetimm. That should be enough. The final version will be uploaded tonight.

Have fun.

Epoch 2 Release

The version is stronger and more capable while still containing the majority of the original model. More robust and capable than v1 and better at plain English.

Epoch 3 Time Stage

Epoch 3 is roughly 375,000 samples, which will be the full subject bucketing system imposed only on the animetimm system. This has shown the most robust capacity with this model, while still learning the plain English associations necessary to use more Qwen than before.

This will take roughly 74 hours, so by next weekend I'll have everything worked out for a full comfyui release.

10k Brent V0.5

{
"subjects": [
{
"name": "subjects name here",
"attributes": ["attributes", "go", "here however you want to divide them"],
"actions": ["actions go here", "in english or broken sequences"],
},
],
"setting": "supports settings",
}

Down here reinforce the system with plain english like this, explain the system and situation.

1girl, here, do, the, booru, tags, like how, you, would,

Probably doesn't need to be perfect, can likely jank it and it will not care if the json is valid.

Add up to 8 subjects, bounding boxes not supported yet, semantic offset is partially working, and associative offset is partially functional.

Attributes hallucinate without reinforcement with the booru tags, for now.

Will bias QWEN more heavily the higher the strength is for this version.

Strengths

Handles low step or high step models fairly well. Reduce strength for low steps and you'll still get some use of the json.

Weaknesses

Attributes hallucinate. Actions hallucinate. Names are pretty good.

1k Brent (Preview)

Similar format as the V0.5.

Booru tags MORE critical. Different biases

Weaknesses

Strong, but will bias a different array of images. More rigid and smaller array.

Text has problems, increase strength to the negative if you have large problems.

Brent 10k V0.5 Release

Fully revamped trainer; a forked diffusion-pipe with a considerably faster parquet processing pipeline.

https://github.com/AbstractEyes/diffusion-pipe/tree/feat/parquet-hf-dataset-backend

Instead of the anima trainer.

https://huggingface.co/datasets/AbstractPhil/diffusion-pretrain-set-ft1

10,000 images instead of 1000.

I ran too many epochs, however the balanced train will allow the model to operate on lower strength. The next run will be considerably more images, a higher diversity in images, a better character controller, a higher complexity yield for json capacity, and a much larger complexity with json prompts.

Subject Bucketing upgrade

The bucketing system handles roaring fast speeds and a shared grab-bag capacity for buckets which both reduces prep time and still produces more images than the model can ingest on 4 gpus. The parquet processing pipeline processes images considerably faster and still handles AR bucketing at lightning speed, all because of the random grab-bag processing capacity of the parquet system.

Improved Cache

The original caching system is quite improved now, converted to parquet processing that easily capped the 4 a40 gpus with 100% processing.

More Data

A much larger train of 10,000 dual-prompted images. Repeats are based on both buckets and their subject selectiveness frequency.

Suggested Use

I suggest reduced strength which will still promote the lora's strength without introducing the QWEN biases as strongly.

I've included trigger prompt assistance for using the built in subject format.

Brent 1k (PREVIEW) Release

https://github.com/AbstractEyes/anima-trainer

Trained with the same trainer as Anima was trained with originally - diffusion-pipe, snapped together with a new dataset organization system so I could run it in either Runpod or notebooks.

https://huggingface.co/datasets/AbstractPhil/diffusion-pretrain-set-ft1

This is 1k images randomly sampled and subject-bucketed from the 80k image dataset "qwen_90k" that will be trained next.

https://huggingface.co/AbstractPhil/Qwen3.5-0.8B-json-captioner

Each of the images were captioned using the VLM's VIT for a JSON outputted system and additionally a variant of AnimeTIMM VIT also captioned and then processed into JSON as well.

12 epochs on the VLM JSON captions, same images back in for 8 more epochs with AnimeTIMM JSON. This is the results from subject-bucketing with json.

More specifically

https://huggingface.co/blog/AbstractPhil/subject-bucketing

This is a subject-bucket trained JSON finetune.

The specific targets are meant to provide better accuracy and more fidelity to finetunes experimentally while simultaneously training a proof-of-concept paradigm related to subject-bucketing.

TLDR Subject Bucketing

Dataset, balancing. Normally you end up with a series of, problems from finetunes. Breakpoints, kinks, issues, distortions, faults, and so on.

This is meant as an experiment to solve those exact problems. By finetuning a model with JSON, you provide a form of differentiated perspective to the AI. By grouping subjects to a more complex paradigm as stated in the article - the differentiation becomes robust.

A little longer, still short.

Each token separator is another format of language that QWEN already understands and recognizes. The more you combine in sequence, the more QWEN will understand this process - providing more utilizable structure to the diffusion system.

With robust and orderly encodings provided to the diffusion system that include differentiated lesser-used tokens in conjunction with more common-use tokens, the more powerful the training results in useful outcomes.

Why?

The smaller-scale non-bucketed variants were successful, so it's time to train the real thing. The tool itself, and the tool yields.

Now the first 1k image train for the direct tool has been successful. The results are yielding and powerful. This merits a full uptick in training.