BigLiminal-JSON: teaching a LoRA to think in JSON - and why the corruption is the point

This is a writeup for BigLiminal-JSONcom - BigLiminal-JSONred, an experimental concept LoRA. It is not a clean, polished release and I'm not going to pretend it is. It's janky. It faulted in a few places, and some base-model variants behave worse than others. That's fine — for this particular concept, the jank is doing real work. Here's the whole story: the dataset format, the training process, and the Qwen pipeline that made it possible.

The premise

Most text-to-image training pairs an image with a flat caption. You write a sentence, the model learns to associate the whole blob with the picture. It works, but the relationships between things in the scene stay implicit — the model has to infer that "rows of houses along a street" means houses are the repeated subject and street is the thing they line up along.

BigLiminal-JSON was trained on captions that make those relationships explicit. Every training image got a structured, JSON-aligned caption describing subjects, their attributes, the actions, and the setting. Something like:

{"subjects":[{"name":"houses","attributes":["yellow","rowed"]},{"name":"street","attributes":["asphalt"]},{"name":"grassy lanes","attributes":[]},{"name":"sidewalk","attributes":[]},{"name":"sky","attributes":["light"]}],"actions":["rowed in a line","street extends into the distance"],"setting":"outdoor"}

The plain-English version of that is just: a suburban neighborhood street extending into the distance, rows of houses along each side with sidewalks and mailboxes.

The idea is that JSON forces subject association to the surface. The model isn't learning a sentence — it's learning which tokens belong to which subject, and which subjects relate to which actions. The links between things become first-class.

The Qwen pipeline

You don't want to hand-author thousands of JSON captions. So the captions were generated by a small finetuned converter: qwen3.5-0.8b-task_1-lora-v2, a LoRA whose entire job is to take plain English and emit the JSON structure above.

The flow:

Source image gets a plain-language description.
The Qwen converter rewrites that description into subject-associated JSON.
The JSON becomes the training caption for the diffusion LoRA.

That converter is the reusable piece. The diffusion LoRA is downstream of it — swap the source set, rerun the converter, and you get a fresh JSON-captioned dataset for whatever concept you want.

What it actually learned

It learned to think in JSON. You don't need to understand why it routes the way it does — what matters is that the JSON taught it to bind tokens together along subject-association pathways.

A lot of those pathways are genuinely useful. A lot of them are corrupted, or collapse causally — a subject's attributes bleed into the wrong place, depth runs away from itself, geometry repeats past the point of sense.

For most concepts that would be a bug. For liminality it is the entire point. Liminal spaces are supposed to feel structurally wrong — endless, repeating, depth that doesn't resolve, familiar objects that have lost their reason for being there. A clean conditioning model fights that. A model whose subject-association pathways are slightly broken leans straight into it. The corruption produces the unease.

Using it

Talk to it in plain English. You do not need to write JSON to prompt it — the JSON was a training-time representation, not a usage requirement. Just describe what you want. Multilingual works too, if the base knows the language.

The more surreal your request, the more this LoRA can accommodate it relative to the stock base model. Two prompts that worked well:

an endlessly deep gigantic hallway at the top floor of a hotel with open doors along each wall growing smaller as they get further away. walls, ceiling, lights, dimly lit

liminal, a depth of field identical suburban house line at night inside of a hangar bay. Vanishing point in the distance. Depth of field. Pitched black outside the visibility of the housing lights. Each house is vividly represented and identical to the others. high fidelity, absurd resolution

Both produced images that hold the depth and repetition the prompt asks for, with the original base model behaving noticeably differently on the same text. The endless-hallway and identical-house-grid are exactly the kind of scene where the subject-association training shows up.

Honest caveats

These are LoRAs, not full finetunes. They land at "okay and interesting," not "perfect."
Behavior varies across the base-model variants (Klein-9b, ZIT, Ernie, HiDream O1, Klein-4b). Some faulted harder than others.
It's hard to say precisely what the JSON taught, because liminality is an obscure target to measure against. The effect is real but not cleanly attributable.
A larger pretrain dataset is in preparation: diffusion-pretrain-set-ft1.

If you generate something good with it, post it — the most interesting outputs from this thing are the ones nobody planned.

BigLiminal - Json - 5 models.