AI Images Converge on 12 Default Styles — And the "Physics Layer" That Breaks the Pattern

By TextureLoRALab (Shitsukan)

When you generate an image without specific direction, you are not tapping into infinite possibility. You are drawing from a funnel. AI only outputs what it has learned.

Recent research confirms what many artists have felt intuitively: AI image generation does not converge on random output. It converges on roughly 12 dominant visual patterns. Dramatic three-point lighting, ultra-high-definition rendering, saturated color palettes, and that distinctive glossy smoothness of "generic high-quality render." These patterns appear regardless of model, regardless of base prompt.

This is not a bug. It is the shape of the training data made visible.

The implication is uncomfortable for AI artists. If you leave your prompt at the default layer — "warrior in armor," "sunset landscape" — you get statistically average output. And so does everyone else. Your image and mine share the same gloss, the same detail density, the same uncanny smoothness.

The industry response has been predictable: stack specification layers on top.

Layer 1: Style Specification

The first intervention is specifying materials. Not "painting" but "oil painting on linen." Not "ink" but "ink wash on rice paper." As zsky.ai's research confirms, specifying materials produces dramatically more distinctive output than leaving them unspecified.

This works because you are adding vocabulary that pushes the model away from the statistical center. Say "screen print" and you pull toward a sub-cluster with lower saturation, visible dot patterns, flatter rendering. Say "watercolor" and you get bleed edges and color mixing.

Most LoRAs operate at this layer. Style LoRAs, aesthetic LoRAs, art movement LoRAs — they teach the model to prefer certain visual characteristics by adjusting style-responsive weights. This is useful. It does get you away from the 12 default patterns.

But there is a constraint: you are working within the model's existing statistical vocabulary. Anyone with access to the same LoRA, or the same prompt words, can reach the same visual destination. Style is copyable. There is currently no law protecting AI image copyright.

Layer 2: Composition and Technical Direction

The second intervention is specific composition and technique. ControlNet guidance. References to specific photographers or cinematographers. Angle and framing instructions. Camera type. Lighting ratio specifications.

This layer constrains through structure rather than aesthetic vocabulary. Instead of just "oil painting," you say "oil painting, shot from above, 35mm Kodachrome reference, single light source, shadows falling left." You are binding the output more tightly to spatial relationships associated with those technical choices in the training data.

This is more resistant to casual copying than style alone. But it is still a prompt-layer intervention — working within the space of "telling the model which words and concepts to combine."

Layer 3: Material Physics

And then there is a layer almost nobody is manipulating: the behavior of actual materials.

Material physics is not "make it look like oil painting." It is "teach the model how oil paint behaves on linen canvas." How paint beads and pools when a brush passes through it. How it cracks as it dries. How light passes through layered transparent glazes. How the substrate's weave accepts, resists, and holds the material.

Material physics is not "gold color." It is "the fracture pattern of gold leaf specific to the kiribaku technique." The distinctive stress lines that form when leaf is pressed at precise angles, how those fractures catch light, how they differ from accidental tearing or oxidation.

Keith Edwards, analyzing contemporary AI output, documented this gap: "Algorithms are not great at texture — always either too smooth or too noisy, with no sense that there is actual thought behind where the detail goes." The problem is not aesthetic. It is epistemological. The model does not understand what physical materials do.

This is the layer where texture LoRA operates. Not through prompts, not through style vocabulary, but by training on actual material photographs — shot across lighting conditions, magnifications, aging states, and damage patterns — encoding the statistical signature of real physical behavior into the model's weights.

When you transplant real gold leaf, real kintsugi repairs, real hamon steel patina into AI, you are rewriting the model's understanding of how those materials look.

Why Layer 3 Has Been Overlooked

Here is why this matters for differentiation.

Style can be copied with a prompt word. If a LoRA teaches "dramatic Caravaggio lighting," anyone who knows that can add it to their prompt. The knowledge becomes public vocabulary.

Composition can be copied with ControlNet. Once a technique is documented, it is reproducible.

Material physics requires knowledge about materials that does not exist in the model's training data. You cannot prompt "precise kiribaku fracture pattern" — the model has no statistical foundation for that pattern. Natural photographs of kintsugi do not capture the technical coherence that comes from understanding actual material decomposition.

But someone who studied Japanese painting in high school, spent time in conservation labs, and handled real materials in workshops can look at those fracture patterns and distinguish which are physically plausible and which are hallucinations. That makes it possible to build training datasets that encode genuine material knowledge, not just visual similarity.

The knowledge transplanted into the LoRA — the physical characteristics — is sticky. It cannot be copied by adding words to a request. It requires the ground-level work of understanding materials at the level of physical behavior, not visual appearance.

Why AI Doesn't Know the Weight of Paint digs deeper into this gap. The model sees surfaces but not substrates. It sees light reflection but not material loading.

Practical Implications

If you want your AI art to escape the 12 default patterns, the options are:

Specify style (everyone is doing this now)
Add composition constraints (this is growing)
Or operate at the physics layer and become genuinely difficult to replicate

Output becomes distinctive not because you chose better prompt words — prompts are infinitely copyable — but because the weights you are using encode material understanding that took years of study to acquire. A texture LoRA is not a style applied to a model. It is material knowledge encoded into a model.

Models still hallucinate. The physics layer does not solve that. But it constrains hallucination toward physical plausibility. Cracks form where cracks should form. Light hits where light should hit. Detail density follows the logic of how materials degrade and age, not the logic of "more detail looks more premium."

Whether that constitutes durable differentiation is something each person should judge based on their own work. But material understanding encoded in weights is harder to copy than prompts or styles.

The 3-Distance Method documents one approach to building this kind of LoRA systematically — starting from macro photography of real materials, understanding the distance between physical reality and digital representation, and training to minimize that distance.

Models Referenced

SHIFUKU Gold Leaf v1 — Initial experiment encoding gold leaf fracture behavior
SHIFUKU Gold Leaf v2 — Refined training based on kiribaku and traditional application techniques
SHIFUKU Kintsugi — Physics of ceramic repair, lacquer behavior, gold line optics
SHIFUKU Hamon Steel (Beta) — Differential hardening patterns, oxidation layer variation, aging patina progression

All trained on macro photography of real materials using the 3-Distance Method to minimize hallucination at the physics layer.

Related:

Why AI Doesn't Know the Weight of Paint
Style LoRA vs Texture LoRA — They Solve Different Problems
Training Texture LoRAs from Real Materials: A 3-Distance Method