Video (entire Patreon is free and public, no account needed):
https://www.patreon.com/posts/118649243
Model:
https://civitai.com/models/1066251/yolkheads-albums?modelVersionId=1196728
NotebookLM discussion (for auditory learners): https://notebooklm.google.com/notebook/410d9763-4847-4a5d-812b-a19aa9cc1ffc/audio
Below is an article that is easier to follow for those who don't want to watch the video, or are unfamiliar with some of the outlined concepts (or just prefer reading):
Combining Astigmatism and Pink Concrete: A Holistic Strategy for Tackling Overfitting in AI
1. Introduction
In AI development—whether text or image-based—models often become overfit to specific patterns, references, or iconic imagery. From famously reproducing the “Mona Lisa” in every prompt containing “Mona,” to twisting the same saturated color palette for “The Kiss,” overfitting leads to repeated, rigid outputs. These narrow “sinkholes” in the model's conceptual map degrade performance in even unrelated tasks (like drawing hands or faces), causing “CFG burn-in” or awkward visuals.
This article introduces two interlinked approaches to address overfitting:
Astigmatism: A semantic shift strategy that uses positive and negative LoRA (Low-Rank Adaptation) to break the model’s obsession with iconic references.
Pink Concrete: A specific workflow that merges multiple improved checkpoints, guided by user feedback and the 50% Rule, to converge on strong global features while “filling in” overfit sinkholes.
Along the way, we’ll also explore the 50% Rule and how repeated merges (plus human preference as “natural selection”) push model performance beyond any single checkpoint’s limitations.
2. Overfitting: Why It Matters and How It Shows Up
2.1 The Sinkhole Problem
In image models like SDXL or Flux, you might notice:
Typing “Mona Lisa” yields almost the exact same painting every time.
“The Kiss” always spawns a Gustav Klimt-inspired layout, ignoring the possibility of two random people kissing.
“Abbey Road” forcibly references the Beatles album cover, even if you just want a quiet street near a monastery.
These are overfit pockets—also nicknamed “sinkholes.” Once a model “falls into” them, it struggles to interpret the prompt any other way, overshadowing broader concepts like “woman’s face” or “people crossing a road.” Worse, these pockets can distort everything from color balance to anatomy, spilling into prompts that never mention the famous reference at all.
2.2 Hidden Costs of Overfitting
Poor Generalization: The model fixates on a single outcome (e.g., a famous painting style) instead of exploring varied possibilities.
Distorted Subcomponents: Overfit regions can warp smaller details like hands or facial geometry.
CFG Burn-In: Even at low to moderate CFG (Classifier-Free Guidance) levels, certain references dominate the output, ignoring user prompts.
To solve this, we need a method to “unstick” or shift the model away from these iconic black holes. That’s where Astigmatism and Pink Concrete come in.
3. Astigmatism: Shifting the Model’s Focus
Astigmatism is a metaphor borrowed from vision problems, where adjusting the lens or focus can bring clarity. In AI image models, it refers to:
Identifying Overfit Terms: Track prompts like “Mona Lisa,” “Abbey Road,” or “The Kiss” that produce repetitive outputs.
Semantically Shifting Those Terms: Train the model (using LoRA) to see these prompts in alternative ways.
3.1 Positive + Negative LoRA
LoRA (Low-Rank Adaptation) allows you to fine-tune large models without retraining every parameter. For Astigmatism, we use a pair of LoRAs:
Positive LoRA
Goal: Broaden or re-route the concept from its iconic meaning.
Example:
“Mona Lisa” → Moaning Woman (exploring the notion that “Mona” might relate to “moan,” thus detaching the model from the Da Vinci painting).
“The Kiss” → people kissing in many contexts (walls, pets, random everyday moments).
Effect: The model becomes more flexible, seeing the phrase as a dynamic prompt rather than a fixed icon.
Negative LoRA
Goal: Suppress the model’s existing overfit imagery.
Method: Train on the model’s own repetitive outputs—those same “Mona Lisa” or “Kiss” templates—so it “learns” to avoid or reduce them.
Effect: Think of it like partial ablation: you’re not deleting weights outright, but pushing the model away from that locked path.
3.2 Benefits and Surprising Side Effects
Less Iconic Lock-In: Prompting “Mona Lisa” no longer yields the same painting.
Better Anatomy & Faces: Eliminating overfit pockets often cascades into fewer bizarre artifacts in hands, faces, or complex backgrounds.
General Quality Boost: Freed from confining references, the model can adapt more flexibly to new styles or subject matters.
4. Pink Concrete: Breaking Overfits Through Iterative Merges
While Astigmatism is a direct approach to re-label or suppress iconic terms, Pink Concrete is an overarching workflow that addresses overfitting pockets by merging multiple improved model checkpoints. It leverages both community feedback (as “natural selection”) and the pigeonhole principle to converge on best features.
4.1 The Workflow Summarized
Gather Overfit Outputs: Prompt the base or partial-fine-tuned model with known problematic references (“Mona Lisa,” “The Kiss,” “Abbey Road”).
Train Negative LoRA: Use these repetitive outputs as training data for a LoRA that explicitly represses the stuck patterns.
Create Positive Alternatives: For the same prompts, produce re-imagined or literal interpretations (e.g., a wide range of “kissing” scenarios for “The Kiss”) to train a second LoRA.
Combine & Test: Merge negative + positive LoRAs into a single checkpoint, then test on both the iconic prompts and general anatomy or style prompts.
At scale, multiple such merges are shared with the community, who naturally “select” the merges that yield better images. Over generations of combining these popular merges—each individually above 50% “successful”—you reach a final checkpoint that rarely falls into overfit sinkholes.
5. The 50% Rule: Why It All Works
5.1 Coin Box Analogy
Imagine you have a box full of coins. Each coin is biased to land on heads 51% of the time—slightly better than a fair coin. Flip just one or two coins, and you might still lose. But flip a large number of these slightly biased coins, and your chances of getting mostly heads jump dramatically.
In AI:
Single Sub-Model or Single Prompt: ~51% chance it might succeed.
Multiple Sub-Models or Prompts: Combining them (“stacking” or “merging”) compounds the likelihood of success.
5.2 How It Ties to Model Stacking
Prompt Layering
If each rephrased prompt or instruction has >50% chance of helping the model find the right output, using several pushes you closer to 100% success.
E.g., “Summarize the text,” then “Highlight main points,” then “Explain key takeaways”—each nudge is a biased coin.
Model Merging
Each sub-checkpoint that consistently improves faces, hands, or color balance beyond baseline is another 51% coin.
Merge them iteratively: the final checkpoint becomes more likely to have “mostly heads” (i.e., mostly beneficial features) because outlier or negative quirks get “averaged out.”
6. Natural Selection Through Community Feedback
6.1 Human Preference as a Filter
When you share new merges (like those from Pink Concrete) on platforms such as Civitai:
Good Merges: Generate superior images, fix known problems, and get upvoted or widely adopted.
Weak Merges: Remain overshadowed because they reintroduce artifacts or break something else.
Over time, the merges that stand out form the “gene pool” for subsequent merges. This community-driven approach becomes a real-world demonstration of natural selection:
Winning Genes (i.e., stable hands, improved color, fewer iconic sinkholes) keep passing into new merges.
Losing Genes vanish as no one uses those merges further.
6.2 Pigeonhole Principle in Convergence
Since each “winner” is more than 50% effective, repeated merging with other winners shares and reinforces their best traits. By the pigeonhole principle, it’s highly unlikely that all merges simultaneously fail on the exact same improvement, so beneficial features get preserved repeatedly. Unwanted quirks—unique to only one sub-model—tend to get diluted.
7. Practical Applications: Beyond Iconic Overfits
7.1 Text Generation and Prompt Engineering
Stacking Prompts: If you want a comprehensive summary, re-ask the question in different ways. Each rephrased prompt is a new biased coin.
Collaborating Models: Combine specialized text summarizers, Q&A modules, and sentiment analyzers. If each is >50% accurate, the ensemble is far more robust.
7.2 Image Creation in the Wild
Targeted Astigmatism: If your model fixates on “cat” as always the same stock image, create a Negative LoRA from those repeated cat outputs and a Positive LoRA showing varied cat photos, cartoons, or paintings.
Multiple Thematic Merges: Some sub-models might excel at lighting, others at detail or texture. Merging these can produce a final checkpoint that handles lighting and detail with minimal compromise.
7.3 Teaching, Communication, and Human Learning
The same principle applies to explaining concepts:
Provide multiple analogies or examples (each >50% likely to click with the learner).
If one style of explanation fails, another might succeed.
Over enough explanations, the odds a student remains confused about all vantage points is minimal.
8. From “Mona Lisa” to “Moaning Woman”: Real Results in Action
8.1 Case Study: Flux
Flux is a model with strong aesthetics but inherited overfits. By applying Astigmatism:
Negative LoRA: Trained on how Flux re-created “Mona Lisa,” “The Kiss,” etc.
Positive LoRA: Provided new, broader interpretations for those terms (different angles, times, mediums).
Merged: Balanced the aesthetic of Flux with the new semantic expansions.
Outcome: Enhanced resolution, better anatomy, and fewer random saturations—especially in prompts that previously triggered the same iconic painting. Users also discovered improvements in niche prompts (like special styles or fetish art), suggesting that clearing overfit pockets helped across the board.
8.2 Pink Concrete’s Iterative Refinement
Going further, Pink Concrete merges multiple finetunes with our astigatism'ed fine tune of the base model, each with a proven higher win-rate than the base model in the first place (in turn, applying the 50% rule in our decision making to optimize performance with the lowest effort). Community feedback picks the merges that best handle overfit issues while maintaining overall quality. Over many iterations, “sinkholes” vanish as the best signals accumulate.
9. Key Takeaways and Recipe for Success
Identify Overfits (Sinkholes)
Look for repeated outputs: If “Mona Lisa” is always the same painting, you’ve found a sinkhole.
Astigmatism (Positive & Negative LoRA)
Negative LoRA: Uses the model’s repetitive outputs to teach it not to replicate them.
Positive LoRA: Supplies varied or literal examples that expand the concept’s meaning.
The 50% Rule
Each >50% success method (prompt, model, or partial checkpoint) is like a biased coin.
Stack multiple coins or merges to achieve a near-certain majority of “heads” (desired features).
Model Merging & “Natural Selection”
Gather sub-models that individually outperform the baseline.
Merge them iteratively; user preference weeds out regressions, reinforcing the best traits.
Over successive merges, beneficial improvements converge while isolated failures fade away.
Practical Action Steps
Experiment with multiple prompts, merges, and rephrasings—avoid single-point failures.
Analyze outputs to find “winners” (improved faces, fewer artifacts).
Refine by layering LoRAs, repeated merges, and user feedback.
Teach & Communicate with multiple examples so your “biased coins” cumulatively ensure comprehension.
10. Conclusion
Overfitting in AI can be visualized as sinkholes that trap your model into reproducing the same iconic imagery or distorted anatomy. Astigmatism (via positive + negative LoRA) and Pink Concrete (iterative merges plus human-driven “natural selection”) form a two-pronged strategy to tackle these sinkholes head-on.
By leveraging the 50% Rule—the idea that multiple slightly-biased strategies stack to a robust outcome—and letting the community filter and merge winners, you reduce the model’s ability to get stuck. You also expand its creativity and accuracy in both overtrained prompts (e.g., “Mona Lisa”) and unrelated tasks (improved hands, faces, backgrounds).
The guiding question remains: How can you stack more >50% components to ensure a win? Whether it’s layering prompts, ensembling sub-models, or combining multiple explanations in teaching, each extra coin tilt above 50% drastically boosts your final success rate. The upshot is a freer, more adaptive AI that handles everything from iconic references to brand-new concepts without falling into overfit pitfalls.
-----
Links to the conversations used to generate the summarization above:
1 (original transcript produced via NotebookLM lol): https://chatgpt.com/share/676bd08c-a060-8001-a6cb-85715e5a4635
2 (context refresh): https://chatgpt.com/share/676bd0bc-9f90-8001-b1bc-b80fec34caec
3 (final process): https://chatgpt.com/share/676bd0c7-36bc-8001-bbf1-56b09b545a2b