From Regex to Real-Time: Our 2025 Moderation Journey

Content moderation at scale is one of the hardest problems in AI. It's not just about blocking bad content, it's about nuance, context, and the ability to adapt faster than bad actors can evolve.

Our community creates millions of images. Keeping things safe while respecting creative freedom isn't a problem you solve once; it's a constant balancing act. This year, we transformed our approach, from basic pattern matching to an adaptive, multi-layered system built on real-time policy tools, community feedback, and continuous learning.

Here's how we got here.

The Regex Era

At the start of 2025, our moderation pipeline was clever but brittle.

We had complex combinatorial regex covering 18,000 concepts and permutations. OpenAI's prompt moderation service scanned inputs. We built what we called "semi-permeable membranes", LoRAs for supported foundational models that could prevent certain combinations of content from being generated together.

It worked. Mostly. But the problem with rule-based systems is that every new edge case means engineering work. A new threat surfaces, someone files a ticket, a developer writes a patch, it goes through review, gets deployed. Days pass. Meanwhile, the problem persists.

We were playing whack-a-mole, and the moles were getting faster.

We needed something that could adapt as fast as threats evolved.

The Policy Definition Revolution

In March 2025, we added VLM-based scanning for generated images. But it wasn't just about adding another AI layer, it was about fundamentally changing how we define and iterate on policies.

We partnered with Moonbounce for policy-driven content classification, and it changed how we think about moderation.

There's a crucial difference between "writing prompts" and "defining policies." The traditional approach is telling an AI what to look for, prompt engineering, essentially. The policy approach is different: codify your rules in a structured way, test against real data, iterate in minutes instead of days.

Moonbounce's interface became our command center for policy work. We could test a policy change against hundreds of images before deploying it. When we saw something wrong, we could verify the fix worked across our entire test set before it went live.

This mattered most when compliance requirements evolved. In April 2025, we needed to add several new content categories to meet the needs of payment processors. With our old system, that would have been weeks of engineering work. With real-time policy tuning, we went from requirement to production in hours.

For the first time, our policy team could iterate without waiting for engineering sprints. That speed turned out to be the difference between reactive moderation and proactive moderation.

Knights of New Order

AI is fast, but humans understand nuance. In May 2025, we launched Knights of New Order, a community moderation game that lets our users help ensure content is appropriately rated.

The numbers are remarkable. We're seeing millions of ratings from community members each month. That's not replacing AI, it's augmenting it with human judgment at scale.

Here's why this matters: AI catches patterns, but humans catch context. An image that's technically within policy might still feel wrong. A borderline case might need human eyes. And when our AI makes mistakes, community ratings help us understand where our policies need refinement.

The feedback loop is the real innovation. Community ratings don't just fix individual misclassifications, they generate ground truth data that helps us improve the entire system. We're not just moderating content; we're continuously learning from our community about where the lines should be drawn.

Plus, gamification made participation meaningful. People want to help keep their community safe. We just had to make it engaging.

Defense in Depth

Today, our moderation pipeline is genuinely multi-layered. No single solution is perfect, so we don't rely on any single solution.

Here's what's running:

Text scanning for prompts and written content: model descriptions, articles, comments, usernames
Image classification for generated and uploaded content
Celebrity and POI detection to prevent unauthorized likeness use
Partnership with Thorn for CSAM detection in generated and uploaded content
Community review layer through Knights of New Order
Continuous refinement based on all of the above

Different providers are optimized for different things. Some excel at speed, others at specific content types, others at edge cases. By layering them together, we get resilience. If one system misses something, another catches it.

The goal isn't perfection, it's having the agility to adapt quickly when problems surface.

Lessons from a Year of Building

Three things became clear this year:

Speed of iteration matters more than initial accuracy. Every system has blind spots. The question isn't whether your moderation will make mistakes, it will. The question is how fast you can fix them. Tools that let policy teams iterate independently, without waiting for engineering cycles, are game-changers.

Community involvement is essential. AI catches patterns; humans catch context. Building trust means involving the community in the process, not just telling them the rules. Our community has a stake in keeping their feeds safe. Letting them participate in that work builds ownership.

Transparency builds credibility. We publish our policies. We explain our changes. We own our mistakes. When we talk to compliance partners, they appreciate seeing the controls, not just hearing about them. Transparency isn't just good ethics; it's good business.

Looking Ahead

Earlier this year, we weren't sure we'd ever get standard payment processing on Civitai. The combination of user-generated AI content and open model sharing made even high-risk processors nervous.

Last month, we launched Civitai Green with Stripe. Standard payment processing. Credit cards. The works.

That happened because our moderation pipeline became a competitive advantage, not just a cost center. Compliance partners could see exactly how much control we have over our content pipeline, and how quickly we can adapt when new requirements emerge.

We're continuing to invest in this infrastructure. The threats will keep evolving. So will we.

We're grateful to partners like Moonbounce who helped us build the foundation for real-time policy management. And we're grateful to our community, both for your patience as we figured this out, and for your participation in making the platform safer for everyone.

If you want to help, join Knights of New. Your ratings make a real difference.

Here's to another year of building.