Beyond Stable Diffusion: Building Better Safeguards
Civitai has long employed protection systems to prevent inappropriate or illegal content being created in our image Generator. These include robust measures against CSAM (which are always applied) and optional safeguards for filtering sexual or suggestive content. These protections have worked well with model ecosystems like Stable Diffusion 1.5, SDXL, and Pony.
The generative AI landscape is evolving, however, and while our current protections have served us well, we believe there’s room for an even better solution. Our existing systems, though effective, have certain drawbacks. In some cases, they can affect image quality or conflict with LoRAs. They also lack portability, making them difficult to implement across other generation services and our upcoming video generation tools. Additionally, they aren’t easily adaptable to more recent model ecosystems like Flux and AuraFlow.
By developing a more robust, LLM-driven approach, we aim to improve moderation accuracy, enhance flexibility across different platforms, and future-proof our safety systems. And we know that our community has the expertise to help us make this a reality.
We know that discussions around AI safety can sometimes spark debate. Some might see this as "censorship", but let’s be clear: this is not about restricting legitimate content. It’s about ensuring ethical, legal, and creator-friendly safeguards are in place - something that has always been our goal.
The Challenge: Build a Smarter Prompt Modification System
To address this, we're launching a community collaboration with $1,000 USD compensation for an LLM-based Prompt Modification system. While we have been developing solutions internally, we recognize that there are members of the community with knowledge exceeding our own, and we want to tap into that! By participating, you’ll be helping to build a future-proof solution that balances creative freedom with responsible AI use.
Note that this system will not introduce any new restrictions beyond what we already have in place in the image generator. It will not remove all NSFW content from the generator. It will not be integrated into the user-upload pipeline. Instead, it’s simply a more robust and adaptable way of applying our existing image generator protections, ensuring they remain effective as AI models evolve.
What we're Looking For: Requirements
We need two distinct LLM-driven filtering processes that will run against a local LLM on our own hardware. When a user enters a prompt into the generator, it will first be analyzed and processed by the LLM to ensure it aligns with our safety guidelines.
The system will operate under two modes:
Prevent Sexual - The goal is to identify and remove all sexual content from prompts. If no sexual content is detected, do not modify the prompt. This is currently applied (in another form) when the "mature" toggle is OFF - an optional feature to prevent sexy content when it isn't wanted.
Prevent Sexual + Minor - The goal is to identify prompts intended to generate sexual or suggestive content and remove any references to minors in those cases. If no sexual or suggestive content is detected, do not modify the prompt. This is currently applied (in another form) to all generations.
We need Json outputs in the following format (or similar, if you have a better idea!);
Prevent Sexual
{ "sexualContent": false, // Does the prompt contain sexual content? "newPrompt": // The revised prompt with words referencing sexual content removed if sexualContent = true }
Prevent Sexual + Minor
{ "sexualContent": false, // Does the prompt contain sexual content? "minorContent": false, // Does the prompt contain references to a minor? "minor": [], // An array of any words that would indicate a minor "newPrompt": // The revised prompt with words referencing minors removed if both minorContent and sexualContent = true }
We’re flexible on the choice of LLM, but it must be able to run locally with as little VRAM usage as possible, preferably. We’ve tested several potential models, with varying levels of success, including;
Llama 3.1 8B Instruct
Phi 3 mini 4k 3B Instruct
Mistral 7B Instruct v0.2
DeepSeek R1
Please refer to the attached files (on the right, under the table of contents) for examples of System Prompts and settings we used during our own testing. These are provided for reference - you’re not required to use them, but they may offer insight into what we’ve already tried.
We expect that, along with the System Prompt, you will use Few-Shot Learning by providing input-output example pairs, before the actual prompt is processed. This helps guide the model’s response and prepares it for potential edge cases it may encounter when handling real-world prompts.
Due to the sensitive nature of some prompts - particularly those related to minor content detection - we cannot provide direct examples. However, we have shared general guidance on the types of patterns and behaviors the model should recognize.
If you have a radically different solution to this problem which meets the basic requirement; let us know!
How to Submit your Work
To submit your work, please complete the Submission Form and attach your files.
Make sure you have:
✅ Your Civitai Username and Contact Email
✅ Any relevant notes about your submission
✅ Your submission files attached
Rules
This is not a contest - it's a Civitai-community collaborative initiative to develop a system that helps improve AI safety while maintaining creative freedom. Civitai is setting aside $1,000 USD to compensate a contributor (or contributors), or team who develop the best solution that meets our needs.
Compensation & Payment
The total compensation is $1,000 USD
Payment will be rendered via PayPal.
Compensation distribution: The full amount will be awarded to a single submission, but if multiple strong submissions are fielded we may make additional awards at our discretion.
Submission Deadline
The submission period will close Friday, February 21st 2025 at 11:59 UTC.
Late submissions will not be considered.
Requirements
Submissions must include a working System Prompt that modifies input prompts according to the outlined goals, above. You don't have to send us an LLM model - just let us know which one you used, and where we can get it from!
The system should minimize VRAM usage as much as possible.
The system must incorporate a System Prompt and Few-Shot Learning examples to improve response accuracy.
The system must be original work.
Submission Process
Submit your work via the form listed above.
Include a short write-up explaining how your system works, which LLM model it's designed to work with, and why it fulfils the goals set out for this system. You may also provide examples of the prompts you've tested.
Civitai reserves the right to request modifications or clarification before selecting winners.
Judging Criteria
Submissions will be evaluated based upon:
Effectiveness - How well the system modifies prompts while maintaining accuracy.
Efficiency - How lightweight it is (VRAM usage, max token length, etc.).
Flexibility - How adaptable it is to different styles of prompt.
Safety & Reliability - How well it prevents inappropriate content while avoiding false positives.
Legal & Other
If no submission meets the required standard, Civitai reserves the right not to award the compensation.
By submitting, participants grant Civitai a perpetual license to use, modify, and integrate the chosen solution(s) into our platform.