What the heck is wrong with the site?
It's no secret we've been experiencing major issues on the site over the past week. Typically, in development, it's rare for something to work fine initially and then break without any changes being made. However, as we've grown, we've had this happen a few times as thresholds get crossed that break the limits of what we developed can support. The root cause is the sheer volume of activity: we're receiving hundreds of thousands of image uploads daily, and there are constant metric updates every second. For reference, we now get more images in a month than we did in the entirety of 2023. This overwhelming load is more than simply adding more servers can handle right now. As a result, our systems are lagging, leading to issues such as images not loading, upload failures, generation failures, and potential failures in various on-site actions.
Our planned fix involves shifting from hitting the primary database to display images to using an optimized image search index. This change will alleviate the load on the rest of the site. However, this is not a small task. We are moving as quickly as possible and implementing temporary solutions to keep the site functional while we work on this more in-depth fix.
These are the other main pain points people have made us aware of, and our status with resolving them.
Current Issues and Status Updates
Some newly uploaded resources not showing in model feed or on profiles
Status: Fixed
We have a job running every 10 minutes that identifies and corrects resources that aren't appearing where they should be. This doesn't solve the core problem, but we're working on a permanent fix. We've made some additional changes that we believe will address the underlying problem.
Resolved
Some newly uploaded resources not being usable in the generator
Status: FIxed
We're still diagnosing why this happens.
Resolved
Generated images not appearing correctly on iOS
Status: Fixed
This issue was resolved today.
Price for generation changing
Status: Not a bug
Prices are adjusted as we optimize new workflows. For example, upscale pricing was reduced while img2img workflow pricing was increased. This ensures our prices align with our actual costs in the generator. Prices may continue to change some over the next few weeks as we calibrate prices and adjust workflows.
Image metrics not updating
Status: Fixed
Image metrics were disabled while we addressed issues that caused things to seem like they weren't saving (replication lag). They've been re-enabled for now after having made some changes to attempt to address the replication lag, but we might need to disable them again during peak hours. We're working on a longer-term fix, but it probably won't be ready until next week as we'll need to ensure that it can handle traffic during peak times as well.
Models not appearing in search results
Status: Fixed
The search engine was updated today. Models should now be added to search reliably, but it may take 20 minutes before they appear. We've temporarily disabled this while we fix a related issue.
We've reworked our model search index and reduced the time it takes to process updates by roughly 10x. It's been re-enabled and configured to update with missing models every 5-6 minutes. We noticed that the updates were getting stuck behind really slow image search updates. So...
Image search not updating
Status: Intentional Break
We've turned off updates to the image search engine for now so that we can keep the model search updating fast. We'll be revising the image search engine over the next week. You'll still be able to search for anything made prior to July 26.
Additional issues:
Are you experiencing problems not attributable to the replication lag or any of the above? Please let us know so we can address them quickly. CivBot also tracks known issues, and you can ask CivBot for an update by clicking on "known issues"