Sign In

AAV-- A better model for scoring anime aesthetics


Welcome to experience the open source aesthetic scoring and data cleaning toolset (AAV) developed by Laxhar Dream Lab for XL training. which can be used for preprocessing of large dataset training, aesthetic scoring and automatic annotation of quality words, if it is useful to you, welcome to add a ❤ to the project.

Model has been open sourced at Hugging Face:

The dataset comes from 200,000 manually selected anime images, and the scoring dimensions include both picture quality and composition.

The model scoring using dual-model supervisory architecture (DMSA)

The model structure can be seen in the following figure:

image/pngModel Basic Information

Parametric quantities : 1.1b+110m picture+composition vit+grn

Scoring range : -1~1

Quality cue word is divided into 5 levels: masterpiece, high quality, normal quality, low quality, worst quality

Model Advantages

Faster recognition speed

In a standard environment, the AAV model can evaluate 10,000 images in 30 minutes with high accuracy, for one million anime picture scoring also need only 50h!

More flexible compositional judgment

Adopt compositional confidence function for score-balance.

Composition confidence is a measure of whether the composition of an image conforms to human aesthetic preferences and rules. Composition refers to how the elements in an image are arranged and combined to achieve a certain visual effect and express intent.

Composition confidence can be measured by the following aspects:

Compositional Patterns: Compositional patterns refer to some commonly used compositional rules and methods, such as the rule of thirds, symmetry, and diagonal. These compositional patterns can help an image create a balanced, stable, dynamic or interesting visual effect, thus improving the aesthetic rating of the image. The aesthetic evaluation model can assess the reasonableness and merits of the composition by recognizing the compositional patterns in the image.

Compositional partitioning: Compositional partitioning refers to dividing an image into several non-overlapping regions, each representing a relative position. Different compositional partitions have different effects on the aesthetic rating of an image, e.g., the center region is usually the most important, while the edge regions are usually the least important. Aesthetic evaluation models can assess the coherence and appropriateness of an image's composition by analyzing the distribution of its elements across different compositional partitions.

Visual saliency: Visual saliency refers to the extent to which certain elements in an image are able to attract human attention, such as color, shape, texture, and contrast. Visual saliency can help an image to highlight the most important elements, thus improving the aesthetic rating of the image. The aesthetic evaluation model can assess the prominence and clarity of an image's composition by calculating the visual saliency in the image.

How to use

You can have the actual experience of evaluating the model with Project anime-thetic in HFspace! Examples have been provided in HFspace based on different rating scales, or you can use your own uploaded images.


We experience any comments and ideas! If you have any comments or ideas, we will continue to optimize Relink.