Six-Dimension Art Evaluation Rubric
Source paper: Learning-based Artificial Intelligence Artwork: Methodology Taxonomy and Quality Evaluation, ACM Computing Surveys (2024).
Origin
The rubric was built from art vocabulary and traditional principles of painting analysis, then validated through a user study to confirm the weightings felt reasonable across different artwork types. The goal was a consistent, repeatable way to evaluate AI-generated artworks across different styles.
The Six Dimensions
Beauty (50%) — The dominant criterion. Encompasses overall compositional harmony: balance, proportion, the arrangement of visual elements, and the pleasing relationship between subjects. An image can score well on every other dimension and still feel wrong if the composition is off. This is where Gestalt principles are most directly applied — does the whole hang together?
Color (10%) — Palette coherence and emotional resonance. Not just whether colors are technically accurate, but whether the color relationships feel intentional and expressive — harmony, contrast, temperature, and the mood they collectively create.
Texture (10%) — Surface quality and material plausibility. In AI-generated imagery this is particularly diagnostic: does skin feel like skin, fabric like fabric, stone like stone? Also covers handling quality — brushstroke character in painterly work, grain in photography-style renders.
Content Detail (10%) — The richness and specificity of what’s depicted. Vague, generic content scores lower; precise, particular content scores higher. Captures whether the image has something specific to say visually, or whether it’s a generic assembly of shapes.
Line (10%) — The clarity, expressiveness, and intentionality of linework and edges. In illustrative or painterly work this is about the quality of mark-making. In photorealistic work it’s about edge definition and whether contours feel decisive. Weak or confused linework is a common failure mode in AI generation.
Style (10%) — Consistency and distinctiveness of the artistic voice across the image. Does the work feel coherent — as if it came from a single artistic sensibility — or like a pastiche of different references pasted together? This dimension penalizes the generic, blended-everything quality that AI tools often default to.
Why Beauty Gets Half the Weight
The researchers’ user study found that compositional harmony was the single strongest predictor of how people overall judged an artwork’s quality. The other five dimensions tend to function as amplifiers or detractors of foundational compositional success. You can’t rescue a badly composed image with great color alone.
Practical Use
The rubric works as a structured critique checklist when evaluating AI outputs — instead of asking “is this good?”, you ask six specific questions. It also works as a prompting guide: if you know which dimension is weak in a generation, you can target your next prompt specifically at that dimension (e.g., “stronger sense of light source and shadow” targets beauty; “consistent loose brushwork throughout” targets style).