Back to Papers

Rethinking creativity evaluation: a critical analysis of existing creativity evaluations

Li-Chun Lu et al.

Abstract

We systematically examine, analyze, and compare representative creativity measures–creativity index, perplexity, syntactic templates, and LLM-as-a-Judge–across diverse creative domains, including creative writing, unconventional problem-solving, and research ideation. Our analyses reveal that these metrics exhibit limited consistency, capturing different dimensions of creativity. We highlight key limitations, including the creativity index's focus on lexical diversity, perplexity's sensitivity to model confidence, and syntactic templates' inability to capture conceptual creativity. Additionally, LLM-as-a-Judge shows instability and bias. Our findings underscore the need for more robust, generalizable evaluation frameworks that better align with human judgments of creativity.

Relevance Assessment

Research Gap

Notes

Notes are automatically saved as you type

Tags

Search Queries

Paper ID: 5dc7e2ce-f7db-435b-89ca-ff1959242b03Added: 10/26/2025