Back to Papers

Curiosity-driven LLM-as-a-judge for personalized creative judgment

Vanya Bannihatti Kumar

Computer science - computation and language, computer science - machine learning

Abstract

Modern large language models (LLMs) excel at objective tasks such as evaluating mathematical reasoning and factual accuracy, yet they falter when faced with the nuanced, subjective nature of assessing creativity. In this work, we propose a novel curiosity-driven LLM-as-a-judge for evaluating creative writing which is personlized to each individual's creative judgments. We use the Torrance Test of Creative Thinking(TTCW) benchmark introduced in Chakrabarty et al. (2024), which has stories annotat

Relevance Assessment

Research Gap

Notes

Notes are automatically saved as you type

Tags

evaluation › LLM-as-a-judgecreativity frameworks › psychological/cognitiveevaluation › human evalevaluation › automatic metricsevaluation › document-levelmodel used › Large (>32B)model used › Small (<3B)related to creativity › related to creativity as a human abilityrelated to creativity › related to creativity as a textual genretextual genre › literaturescope › technical research

Search Queries

Paper ID: f6f31b70-079b-4f46-8e14-ebeac7a35e49Added: 10/26/2025