Curiosity-driven LLM-as-a-judge for personalized creative judgment
Vanya Bannihatti Kumar
Computer science - computation and language, computer science - machine learning
Abstract
Modern large language models (LLMs) excel at objective tasks such as evaluating mathematical reasoning and factual accuracy, yet they falter when faced with the nuanced, subjective nature of assessing creativity. In this work, we propose a novel curiosity-driven LLM-as-a-judge for evaluating creative writing which is personlized to each individual's creative judgments. We use the Torrance Test of Creative Thinking(TTCW) benchmark introduced in Chakrabarty et al. (2024), which has stories annotat
Relevance Assessment
Research Gap
Notes
Notes are automatically saved as you type
Tags
evaluation › LLM-as-a-judgecreativity frameworks › psychological/cognitiveevaluation › human evalevaluation › automatic metricsevaluation › document-levelmodel used › Large (>32B)model used › Small (<3B)related to creativity › related to creativity as a human abilityrelated to creativity › related to creativity as a textual genretextual genre › literaturescope › technical research
Search Queries
Paper ID: f6f31b70-079b-4f46-8e14-ebeac7a35e49Added: 10/26/2025