Curiosity-driven LLM-as-a-judge for personalized creative judgment
Vanya Bannihatti Kumar
2025Computer science - computation and language, computer science - machine learning
Abstract
Modern large language models (LLMs) excel at objective tasks such as evaluating mathematical reasoning and factual accuracy, yet they falter when faced with the nuanced, subjective nature of assessing creativity. In this work, we propose a novel curiosity-driven LLM-as-a-judge for evaluating creative writing which is personlized to each individual's creative judgments. We use the Torrance Test of Creative Thinking(TTCW) benchmark introduced in Chakrabarty et al. (2024), which has stories annotat
Relevance Assessment
Research Gap
Notes
Notes are automatically saved as you type
Tags
Human-In-The-Loop › Autonomous GenerationCreativity Evaluation Methods › LLM-Based EvaluationCreativity Frameworks › Logical CreativityCreativity Evaluation Methods › Human EvaluationCreativity Evaluation Methods › Automatic MetricsLevel of Analysis › Document-LevelModel Scale › Small (<3B)Relationship to Creativity › ImplicitTextual Domain › Literary TextsResearch Focus › Architectural ResearchModel Scale › Medium (8-24)Creativity Frameworks › Linguistic CreativityResearch Focus › Benchmark
Search Queries
Paper ID: f6f31b70-079b-4f46-8e14-ebeac7a35e49Added: 10/26/2025