Back to Papers

Building Japanese Creativity Benchmarks and Applying them to Enhance LLM Creativity

So Fukuda

2025ACL, WS

Abstract

To evaluate the creativity of large language models (LLMs) in Japanese, we construct three benchmarks: Japanese Creativity Questions (JCQ), Divergent Association Task (DAT), and Story Alteration Task (SAT). JCQ comprehensively evaluates creativity using LLMs. Meanwhile, DAT and SAT measure specific aspects of creative ability using embeddings. We also analyze correlations between JCQ and DAT, JCQ and SAT, and DAT and SAT. While JCQ provides comprehensive evaluation, it is relatively time and resource intensive. In contrast, DAT and SAT offer lower comprehensiveness but enable quick, low-cost assessment. Additionally, we investigate whether training with DAT contributes to enhancing LLM creativity.

Relevance Assessment

Research Gap

Hasn't been applied to MT. No automatic creativity framework for MT.

Notes

Notes are automatically saved as you type

Tags

Creativity Evaluation Methods › Creativity-Specific EvaluationRelationship to Creativity › ExplicitModel Scale › Medium (8-24)Model Scale › Small (<3B)Creativity Evaluation Methods › Automatic MetricsCreativity Evaluation Methods › Human EvaluationHuman-In-The-Loop › Autonomous GenerationCreativity Frameworks › Logical CreativityResearch Focus › Benchmark

Search Queries

Paper ID: 3621915e-6ee3-4a5c-9fc5-48d23fa78a8aAdded: 9/21/2025