Back to Papers

Creativity or brute force? Using brainteasers as a window into the problem-solving abilities of large language models

Simeng Han

Computer science - artificial intelligence, computer science - computation and language

Abstract

Accuracy remains a standard metric for evaluating AI systems, but it offers limited insight into how models arrive at their solutions. In this work, we introduce a benchmark based on brainteasers written in long narrative form to probe more deeply into the types of reasoning strategies that models use. Brainteasers are well-suited for this goal because they can be solved with multiple approaches, such as a few-step solution that uses a creative insight or a longer solution that uses more brute f

Relevance Assessment

Research Gap

Notes

Notes are automatically saved as you type

Tags

creativity frameworks › psychological/cognitiveevaluates a creative feature › logic (puzzles, etc.)evaluates a creative feature › riddlesmodel used › ChatGPTmodel used › Large (>32B)model used › Medium (8-24)model used › Small (<3B)related to creativity › mentions creativity as a human abilityscope › prompt engineeringscope › technical researchevaluation › LLM-as-a-judgeevaluation › human evalevaluation › document-levelevaluation › creativity evaluation

Search Queries

Paper ID: 164b4354-472e-4b14-9ef6-a2c770a16bb2Added: 10/26/2025