VisuCraft: Enhancing large vision-language models for complex visual-guided creative content generation via structured information extraction
Rongxin Jiang
Computer science - computer vision and pattern recognition, computer science - computation and language
Abstract
This paper introduces VisuCraft, a novel framework designed to significantly enhance the capabilities of Large Vision-Language Models (LVLMs) in complex visual-guided creative content generation. Existing LVLMs often exhibit limitations in maintaining high visual fidelity, genuine creativity, and precise adherence to nuanced user instructions when generating long-form texts. VisuCraft addresses these challenges by integrating a multimodal structured information extractor (E) and a dynamic prompt
Relevance Assessment
Research Gap
Notes
Notes are automatically saved as you type
Tags
Search Queries
Paper ID: c290563b-6762-44d4-aa3c-c399328a233fAdded: 10/26/2025