Back to Papers

VisuCraft: Enhancing large vision-language models for complex visual-guided creative content generation via structured information extraction

Rongxin Jiang

Computer science - computer vision and pattern recognition, computer science - computation and language

Abstract

This paper introduces VisuCraft, a novel framework designed to significantly enhance the capabilities of Large Vision-Language Models (LVLMs) in complex visual-guided creative content generation. Existing LVLMs often exhibit limitations in maintaining high visual fidelity, genuine creativity, and precise adherence to nuanced user instructions when generating long-form texts. VisuCraft addresses these challenges by integrating a multimodal structured information extractor (E) and a dynamic prompt

Relevance Assessment

Research Gap

Notes

Notes are automatically saved as you type

Tags

Search Queries

Paper ID: c290563b-6762-44d4-aa3c-c399328a233fAdded: 10/26/2025