LayoutGPT: Compositional Visual Planning and Generation with Large Language Models

被引:0
|
作者
Feng, Weixi [1 ]
Zhu, Wanrong [1 ]
Fu, Tsu-jui [1 ]
Jampani, Varun [2 ]
Akula, Arjun [2 ]
He, Xuehai [3 ]
Basu, Sugato [2 ]
Wang, Xin Eric [3 ]
Wang, William Yang [1 ]
机构
[1] Univ Calif Santa Barbara, Santa Barbara, CA 93106 USA
[2] Google, Mountain View, CA USA
[3] Univ Calif Santa Cruz, Santa Cruz, CA USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Attaining a high degree of user controllability in visual generation often requires intricate, fine-grained inputs like layouts. However, such inputs impose a substantial burden on users when compared to simple text inputs. To address the issue, we study how Large Language Models (LLMs) can serve as visual planners by generating layouts from text conditions, and thus collaborate with visual generative models. We propose LayoutGPT, a method to compose in-context visual demonstrations in style sheet language to enhance the visual planning skills of LLMs. LayoutGPT can generate plausible layouts in multiple domains, ranging from 2D images to 3D indoor scenes. LayoutGPT also shows superior performance in converting challenging language concepts like numerical and spatial relations to layout arrangements for faithful text-to-image generation. When combined with a downstream image generation model, LayoutGPT outperforms text-to-image models/systems by 20-40% and achieves comparable performance as human users in designing visual layouts for numerical and spatial correctness. Lastly, LayoutGPT achieves comparable performance to supervised methods in 3D indoor scene synthesis, demonstrating its effectiveness and potential in multiple visual domains.
引用
收藏
页数:26
相关论文
共 50 条
  • [31] Distilling Script Knowledge from Large Language Models for Constrained Language Planning
    Yuan, Siyu
    Chen, Jiangjie
    Fu, Ziquan
    Ge, Xuyang
    Shah, Soham
    Jankowski, Charles Robert
    Xiao, Yanghua
    Yang, Deqing
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 4303 - 4325
  • [32] Enhancing Large Language Models with RAG for Visual Language Navigation in Continuous Environments
    Bao, Xiaoan
    Lv, Zhiqiang
    Wu, Biao
    ELECTRONICS, 2025, 14 (05):
  • [33] TaleBrush: Visual Sketching of Story Generation with Pretrained Language Models
    Chung, John Joon Young
    Kim, Wooseok
    Yoo, Kang Min
    Lee, Hwaran
    Adar, Eytan
    Chang, Minsuk
    EXTENDED ABSTRACTS OF THE 2022 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS, CHI 2022, 2022,
  • [34] A Design of Interface for Visual-Impaired People to Access Visual Information from Images Featuring Large Language Models and Visual Language Models
    Zhang, Zhe-Xin
    EXTENDED ABSTRACTS OF THE 2024 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS, CHI 2024, 2024,
  • [35] Large Language Models as Commonsense Knowledge for Large-Scale Task Planning
    Zhao, Zirui
    Lee, Wee Sun
    Hsu, David
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [36] Task and Motion Planning with Large Language Models for Object Rearrangement
    Ding, Yan
    Zhang, Xiaohan
    Paxton, Chris
    Zhang, Shiqi
    2023 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS, IROS, 2023, : 2086 - 2092
  • [37] Evaluation of Pretrained Large Language Models in Embodied Planning Tasks
    Sarkisyan, Christina
    Korchemnyi, Alexandr
    Kovalev, Alexey K.
    Panov, Aleksandr, I
    ARTIFICIAL GENERAL INTELLIGENCE, AGI 2023, 2023, 13921 : 222 - 232
  • [38] Leave It to Large Language Models! Correction and Planning with Memory Integration
    Zhang, Yuan
    Wang, Chao
    Qi, Juntong
    Peng, Yan
    CYBORG AND BIONIC SYSTEMS, 2024, 5
  • [39] Utilizing Large Language Models to Illustrate Constraints for Construction Planning
    He, Chuanni
    Yu, Bei
    Liu, Min
    Guo, Lu
    Tian, Li
    Huang, Jianfeng
    BUILDINGS, 2024, 14 (08)
  • [40] Navigation with Large Language Models: Semantic Guesswork as a Heuristic for Planning
    Shah, Dhruv
    Equi, Michael
    Osinski, Blazej
    Xia, Fei
    Ichter, Brian
    Levine, Sergey
    CONFERENCE ON ROBOT LEARNING, VOL 229, 2023, 229