Action-GPT: Leveraging Large-scale Language Models for Improved and Generalized Action Generation

被引:10
|
作者
Kalakonda, Sai Shashank [1 ]
Maheshwari, Shubh [1 ]
Sarvadevabhatla, Ravi Kiran [1 ]
机构
[1] IIIT Hyderabad, CVIT, Hyderabad, India
来源
2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME | 2023年
关键词
text-conditioned action generation models; large-scale language models; prompt function; stochastic and deterministic;
D O I
10.1109/ICME55011.2023.00014
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We introduce Action-GPT, a plug-and-play framework for incorporating Large Language Models (LLMs) into text-based action generation models. Action phrases in current motion capture datasets contain minimal and to-the-point information. By carefully crafting prompts for LLMs, we generate richer and fine-grained descriptions of the action. We show that utilizing these detailed descriptions instead of the original action phrases leads to better alignment of text and motion spaces. We introduce a generic approach compatible with stochastic (e.g. VAE-based) and deterministic (e.g. MotionCLIP) text-to-motion models. In addition, the approach enables multiple text descriptions to be utilized. Our experiments show (i) noticeable qualitative and quantitative improvement in the quality of synthesized motions, (ii) benefits of utilizing multiple LLM-generated descriptions, (iii) suitability of the prompt function, and (iv) zero-shot generation capabilities of the proposed approach. Code and pretrained models are available at https://actiongpt.github.io.
引用
收藏
页码:31 / 36
页数:6
相关论文
共 50 条
  • [12] A large-scale fMRI dataset for human action recognition
    Ming Zhou
    Zhengxin Gong
    Yuxuan Dai
    Yushan Wen
    Youyi Liu
    Zonglei Zhen
    Scientific Data, 10
  • [13] Traces of large-scale dynamo action in the kinematic stage
    Subramanian, Kandaswamy
    Brandenburg, Axel
    MONTHLY NOTICES OF THE ROYAL ASTRONOMICAL SOCIETY, 2014, 445 (03) : 2930 - 2940
  • [14] Fast Action Localization in Large-Scale Video Archives
    Stoian, Andrei
    Ferecatu, Marin
    Benois-Pineau, Jenny
    Crucianu, Michel
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2016, 26 (10) : 1917 - 1930
  • [15] POLICIES AND LOGIC OF LARGE-SCALE INDUSTRIAL ENTERPRISE ACTION
    KARPIK, L
    SOCIOLOGIE DU TRAVAIL, 1972, 14 (01) : 82 - 105
  • [16] Instructions for authors for large language models: Missing in action!
    De Cassai, Alessandro
    Dost, Burhan
    Mormando, Giulia
    Boscolo, Annalisa
    Navalesi, Paolo
    JOURNAL OF CLINICAL ANESTHESIA, 2025, 102
  • [17] MultiSurf-GPT: Facilitating Context-Aware Reasoning with Large-Scale Language Models for Multimodal Surface Sensing
    Hu, Yongquan
    Sun, Black
    An, Pengcheng
    Li, Zhuying
    Hu, Wen
    Quigley, Aaron J.
    PUBLICATION OF THE 26TH ACM INTERNATIONAL CONFERENCE ON MOBILE HUMAN-COMPUTER INTERACTION, MOBILEHCI 2024 ADJUNCT PROCEEDINGS, 2024,
  • [18] Large-Scale Transfer Learning for Natural Language Generation
    Golovanov, Sergey
    Kurbanov, Rauf
    Nikolenko, Sergey
    Truskovskyi, Kyryl
    Tselousov, Alexander
    Wolf, Thomas
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 6053 - 6058
  • [19] Leveraging Large Language Models for the Generation of Novel Metaheuristic Optimization Algorithms
    Pluhacek, Michal
    Kazikova, Anezka
    Kadavy, Tomas
    Viktorin, Adam
    Senkerik, Roman
    PROCEEDINGS OF THE 2023 GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE COMPANION, GECCO 2023 COMPANION, 2023, : 1812 - 1820
  • [20] Towards Modern Card Games with Large-Scale Action Spaces Through Action Representation
    Yao, Zhiyuan
    Shi, Tianyu
    Li, Site
    Xie, Yiting
    Qin, Yuanyuan
    Xie, Xiongjie
    Lu, Huan
    Zhang, Yan
    2022 IEEE CONFERENCE ON GAMES, COG, 2022, : 576 - 579