Action-GPT: Leveraging Large-scale Language Models for Improved and Generalized Action Generation

被引:10
|
作者
Kalakonda, Sai Shashank [1 ]
Maheshwari, Shubh [1 ]
Sarvadevabhatla, Ravi Kiran [1 ]
机构
[1] IIIT Hyderabad, CVIT, Hyderabad, India
来源
2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME | 2023年
关键词
text-conditioned action generation models; large-scale language models; prompt function; stochastic and deterministic;
D O I
10.1109/ICME55011.2023.00014
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We introduce Action-GPT, a plug-and-play framework for incorporating Large Language Models (LLMs) into text-based action generation models. Action phrases in current motion capture datasets contain minimal and to-the-point information. By carefully crafting prompts for LLMs, we generate richer and fine-grained descriptions of the action. We show that utilizing these detailed descriptions instead of the original action phrases leads to better alignment of text and motion spaces. We introduce a generic approach compatible with stochastic (e.g. VAE-based) and deterministic (e.g. MotionCLIP) text-to-motion models. In addition, the approach enables multiple text descriptions to be utilized. Our experiments show (i) noticeable qualitative and quantitative improvement in the quality of synthesized motions, (ii) benefits of utilizing multiple LLM-generated descriptions, (iii) suitability of the prompt function, and (iv) zero-shot generation capabilities of the proposed approach. Code and pretrained models are available at https://actiongpt.github.io.
引用
收藏
页码:31 / 36
页数:6
相关论文
共 50 条
  • [21] Action Contextualization: Adaptive Task Planning and Action Tuning Using Large Language Models
    Gupta, Sthithpragya
    Yao, Kunpeng
    Niederhauser, Loic
    Billard, Aude
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2024, 9 (11): : 9407 - 9414
  • [22] Improving Large-scale Language Models and Resources for Filipino
    Cruz, Jan Christian Blaise
    Cheng, Charibeth
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 6548 - 6555
  • [23] A LARGE-SCALE STUDY OF LANGUAGE MODELS FOR CHORD PREDICTION
    Korzeniowski, Filip
    Sears, David R. W.
    Widmer, Gerhard
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 91 - 95
  • [24] Leveraging Large-scale Multimedia Datasets to Refine Content Moderation Models
    Sarridis, Ioannis
    Koutlis, Christos
    Papadopoulou, Olga
    Papadopoulos, Symeon
    2022 IEEE EIGHTH INTERNATIONAL CONFERENCE ON MULTIMEDIA BIG DATA (BIGMM 2022), 2022, : 125 - 132
  • [25] An improved Generalized Discriminant Analysis for Large-scale data set
    Shi, Weiya
    Guo, Yue-Fei
    Jin, Cheng
    Xue, Xiangyang
    SEVENTH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, PROCEEDINGS, 2008, : 769 - 772
  • [26] Judge the Judges: A Large-Scale Evaluation Study of Neural Language Models for Online Review Generation
    Garbacea, Cristina
    Carton, Samuel
    Yan, Shiyan
    Mei, Qiaozhu
    2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 3968 - 3981
  • [27] Large Language Models as Commonsense Knowledge for Large-Scale Task Planning
    Zhao, Zirui
    Lee, Wee Sun
    Hsu, David
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [28] HaluEval: A Large-Scale Hallucination Evaluation Benchmark for Large Language Models
    Li, Junyi
    Cheng, Xiaoxue
    Zhao, Wayne Xin
    Nie, Jian-Yun
    Wen, Ji-Rong
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 6449 - 6464
  • [29] Dynamic large-scale network synchronization from perception to action
    Hirvonen, Jonni
    Monto, Simo
    Wang, Sheng H.
    Palva, J. Matias
    Palva, Satu
    NETWORK NEUROSCIENCE, 2018, 2 (04): : 442 - 463
  • [30] Facilitating dynamo action via control of large-scale turbulence
    Limone, A.
    Hatch, D. R.
    Forest, C. B.
    Jenko, F.
    PHYSICAL REVIEW E, 2012, 86 (06):