Action-GPT: Leveraging Large-scale Language Models for Improved and Generalized Action Generation

被引:10
|
作者
Kalakonda, Sai Shashank [1 ]
Maheshwari, Shubh [1 ]
Sarvadevabhatla, Ravi Kiran [1 ]
机构
[1] IIIT Hyderabad, CVIT, Hyderabad, India
来源
2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME | 2023年
关键词
text-conditioned action generation models; large-scale language models; prompt function; stochastic and deterministic;
D O I
10.1109/ICME55011.2023.00014
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We introduce Action-GPT, a plug-and-play framework for incorporating Large Language Models (LLMs) into text-based action generation models. Action phrases in current motion capture datasets contain minimal and to-the-point information. By carefully crafting prompts for LLMs, we generate richer and fine-grained descriptions of the action. We show that utilizing these detailed descriptions instead of the original action phrases leads to better alignment of text and motion spaces. We introduce a generic approach compatible with stochastic (e.g. VAE-based) and deterministic (e.g. MotionCLIP) text-to-motion models. In addition, the approach enables multiple text descriptions to be utilized. Our experiments show (i) noticeable qualitative and quantitative improvement in the quality of synthesized motions, (ii) benefits of utilizing multiple LLM-generated descriptions, (iii) suitability of the prompt function, and (iv) zero-shot generation capabilities of the proposed approach. Code and pretrained models are available at https://actiongpt.github.io.
引用
收藏
页码:31 / 36
页数:6
相关论文
共 50 条
  • [1] GPT3Mix: Leveraging Large-scale Language Models for Text Augmentation
    Yoo, Kang Min
    Park, Dongju
    Kang, Jaewook
    Lee, Sang-Woo
    Park, Woomyeong
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 2225 - 2239
  • [2] A Large-scale Robustness Analysis of Video Action Recognition Models
    Schiappa, Madeline Chantry
    Biyani, Naman
    Kamtam, Prudvi
    Vyas, Shruti
    Palangi, Hamid
    Vineet, Vibhav
    Rawat, Yogesh
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 14698 - 14708
  • [3] On the preconditions for large-scale collective action
    Jagers, Sverker C.
    Harring, Niklas
    Lofgren, Asa
    Sjostedt, Martin
    Alpizar, Francisco
    Brulde, Bengt
    Langlet, David
    Nilsson, Andreas
    Almroth, Bethanie Carney
    Dupont, Sam
    Steffen, Will
    AMBIO, 2020, 49 (07) : 1282 - 1296
  • [4] On the preconditions for large-scale collective action
    Sverker C. Jagers
    Niklas Harring
    Åsa Löfgren
    Martin Sjöstedt
    Francisco Alpizar
    Bengt Brülde
    David Langlet
    Andreas Nilsson
    Bethanie Carney Almroth
    Sam Dupont
    Will Steffen
    Ambio, 2020, 49 : 1282 - 1296
  • [5] Fusion Based Deep CNN for Improved Large-Scale Image Action Recognition
    Lavinia, Yukhe
    Vo, Holly H.
    Verma, Abhishek
    PROCEEDINGS OF 2016 IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA (ISM), 2016, : 609 - 614
  • [6] Large-Scale Human Action Recognition with Spark
    Wang, Hanli
    Zheng, Xiaobin
    Xiao, Bo
    2015 IEEE 17TH INTERNATIONAL WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING (MMSP), 2015,
  • [7] Leveraging Large Language Models for Automatic Smart Contract Generation
    Napoli, Emanuele Antonio
    Barbara, Fadi
    Gatteschi, Valentina
    Schifanella, Claudio
    2024 IEEE 48TH ANNUAL COMPUTERS, SOFTWARE, AND APPLICATIONS CONFERENCE, COMPSAC 2024, 2024, : 701 - 710
  • [8] Building a Role Specified Open-Domain Dialogue System Leveraging Large-Scale Language Models
    Bae, Sanghwan
    Kwak, Donghyun
    Kim, Sungdong
    Ham, Donghoon
    Kang, Soyoung
    Lee, Sang-Woo
    Park, Woomyoung
    NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 2128 - 2150
  • [9] Cooperation, collective action, and the archeology of large-scale societies
    Carballo, David M.
    Feinman, Gary M.
    EVOLUTIONARY ANTHROPOLOGY, 2016, 25 (06): : 288 - 296
  • [10] A large-scale fMRI dataset for human action recognition
    Zhou, Ming
    Gong, Zhengxin
    Dai, Yuxuan
    Wen, Yushan
    Liu, Youyi
    Zhen, Zonglei
    SCIENTIFIC DATA, 2023, 10 (01)