Action-GPT: Leveraging Large-scale Language Models for Improved and Generalized Action Generation

被引：10

作者：

Kalakonda, Sai Shashank ^{[1
]}

Maheshwari, Shubh ^{[1
]}

Sarvadevabhatla, Ravi Kiran ^{[1
]}

机构：

[1] IIIT Hyderabad, CVIT, Hyderabad, India

来源：

2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME | 2023年

关键词：

text-conditioned action generation models; large-scale language models; prompt function; stochastic and deterministic;

D O I：

10.1109/ICME55011.2023.00014

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We introduce Action-GPT, a plug-and-play framework for incorporating Large Language Models (LLMs) into text-based action generation models. Action phrases in current motion capture datasets contain minimal and to-the-point information. By carefully crafting prompts for LLMs, we generate richer and fine-grained descriptions of the action. We show that utilizing these detailed descriptions instead of the original action phrases leads to better alignment of text and motion spaces. We introduce a generic approach compatible with stochastic (e.g. VAE-based) and deterministic (e.g. MotionCLIP) text-to-motion models. In addition, the approach enables multiple text descriptions to be utilized. Our experiments show (i) noticeable qualitative and quantitative improvement in the quality of synthesized motions, (ii) benefits of utilizing multiple LLM-generated descriptions, (iii) suitability of the prompt function, and (iv) zero-shot generation capabilities of the proposed approach. Code and pretrained models are available at https://actiongpt.github.io.

引用

页码：31 / 36

页数：6

共 50 条

[1] GPT3Mix: Leveraging Large-scale Language Models for Text Augmentation
Yoo, Kang Min
Park, Dongju
Kang, Jaewook
Lee, Sang-Woo
Park, Woomyeong
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 2225 - 2239
[2] A Large-scale Robustness Analysis of Video Action Recognition Models
Schiappa, Madeline Chantry
Biyani, Naman
Kamtam, Prudvi
Vyas, Shruti
Palangi, Hamid
Vineet, Vibhav
Rawat, Yogesh
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 14698 - 14708
[3] On the preconditions for large-scale collective action
Jagers, Sverker C.
Harring, Niklas
Lofgren, Asa
Sjostedt, Martin
Alpizar, Francisco
Brulde, Bengt
Langlet, David
Nilsson, Andreas
Almroth, Bethanie Carney
Dupont, Sam
Steffen, Will
AMBIO, 2020, 49 (07) : 1282 - 1296
[4] On the preconditions for large-scale collective action
Sverker C. Jagers
Niklas Harring
Åsa Löfgren
Martin Sjöstedt
Francisco Alpizar
Bengt Brülde
David Langlet
Andreas Nilsson
Bethanie Carney Almroth
Sam Dupont
Will Steffen
Ambio, 2020, 49 : 1282 - 1296
[5] Fusion Based Deep CNN for Improved Large-Scale Image Action Recognition
Lavinia, Yukhe
Vo, Holly H.
Verma, Abhishek
PROCEEDINGS OF 2016 IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA (ISM), 2016, : 609 - 614
[6] Large-Scale Human Action Recognition with Spark
Wang, Hanli
Zheng, Xiaobin
Xiao, Bo
2015 IEEE 17TH INTERNATIONAL WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING (MMSP), 2015,
[7] Leveraging Large Language Models for Automatic Smart Contract Generation
Napoli, Emanuele Antonio
Barbara, Fadi
Gatteschi, Valentina
Schifanella, Claudio
2024 IEEE 48TH ANNUAL COMPUTERS, SOFTWARE, AND APPLICATIONS CONFERENCE, COMPSAC 2024, 2024, : 701 - 710
[8] Building a Role Specified Open-Domain Dialogue System Leveraging Large-Scale Language Models
Bae, Sanghwan
Kwak, Donghyun
Kim, Sungdong
Ham, Donghoon
Kang, Soyoung
Lee, Sang-Woo
Park, Woomyoung
NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 2128 - 2150
[9] Cooperation, collective action, and the archeology of large-scale societies
Carballo, David M.
Feinman, Gary M.
EVOLUTIONARY ANTHROPOLOGY, 2016, 25 (06): : 288 - 296
[10] A large-scale fMRI dataset for human action recognition
Zhou, Ming
Gong, Zhengxin
Dai, Yuxuan
Wen, Yushan
Liu, Youyi
Zhen, Zonglei
SCIENTIFIC DATA, 2023, 10 (01)

← 1 2 3 4 5 →