Action-GPT: Leveraging Large-scale Language Models for Improved and Generalized Action Generation

被引：10

作者：

Kalakonda, Sai Shashank ^{[1
]}

Maheshwari, Shubh ^{[1
]}

Sarvadevabhatla, Ravi Kiran ^{[1
]}

机构：

[1] IIIT Hyderabad, CVIT, Hyderabad, India

来源：

2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME | 2023年

关键词：

text-conditioned action generation models; large-scale language models; prompt function; stochastic and deterministic;

D O I：

10.1109/ICME55011.2023.00014

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We introduce Action-GPT, a plug-and-play framework for incorporating Large Language Models (LLMs) into text-based action generation models. Action phrases in current motion capture datasets contain minimal and to-the-point information. By carefully crafting prompts for LLMs, we generate richer and fine-grained descriptions of the action. We show that utilizing these detailed descriptions instead of the original action phrases leads to better alignment of text and motion spaces. We introduce a generic approach compatible with stochastic (e.g. VAE-based) and deterministic (e.g. MotionCLIP) text-to-motion models. In addition, the approach enables multiple text descriptions to be utilized. Our experiments show (i) noticeable qualitative and quantitative improvement in the quality of synthesized motions, (ii) benefits of utilizing multiple LLM-generated descriptions, (iii) suitability of the prompt function, and (iv) zero-shot generation capabilities of the proposed approach. Code and pretrained models are available at https://actiongpt.github.io.

引用

页码：31 / 36

页数：6

共 50 条

[21] Action Contextualization: Adaptive Task Planning and Action Tuning Using Large Language Models
Gupta, Sthithpragya
Yao, Kunpeng
Niederhauser, Loic
Billard, Aude
IEEE ROBOTICS AND AUTOMATION LETTERS, 2024, 9 (11): : 9407 - 9414
[22] Improving Large-scale Language Models and Resources for Filipino
Cruz, Jan Christian Blaise
Cheng, Charibeth
LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 6548 - 6555
[23] A LARGE-SCALE STUDY OF LANGUAGE MODELS FOR CHORD PREDICTION
Korzeniowski, Filip
Sears, David R. W.
Widmer, Gerhard
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 91 - 95
[24] Leveraging Large-scale Multimedia Datasets to Refine Content Moderation Models
Sarridis, Ioannis
Koutlis, Christos
Papadopoulou, Olga
Papadopoulos, Symeon
2022 IEEE EIGHTH INTERNATIONAL CONFERENCE ON MULTIMEDIA BIG DATA (BIGMM 2022), 2022, : 125 - 132
[25] An improved Generalized Discriminant Analysis for Large-scale data set
Shi, Weiya
Guo, Yue-Fei
Jin, Cheng
Xue, Xiangyang
SEVENTH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, PROCEEDINGS, 2008, : 769 - 772
[26] Judge the Judges: A Large-Scale Evaluation Study of Neural Language Models for Online Review Generation
Garbacea, Cristina
Carton, Samuel
Yan, Shiyan
Mei, Qiaozhu
2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 3968 - 3981
[27] Large Language Models as Commonsense Knowledge for Large-Scale Task Planning
Zhao, Zirui
Lee, Wee Sun
Hsu, David
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[28] HaluEval: A Large-Scale Hallucination Evaluation Benchmark for Large Language Models
Li, Junyi
Cheng, Xiaoxue
Zhao, Wayne Xin
Nie, Jian-Yun
Wen, Ji-Rong
2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 6449 - 6464
[29] Dynamic large-scale network synchronization from perception to action
Hirvonen, Jonni
Monto, Simo
Wang, Sheng H.
Palva, J. Matias
Palva, Satu
NETWORK NEUROSCIENCE, 2018, 2 (04): : 442 - 463
[30] Facilitating dynamo action via control of large-scale turbulence
Limone, A.
Hatch, D. R.
Forest, C. B.
Jenko, F.
PHYSICAL REVIEW E, 2012, 86 (06):

← 1 2 3 4 5 →