Action-GPT: Leveraging Large-scale Language Models for Improved and Generalized Action Generation

被引：10

作者：

Kalakonda, Sai Shashank ^{[1
]}

Maheshwari, Shubh ^{[1
]}

Sarvadevabhatla, Ravi Kiran ^{[1
]}

机构：

[1] IIIT Hyderabad, CVIT, Hyderabad, India

来源：

2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME | 2023年

关键词：

text-conditioned action generation models; large-scale language models; prompt function; stochastic and deterministic;

D O I：

10.1109/ICME55011.2023.00014

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We introduce Action-GPT, a plug-and-play framework for incorporating Large Language Models (LLMs) into text-based action generation models. Action phrases in current motion capture datasets contain minimal and to-the-point information. By carefully crafting prompts for LLMs, we generate richer and fine-grained descriptions of the action. We show that utilizing these detailed descriptions instead of the original action phrases leads to better alignment of text and motion spaces. We introduce a generic approach compatible with stochastic (e.g. VAE-based) and deterministic (e.g. MotionCLIP) text-to-motion models. In addition, the approach enables multiple text descriptions to be utilized. Our experiments show (i) noticeable qualitative and quantitative improvement in the quality of synthesized motions, (ii) benefits of utilizing multiple LLM-generated descriptions, (iii) suitability of the prompt function, and (iv) zero-shot generation capabilities of the proposed approach. Code and pretrained models are available at https://actiongpt.github.io.

引用

页码：31 / 36

页数：6

共 50 条

[41] Private forestland owners in Sweden: Large-scale cooperation in action
Kittredge, David B.
Journal of Forestry, 2003, 101 (02): : 41 - 46
[42] Heterogenous Action Ensembling for Visual Recognition of Large-Scale Actions
Rouali, Mohamed Lamine
Amamra, Abdenour
Boulahia, Said Yacine
Benatia, Mohamed Akram
ADVANCES IN COMPUTING SYSTEMS AND APPLICATIONS, 2022, 513 : 361 - 371
[43] Penalized Estimation in Large-Scale Generalized Linear Array Models
Lund, Adam
Vincent, Martin
Hansen, Niels Richard
JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2017, 26 (03) : 709 - 724
[44] Optimal subsampling for generalized additive models on large-scale datasets
Li, Lili
Liu, Bingfan
Liu, Xiaodi
Shi, Haolun
Cao, Jiguo
STATISTICS AND COMPUTING, 2025, 35 (01)
[45] A new large-scale learning algorithm for generalized additive models
Bin Gu
Chenkang Zhang
Zhouyuan Huo
Heng Huang
Machine Learning, 2023, 112 : 3077 - 3104
[46] Large-scale model selection in misspecified generalized linear models
Demirkaya, Emre
Feng, Yang
Basu, Pallavi
Lv, Jinchi
BIOMETRIKA, 2022, 109 (01) : 123 - 136
[47] A new large-scale learning algorithm for generalized additive models
Gu, Bin
Zhang, Chenkang
Huo, Zhouyuan
Huang, Heng
MACHINE LEARNING, 2023, 112 (09) : 3077 - 3104
[48] Transforming online learning research: Leveraging GPT large language models for automated content analysis of cognitive presence
Castellanos-Reyes, Daniela
Olesova, Larisa
Sadaf, Ayesha
INTERNET AND HIGHER EDUCATION, 2025, 65
[49] Leveraging Large Language Models for Flexible and Robust Table-to-Text Generation
Oro, Ermelinda
De Grandis, Luca
Granata, Francesco Maria
Ruffolo, Massimo
DATABASE AND EXPERT SYSTEMS APPLICATIONS, PT I, DEXA 2024, 2024, 14910 : 222 - 227
[50] SMARThealth GPT: Large language models for improved maternal care in resource-constrained environments
Al Ghadban, Y.
Sharma, A.
Lu, H.
Adavi, U.
Das, N.
Gara, S.
Devarsetty, P.
Hirst, J.
BJOG-AN INTERNATIONAL JOURNAL OF OBSTETRICS AND GYNAECOLOGY, 2024, 131 : 137 - 138

← 1 2 3 4 5 →