Action-GPT: Leveraging Large-scale Language Models for Improved and Generalized Action Generation

被引:10
|
作者
Kalakonda, Sai Shashank [1 ]
Maheshwari, Shubh [1 ]
Sarvadevabhatla, Ravi Kiran [1 ]
机构
[1] IIIT Hyderabad, CVIT, Hyderabad, India
来源
2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME | 2023年
关键词
text-conditioned action generation models; large-scale language models; prompt function; stochastic and deterministic;
D O I
10.1109/ICME55011.2023.00014
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We introduce Action-GPT, a plug-and-play framework for incorporating Large Language Models (LLMs) into text-based action generation models. Action phrases in current motion capture datasets contain minimal and to-the-point information. By carefully crafting prompts for LLMs, we generate richer and fine-grained descriptions of the action. We show that utilizing these detailed descriptions instead of the original action phrases leads to better alignment of text and motion spaces. We introduce a generic approach compatible with stochastic (e.g. VAE-based) and deterministic (e.g. MotionCLIP) text-to-motion models. In addition, the approach enables multiple text descriptions to be utilized. Our experiments show (i) noticeable qualitative and quantitative improvement in the quality of synthesized motions, (ii) benefits of utilizing multiple LLM-generated descriptions, (iii) suitability of the prompt function, and (iv) zero-shot generation capabilities of the proposed approach. Code and pretrained models are available at https://actiongpt.github.io.
引用
收藏
页码:31 / 36
页数:6
相关论文
共 50 条
  • [41] Private forestland owners in Sweden: Large-scale cooperation in action
    Kittredge, David B.
    Journal of Forestry, 2003, 101 (02): : 41 - 46
  • [42] Heterogenous Action Ensembling for Visual Recognition of Large-Scale Actions
    Rouali, Mohamed Lamine
    Amamra, Abdenour
    Boulahia, Said Yacine
    Benatia, Mohamed Akram
    ADVANCES IN COMPUTING SYSTEMS AND APPLICATIONS, 2022, 513 : 361 - 371
  • [43] Penalized Estimation in Large-Scale Generalized Linear Array Models
    Lund, Adam
    Vincent, Martin
    Hansen, Niels Richard
    JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2017, 26 (03) : 709 - 724
  • [44] Optimal subsampling for generalized additive models on large-scale datasets
    Li, Lili
    Liu, Bingfan
    Liu, Xiaodi
    Shi, Haolun
    Cao, Jiguo
    STATISTICS AND COMPUTING, 2025, 35 (01)
  • [45] A new large-scale learning algorithm for generalized additive models
    Bin Gu
    Chenkang Zhang
    Zhouyuan Huo
    Heng Huang
    Machine Learning, 2023, 112 : 3077 - 3104
  • [46] Large-scale model selection in misspecified generalized linear models
    Demirkaya, Emre
    Feng, Yang
    Basu, Pallavi
    Lv, Jinchi
    BIOMETRIKA, 2022, 109 (01) : 123 - 136
  • [47] A new large-scale learning algorithm for generalized additive models
    Gu, Bin
    Zhang, Chenkang
    Huo, Zhouyuan
    Huang, Heng
    MACHINE LEARNING, 2023, 112 (09) : 3077 - 3104
  • [48] Transforming online learning research: Leveraging GPT large language models for automated content analysis of cognitive presence
    Castellanos-Reyes, Daniela
    Olesova, Larisa
    Sadaf, Ayesha
    INTERNET AND HIGHER EDUCATION, 2025, 65
  • [49] Leveraging Large Language Models for Flexible and Robust Table-to-Text Generation
    Oro, Ermelinda
    De Grandis, Luca
    Granata, Francesco Maria
    Ruffolo, Massimo
    DATABASE AND EXPERT SYSTEMS APPLICATIONS, PT I, DEXA 2024, 2024, 14910 : 222 - 227
  • [50] SMARThealth GPT: Large language models for improved maternal care in resource-constrained environments
    Al Ghadban, Y.
    Sharma, A.
    Lu, H.
    Adavi, U.
    Das, N.
    Gara, S.
    Devarsetty, P.
    Hirst, J.
    BJOG-AN INTERNATIONAL JOURNAL OF OBSTETRICS AND GYNAECOLOGY, 2024, 131 : 137 - 138