CPoser: An Optimization-after-Parsing Approach for Text-to-PoseGeneration Using Large Language Models

被引:0
|
作者
Li, Yumeng [1 ]
Chen, Bohong [1 ]
Ren, Zhong [1 ]
Ding, Yao-xiang [1 ]
Liu, Libin [2 ]
Shao, Tianjia [1 ]
Zhou, Kun [1 ]
机构
[1] Zhejiang Univ, State Key Lab CAD&CG, Hangzhou, Peoples R China
[2] Peking Univ, State Key Lab Gen AI, Beijing, Peoples R China
来源
ACM TRANSACTIONS ON GRAPHICS | 2024年 / 43卷 / 06期
关键词
Human posture; text-to-pose generation; zero-shot learning; pose priors; large language models;
D O I
10.1145/3687932
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Text-to-pose generation is challenging due to the complexity of naturallanguage and human posture semantics. Utilizing large language models(LLMs) for text-to-pose generation is appealing due to their strong capabili-ties in text understanding and reasoning. However, as LLMs are designed forgeneral-purpose language processing and not specifically trained for posegeneration, it remains nontrivial to generate precise articulation targets for the full body using LLMs directly. To this end, we propose CPoser, a novelapproach to harness the power of LLMs for text-to-pose generation, featur-ing a prompt parsing stage and a pose optimization stage. The parsing stageutilizes LLMs to turn text prompts into pose intermediate representations(Pose-IRs) through a set of predefined structured queries. These Pose-IRsexplicitly describe specific pose conditions, such as squatting depth and kneebending angle, naturally forming an objective function that a target poseshould satisfy. The optimization stage solves for expressive poses and handgestures based on the Pose-IR objective function via robust optimizationin a quantized pose prior space. The results are further refined to enhancenaturalness and incorporate facial expressions. Experiments show that ourapproach effectively understands diverse text prompts for pose generation,surpassing existing text-to-pose methods
引用
收藏
页数:13
相关论文
共 50 条
  • [1] Lexicalized and statistical parsing of natural language text in Tamil using hybrid language models
    Selvam, M.
    Natarajan, A.M.
    Thangarajan, R.
    2008, WSEAS (07):
  • [2] Legal Text Analysis Using Large Language Models
    Arfat, Yasir
    Colella, Marco
    Marello, Enrico
    NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS, PT II, NLDB 2024, 2024, 14763 : 258 - 268
  • [3] Embedding Layout in Text for Document Understanding Using Large Language Models
    Minouei, Mohammad
    Soheili, Mohammad Reza
    Stricker, Didier
    DOCUMENT ANALYSIS AND RECOGNITION-ICDAR 2024, PT I, 2024, 14804 : 280 - 293
  • [4] Learning the Visualness of Text Using Large Vision-Language Models
    Verma, Gaurav
    Rossi, Ryan A.
    Tensmeyer, Christopher
    Gu, Jiuxiang
    Nenkova, Ani
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 2394 - 2408
  • [5] Diagnosing infeasible optimization problems using large language models
    Chen, Hao
    Constante-Flores, Gonzalo E.
    Li, Can
    INFOR, 2024, 62 (04) : 573 - 587
  • [6] Using Traditional Text Analysis and Large Language Models in Service Failure and Recovery
    Villarroel Ordenes, Francisco
    Packard, Grant
    Hartmann, Jochen
    Proserpio, Davide
    JOURNAL OF SERVICE RESEARCH, 2025,
  • [7] Causality Extraction from Medical Text Using Large Language Models (LLMs)
    Gopalakrishnan, Seethalakshmi
    Garbayo, Luciana
    Zadrozny, Wlodek
    INFORMATION, 2025, 16 (01)
  • [8] Human-interpretable clustering of short text using large language models
    Miller, Justin K.
    Alexander, Tristram J.
    ROYAL SOCIETY OPEN SCIENCE, 2025, 12 (01):
  • [9] Offensive Text Span Detection in Romanian Comments Using Large Language Models
    Paraschiv, Andrei
    Ion, Teodora Andreea
    Dascalu, Mihai
    INFORMATION, 2024, 15 (01)
  • [10] ZEROTOP: Zero-Shot Task-Oriented Semantic Parsing using Large Language Models
    Mekala, Dheeraj
    Wolfe, Jason
    Roy, Subhro
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 5792 - 5799