Generating Diverse and Natural 3D Human Motions from Text

被引:149
|
作者
Guo, Chuan [1 ]
Zou, Shihao [1 ]
Zuo, Xinxin [1 ]
Wang, Sen [1 ]
Ji, Wei [1 ]
Li, Xingyu [1 ]
Cheng, Li [1 ]
机构
[1] Univ Alberta, Edmonton, AB, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
D O I
10.1109/CVPR52688.2022.00509
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Automated generation of 3D human motions from text is a challenging problem. The generated motions are expected to be sufficiently diverse to explore the text-grounded motion space, and more importantly, accurately depicting the content in prescribed text descriptions. Here we tackle this problem with a two-stage approach: text2length sampling and text2motion generation. Text2length involves sampling from the learned distribution function of motion lengths conditioned on the input text. This is followed by our text2motion module using temporal variational autoencoder to synthesize a diverse set of human motions of the sampled lengths. Instead of directly engaging with pose sequences, we propose motion snippet code as our internal motion representation, which captures local semantic motion contexts and is empirically shown to facilitate the generation of plausible motions faithful to the input text. Moreover, a large-scale dataset of scripted 3D Human motions, HumanML3D, is constructed, consisting of 14,616 motion clips and 44,970 text descriptions.
引用
收藏
页码:5142 / 5151
页数:10
相关论文
共 50 条
  • [21] Generating 3D Human Texture from a Single Image with Sampling and Refinement
    Cha, Sihun
    Seo, Kwanggyoon
    Ashtari, Amirsaman
    Noh, Junyong
    PROCEEDINGS OF SIGGRAPH 2022 POSTERS, SIGGRAPH 2022, 2022,
  • [22] Natural scenes reveal diverse representations of 2D and 3D body pose in the human brain
    Zhu, Hongru
    Ge, Yijun
    Bratch, Alexander
    Yuille, Alan
    Kay, Kendrick
    Kersten, Daniel
    PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2024, 121 (24)
  • [23] Generating Music from Natural Language Text
    Rangarajan, Rohit
    2015 TENTH INTERNATIONAL CONFERENCE ON DIGITAL INFORMATION MANAGEMENT (ICDIM), 2015, : 23 - 26
  • [24] Content based querying and searching for 3D human motions
    Pawar, Manoj M.
    Pradhan, Gaurav N.
    Zhang, Kang
    Prabhakaran, Balakrishnan
    ADVANCES IN MULTIMEDIA MODELING, PROCEEDINGS, 2008, 4903 : 446 - 455
  • [25] MoMask: Generative Masked Modeling of 3D Human Motions
    Guo, Chuan
    Mu, Yuxuan
    Javed, Muhammad Gohar
    Wang, Sen
    Cheng, Li
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024, 2024, : 1900 - 1910
  • [26] Synthesizing Physically Plausible Human Motions in 3D Scenes
    Pan, Liang
    Wang, Jingbo
    Huang, Buzhen
    Zhang, Junyu
    Wang, Haofan
    Tang, Xu
    Wang, Yangang
    2024 INTERNATIONAL CONFERENCE IN 3D VISION, 3DV 2024, 2024, : 1498 - 1507
  • [27] Differential motions for recovering 3D structure and motions from an unstructured environment
    Vicente, J
    Guinea, D
    Preciado, V
    THREE-DIMENSIONAL IMAGE CAPTURE AND APPLICATIONS III, 2000, 3958 : 126 - 132
  • [28] Generating 3D Model of Furniture from 3D Point Cloud of Room
    Osakama, Shunta
    Manabe, Yoshitsugu
    Yata, Noriko
    INTERNATIONAL WORKSHOP ON ADVANCED IMAGING TECHNOLOGY (IWAIT) 2020, 2020, 11515
  • [29] Text-guided 3D Human Generation from 2D Collections
    Fu, Tsu-Jui
    Xiong, Wenhan
    Nie, Yixin
    Liu, Jingyu
    Oguz, Barlas
    Wang, William Yang
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 4508 - 4520
  • [30] Generating various composite human faces from real 3D facial images
    Igor Chalás
    Petra Urbanová
    Vojtěch Juřík
    Zuzana Ferková
    Marie Jandová
    Jiří Sochor
    Barbora Kozlíková
    The Visual Computer, 2017, 33 : 443 - 458