MMM: Generative Masked Motion Model

被引:1
|
作者
Pinyoanuntapong, Ekkasit [1 ]
Wang, Pu [1 ]
Lee, Minwoo [1 ]
Chen, Chen [2 ]
机构
[1] Univ North Carolina Charlotte, Charlotte, NC 28223 USA
[2] Univ Cent Florida, Orlando, FL 32816 USA
关键词
D O I
10.1109/CVPR52733.2024.00153
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent advances in text-to-motion generation using diffusion and autoregressive models have shown promising results. However, these models often suffer from a trade-off between real-time performance, high fidelity, and motion editability. To address this gap, we introduce MMM, a novel yet simple motion generation paradigm based on Masked Motion Model. MMM consists of two key components: (1) a motion tokenizer that transforms 3D human motion into a sequence of discrete tokens in latent space, and (2) a conditional masked motion transformer that learns to predict randomly masked motion tokens, conditioned on the precomputed text tokens. By attending to motion and text tokens in all directions, MMM explicitly captures inherent dependency among motion tokens and semantic mapping between motion and text tokens. During inference, this allows parallel and iterative decoding of multiple motion tokens that are highly consistent with fine-grained text descriptions, therefore simultaneously achieving high-fidelity and high-speed motion generation. In addition, MMM has innate motion editability. By simply placing mask tokens in the place that needs editing, MMM automatically fills the gaps while guaranteeing smooth transitions between editing and non-editing parts. Extensive experiments on the HumanML3D and KIT-ML datasets demonstrate that MMM surpasses current leading methods in generating high-quality motion (evidenced by superior FID scores of 0.08 and 0.429), while offering advanced editing features such as body-part modification, motion in-betweening, and the synthesis of long motion sequences. In addition, MMM is two orders of magnitude faster on a single mid-range GPU than editable motion diffusion models. Our project page is available at https://exitudio.github.io/MMM-page/.
引用
收藏
页码:1546 / 1555
页数:10
相关论文
共 50 条
  • [31] Predictions from masked motion with and without obstacles
    Goldstein, Ariel
    Rivlin, Ido
    Goldstein, Alon
    Pertzov, Yoni
    Hassin, Ran R.
    PLOS ONE, 2020, 15 (11):
  • [32] MAGE: MAsked Generative Encoder to Unify Representation Learning and Image Synthesis
    Li, Tianhong
    Chang, Huiwen
    Mishra, Shlok Kumar
    Zhang, Han
    Katabi, Dina
    Krishnan, Dilip
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 2142 - 2152
  • [33] Masked Generative Adversarial Networks are Data-Efficient Generation Learners
    Huang, Jiaxing
    Cui, Kaiwen
    Guan, Dayan
    Xiao, Aoran
    Zhan, Fangneng
    Lu, Shijian
    Liao, Shengcai
    Xing, Eric
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [34] Muse: Text-To-Image Generation via Masked Generative Transformers
    Chang, Huiwen
    Zhang, Han
    Barber, Jarred
    Maschinot, A. J.
    Lezama, Jose
    Jiang, Lu
    Yang, Ming-Hsuan
    Murphy, Kevin
    Freeman, William T.
    Rubinstein, Michael
    Li, Yuanzhen
    Krishnan, Dilip
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 202, 2023, 202
  • [35] MoMask: Generative Masked Modeling of 3D Human Motions
    Guo, Chuan
    Mu, Yuxuan
    Javed, Muhammad Gohar
    Wang, Sen
    Cheng, Li
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024, 2024, : 1900 - 1910
  • [36] Masked Auto-Encoders Meet Generative Adversarial Networks and Beyond
    Fei, Zhengcong
    Fan, Mingyuan
    Zhu, Li
    Huang, Junshi
    Wei, Xiaoming
    Wei, Xiaolin
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 24449 - 24459
  • [37] Masked Face Recognition with Generative Data Augmentation and Domain Constrained Ranking
    Geng, Mengyue
    Peng, Peixi
    Huang, Yangru
    Tian, Yonghong
    MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 2246 - 2254
  • [38] Generative Motion: Queer Ecology and Avatar
    Anglin, Sallie
    JOURNAL OF POPULAR CULTURE, 2015, 48 (02): : 341 - 354
  • [39] Deep Generative Filter for Motion Deblurring
    Ramakrishnan, Sainandan
    Pachori, Shubham
    Gangopadhyay, Aalok
    Raman, Shanmuganathan
    2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2017), 2017, : 2993 - 3000
  • [40] Unsupervised Condition Diagnosis of Linear Motion Guide Using Generative Model Based on Images
    Hong, Dongwoo
    Bang, Seunghyun
    Kim, Byeongil
    IEEE ACCESS, 2021, 9 (09): : 80491 - 80499