Hierarchical Semantics Alignment for 3D Human Motion Retrieval

被引:0
|
作者
Yang, Yang [1 ]
Shi, Haoyu [1 ]
Zhang, Huaiwen [1 ]
机构
[1] Inner Mongolia Univ, Hohhot, Peoples R China
基金
中国国家自然科学基金; 北京市自然科学基金;
关键词
3D Human Motion; Text-to-Motion Retrieval; Multi-modal; Semantics Alignment;
D O I
10.1145/3626772.3657804
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text to 3D human Motion Retrieval (TMR) is a challenging task in information retrieval, aiming to query relevant motion sequences with the natural language description. The conventional approach for TMR is to represent the data instances as point embeddings for alignment. However, in real-world scenarios, multiple motions often co-occur and superimpose on a single avatar. Simply aggregating text and motion sequences into a single global embedding may be inadequate for capturing the intricate semantics of superimposing motions. In addition, most of the motion variations occur locally and subtly, which further presents considerable challenges in precisely aligning motion sequences with their corresponding text. To address the aforementioned challenges, we propose a novel Hierarchical Semantics Alignment (HSA) framework for text-to-3D human motion retrieval. Beyond global alignment, we propose the Probabilistic-based Distribution Alignment (PDA) and a Descriptors-based Fine-grained Alignment (DFA) to achieve precise semantic matching. Specifically, the PDA encodes the text and motion sequences into multidimensional probabilistic distributions, effectively capturing the semantics of superimposing motions. By optimizing the problem of probabilistic distribution alignment, PDA achieves a precise match between superimposing motions and their corresponding text. The DFA first adopts a fine-grained feature gating by selectively filtering to the significant and representative local representations and meanwhile excluding the interferences of meaningless features. Then we adaptively assign local representations from text and motion into a set of cross-modal local aggregated descriptors, enabling local comparison and interaction between fine-grained text and motion features. Extensive experiments on two widely used benchmark datasets, HumanML3D and KIT-ML, demonstrate the effectiveness of the proposed method. It significantly outperforms existing state-of-the-art retrieval methods, achieving Rsum improvements of 24.74% on HumanML3D and 23.08% on KIT-ML.
引用
收藏
页码:1083 / 1092
页数:10
相关论文
共 50 条
  • [31] MODELING AND TRANSFORMATION OF 3D HUMAN MOTION
    Etemad, Seyed Ali
    Arya, Ali
    GRAPP 2010: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON COMPUTER GRAPHICS THEORY AND APPLICATIONS, 2010, : 307 - 315
  • [32] Advances in description of 3D human motion
    Khokhlova, Margarita
    Migniot, Cyrille
    Dipanda, Albert
    MULTIMEDIA TOOLS AND APPLICATIONS, 2018, 77 (24) : 31665 - 31691
  • [33] Advances in description of 3D human motion
    Margarita Khokhlova
    Cyrille Migniot
    Albert Dipanda
    Multimedia Tools and Applications, 2018, 77 : 31665 - 31691
  • [34] Reconstruct 3D human motion using motion library
    Qiu, Xianjie
    Wang, Wenzhong
    Wang, Rongrong
    Li, Jintao
    Wang, Zhaoqi
    PROCEEDINGS OF UK-CHINA SPORTS ENGINEERING WORKSHOP, 2007, : 173 - +
  • [35] Motion Improvisation: 3D Human Motion Synthesis with a Transformer
    Liu, Yimeng
    Sra, Misha
    ADJUNCT PROCEEDINGS OF THE 34TH ANNUAL ACM SYMPOSIUM ON USER INTERFACE SOFTWARE AND TECHNOLOGY, UIST 2021, 2021, : 26 - 28
  • [36] On the retrieval of 3D mesh sequences of human actions
    Christos Veinidis
    Ioannis Pratikakis
    Theoharis Theoharis
    Multimedia Tools and Applications, 2017, 76 : 2059 - 2085
  • [37] On the retrieval of 3D mesh sequences of human actions
    Veinidis, Christos
    Pratikakis, Ioannis
    Theoharis, Theoharis
    MULTIMEDIA TOOLS AND APPLICATIONS, 2017, 76 (02) : 2059 - 2085
  • [38] 3D motion retrieval based on double index and user interaction
    Jian, Xiang
    International Journal of Information and Communication Technology, 2013, 5 (3-4) : 257 - 262
  • [39] A motion-aware approach to continuous retrieval of 3D objects
    Ali, Mohammed Eunus
    Zhang, Rui
    Tanin, Egemen
    Kulik, Lars
    2008 IEEE 24TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, VOLS 1-3, 2008, : 843 - +
  • [40] Hierarchical indexing structure for 3D human motions
    Pradhan, Gaurav N.
    Li, Chuanjun
    Prabhakaran, Balakrishnan
    ADVANCES IN MULTIMEDIA MODELING, PT 1, 2007, 4351 : 386 - 396