Hierarchical Semantics Alignment for 3D Human Motion Retrieval

被引:0
|
作者
Yang, Yang [1 ]
Shi, Haoyu [1 ]
Zhang, Huaiwen [1 ]
机构
[1] Inner Mongolia Univ, Hohhot, Peoples R China
基金
中国国家自然科学基金; 北京市自然科学基金;
关键词
3D Human Motion; Text-to-Motion Retrieval; Multi-modal; Semantics Alignment;
D O I
10.1145/3626772.3657804
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text to 3D human Motion Retrieval (TMR) is a challenging task in information retrieval, aiming to query relevant motion sequences with the natural language description. The conventional approach for TMR is to represent the data instances as point embeddings for alignment. However, in real-world scenarios, multiple motions often co-occur and superimpose on a single avatar. Simply aggregating text and motion sequences into a single global embedding may be inadequate for capturing the intricate semantics of superimposing motions. In addition, most of the motion variations occur locally and subtly, which further presents considerable challenges in precisely aligning motion sequences with their corresponding text. To address the aforementioned challenges, we propose a novel Hierarchical Semantics Alignment (HSA) framework for text-to-3D human motion retrieval. Beyond global alignment, we propose the Probabilistic-based Distribution Alignment (PDA) and a Descriptors-based Fine-grained Alignment (DFA) to achieve precise semantic matching. Specifically, the PDA encodes the text and motion sequences into multidimensional probabilistic distributions, effectively capturing the semantics of superimposing motions. By optimizing the problem of probabilistic distribution alignment, PDA achieves a precise match between superimposing motions and their corresponding text. The DFA first adopts a fine-grained feature gating by selectively filtering to the significant and representative local representations and meanwhile excluding the interferences of meaningless features. Then we adaptively assign local representations from text and motion into a set of cross-modal local aggregated descriptors, enabling local comparison and interaction between fine-grained text and motion features. Extensive experiments on two widely used benchmark datasets, HumanML3D and KIT-ML, demonstrate the effectiveness of the proposed method. It significantly outperforms existing state-of-the-art retrieval methods, achieving Rsum improvements of 24.74% on HumanML3D and 23.08% on KIT-ML.
引用
收藏
页码:1083 / 1092
页数:10
相关论文
共 50 条
  • [41] Hierarchical learning recurrent neural networks for 3D motion synthesis
    Zhou, Dongsheng
    Guo, Chongyang
    Liu, Rui
    Che, Chao
    Yang, Deyun
    Zhang, Qiang
    Wei, Xiaopeng
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2021, 12 (08) : 2255 - 2267
  • [42] Hierarchical learning recurrent neural networks for 3D motion synthesis
    Dongsheng Zhou
    Chongyang Guo
    Rui Liu
    Chao Che
    Deyun Yang
    Qiang Zhang
    Xiaopeng Wei
    International Journal of Machine Learning and Cybernetics, 2021, 12 : 2255 - 2267
  • [43] A robust hierarchical clustering algorithm and its application in 3D model retrieval
    Lv, Tianyang
    Huang, Shaobin
    Zhang, Xizhe
    Wang, Zheng-Xuan
    FIRST INTERNATIONAL MULTI-SYMPOSIUMS ON COMPUTER AND COMPUTATIONAL SCIENCES (IMSCCS 2006), PROCEEDINGS, VOL 2, 2006, : 560 - +
  • [44] 3D sketching for 3D object retrieval
    Li, Bo
    Yuan, Juefei
    Ye, Yuxiang
    Lu, Yijuan
    Zhang, Chaoyang
    Tian, Qi
    MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (06) : 9569 - 9595
  • [45] Hierarchical indexing for 3D head model retrieval based on kernel PCA
    Wong, HS
    Ma, B
    Sha, Y
    Ip, HHS
    NINTH INTERNATIONAL CONFERENCE ON INFORMATION VISUALISATION, PROCEEDINGS, 2005, : 848 - 853
  • [46] 3D sketching for 3D object retrieval
    Bo Li
    Juefei Yuan
    Yuxiang Ye
    Yijuan Lu
    Chaoyang Zhang
    Qi Tian
    Multimedia Tools and Applications, 2021, 80 : 9569 - 9595
  • [47] Unsupervised 3D Object Retrieval with Parameter-Free Hierarchical Clustering
    Getto, Roman
    Kuijper, Arjan
    Fellner, Dieter W.
    CGI'17: PROCEEDINGS OF THE COMPUTER GRAPHICS INTERNATIONAL CONFERENCE, 2017,
  • [48] 3D Face Alignment via Cascade 2D Shape Alignment and Constrained Structure from Motion
    Hou, Yunshu
    Fan, Ping
    Ravyse, Ilse
    Sahli, Hichem
    ADVANCED CONCEPTS FOR INTELLIGENT VISION SYSTEMS, PROCEEDINGS, 2009, 5807 : 550 - 561
  • [49] Online motion recognition for 3D human motion with rejection determination
    Cai, Meiling
    Zou, Beiji
    ICIC Express Letters, 2012, 6 (11): : 2739 - 2744
  • [50] Alignment of 3D models
    Chaouch, Mohamed
    Verroust-Blondet, Anne
    GRAPHICAL MODELS, 2009, 71 (1-6) : 63 - 76