Hierarchical Semantics Alignment for 3D Human Motion Retrieval

被引:0
|
作者
Yang, Yang [1 ]
Shi, Haoyu [1 ]
Zhang, Huaiwen [1 ]
机构
[1] Inner Mongolia Univ, Hohhot, Peoples R China
基金
中国国家自然科学基金; 北京市自然科学基金;
关键词
3D Human Motion; Text-to-Motion Retrieval; Multi-modal; Semantics Alignment;
D O I
10.1145/3626772.3657804
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text to 3D human Motion Retrieval (TMR) is a challenging task in information retrieval, aiming to query relevant motion sequences with the natural language description. The conventional approach for TMR is to represent the data instances as point embeddings for alignment. However, in real-world scenarios, multiple motions often co-occur and superimpose on a single avatar. Simply aggregating text and motion sequences into a single global embedding may be inadequate for capturing the intricate semantics of superimposing motions. In addition, most of the motion variations occur locally and subtly, which further presents considerable challenges in precisely aligning motion sequences with their corresponding text. To address the aforementioned challenges, we propose a novel Hierarchical Semantics Alignment (HSA) framework for text-to-3D human motion retrieval. Beyond global alignment, we propose the Probabilistic-based Distribution Alignment (PDA) and a Descriptors-based Fine-grained Alignment (DFA) to achieve precise semantic matching. Specifically, the PDA encodes the text and motion sequences into multidimensional probabilistic distributions, effectively capturing the semantics of superimposing motions. By optimizing the problem of probabilistic distribution alignment, PDA achieves a precise match between superimposing motions and their corresponding text. The DFA first adopts a fine-grained feature gating by selectively filtering to the significant and representative local representations and meanwhile excluding the interferences of meaningless features. Then we adaptively assign local representations from text and motion into a set of cross-modal local aggregated descriptors, enabling local comparison and interaction between fine-grained text and motion features. Extensive experiments on two widely used benchmark datasets, HumanML3D and KIT-ML, demonstrate the effectiveness of the proposed method. It significantly outperforms existing state-of-the-art retrieval methods, achieving Rsum improvements of 24.74% on HumanML3D and 23.08% on KIT-ML.
引用
收藏
页码:1083 / 1092
页数:10
相关论文
共 50 条
  • [1] 3D HUMAN MOTION RETRIEVAL BASED ON HUMAN HIERARCHICAL INDEX STRUCTURE
    Zhang, Q.
    Guo, X.
    BIOLOGY OF SPORT, 2013, 30 (02) : 145 - 151
  • [2] Hierarchical deep semantic alignment for cross-domain 3D model retrieval
    Song, Dan
    Ling, Yuting
    Li, Tianbao
    Wang, Teng
    Li, Xuanya
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2023, 95
  • [3] Shape and Semantics for 3D Anatomical Structure Retrieval
    Moroni, Davide
    Salvetti, Mario
    Salvetti, Ovidio
    IMTA 2009: IMAGE MINING THEORY AND APPLICATIONS, PROCEEDINGS, 2009, : 73 - +
  • [4] 3D motion retrieval with motion index tree
    Liu, F
    Zhuang, YT
    Wu, F
    Pan, YH
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2003, 92 (2-3) : 265 - 284
  • [5] RETRIEVAL-BASED NATURAL 3D HUMAN MOTION GENERATION
    Li, Yuqi
    Luo, Yizhi
    Wu, Song
    2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [6] 3D Human Motion Retrieval Based on ISOMAP Dimension Reduction
    Guo, Xiaocui
    Zhang, Qiang
    Liu, Rui
    Zhou, Dongsheng
    Dong, Jing
    ARTIFICIAL INTELLIGENCE AND COMPUTATIONAL INTELLIGENCE, PT III, 2011, 7004 : 159 - 169
  • [7] 3D human motion analysis framework for shape similarity and retrieval
    Slama, Rim
    Wannous, Hazem
    Daoudi, Mohamed
    IMAGE AND VISION COMPUTING, 2014, 32 (02) : 131 - 154
  • [8] Hierarchical Instance Feature Alignment for 2D Image-Based 3D Shape Retrieval
    Zhou, Heyu
    Nie, Weizhi
    Li, Wenhui
    Song, Dan
    Liu, An-An
    PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 839 - 845
  • [9] 3D model retrieval by sample based alignment
    Chen, Zong-Yao
    Lin, Wei-Chao
    Tsai, Chih-Fong
    Ke, Shih-Wen
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2016, 40 : 721 - 731
  • [10] TMR: Text-to-Motion Retrieval Using Contrastive 3D Human Motion Synthesis
    Petrovich, Mathis
    Black, Michael J.
    Varol, Guel
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 9454 - 9463