Hierarchical Semantics Alignment for 3D Human Motion Retrieval

被引：0

作者：

Yang, Yang ^{[1
]}

Shi, Haoyu ^{[1
]}

Zhang, Huaiwen ^{[1
]}

机构：

[1] Inner Mongolia Univ, Hohhot, Peoples R China

来源：

PROCEEDINGS OF THE 47TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2024 | 2024年

基金：

中国国家自然科学基金; 北京市自然科学基金;

关键词：

3D Human Motion; Text-to-Motion Retrieval; Multi-modal; Semantics Alignment;

D O I：

10.1145/3626772.3657804

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Text to 3D human Motion Retrieval (TMR) is a challenging task in information retrieval, aiming to query relevant motion sequences with the natural language description. The conventional approach for TMR is to represent the data instances as point embeddings for alignment. However, in real-world scenarios, multiple motions often co-occur and superimpose on a single avatar. Simply aggregating text and motion sequences into a single global embedding may be inadequate for capturing the intricate semantics of superimposing motions. In addition, most of the motion variations occur locally and subtly, which further presents considerable challenges in precisely aligning motion sequences with their corresponding text. To address the aforementioned challenges, we propose a novel Hierarchical Semantics Alignment (HSA) framework for text-to-3D human motion retrieval. Beyond global alignment, we propose the Probabilistic-based Distribution Alignment (PDA) and a Descriptors-based Fine-grained Alignment (DFA) to achieve precise semantic matching. Specifically, the PDA encodes the text and motion sequences into multidimensional probabilistic distributions, effectively capturing the semantics of superimposing motions. By optimizing the problem of probabilistic distribution alignment, PDA achieves a precise match between superimposing motions and their corresponding text. The DFA first adopts a fine-grained feature gating by selectively filtering to the significant and representative local representations and meanwhile excluding the interferences of meaningless features. Then we adaptively assign local representations from text and motion into a set of cross-modal local aggregated descriptors, enabling local comparison and interaction between fine-grained text and motion features. Extensive experiments on two widely used benchmark datasets, HumanML3D and KIT-ML, demonstrate the effectiveness of the proposed method. It significantly outperforms existing state-of-the-art retrieval methods, achieving Rsum improvements of 24.74% on HumanML3D and 23.08% on KIT-ML.

引用

页码：1083 / 1092

页数：10

共 50 条

[31] MODELING AND TRANSFORMATION OF 3D HUMAN MOTION
Etemad, Seyed Ali
Arya, Ali
GRAPP 2010: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON COMPUTER GRAPHICS THEORY AND APPLICATIONS, 2010, : 307 - 315
[32] Advances in description of 3D human motion
Khokhlova, Margarita
Migniot, Cyrille
Dipanda, Albert
MULTIMEDIA TOOLS AND APPLICATIONS, 2018, 77 (24) : 31665 - 31691
[33] Advances in description of 3D human motion
Margarita Khokhlova
Cyrille Migniot
Albert Dipanda
Multimedia Tools and Applications, 2018, 77 : 31665 - 31691
[34] Reconstruct 3D human motion using motion library
Qiu, Xianjie
Wang, Wenzhong
Wang, Rongrong
Li, Jintao
Wang, Zhaoqi
PROCEEDINGS OF UK-CHINA SPORTS ENGINEERING WORKSHOP, 2007, : 173 - +
[35] Motion Improvisation: 3D Human Motion Synthesis with a Transformer
Liu, Yimeng
Sra, Misha
ADJUNCT PROCEEDINGS OF THE 34TH ANNUAL ACM SYMPOSIUM ON USER INTERFACE SOFTWARE AND TECHNOLOGY, UIST 2021, 2021, : 26 - 28
[36] On the retrieval of 3D mesh sequences of human actions
Christos Veinidis
Ioannis Pratikakis
Theoharis Theoharis
Multimedia Tools and Applications, 2017, 76 : 2059 - 2085
[37] On the retrieval of 3D mesh sequences of human actions
Veinidis, Christos
Pratikakis, Ioannis
Theoharis, Theoharis
MULTIMEDIA TOOLS AND APPLICATIONS, 2017, 76 (02) : 2059 - 2085
[38] 3D motion retrieval based on double index and user interaction
Jian, Xiang
International Journal of Information and Communication Technology, 2013, 5 (3-4) : 257 - 262
[39] A motion-aware approach to continuous retrieval of 3D objects
Ali, Mohammed Eunus
Zhang, Rui
Tanin, Egemen
Kulik, Lars
2008 IEEE 24TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, VOLS 1-3, 2008, : 843 - +
[40] Hierarchical indexing structure for 3D human motions
Pradhan, Gaurav N.
Li, Chuanjun
Prabhakaran, Balakrishnan
ADVANCES IN MULTIMEDIA MODELING, PT 1, 2007, 4351 : 386 - 396

← 1 2 3 4 5 →