Omni-TransPose: Fusion of OmniPose and Transformer Architecture for Improving Action Detection

被引：0

作者：

Phu, Khac-Anh ^{[1
,2
]}

Hoang, Van-Dung ^{[3
]}

Le, Van-Tuong-Lan ^{[4
]}

Tran, Quang-Khai ^{[3
]}

机构：

[1] Hue Univ, Univ Sci, Fac Informat Technol, Hue City 530000, Vietnam

[2] Cao Thang Tech Coll, Fac Informat Technol, Ho Chi Minh City 720000, Vietnam

[3] HCMC Univ Technol & Educ, Fac Informat Technol, Ho Chi Minh City 720000, Vietnam

[4] Hue Univ, Dept Acad & Students Affairs, Hue City 530000, Vietnam

来源：

RECENT CHALLENGES IN INTELLIGENT INFORMATION AND DATABASE SYSTEMS, PT II, ACIIDS 2024 | 2024年 / 2145卷

关键词：

Computer vision; Deep learning; Skeleton data;

D O I：

10.1007/978-981-97-5934-7_6

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The field of computer vision research has been experiencing rapid and remarkable development in recent years, aiming to analyze image and video data through increasingly sophisticated machine learning models. In this research domain, capturing and extracting relevant features plays a crucial role in approaching the detailed content and semantics of image and video data. Among these, skeleton data, with the ability to represent the position and movements of human body parts, along with its simplicity and independence from external factors, has proven highly effective in solving human action recognition problems. Consequently, many researchers have shown interest and proposed various skeleton data extraction models following different approaches. In this study, we introduce the Omni-TransPose model for skeleton data extraction, constructed by combining the OmniPose model with the Transformer architecture. We conducted experiments on the MPII dataset, using the Percentage of Correct Key Points (PCK) metric to evaluate the effectiveness of the new model. The experimental results were compared with the original OmniPose model, demonstrating a significant improvement in skeleton extraction and recognition, thereby enhancing the capability of human action recognition. This work promises to provide an efficient and powerful method for human action recognition, with broad potential applications in practical scenarios.

引用

页码：59 / 70

页数：12

共 4 条

[1] MFVT: an anomaly traffic detection method merging feature fusion network and vision transformer architecture
Li, Ming
Han, Dezhi
Li, Dun
Liu, Han
Chang, Chin-Chen
EURASIP JOURNAL ON WIRELESS COMMUNICATIONS AND NETWORKING, 2022, 2022 (01)
[2] MFVT: an anomaly traffic detection method merging feature fusion network and vision transformer architecture
Ming Li
Dezhi Han
Dun Li
Han Liu
Chin-Chen Chang
EURASIP Journal on Wireless Communications and Networking, 2022
[3] Cross-modal interaction fusion grasping detection based on Transformer-CNN hybrid architecture
Wang, Yong
Li, Yi-Ling
Miao, Duo-Qian
An, Chun-Yan
Yuan, Xin-Lin
Kongzhi yu Juece/Control and Decision, 2024, 39 (11): : 3607 - 3616
[4] LN-DETR: An efficient Transformer architecture for lung nodule detection with multi-scale feature fusion
Tang, Jiade
Chen, Xiao
Fan, Linyuan
Zhu, Zhenliang
Huang, Chen
NEUROCOMPUTING, 2025, 633

← 1 →