Omni-TransPose: Fusion of OmniPose and Transformer Architecture for Improving Action Detection

被引:0
|
作者
Phu, Khac-Anh [1 ,2 ]
Hoang, Van-Dung [3 ]
Le, Van-Tuong-Lan [4 ]
Tran, Quang-Khai [3 ]
机构
[1] Hue Univ, Univ Sci, Fac Informat Technol, Hue City 530000, Vietnam
[2] Cao Thang Tech Coll, Fac Informat Technol, Ho Chi Minh City 720000, Vietnam
[3] HCMC Univ Technol & Educ, Fac Informat Technol, Ho Chi Minh City 720000, Vietnam
[4] Hue Univ, Dept Acad & Students Affairs, Hue City 530000, Vietnam
关键词
Computer vision; Deep learning; Skeleton data;
D O I
10.1007/978-981-97-5934-7_6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The field of computer vision research has been experiencing rapid and remarkable development in recent years, aiming to analyze image and video data through increasingly sophisticated machine learning models. In this research domain, capturing and extracting relevant features plays a crucial role in approaching the detailed content and semantics of image and video data. Among these, skeleton data, with the ability to represent the position and movements of human body parts, along with its simplicity and independence from external factors, has proven highly effective in solving human action recognition problems. Consequently, many researchers have shown interest and proposed various skeleton data extraction models following different approaches. In this study, we introduce the Omni-TransPose model for skeleton data extraction, constructed by combining the OmniPose model with the Transformer architecture. We conducted experiments on the MPII dataset, using the Percentage of Correct Key Points (PCK) metric to evaluate the effectiveness of the new model. The experimental results were compared with the original OmniPose model, demonstrating a significant improvement in skeleton extraction and recognition, thereby enhancing the capability of human action recognition. This work promises to provide an efficient and powerful method for human action recognition, with broad potential applications in practical scenarios.
引用
收藏
页码:59 / 70
页数:12
相关论文
共 4 条
  • [1] MFVT: an anomaly traffic detection method merging feature fusion network and vision transformer architecture
    Li, Ming
    Han, Dezhi
    Li, Dun
    Liu, Han
    Chang, Chin-Chen
    EURASIP JOURNAL ON WIRELESS COMMUNICATIONS AND NETWORKING, 2022, 2022 (01)
  • [2] MFVT: an anomaly traffic detection method merging feature fusion network and vision transformer architecture
    Ming Li
    Dezhi Han
    Dun Li
    Han Liu
    Chin-Chen Chang
    EURASIP Journal on Wireless Communications and Networking, 2022
  • [3] Cross-modal interaction fusion grasping detection based on Transformer-CNN hybrid architecture
    Wang, Yong
    Li, Yi-Ling
    Miao, Duo-Qian
    An, Chun-Yan
    Yuan, Xin-Lin
    Kongzhi yu Juece/Control and Decision, 2024, 39 (11): : 3607 - 3616
  • [4] LN-DETR: An efficient Transformer architecture for lung nodule detection with multi-scale feature fusion
    Tang, Jiade
    Chen, Xiao
    Fan, Linyuan
    Zhu, Zhenliang
    Huang, Chen
    NEUROCOMPUTING, 2025, 633