CWPR: An optimized transformer-based model for construction worker pose estimation on construction robots

被引：0

作者：

Zhou, Jiakai ^{[1
]}

Zhou, Wanlin ^{[1
]}

Wang, Yang ^{[2
,3
]}

机构：

[1] Nanjing Univ Aeronaut & Astronaut, Coll Mech & Elect Engn, Nanjing 210000, Peoples R China

[2] Anhui Univ Technol, Sch Mech Engn, Maanshan 243000, Peoples R China

[3] Anhui Prov Key Lab Special Heavy Load Robot, Maanshan 243000, Peoples R China

来源：

ADVANCED ENGINEERING INFORMATICS | 2024年 / 62卷

关键词：

Construction worker pose; Construction robots; Transformer; Multi-human pose estimation; SURVEILLANCE VIDEOS; RECOGNITION;

D O I：

10.1016/j.aei.2024.102894

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Estimating construction workers' poses is critically important for recognizing unsafe behaviors, conducting ergonomic analyses, and assessing productivity. Recently, utilizing construction robots to capture RGB images for pose estimation offers flexible monitoring perspectives and timely interventions. However, existing multi- human pose estimation (MHPE) methods struggle to balance accuracy and speed, making them unsuitable for real-time applications on construction robots. This paper introduces the Construction Worker Pose Recognizer (CWPR), an optimized Transformer-based MHPE model tailored for construction robots. Specifically, CWPR utilizes a lightweight encoder equipped with a multi-scale feature fusion module to enhance operational speed. Then, an Intersection over Union (IoU)-aware query selection strategy is employed to provide high- quality initial queries for the hybrid decoder, significantly improving performance. Besides, a decoder denoising module is used to incorporate noisy ground truth into the decoder, mitigating sample imbalance and further improving accuracy. Additionally, the Construction Worker Pose and Action (CWPA) dataset is collected from 154 videos captured in real construction scenarios. The dataset is annotated for different tasks: a pose benchmark for MHPE and an action benchmark for action recognition. Experiments demonstrate that CWPR achieves top-level accuracy and the fastest inference speed, attaining 68.1 Average Precision (AP) with a processing time of 26 ms on the COCO test set and 76.2 AP with 21 ms on the CWPA pose benchmark. Moreover, when integrated with the action recognition method ST-GCN on construction robot hardware, CWPR achieves 78.7 AP and a processing time of 19 ms on the CWPA action benchmark, validating its effectiveness for practical deployment.

引用

页数：12

共 50 条

[1] Vision Transformer-based pilot pose estimation
Wu, Honglan
Liu, Hao
Sun, Youchao
Beijing Hangkong Hangtian Daxue Xuebao/Journal of Beijing University of Aeronautics and Astronautics, 2024, 50 (10): : 3100 - 3110
[2] Transformer-based rapid human pose estimation network
Wang, Dong
Xie, Wenjun
Cai, Youcheng
Li, Xinjie
Liu, Xiaoping
COMPUTERS & GRAPHICS-UK, 2023, 116 : 317 - 326
[3] AiPE: A Novel Transformer-Based Pose Estimation Method
Lu, Kai
Min, Dugki
ELECTRONICS, 2024, 13 (05)
[4] Pruning-guided feature distillation for an efficient transformer-based pose estimation model
Kim, Dong-hwi
Lee, Dong-hun
Kim, Aro
Jeong, Jinwoo
Lee, Jong Taek
Kim, Sungjei
Park, Sang-hyo
IET COMPUTER VISION, 2024, 18 (06) : 745 - 758
[5] Transformer-Based Semantic Segmentation for Recycling Materials in Construction
Wang, Xin
Han, Wei
Mo, Sicheng
Cai, Ting
Gong, Yijing
Li, Yin
Zhu, Zhenhua
COMPUTING IN CIVIL ENGINEERING 2023-DATA, SENSING, AND ANALYTICS, 2024, : 25 - 33
[6] A Transformer-Based Network for Full Object Pose Estimation with Depth Refinement
Abdulsalam, Mahmoud
Ahiska, Kenan
Aouf, Nabil
ADVANCED INTELLIGENT SYSTEMS, 2024, 6 (10)
[7] Pose estimation method for construction machine based on improved AlphaPose model
Zhao, Jiayue
Cao, Yunzhong
Xiang, Yuanzhi
ENGINEERING CONSTRUCTION AND ARCHITECTURAL MANAGEMENT, 2024, 31 (03) : 976 - 996
[8] A vision-based marker-less pose estimation system for articulated construction robots
Liang, Ci-Jyun
Lundeen, Kurt M.
McGee, Wes
Menassa, Carol C.
Lee, SangHyun
Kamat, Vineet R.
AUTOMATION IN CONSTRUCTION, 2019, 104 : 80 - 94
[9] Vision transformer-based visual language understanding of the construction process
Yang, Bin
Zhang, Binghan
Han, Yilong
Liu, Boda
Hu, Jiniming
Jin, Yiming
ALEXANDRIA ENGINEERING JOURNAL, 2024, 99 : 242 - 256
[10] Vision-Based Body Pose Estimation of Excavator Using a Transformer-Based Deep-Learning Model
Ji, Ankang
Fan, Hongqin
Xue, Xiaolong
JOURNAL OF COMPUTING IN CIVIL ENGINEERING, 2025, 39 (02)

← 1 2 3 4 5 →