CWPR: An optimized transformer-based model for construction worker pose estimation on construction robots

被引:0
|
作者
Zhou, Jiakai [1 ]
Zhou, Wanlin [1 ]
Wang, Yang [2 ,3 ]
机构
[1] Nanjing Univ Aeronaut & Astronaut, Coll Mech & Elect Engn, Nanjing 210000, Peoples R China
[2] Anhui Univ Technol, Sch Mech Engn, Maanshan 243000, Peoples R China
[3] Anhui Prov Key Lab Special Heavy Load Robot, Maanshan 243000, Peoples R China
关键词
Construction worker pose; Construction robots; Transformer; Multi-human pose estimation; SURVEILLANCE VIDEOS; RECOGNITION;
D O I
10.1016/j.aei.2024.102894
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Estimating construction workers' poses is critically important for recognizing unsafe behaviors, conducting ergonomic analyses, and assessing productivity. Recently, utilizing construction robots to capture RGB images for pose estimation offers flexible monitoring perspectives and timely interventions. However, existing multi- human pose estimation (MHPE) methods struggle to balance accuracy and speed, making them unsuitable for real-time applications on construction robots. This paper introduces the Construction Worker Pose Recognizer (CWPR), an optimized Transformer-based MHPE model tailored for construction robots. Specifically, CWPR utilizes a lightweight encoder equipped with a multi-scale feature fusion module to enhance operational speed. Then, an Intersection over Union (IoU)-aware query selection strategy is employed to provide high- quality initial queries for the hybrid decoder, significantly improving performance. Besides, a decoder denoising module is used to incorporate noisy ground truth into the decoder, mitigating sample imbalance and further improving accuracy. Additionally, the Construction Worker Pose and Action (CWPA) dataset is collected from 154 videos captured in real construction scenarios. The dataset is annotated for different tasks: a pose benchmark for MHPE and an action benchmark for action recognition. Experiments demonstrate that CWPR achieves top-level accuracy and the fastest inference speed, attaining 68.1 Average Precision (AP) with a processing time of 26 ms on the COCO test set and 76.2 AP with 21 ms on the CWPA pose benchmark. Moreover, when integrated with the action recognition method ST-GCN on construction robot hardware, CWPR achieves 78.7 AP and a processing time of 19 ms on the CWPA action benchmark, validating its effectiveness for practical deployment.
引用
收藏
页数:12
相关论文
共 50 条
  • [31] Construction Worker Posture Estimation Using OpenPose
    Ojelade, Aanuoluwapo
    Paige, Frederick
    CONSTRUCTION RESEARCH CONGRESS 2020: SAFETY, WORKFORCE, AND EDUCATION, 2020, : 556 - 564
  • [32] Multi-hypothesis representation learning for transformer-based 3D human pose estimation
    Li, Wenhao
    Liu, Hong
    Tang, Hao
    Wang, Pichao
    PATTERN RECOGNITION, 2023, 141
  • [33] YOLOPose V2: Understanding and improving transformer-based 6D pose estimation
    Periyasamy, Arul Selvam
    Amini, Arash
    Tsaturyan, Vladimir
    Behnke, Sven
    ROBOTICS AND AUTONOMOUS SYSTEMS, 2023, 168
  • [34] Construction Worker Fatigue Prediction Model Based on System Dynamic
    Adi, Tri Joko Wahyu
    Ratnawinanda, Lila Ayu
    6TH INTERNATIONAL CONFERENCE OF EURO ASIA CIVIL ENGINEERING FORUM (EACEF 2017), 2017, 138
  • [35] Face tracking and pose estimation with automatic three-dimensional model construction
    Jimenez, P.
    Nuevo, J.
    Bergasa, L. M.
    Sotelo, M. A.
    IET COMPUTER VISION, 2009, 3 (02) : 93 - 102
  • [36] Automatic model construction and pose estimation from photographs using triangular splines
    Sullivan, S
    Ponce, J
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1998, 20 (10) : 1091 - 1097
  • [37] Multilingual Transformer-Based Personality Traits Estimation
    Leonardi, Simone
    Monti, Diego
    Rizzo, Giuseppe
    Morisio, Maurizio
    INFORMATION, 2020, 11 (04)
  • [38] DiffSurf: A Transformer-Based Diffusion Model for Generating and Reconstructing 3D Surfaces in Pose
    Yoshiyasu, Yusuke
    Sun, Leyuan
    COMPUTER VISION-ECCV 2024, PT LXXXII, 2025, 15140 : 246 - 264
  • [39] Computer Vision-Based Monitoring of Construction Site Housekeeping: An Evaluation of CNN and Transformer-Based Models
    Shao, Zherui
    Goh, Yang Miang
    Tian, Jing
    Lim, Yu Guang
    Gan, Vincent Jie Long
    COMPUTING IN CIVIL ENGINEERING 2023-RESILIENCE, SAFETY, AND SUSTAINABILITY, 2024, : 508 - 515
  • [40] OptiState: State Estimation of Legged Robots using Gated Networks with Transformer-based Vision and Kalman Filtering
    Schperberg, Alexander
    Tanaka, Yusuke
    Mowlavi, Saviz
    Xu, Feng
    Balaji, Bharathan
    Hong, Dennis
    2024 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, ICRA 2024, 2024, : 6314 - 6320