Fine-grained activity classification in assembly based on multi-visual modalities

被引:7
|
作者
Chen, Haodong [1 ]
Zendehdel, Niloofar [1 ]
Leu, Ming C. [1 ]
Yin, Zhaozheng [2 ,3 ]
机构
[1] Missouri Univ Sci & Technol, Dept Mech & Aerosp Engn, Rolla, MO 65409 USA
[2] SUNY Stony Brook, Dept Biomed Informat, Stony Brook, NY USA
[3] SUNY Stony Brook, Dept Comp Sci, Stony Brook, NY USA
基金
美国国家科学基金会;
关键词
Fine-grained activity; Activity classification; Assembly; Multi-visual modality; RECOGNITION; LSTM;
D O I
10.1007/s10845-023-02152-x
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Assembly activity recognition and prediction help to improve productivity, quality control, and safety measures in smart factories. This study aims to sense, recognize, and predict a worker's continuous fine-grained assembly activities in a manufacturing platform. We propose a two-stage network for workers' fine-grained activity classification by leveraging scene-level and temporal-level activity features. The first stage is a feature awareness block that extracts scene-level features from multi-visual modalities, including red-green-blue (RGB) and hand skeleton frames. We use the transfer learning method in the first stage and compare three different pre-trained feature extraction models. Then, we transmit the feature information from the first stage to the second stage to learn the temporal-level features of activities. The second stage consists of the Recurrent Neural Network (RNN) layers and a final classifier. We compare the performance of two different RNNs in the second stage, including the Long Short-Term Memory (LSTM) and the Gated Recurrent Unit (GRU). The partial video observation method is used in the prediction of fine-grained activities. In the experiments using the trimmed activity videos, our model achieves an accuracy of > 99% on our dataset and > 98% on the public dataset UCF 101, outperforming the state-of-the-art models. The prediction model achieves an accuracy of > 97% in predicting activity labels using 50% of the onset activity video information. In the experiments using an untrimmed video with continuous assembly activities, we combine our recognition and prediction models and achieve an accuracy of > 91% in real time, surpassing the state-of-the-art models for the recognition of continuous assembly activities.
引用
收藏
页码:2215 / 2233
页数:19
相关论文
共 50 条
  • [1] Dual Transformer With Multi-Grained Assembly for Fine-Grained Visual Classification
    Ji, Ruyi
    Li, Jiaying
    Zhang, Libo
    Liu, Jing
    Wu, Yanjun
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (09) : 5009 - 5021
  • [2] Leveraging Fine-Grained Labels to Regularize Fine-Grained Visual Classification
    Wu, Junfeng
    Yao, Li
    Liu, Bin
    Ding, Zheyuan
    PROCEEDINGS OF THE 11TH INTERNATIONAL CONFERENCE ON COMPUTER MODELING AND SIMULATION (ICCMS 2019) AND 8TH INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING AND APPLICATIONS (ICICA 2019), 2019, : 133 - 136
  • [3] Multi-directional guidance network for fine-grained visual classification
    Yang, Shengying
    Jin, Yao
    Lei, Jingsheng
    Zhang, Shuping
    VISUAL COMPUTER, 2024, 40 (11): : 8113 - 8124
  • [4] TransFGVC: transformer-based fine-grained visual classification
    Shen, Longfeng
    Hou, Bin
    Jian, Yulei
    Tu, Xisong
    Zhang, Yingjie
    Shuai, Lingying
    Ge, Fangzhen
    Chen, Debao
    VISUAL COMPUTER, 2025, 41 (04): : 2439 - 2459
  • [5] A Data Augmentation Based ViT for Fine-Grained Visual Classification
    Yuan, Shuozhi
    Guo, Wenming
    Han, Fang
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2023, PT II, 2023, 14255 : 1 - 12
  • [6] An Erudite Fine-Grained Visual Classification Model
    Chang, Dongliang
    Tong, Yujun
    Du, Ruoyi
    Hospedales, Timothy
    Song, Yi-Zhe
    Ma, Zhanyu
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 7268 - 7277
  • [7] Pairwise Confusion for Fine-Grained Visual Classification
    Dubey, Abhimanyu
    Gupta, Otkrist
    Guo, Pei
    Raskar, Ramesh
    Farrell, Ryan
    Naik, Nikhil
    COMPUTER VISION - ECCV 2018, PT XII, 2018, 11216 : 71 - 88
  • [8] A Feature Fusion Method Based on Multi-Classification Losses for Fine-Grained Visual Categorization
    Zhu, Mengmeng
    Wan, Shouhong
    Jin, Peiquan
    Tian, Qijun
    2021 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2021, : 6072 - 6074
  • [9] Fine-Grained Activity Recognition for Assembly Videos
    Jones, Jonathan D.
    Cortesa, Cathryn
    Shelton, Amy
    Landau, Barbara
    Khudanpur, Sanjeev
    Hager, Gregory D.
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2021, 6 (02): : 3728 - 3735
  • [10] Multi-level navigation network: advancing fine-grained visual classification
    Liang, Hong
    Li, Xian
    Shao, Mingwen
    Zhang, Qian
    JOURNAL OF SUPERCOMPUTING, 2025, 81 (02):