A Transformer-based Late-Fusion Mechanism for Fine-Grained Object Recognition in Videos

被引:3
|
作者
Koch, Jannik [1 ]
Wolf, Stefan [1 ,2 ]
Beyerer, Juergen [1 ,2 ,3 ]
机构
[1] Fraunhofer IOSB, Karlsruhe, Germany
[2] Karlsruhe Inst Technol, Vis & Future Lab, Karlsruhe, Germany
[3] Fraunhofer Ctr Machine Learning, Munich, Germany
关键词
D O I
10.1109/WACVW58289.2023.00015
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Fine-grained image classification is limited by only considering a single view while in many cases, like surveillance, a whole video exists which provides multiple perspectives. However, the potential of videos is mostly considered in the context of action recognition while finegrained object recognition is rarely considered as an application for video classification. This leads to recent video classification architectures being inappropriate for the task of fine-grained object recognition. We propose a novel, Transformer-based late-fusion mechanism for finegrained video classification. Our approach achieves superior results to both early-fusion mechanisms, like the Video Swin Transformer, and a simple consensus-based late-fusion baseline with a modern Swin Transformer backbone. Additionally, we achieve improved efficiency, as our results show a high increase in accuracy with only a slight increase in computational complexity. Code is available at: https://github.com/wolfstefan/tlf.
引用
收藏
页码:100 / 109
页数:10
相关论文
共 50 条
  • [1] Transformer-based descriptors with fine-grained region supervisions for visual place recognition
    Wang, Yuwei
    Qiu, Yuanying
    Cheng, Peitao
    Zhang, Junyu
    KNOWLEDGE-BASED SYSTEMS, 2023, 280
  • [2] TransFGVC: transformer-based fine-grained visual classification
    Shen, Longfeng
    Hou, Bin
    Jian, Yulei
    Tu, Xisong
    Zhang, Yingjie
    Shuai, Lingying
    Ge, Fangzhen
    Chen, Debao
    VISUAL COMPUTER, 2024,
  • [3] TransFGVC: transformer-based fine-grained visual classificationTransFGVC: transformer-based fine-grained visual classificationL. Shen et al.
    Longfeng Shen
    Bin Hou
    Yulei Jian
    Xisong Tu
    Yingjie Zhang
    Lingying Shuai
    Fangzhen Ge
    Debao Chen
    The Visual Computer, 2025, 41 (4) : 2439 - 2459
  • [4] Fine-Grained Activity Recognition for Assembly Videos
    Jones, Jonathan D.
    Cortesa, Cathryn
    Shelton, Amy
    Landau, Barbara
    Khudanpur, Sanjeev
    Hager, Gregory D.
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2021, 6 (02): : 3728 - 3735
  • [5] Fine-grained Activity Recognition in Baseball Videos
    Piergiovanni, A. J.
    Ryoo, Michael S.
    PROCEEDINGS 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), 2018, : 1821 - 1829
  • [6] FINE-GRAINED AND LAYERED OBJECT RECOGNITION
    Wu, Yang
    Zheng, Nanning
    Liu, Yuanliu
    Yuan, Zejian
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2012, 26 (02)
  • [7] SwinFG: A fine-grained recognition scheme based on swin transformer
    Ma, Zhipeng
    Wu, Xiaoyu
    Chu, Anzhuo
    Huang, Lei
    Wei, Zhiqiang
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 244
  • [8] Fine-grained citation count prediction via a transformer-based model with among-attention mechanism
    Huang, Shengzhi
    Huang, Yong
    Bu, Yi
    Lu, Wei
    Qian, Jiajia
    Wang, Dan
    INFORMATION PROCESSING & MANAGEMENT, 2022, 59 (02)
  • [9] TransFG: A Transformer Architecture for Fine-Grained Recognition
    He, Ju
    Chen, Jie-Neng
    Liu, Shuai
    Kortylewski, Adam
    Yang, Cheng
    Bai, Yutong
    Wang, Changhu
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 852 - 860
  • [10] FINE-GRAINED STYLE CONTROL IN TRANSFORMER-BASED TEXT-TO-SPEECH SYNTHESIS
    Chen, Li-Wei
    Rudnicky, Alexander
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7907 - 7911