A Transformer-based Late-Fusion Mechanism for Fine-Grained Object Recognition in Videos

被引:3
|
作者
Koch, Jannik [1 ]
Wolf, Stefan [1 ,2 ]
Beyerer, Juergen [1 ,2 ,3 ]
机构
[1] Fraunhofer IOSB, Karlsruhe, Germany
[2] Karlsruhe Inst Technol, Vis & Future Lab, Karlsruhe, Germany
[3] Fraunhofer Ctr Machine Learning, Munich, Germany
关键词
D O I
10.1109/WACVW58289.2023.00015
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Fine-grained image classification is limited by only considering a single view while in many cases, like surveillance, a whole video exists which provides multiple perspectives. However, the potential of videos is mostly considered in the context of action recognition while finegrained object recognition is rarely considered as an application for video classification. This leads to recent video classification architectures being inappropriate for the task of fine-grained object recognition. We propose a novel, Transformer-based late-fusion mechanism for finegrained video classification. Our approach achieves superior results to both early-fusion mechanisms, like the Video Swin Transformer, and a simple consensus-based late-fusion baseline with a modern Swin Transformer backbone. Additionally, we achieve improved efficiency, as our results show a high increase in accuracy with only a slight increase in computational complexity. Code is available at: https://github.com/wolfstefan/tlf.
引用
收藏
页码:100 / 109
页数:10
相关论文
共 50 条
  • [21] FineFormer: Fine-Grained Adaptive Object Transformer for Image Captioning
    Wang, Bo
    Zhang, Zhao
    Fan, Jicong
    Zhao, Mingbo
    Zhan, Choujun
    Xu, Mingliang
    2022 IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2022, : 508 - 517
  • [22] Efficient object detection and segmentation for fine-grained recognition
    Angelova, Anelia
    Zhu, Shenghuo
    2013 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2013, : 811 - 818
  • [23] ITERATIVE OBJECT AND PART TRANSFER FOR FINE-GRAINED RECOGNITION
    Shen, Zhiqiang
    Jiang, Yu-Gang
    Wang, Dequan
    Xue, Xiangyang
    2017 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2017, : 1470 - 1475
  • [24] Fine-grained object recognition in underwater visual data
    Spampinato, C.
    Palazzo, S.
    Joalland, P. H.
    Paris, S.
    Glotin, H.
    Blanc, K.
    Lingrand, D.
    Precioso, F.
    MULTIMEDIA TOOLS AND APPLICATIONS, 2016, 75 (03) : 1701 - 1720
  • [25] Fine-grained object recognition in underwater visual data
    C. Spampinato
    S. Palazzo
    P. H. Joalland
    S. Paris
    H. Glotin
    K. Blanc
    D. Lingrand
    F. Precioso
    Multimedia Tools and Applications, 2016, 75 : 1701 - 1720
  • [26] A fine-grained protection mechanism in object-based operating systems
    Shigeta, S
    Tanimori, T
    Shimizu, K
    Ashihara, H
    PROCEEDINGS OF THE FIFTH INTERNATIONAL WORKSHOP ON OBJECT-ORIENTATION IN OPERATING SYSTEMS, 1996, : 156 - 160
  • [27] Multi-level information fusion Transformer with background filter for fine-grained image recognition
    Yu, Ying
    Wang, Jinghui
    Pedrycz, Witold
    Miao, Duoqian
    Qian, Jin
    APPLIED INTELLIGENCE, 2024, 54 (17-18) : 8108 - 8119
  • [28] Vclusters: A flexible, fine-grained object clustering mechanism
    McAuliffe, ML
    Carey, MJ
    Solomon, MH
    ACM SIGPLAN NOTICES, 1998, 33 (10) : 230 - 243
  • [29] Group-Attention Transformer for Fine-Grained Image Recognition
    Yan, Bo
    Wang, Siwei
    Zhu, En
    Liu, Xinwang
    Chen, Wei
    Communications in Computer and Information Science, 2022, 1587 CCIS : 40 - 54
  • [30] An Integrated Transformer with Collaborative Tokens Mining for Fine-Grained Recognition
    Yang, Weiwei
    Yin, Jian
    ELECTRONICS, 2023, 12 (12)