Self-supervised Vision Transformers for Writer Retrieval

被引:0
|
作者
Raven, Tim [1 ]
Matei, Arthur [1 ]
Fink, Gernot A. [1 ]
机构
[1] TU Dortmund Univ, Dortmund, Germany
关键词
Writer Retrieval; Writer Identification; Historical Documents; Self-Supervised Learning; Vision Transformer; IDENTIFICATION; FEATURES; VLAD;
D O I
10.1007/978-3-031-70536-6_23
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
While methods based on Vision Transformers (ViT) have achieved state-of-the-art performance in many domains, they have not yet been applied successfully in the domain of writer retrieval. The field is dominated by methods using handcrafted features or features extracted from Convolutional Neural Networks. In this work, we bridge this gap and present a novel method that extracts features from a ViT and aggregates them using VLAD encoding. The model is trained in a self-supervised fashion without any need for labels. We show that extracting local foreground features is superior to using the ViT's class token in the context of writer retrieval. We evaluate our method on two historical document collections. We set a new state-at-of-art performance on the Historical-WI dataset (83.1% mAP), and the HisIR19 dataset (95.0% mAP). Additionally, we demonstrate that our ViT feature extractor can be directly applied to modern datasets such as the CVL database (98.6% mAP) without any fine-tuning.
引用
收藏
页码:380 / 396
页数:17
相关论文
共 50 条
  • [21] Scaling Vision Transformers to Gigapixel Images via Hierarchical Self-Supervised Learning
    Chen, Richard J.
    Chen, Chengkuan
    Li, Yicong
    Chen, Tiffany Y.
    Trister, Andrew D.
    Krishnan, Rahul G.
    Mahmood, Faisal
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 16123 - 16134
  • [22] SELF-SUPERVISED VISION TRANSFORMERS FOR JOINT SAR-OPTICAL REPRESENTATION LEARNING
    Wang, Yi
    Albrecht, Conrad M.
    Zhu, Xiao Xiang
    2022 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS 2022), 2022, : 139 - 142
  • [23] Distilling Self-Supervised Vision Transformers for Weakly-Supervised Few-Shot Classification & Segmentation
    Kang, Dahyun
    Koniusz, Piotr
    Cho, Minsu
    Murray, Naila
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 19627 - 19638
  • [24] Revamping Cross-Modal Recipe Retrieval with Hierarchical Transformers and Self-supervised Learning
    Salvador, Amaia
    Gundogdu, Erhan
    Bazzani, Loris
    Donoser, Michael
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 15470 - 15479
  • [25] Self-supervised Vision Transformers for 3D pose estimation of novel objects
    Thalhammer, Stefan
    Weibel, Jean-Baptiste
    Vincze, Markus
    Garcia-Rodriguez, Jose
    IMAGE AND VISION COMPUTING, 2023, 139
  • [26] PROPERTY NEURONS IN SELF-SUPERVISED SPEECH TRANSFORMERS
    Lin, Tzu-Quan
    Lin, Guan-Ting
    Lee, Hung-Yi
    Tang, Hao
    arXiv,
  • [27] Adapting Self-Supervised Vision Transformers by Probing Attention-Conditioned Masking Consistency
    Prabhu, Viraj
    Yenamandra, Sriram
    Singh, Aaditya
    Hoffman, Judy
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [28] Self-supervised learning of Vision Transformers for digital soil mapping using visual data
    Tresson, Paul
    Dumont, Maxime
    Jaeger, Marc
    Borne, Frederic
    Boivin, Stephane
    Marie-Louise, Loic
    Francois, Jeremie
    Boukcim, Hassan
    Goeau, Herve
    GEODERMA, 2024, 450
  • [29] Guiding Attention for Self-Supervised Learning with Transformers
    Deshpande, Ameet
    Narasimhan, Karthik
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 4676 - 4686
  • [30] Self-Supervised Vision for Climate Downscaling
    Singh, Karandeep
    Jeong, Chaeyoon
    Shidqi, Naufal
    Park, Sungwon
    Nellikkatti, Arjun
    Zeller, Elke
    Cha, Meeyoung
    PROCEEDINGS OF THE THIRTY-THIRD INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2024, 2024, : 7456 - 7464