Self-supervised Vision Transformers for Writer Retrieval

被引:0
|
作者
Raven, Tim [1 ]
Matei, Arthur [1 ]
Fink, Gernot A. [1 ]
机构
[1] TU Dortmund Univ, Dortmund, Germany
关键词
Writer Retrieval; Writer Identification; Historical Documents; Self-Supervised Learning; Vision Transformer; IDENTIFICATION; FEATURES; VLAD;
D O I
10.1007/978-3-031-70536-6_23
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
While methods based on Vision Transformers (ViT) have achieved state-of-the-art performance in many domains, they have not yet been applied successfully in the domain of writer retrieval. The field is dominated by methods using handcrafted features or features extracted from Convolutional Neural Networks. In this work, we bridge this gap and present a novel method that extracts features from a ViT and aggregates them using VLAD encoding. The model is trained in a self-supervised fashion without any need for labels. We show that extracting local foreground features is superior to using the ViT's class token in the context of writer retrieval. We evaluate our method on two historical document collections. We set a new state-at-of-art performance on the Historical-WI dataset (83.1% mAP), and the HisIR19 dataset (95.0% mAP). Additionally, we demonstrate that our ViT feature extractor can be directly applied to modern datasets such as the CVL database (98.6% mAP) without any fine-tuning.
引用
收藏
页码:380 / 396
页数:17
相关论文
共 50 条
  • [11] SAGHOG: Self-supervised Autoencoder for Generating HOG Features for Writer Retrieval
    Peer, Marco
    Kleber, Florian
    Sablatnig, Robert
    DOCUMENT ANALYSIS AND RECOGNITION-ICDAR 2024, PT II, 2024, 14805 : 121 - 138
  • [12] Self-supervised Vision Transformers for Land-cover Segmentation and Classification
    Scheibenreif, Linus
    Hanna, Joelle
    Mommert, Michael
    Borth, Damian
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022, 2022, : 1421 - 1430
  • [13] Self-Supervised Vision Transformers for Scalable Anomaly Detection over Images
    Samele, Stefano
    Matteucci, Matteo
    2024 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN 2024, 2024,
  • [14] Learnable Masked Tokens for Improved Transferability of Self-supervised Vision Transformers
    Hu, Hao
    Baldassarre, Federico
    Azizpour, Hossein
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2022, PT III, 2023, 13715 : 409 - 426
  • [15] Exploring Efficiency of Vision Transformers for Self-Supervised Monocular Depth Estimation
    Karpov, Aleksei
    Makarov, Ilya
    2022 IEEE INTERNATIONAL SYMPOSIUM ON MIXED AND AUGMENTED REALITY (ISMAR 2022), 2022, : 711 - 719
  • [16] Multi-level Contrastive Learning for Self-Supervised Vision Transformers
    Mo, Shentong
    Sun, Zhun
    Li, Chao
    2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 2777 - 2786
  • [17] Patch-level Representation Learning for Self-supervised Vision Transformers
    Yun, Sukmin
    Lee, Hankook
    Kim, Jaehyung
    Shin, Jinwoo
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 8344 - 8353
  • [18] Self-Supervised Transformers for fMRI representation
    Malkiel, Itzik
    Rosenman, Gony
    Wolf, Lior
    Hendler, Talma
    INTERNATIONAL CONFERENCE ON MEDICAL IMAGING WITH DEEP LEARNING, VOL 172, 2022, 172 : 895 - 913
  • [19] On Separate Normalization in Self-supervised Transformers
    Chen, Xiaohui
    Wang, Yinkai
    Du, Yuanqi
    Hassoun, Soha
    Liu, Li-Ping
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [20] Gait Recognition with Self-Supervised Learning of Gait Features Based on Vision Transformers
    Pincic, Domagoj
    Susanj, Diego
    Lenac, Kristijan
    SENSORS, 2022, 22 (19)