Self-supervised Vision Transformers for Writer Retrieval

被引:0
|
作者
Raven, Tim [1 ]
Matei, Arthur [1 ]
Fink, Gernot A. [1 ]
机构
[1] TU Dortmund Univ, Dortmund, Germany
关键词
Writer Retrieval; Writer Identification; Historical Documents; Self-Supervised Learning; Vision Transformer; IDENTIFICATION; FEATURES; VLAD;
D O I
10.1007/978-3-031-70536-6_23
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
While methods based on Vision Transformers (ViT) have achieved state-of-the-art performance in many domains, they have not yet been applied successfully in the domain of writer retrieval. The field is dominated by methods using handcrafted features or features extracted from Convolutional Neural Networks. In this work, we bridge this gap and present a novel method that extracts features from a ViT and aggregates them using VLAD encoding. The model is trained in a self-supervised fashion without any need for labels. We show that extracting local foreground features is superior to using the ViT's class token in the context of writer retrieval. We evaluate our method on two historical document collections. We set a new state-at-of-art performance on the Historical-WI dataset (83.1% mAP), and the HisIR19 dataset (95.0% mAP). Additionally, we demonstrate that our ViT feature extractor can be directly applied to modern datasets such as the CVL database (98.6% mAP) without any fine-tuning.
引用
收藏
页码:380 / 396
页数:17
相关论文
共 50 条
  • [31] Understanding Self-Attention of Self-Supervised Audio Transformers
    Yang, Shu-wen
    Liu, Andy T.
    Lee, Hung-yi
    INTERSPEECH 2020, 2020, : 3785 - 3789
  • [32] Self-supervised Medical Out-of-Distribution Using U-Net Vision Transformers
    Park, Seongjin
    Balint, Adam
    Hwang, Hyejin
    BIOMEDICAL IMAGE REGISTRATION, DOMAIN GENERALISATION AND OUT-OF-DISTRIBUTION ANALYSIS, 2022, 13166 : 104 - 110
  • [33] Self-supervised Vision Transformers for image-to-image labeling: a BiaPy solution to the LightMyCells Challenge
    Franco-Barranco, Daniel
    Gonzalez-Marfil, Aitor
    Arganda-Carreras, Ignacio
    IEEE INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING, ISBI 2024, 2024,
  • [34] Civil Rephrases Of Toxic Texts With Self-Supervised Transformers
    Laugier, Leo
    Pavlopoulos, John
    Sorensen, Jeffrey
    Dixon, Lucas
    16TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2021), 2021, : 1442 - 1461
  • [35] Self-supervised Video Hashing via Bidirectional Transformers
    Li, Shuyan
    Li, Xiu
    Lu, Jiwen
    Zhou, Jie
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 13544 - 13553
  • [36] Self-supervised Visual Transformers for Breast Cancer Diagnosis
    Saidnassim, Nurbek
    Abdikenov, Beibit
    Kelesbekov, Rauan
    Akhtar, Muhammad Tahir
    Jamwal, Prashant
    2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 423 - 427
  • [37] FactoFormer: Factorized Hyperspectral Transformers With Self-Supervised Pretraining
    Mohamed, Shaheer
    Haghighat, Maryam
    Fernando, Tharindu
    Sridharan, Sridha
    Fookes, Clinton
    Moghadam, Peyman
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 1 - 14
  • [38] Multi-Frame Self-Supervised Depth with Transformers
    Guizilini, Vitor
    Ambrus, Rares
    Chen, Dian
    Zakharov, Sergey
    Gaidon, Adrien
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 160 - 170
  • [39] Self-supervised transformers for turbulent flow time series
    Drikakis, Dimitris
    Kokkinakis, Ioannis William
    Fung, Daryl
    Spottswood, S. Michael
    PHYSICS OF FLUIDS, 2024, 36 (06)
  • [40] SELF-SUPERVISED REMOTE SENSING IMAGE RETRIEVAL
    Walter, Kane
    Gibson, Matthew J.
    Sowmya, Arcot
    IGARSS 2020 - 2020 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2020, : 1683 - 1686