Self-supervised Vision Transformers for Writer Retrieval

被引:0
|
作者
Raven, Tim [1 ]
Matei, Arthur [1 ]
Fink, Gernot A. [1 ]
机构
[1] TU Dortmund Univ, Dortmund, Germany
关键词
Writer Retrieval; Writer Identification; Historical Documents; Self-Supervised Learning; Vision Transformer; IDENTIFICATION; FEATURES; VLAD;
D O I
10.1007/978-3-031-70536-6_23
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
While methods based on Vision Transformers (ViT) have achieved state-of-the-art performance in many domains, they have not yet been applied successfully in the domain of writer retrieval. The field is dominated by methods using handcrafted features or features extracted from Convolutional Neural Networks. In this work, we bridge this gap and present a novel method that extracts features from a ViT and aggregates them using VLAD encoding. The model is trained in a self-supervised fashion without any need for labels. We show that extracting local foreground features is superior to using the ViT's class token in the context of writer retrieval. We evaluate our method on two historical document collections. We set a new state-at-of-art performance on the Historical-WI dataset (83.1% mAP), and the HisIR19 dataset (95.0% mAP). Additionally, we demonstrate that our ViT feature extractor can be directly applied to modern datasets such as the CVL database (98.6% mAP) without any fine-tuning.
引用
收藏
页码:380 / 396
页数:17
相关论文
共 50 条
  • [1] Self-supervised Vision Transformers with Data Augmentation Strategies Using Morphological Operations for Writer Retrieval
    Peer, Marco
    Kleber, Florian
    Sablatnig, Robert
    FRONTIERS IN HANDWRITING RECOGNITION, ICFHR 2022, 2022, 13639 : 122 - 136
  • [2] Emerging Properties in Self-Supervised Vision Transformers
    Caron, Mathilde
    Touvron, Hugo
    Misra, Ishan
    Jegou, Herve
    Mairal, Julien
    Bojanowski, Piotr
    Joulin, Armand
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 9630 - 9640
  • [3] Self-supervised vision transformers for semantic segmentation
    Gu, Xianfan
    Hu, Yingdong
    Wen, Chuan
    Gao, Yang
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2025, 251
  • [4] Self-Supervised Vision Transformers for Malware Detection
    Seneviratne, Sachith
    Shariffdeen, Ridwan
    Rasnayaka, Sanka
    Kasthuriarachchi, Nuran
    IEEE ACCESS, 2022, 10 : 103121 - 103135
  • [5] An Empirical Study of Training Self-Supervised Vision Transformers
    Chen, Xinlei
    Xie, Saining
    He, Kaiming
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 9620 - 9629
  • [6] Exploring Self-Supervised Vision Transformers for Gait Recognition in the Wild
    Cosma, Adrian
    Catruna, Andy
    Radoi, Emilian
    SENSORS, 2023, 23 (05)
  • [7] Jointly Optimal Incremental Learning with Self-Supervised Vision Transformers
    Witzgall, Hanna
    2024 IEEE AEROSPACE CONFERENCE, 2024,
  • [8] Self-supervised Models are Good Teaching Assistants for Vision Transformers
    Wu, Haiyan
    Gao, Yuting
    Zhang, Yinqi
    Lin, Shaohui
    Xie, Yuan
    Sun, Xing
    Li, Ke
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [9] Self-Supervised Augmented Vision Transformers for Remote Physiological Measurement
    Pang, Liyu
    Li, Xiaoou
    Wang, Zhen
    Lei, Xueyi
    Pei, Yulong
    2023 8TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND BIG DATA ANALYTICS, ICCCBDA, 2023, : 623 - 627
  • [10] Monocular Robot Navigation with Self-Supervised Pretrained Vision Transformers
    Saavedra-Ruiz, Miguel
    Morin, Sacha
    Paull, Liam
    2022 19TH CONFERENCE ON ROBOTS AND VISION (CRV 2022), 2022, : 197 - 204