Self-supervised Vision Transformers for Writer Retrieval

被引：0

作者：

Raven, Tim ^{[1
]}

Matei, Arthur ^{[1
]}

Fink, Gernot A. ^{[1
]}

机构：

[1] TU Dortmund Univ, Dortmund, Germany

来源：

DOCUMENT ANALYSIS AND RECOGNITION-ICDAR 2024, PT II | 2024年 / 14805卷

关键词：

Writer Retrieval; Writer Identification; Historical Documents; Self-Supervised Learning; Vision Transformer; IDENTIFICATION; FEATURES; VLAD;

D O I：

10.1007/978-3-031-70536-6_23

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

While methods based on Vision Transformers (ViT) have achieved state-of-the-art performance in many domains, they have not yet been applied successfully in the domain of writer retrieval. The field is dominated by methods using handcrafted features or features extracted from Convolutional Neural Networks. In this work, we bridge this gap and present a novel method that extracts features from a ViT and aggregates them using VLAD encoding. The model is trained in a self-supervised fashion without any need for labels. We show that extracting local foreground features is superior to using the ViT's class token in the context of writer retrieval. We evaluate our method on two historical document collections. We set a new state-at-of-art performance on the Historical-WI dataset (83.1% mAP), and the HisIR19 dataset (95.0% mAP). Additionally, we demonstrate that our ViT feature extractor can be directly applied to modern datasets such as the CVL database (98.6% mAP) without any fine-tuning.

引用

页码：380 / 396

页数：17

共 50 条

[11] SAGHOG: Self-supervised Autoencoder for Generating HOG Features for Writer Retrieval
Peer, Marco
Kleber, Florian
Sablatnig, Robert
DOCUMENT ANALYSIS AND RECOGNITION-ICDAR 2024, PT II, 2024, 14805 : 121 - 138
[12] Self-supervised Vision Transformers for Land-cover Segmentation and Classification
Scheibenreif, Linus
Hanna, Joelle
Mommert, Michael
Borth, Damian
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022, 2022, : 1421 - 1430
[13] Self-Supervised Vision Transformers for Scalable Anomaly Detection over Images
Samele, Stefano
Matteucci, Matteo
2024 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN 2024, 2024,
[14] Learnable Masked Tokens for Improved Transferability of Self-supervised Vision Transformers
Hu, Hao
Baldassarre, Federico
Azizpour, Hossein
MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2022, PT III, 2023, 13715 : 409 - 426
[15] Exploring Efficiency of Vision Transformers for Self-Supervised Monocular Depth Estimation
Karpov, Aleksei
Makarov, Ilya
2022 IEEE INTERNATIONAL SYMPOSIUM ON MIXED AND AUGMENTED REALITY (ISMAR 2022), 2022, : 711 - 719
[16] Multi-level Contrastive Learning for Self-Supervised Vision Transformers
Mo, Shentong
Sun, Zhun
Li, Chao
2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 2777 - 2786
[17] Patch-level Representation Learning for Self-supervised Vision Transformers
Yun, Sukmin
Lee, Hankook
Kim, Jaehyung
Shin, Jinwoo
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 8344 - 8353
[18] Self-Supervised Transformers for fMRI representation
Malkiel, Itzik
Rosenman, Gony
Wolf, Lior
Hendler, Talma
INTERNATIONAL CONFERENCE ON MEDICAL IMAGING WITH DEEP LEARNING, VOL 172, 2022, 172 : 895 - 913
[19] On Separate Normalization in Self-supervised Transformers
Chen, Xiaohui
Wang, Yinkai
Du, Yuanqi
Hassoun, Soha
Liu, Li-Ping
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[20] Gait Recognition with Self-Supervised Learning of Gait Features Based on Vision Transformers
Pincic, Domagoj
Susanj, Diego
Lenac, Kristijan
SENSORS, 2022, 22 (19)

← 1 2 3 4 5 →