Self-supervised Vision Transformers for Writer Retrieval

被引：0

作者：

Raven, Tim ^{[1
]}

Matei, Arthur ^{[1
]}

Fink, Gernot A. ^{[1
]}

机构：

[1] TU Dortmund Univ, Dortmund, Germany

来源：

DOCUMENT ANALYSIS AND RECOGNITION-ICDAR 2024, PT II | 2024年 / 14805卷

关键词：

Writer Retrieval; Writer Identification; Historical Documents; Self-Supervised Learning; Vision Transformer; IDENTIFICATION; FEATURES; VLAD;

D O I：

10.1007/978-3-031-70536-6_23

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

While methods based on Vision Transformers (ViT) have achieved state-of-the-art performance in many domains, they have not yet been applied successfully in the domain of writer retrieval. The field is dominated by methods using handcrafted features or features extracted from Convolutional Neural Networks. In this work, we bridge this gap and present a novel method that extracts features from a ViT and aggregates them using VLAD encoding. The model is trained in a self-supervised fashion without any need for labels. We show that extracting local foreground features is superior to using the ViT's class token in the context of writer retrieval. We evaluate our method on two historical document collections. We set a new state-at-of-art performance on the Historical-WI dataset (83.1% mAP), and the HisIR19 dataset (95.0% mAP). Additionally, we demonstrate that our ViT feature extractor can be directly applied to modern datasets such as the CVL database (98.6% mAP) without any fine-tuning.

引用

页码：380 / 396

页数：17

共 50 条

[1] Self-supervised Vision Transformers with Data Augmentation Strategies Using Morphological Operations for Writer Retrieval
Peer, Marco
Kleber, Florian
Sablatnig, Robert
FRONTIERS IN HANDWRITING RECOGNITION, ICFHR 2022, 2022, 13639 : 122 - 136
[2] Emerging Properties in Self-Supervised Vision Transformers
Caron, Mathilde
Touvron, Hugo
Misra, Ishan
Jegou, Herve
Mairal, Julien
Bojanowski, Piotr
Joulin, Armand
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 9630 - 9640
[3] Self-supervised vision transformers for semantic segmentation
Gu, Xianfan
Hu, Yingdong
Wen, Chuan
Gao, Yang
COMPUTER VISION AND IMAGE UNDERSTANDING, 2025, 251
[4] Self-Supervised Vision Transformers for Malware Detection
Seneviratne, Sachith
Shariffdeen, Ridwan
Rasnayaka, Sanka
Kasthuriarachchi, Nuran
IEEE ACCESS, 2022, 10 : 103121 - 103135
[5] An Empirical Study of Training Self-Supervised Vision Transformers
Chen, Xinlei
Xie, Saining
He, Kaiming
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 9620 - 9629
[6] Exploring Self-Supervised Vision Transformers for Gait Recognition in the Wild
Cosma, Adrian
Catruna, Andy
Radoi, Emilian
SENSORS, 2023, 23 (05)
[7] Jointly Optimal Incremental Learning with Self-Supervised Vision Transformers
Witzgall, Hanna
2024 IEEE AEROSPACE CONFERENCE, 2024,
[8] Self-supervised Models are Good Teaching Assistants for Vision Transformers
Wu, Haiyan
Gao, Yuting
Zhang, Yinqi
Lin, Shaohui
Xie, Yuan
Sun, Xing
Li, Ke
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
[9] Self-Supervised Augmented Vision Transformers for Remote Physiological Measurement
Pang, Liyu
Li, Xiaoou
Wang, Zhen
Lei, Xueyi
Pei, Yulong
2023 8TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND BIG DATA ANALYTICS, ICCCBDA, 2023, : 623 - 627
[10] Monocular Robot Navigation with Self-Supervised Pretrained Vision Transformers
Saavedra-Ruiz, Miguel
Morin, Sacha
Paull, Liam
2022 19TH CONFERENCE ON ROBOTS AND VISION (CRV 2022), 2022, : 197 - 204

← 1 2 3 4 5 →