Self-supervised Vision Transformers for Writer Retrieval

被引：0

作者：

Raven, Tim ^{[1
]}

Matei, Arthur ^{[1
]}

Fink, Gernot A. ^{[1
]}

机构：

[1] TU Dortmund Univ, Dortmund, Germany

来源：

DOCUMENT ANALYSIS AND RECOGNITION-ICDAR 2024, PT II | 2024年 / 14805卷

关键词：

Writer Retrieval; Writer Identification; Historical Documents; Self-Supervised Learning; Vision Transformer; IDENTIFICATION; FEATURES; VLAD;

D O I：

10.1007/978-3-031-70536-6_23

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

While methods based on Vision Transformers (ViT) have achieved state-of-the-art performance in many domains, they have not yet been applied successfully in the domain of writer retrieval. The field is dominated by methods using handcrafted features or features extracted from Convolutional Neural Networks. In this work, we bridge this gap and present a novel method that extracts features from a ViT and aggregates them using VLAD encoding. The model is trained in a self-supervised fashion without any need for labels. We show that extracting local foreground features is superior to using the ViT's class token in the context of writer retrieval. We evaluate our method on two historical document collections. We set a new state-at-of-art performance on the Historical-WI dataset (83.1% mAP), and the HisIR19 dataset (95.0% mAP). Additionally, we demonstrate that our ViT feature extractor can be directly applied to modern datasets such as the CVL database (98.6% mAP) without any fine-tuning.

引用

页码：380 / 396

页数：17

共 50 条

[31] Understanding Self-Attention of Self-Supervised Audio Transformers
Yang, Shu-wen
Liu, Andy T.
Lee, Hung-yi
INTERSPEECH 2020, 2020, : 3785 - 3789
[32] Self-supervised Medical Out-of-Distribution Using U-Net Vision Transformers
Park, Seongjin
Balint, Adam
Hwang, Hyejin
BIOMEDICAL IMAGE REGISTRATION, DOMAIN GENERALISATION AND OUT-OF-DISTRIBUTION ANALYSIS, 2022, 13166 : 104 - 110
[33] Self-supervised Vision Transformers for image-to-image labeling: a BiaPy solution to the LightMyCells Challenge
Franco-Barranco, Daniel
Gonzalez-Marfil, Aitor
Arganda-Carreras, Ignacio
IEEE INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING, ISBI 2024, 2024,
[34] Civil Rephrases Of Toxic Texts With Self-Supervised Transformers
Laugier, Leo
Pavlopoulos, John
Sorensen, Jeffrey
Dixon, Lucas
16TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2021), 2021, : 1442 - 1461
[35] Self-supervised Video Hashing via Bidirectional Transformers
Li, Shuyan
Li, Xiu
Lu, Jiwen
Zhou, Jie
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 13544 - 13553
[36] Self-supervised Visual Transformers for Breast Cancer Diagnosis
Saidnassim, Nurbek
Abdikenov, Beibit
Kelesbekov, Rauan
Akhtar, Muhammad Tahir
Jamwal, Prashant
2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 423 - 427
[37] FactoFormer: Factorized Hyperspectral Transformers With Self-Supervised Pretraining
Mohamed, Shaheer
Haghighat, Maryam
Fernando, Tharindu
Sridharan, Sridha
Fookes, Clinton
Moghadam, Peyman
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 1 - 14
[38] Multi-Frame Self-Supervised Depth with Transformers
Guizilini, Vitor
Ambrus, Rares
Chen, Dian
Zakharov, Sergey
Gaidon, Adrien
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 160 - 170
[39] Self-supervised transformers for turbulent flow time series
Drikakis, Dimitris
Kokkinakis, Ioannis William
Fung, Daryl
Spottswood, S. Michael
PHYSICS OF FLUIDS, 2024, 36 (06)
[40] SELF-SUPERVISED REMOTE SENSING IMAGE RETRIEVAL
Walter, Kane
Gibson, Matthew J.
Sowmya, Arcot
IGARSS 2020 - 2020 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2020, : 1683 - 1686

← 1 2 3 4 5 →