XMP-Font: Self-Supervised Cross-Modality Pre-training for Few-Shot Font Generation

被引：25

作者：

Liu, Wei ^{[1
]}

Liu, Fangyue ^{[1
]}

Ding, Fei ^{[1
]}

He, Qian ^{[1
]}

Yi, Zili ^{[1
]}

机构：

[1] ByteDance Ltd, Beijing, Peoples R China

来源：

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2022年

关键词：

D O I：

10.1109/CVPR52688.2022.00775

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Generating a new font library is a very labor-intensive and time-consuming job for glyph-rich scripts. Few-shot font generation is thus required, as it requires only a few glyph references without fine-tuning during test. Existing methods follow the style-content disentanglement paradigm and expect novel fonts to be produced by combining the style codes of the reference glyphs and the content representations of the source. However, these few-shot font generation methods either fail to capture content-independent style representations, or employ localized component-wise style representations, which is insufficient to model many Chinese font styles that involve hyper-component features such as inter-component spacing and "connected-stroke". To resolve these drawbacks and make the style representations more reliable, we propose a self-supervised cross-modality pre-training strategy and a cross-modality transformer-based encoder that is conditioned jointly on the glyph image and the corresponding stroke labels. The cross-modality encoder is pre-trained in a self-supervised manner to allow effective capture of cross- and intra-modality correlations, which facilitates the content-style disentanglement and modeling style representations of all scales (stroke-level, component-level and character-level). The pretrained encoder is then applied to the downstream font generation task without fine-tuning. Experimental comparisons of our method with state-of-the-art methods demonstrate our method successfully transfers styles of all scales. In addition, it only requires one reference glyph and achieves the lowest rate of bad cases in the few-shot font generation task (28% lower than the second best).

引用

页码：7895 / 7904

页数：10

共 50 条

[41] Cross-Modal Contrastive Pre-Training for Few-Shot Skeleton Action Recognition
Lu, Mingqi
Yang, Siyuan
Lu, Xiaobo
Liu, Jun
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (10) : 9798 - 9807
[42] Self-Supervised Learning for Few-Shot Medical Image Segmentation
Ouyang, Cheng
Biffi, Carlo
Chen, Chen
Kart, Turkay
Qiu, Huaqi
Rueckert, Daniel
IEEE TRANSACTIONS ON MEDICAL IMAGING, 2022, 41 (07) : 1837 - 1848
[43] Self-supervised Prototype Conditional Few-Shot Object Detection
Kobayashi, Daisuke
IMAGE ANALYSIS AND PROCESSING, ICIAP 2022, PT II, 2022, 13232 : 681 - 692
[44] Few-Shot Hyperspectral Image Classification With Self-Supervised Learning
Li, Zhaokui
Guo, Hui
Chen, Yushi
Liu, Cuiwei
Du, Qian
Fang, Zhuoqun
Wang, Yan
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61
[45] Multi-task Self-supervised Few-Shot Detection
Zhang, Guangyong
Duan, Lijuan
Wang, Wenjian
Gong, Zhi
Ma, Bian
PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT XII, 2024, 14436 : 107 - 119
[46] Self-Supervised Task Augmentation for Few-Shot Intent Detection
Peng-Fei Sun
Ya-Wen Ouyang
Ding-Jie Song
Xin-Yu Dai
Journal of Computer Science and Technology, 2022, 37 : 527 - 538
[47] SELF-SUPERVISED CLASS-COGNIZANT FEW-SHOT CLASSIFICATION
Shirekar, Ojas Kishore
Jamali-Rad, Hadi
2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 976 - 980
[48] Self-Supervised Approach for Few-shot Hand Gesture Recognition
Kimura, Naoki
ADJUNCT PROCEEDINGS OF THE 35TH ACM SYMPOSIUM ON USER INTERFACE SOFTWARE & TECHNOLOGY, UIST 2022, 2022,
[49] SELF-SUPERVISED LEARNING FOR FEW-SHOT BIRD SOUND CLASSIFICATION
Moummad, Ilyass
Farrugia, Nicolas
Serizel, Romain
2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING WORKSHOPS, ICASSPW 2024, 2024, : 600 - 604
[50] Self-Supervised Task Augmentation for Few-Shot Intent Detection
Sun, Peng-Fei
Ouyang, Ya-Wen
Song, Ding-Jie
Dai, Xin-Yu
JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2022, 37 (03) : 527 - 538

← 1 2 3 4 5 →