Catch Missing Details: Image Reconstruction with Frequency Augmented Variational Autoencoder

被引：8

作者：

Lin, Xinmiao ^{[1
]}

Li, Yikang ^{[2
]}

Hsiao, Jenhao ^{[2
]}

Ho, Chiuman ^{[2
]}

Kong, Yu ^{[3
]}

机构：

[1] Rochester Inst Technol, Rochester, MN USA

[2] OPPO US Res, Shenzhen, Peoples R China

[3] Michigan State Univ, E Lansing, MI USA

来源：

2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR | 2023年

基金：

美国国家科学基金会;

关键词：

D O I：

10.1109/CVPR52729.2023.00173

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The popular VQ-VAE models reconstruct images through learning a discrete codebook but suffer from a significant issue in the rapid quality degradation of image reconstruction as the compression rate rises. One major reason is that a higher compression rate induces more loss of visual signals on the higher frequency spectrum which reflect the details on pixel space. In this paper, a Frequency Complement Module (FCM) architecture is proposed to capture the missing frequency information for enhancing reconstruction quality. The FCM can be easily incorporated into the VQ-VAE structure, and we refer to the new model as Frequancy Augmented VAE (FA-VAE). In addition, a Dynamic Spectrum Loss (DSL) is introduced to guide the FCMs to balance between various frequencies dynamically for optimal reconstruction. FA-VAE is further extended to the text-to-image synthesis task, and a Cross-attention Autoregressive Transformer (CAT) is proposed to obtain more precise semantic attributes in texts. Extensive reconstruction experiments with different compression rates are conducted on several benchmark datasets, and the results demonstrate that the proposed FA-VAE is able to restore more faithfully the details compared to SOTA methods. CAT also shows improved generation quality with better image-text semantic alignment.

引用

页码：1736 / 1745

页数：10

共 50 条

[31] Autoencoder With Invertible Functions for Dimension Reduction and Image Reconstruction
Yang, Yimin
Wu, Q. M. Jonathan
Wang, Yaonan
IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2018, 48 (07): : 1065 - 1079
[32] Unmixing Autoencoder for Image Reconstruction from Hyperspectral Data
Liu, Xuyang
Duan, Chaoshu
Cai, Wensheng
Shao, Xueguang
ANALYTICAL CHEMISTRY, 2024, 96 (52) : 20354 - 20361
[33] VAE-BRIDGE: Variational Autoencoder Filter for Bayesian Ridge Imputation of Missing Data
Pereira, Ricardo Cardoso
Abreu, Pedro Henriques
Rodrigues, Pedro Pereira
2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
[34] A topic modeling and image classification framework: The Generalized Dirichlet variational autoencoder
Ojo, Akinlolu Oluwabusayo
Bouguila, Nizar
PATTERN RECOGNITION, 2024, 146
[35] Remote sensing image captioning via Variational Autoencoder and Reinforcement Learning
Shen, Xiangqing
Liu, Bing
Zhou, Yong
Zhao, Jiaqi
Liu, Mingming
KNOWLEDGE-BASED SYSTEMS, 2020, 203
[36] Image forgery detection by combining Visual Transformer with Variational Autoencoder Network
Atak, Ilker Galip
Yasar, Ali
APPLIED SOFT COMPUTING, 2024, 165
[37] AVAFN-adaptive variational autoencoder fusion network for multispectral image
Chu, Wen-Lin
Tu, Ching-Che
Jian, Bo-Lin
Multimedia Tools and Applications, 2024, 83 (41) : 89297 - 89315
[38] Learning an Optimisable Semantic Segmentation Map with Image Conditioned Variational Autoencoder
Zhuang, Pengcheng
Sekikawa, Yusuke
Hara, Kosuke
Saito, Hideo
IMAGE ANALYSIS AND PROCESSING - ICIAP 2019, PT II, 2019, 11752 : 379 - 389
[39] Semisupervised SAR image change detection based on a siamese variational autoencoder
Zhao, Guangwei
Peng, Yaxin
INFORMATION PROCESSING & MANAGEMENT, 2022, 59 (01)
[40] A novel binary quantizer for variational autoencoder-based image compressor
Thulasidharan, Pillai Praveen
Nath, Keshab
International Journal of Computers and Applications, 2024, 46 (08) : 604 - 620

← 1 2 3 4 5 →