Catch Missing Details: Image Reconstruction with Frequency Augmented Variational Autoencoder

被引：8

作者：

Lin, Xinmiao ^{[1
]}

Li, Yikang ^{[2
]}

Hsiao, Jenhao ^{[2
]}

Ho, Chiuman ^{[2
]}

Kong, Yu ^{[3
]}

机构：

[1] Rochester Inst Technol, Rochester, MN USA

[2] OPPO US Res, Shenzhen, Peoples R China

[3] Michigan State Univ, E Lansing, MI USA

来源：

2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR | 2023年

基金：

美国国家科学基金会;

关键词：

D O I：

10.1109/CVPR52729.2023.00173

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The popular VQ-VAE models reconstruct images through learning a discrete codebook but suffer from a significant issue in the rapid quality degradation of image reconstruction as the compression rate rises. One major reason is that a higher compression rate induces more loss of visual signals on the higher frequency spectrum which reflect the details on pixel space. In this paper, a Frequency Complement Module (FCM) architecture is proposed to capture the missing frequency information for enhancing reconstruction quality. The FCM can be easily incorporated into the VQ-VAE structure, and we refer to the new model as Frequancy Augmented VAE (FA-VAE). In addition, a Dynamic Spectrum Loss (DSL) is introduced to guide the FCMs to balance between various frequencies dynamically for optimal reconstruction. FA-VAE is further extended to the text-to-image synthesis task, and a Cross-attention Autoregressive Transformer (CAT) is proposed to obtain more precise semantic attributes in texts. Extensive reconstruction experiments with different compression rates are conducted on several benchmark datasets, and the results demonstrate that the proposed FA-VAE is able to restore more faithfully the details compared to SOTA methods. CAT also shows improved generation quality with better image-text semantic alignment.

引用

页码：1736 / 1745

页数：10

共 50 条

[41] Variational autoencoder based receiver for orthogonal time frequency space modulation
Li, Qiao
Xiang, Zheng
Ren, Peng
Li, Wanlu
DIGITAL SIGNAL PROCESSING, 2021, 117
[42] RECOGNIZING FALL ACTIONS FROM VIDEOS USING RECONSTRUCTION ERROR OF VARIATIONAL AUTOENCODER
Zhou, Jiaxin
Komuro, Takashi
2019 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2019, : 3372 - 3376
[43] Accelerated Augmented Lagrangian Method for Image Reconstruction
Yang, Zhen-Zhen
Yang, Zhen
2013 INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS AND SIGNAL PROCESSING (WCSP 2013), 2013,
[44] Electromagnetic Field Reconstruction and Source Identification Using Conditional Variational Autoencoder and CNN
Barmada, Sami
Barba, Paolo Di
Fontana, Nunzia
Mognaschi, Maria Evelina
Tucci, Mauro
IEEE JOURNAL ON MULTISCALE AND MULTIPHYSICS COMPUTATIONAL TECHNIQUES, 2023, 8 : 322 - 331
[45] Intrusion Detection Toward Feature Reconstruction using Huber Conditional Variational AutoEncoder
Razafimahatratra, Fenohasina Lova
Rakotomandimby, Miora Fifaliana
Wajira, Prasad De Silva
36TH INTERNATIONAL CONFERENCE ON INFORMATION NETWORKING (ICOIN 2022), 2022, : 13 - 17
[46] GRADIENT CONSTRAINT VARIATIONAL APPROACH TO IMAGE RECONSTRUCTION
Carja, Oana
DIFFERENTIAL AND INTEGRAL EQUATIONS, 2009, 22 (3-4) : 285 - 301
[47] A variational principle of image reconstruction from projections
Popkov, YS
AUTOMATION AND REMOTE CONTROL, 1997, 58 (12) : 1950 - 1957
[48] Predicting chemical structure using reinforcement learning with a stack-augmented conditional variational autoencoder
Hwanhee Kim
Soohyun Ko
Byung Ju Kim
Sung Jin Ryu
Jaegyoon Ahn
Journal of Cheminformatics, 14
[49] Interpretable data-augmented adversarial variational autoencoder with sequential attention for imbalanced fault diagnosis
Liu, Yunpeng
Jiang, Hongkai
Yao, Renhe
Zhu, Hongxuan
JOURNAL OF MANUFACTURING SYSTEMS, 2023, 71 : 342 - 359
[50] Predicting chemical structure using reinforcement learning with a stack-augmented conditional variational autoencoder
Kim, Hwanhee
Ko, Soohyun
Kim, Byung Ju
Ryu, Sung Jin
Ahn, Jaegyoon
JOURNAL OF CHEMINFORMATICS, 2022, 14 (01)

← 1 2 3 4 5 →