HIPA: Hierarchical Patch Transformer for Single Image Super Resolution

被引：10

作者：

Cai, Qing ^{[1
]}

Qian, Yiming ^{[2
]}

Li, Jinxing ^{[3
]}

Lyu, Jun ^{[4
]}

Yang, Yee-Hong ^{[5
]}

Wu, Feng ^{[6
]}

Zhang, David ^{[7
,8
,9
]}

机构：

[1] Ocean Univ China, Fac Informat Sci & Engn, Qingdao 266100, Shandong, Peoples R China

[2] Univ Manitoba, Dept Comp Sci, Winnipeg, MB R3T 2N2, Canada

[3] Harbin Inst Technol, Sch Comp Sci & Technol, Shenzhen 518055, Guangdong, Peoples R China

[4] Hong Kong Polytech Univ, Sch Nursing, Hong Kong, Peoples R China

[5] Univ Alberta, Dept Comp Sci, Edmonton, AB T6G 2E9, Canada

[6] Univ Sci & Technol China, Sch Informat Sci & Technol, Hefei 230026, Anhui, Peoples R China

[7] Chinese Univ Hong Kong, Sch Data Sci, Shenzhen 518172, Peoples R China

[8] Shenzhen Inst Artificial Intelligence & Robot Soc, Shenzhen 518129, Guangdong, Peoples R China

[9] CUHK SZ Linkl Joint Lab Comp Vis & Artificial Inte, Shenzhen 518172, Guangdong, Peoples R China

来源：

IEEE TRANSACTIONS ON IMAGE PROCESSING | 2023年 / 32卷

基金：

加拿大自然科学与工程研究理事会; 美国国家科学基金会;

关键词：

Transformers; Feature extraction; Convolution; Image restoration; Superresolution; Visualization; Computer architecture; single image super-resolution; hierarchical patch transformer; attention-based position embedding; SUPERRESOLUTION; NETWORK;

D O I：

10.1109/TIP.2023.3279977

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Transformer-based architectures start to emerge in single image super resolution (SISR) and have achieved promising performance. However, most existing vision Transformer-based SISR methods still have two shortcomings: (1) they divide images into the same number of patches with a fixed size, which may not be optimal for restoring patches with different levels of texture richness; and (2) their position encodings treat all input tokens equally and hence, neglect the dependencies among them. This paper presents a HIPA, which stands for a novel Transformer architecture that progressively recovers the high resolution image using a hierarchical patch partition. Specifically, we build a cascaded model that processes an input image in multiple stages, where we start with tokens with small patch sizes and gradually merge them to form the full resolution. Such a hierarchical patch mechanism not only explicitly enables feature aggregation at multiple resolutions but also adaptively learns patch-aware features for different image regions, e.g., using a smaller patch for areas with fine details and a larger patch for textureless regions. Meanwhile, a new attention-based position encoding scheme for Transformer is proposed to let the network focus on which tokens should be paid more attention by assigning different weights to different tokens, which is the first time to our best knowledge. Furthermore, we also propose a multi-receptive field attention module to enlarge the convolution receptive field from different branches. The experimental results on several public datasets demonstrate the superior performance of the proposed HIPA over previous methods quantitatively and qualitatively. We will share our code and models when the paper is accepted.

引用

下载

页码：3226 / 3237

页数：12

共 50 条

[21] Single Image Super-Resolution by Clustered Sparse Representation and Adaptive Patch Aggregation
Huang Wei
Xiao Liang
Wei Zhihui
Fei Xuan
Wang Kai
CHINA COMMUNICATIONS, 2013, 10 (05) : 50 - 61
[22] Local patch encoding-based method for single image super-resolution
Zhao, Yang
Wang, Ronggang
Jia, Wei
Yang, Jianchao
Wang, Wenmin
Gao, Wen
INFORMATION SCIENCES, 2018, 433 : 292 - 305
[23] Dual-aware transformer network for single-image super-resolution
Luo, Zhonghua
Wang, Li
Wang, Fengzhou
Ruan, Yinglan
JOURNAL OF ELECTRONIC IMAGING, 2023, 32 (02)
[24] A Very Deep Spatial Transformer Towards Robust Single Image Super-Resolution
Jiang, Jianmin
Kasem, Hossam M.
Hung, Kwok-Wai
IEEE ACCESS, 2019, 7 : 45618 - 45631
[25] Multi-attention fusion transformer for single-image super-resolution
Li, Guanxing
Cui, Zhaotong
Li, Meng
Han, Yu
Li, Tianping
SCIENTIFIC REPORTS, 2024, 14 (01):
[26] Spatial relaxation transformer for image super-resolution
Li, Yinghua
Zhang, Ying
Zeng, Hao
He, Jinglu
Guo, Jie
JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2024, 36 (07)
[27] Dual Aggregation Transformer for Image Super-Resolution
Chen, Zheng
Zhang, Yulun
Gu, Jinjin
Kong, Linghe
Yang, Xiaokang
Yu, Fisher
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 12278 - 12287
[28] Not All Patches Are Equal: Hierarchical Dataset Condensation for Single Image Super-Resolution
Ding, Qingtang
Liang, Zhengyu
Wang, Longguang
Wang, Yingqian
Yang, Jungang
IEEE SIGNAL PROCESSING LETTERS, 2023, 30 : 1752 - 1756
[29] NON-LOCAL HIERARCHICAL RESIDUAL NETWORK FOR SINGLE IMAGE SUPER-RESOLUTION
Bai, Furui
Lu, Wen
Zha, Lin
Sun, Xiaopeng
Guan, Ruoxuan
2019 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2019, : 2821 - 2825
[30] Colorization for Single Image Super Resolution
Liu, Shuaicheng
Brown, Michael S.
Kim, Seon Jo
Tai, Yu-Wing
COMPUTER VISION - ECCV 2010, PT VI, 2010, 6316 : 323 - +

← 1 2 3 4 5 →