HIPA: Hierarchical Patch Transformer for Single Image Super Resolution

被引:10
|
作者
Cai, Qing [1 ]
Qian, Yiming [2 ]
Li, Jinxing [3 ]
Lyu, Jun [4 ]
Yang, Yee-Hong [5 ]
Wu, Feng [6 ]
Zhang, David [7 ,8 ,9 ]
机构
[1] Ocean Univ China, Fac Informat Sci & Engn, Qingdao 266100, Shandong, Peoples R China
[2] Univ Manitoba, Dept Comp Sci, Winnipeg, MB R3T 2N2, Canada
[3] Harbin Inst Technol, Sch Comp Sci & Technol, Shenzhen 518055, Guangdong, Peoples R China
[4] Hong Kong Polytech Univ, Sch Nursing, Hong Kong, Peoples R China
[5] Univ Alberta, Dept Comp Sci, Edmonton, AB T6G 2E9, Canada
[6] Univ Sci & Technol China, Sch Informat Sci & Technol, Hefei 230026, Anhui, Peoples R China
[7] Chinese Univ Hong Kong, Sch Data Sci, Shenzhen 518172, Peoples R China
[8] Shenzhen Inst Artificial Intelligence & Robot Soc, Shenzhen 518129, Guangdong, Peoples R China
[9] CUHK SZ Linkl Joint Lab Comp Vis & Artificial Inte, Shenzhen 518172, Guangdong, Peoples R China
基金
加拿大自然科学与工程研究理事会; 美国国家科学基金会;
关键词
Transformers; Feature extraction; Convolution; Image restoration; Superresolution; Visualization; Computer architecture; single image super-resolution; hierarchical patch transformer; attention-based position embedding; SUPERRESOLUTION; NETWORK;
D O I
10.1109/TIP.2023.3279977
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Transformer-based architectures start to emerge in single image super resolution (SISR) and have achieved promising performance. However, most existing vision Transformer-based SISR methods still have two shortcomings: (1) they divide images into the same number of patches with a fixed size, which may not be optimal for restoring patches with different levels of texture richness; and (2) their position encodings treat all input tokens equally and hence, neglect the dependencies among them. This paper presents a HIPA, which stands for a novel Transformer architecture that progressively recovers the high resolution image using a hierarchical patch partition. Specifically, we build a cascaded model that processes an input image in multiple stages, where we start with tokens with small patch sizes and gradually merge them to form the full resolution. Such a hierarchical patch mechanism not only explicitly enables feature aggregation at multiple resolutions but also adaptively learns patch-aware features for different image regions, e.g., using a smaller patch for areas with fine details and a larger patch for textureless regions. Meanwhile, a new attention-based position encoding scheme for Transformer is proposed to let the network focus on which tokens should be paid more attention by assigning different weights to different tokens, which is the first time to our best knowledge. Furthermore, we also propose a multi-receptive field attention module to enlarge the convolution receptive field from different branches. The experimental results on several public datasets demonstrate the superior performance of the proposed HIPA over previous methods quantitatively and qualitatively. We will share our code and models when the paper is accepted.
引用
下载
收藏
页码:3226 / 3237
页数:12
相关论文
共 50 条
  • [41] Learning Texture Transformer Network for Image Super-Resolution
    Yang, Fuzhi
    Yang, Huan
    Fu, Jianlong
    Lu, Hongtao
    Guo, Baining
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 5790 - 5799
  • [42] Image Super-Resolution Using Dilated Window Transformer
    Park, Soobin
    Choi, Yong Suk
    IEEE ACCESS, 2023, 11 (60028-60039): : 60028 - 60039
  • [43] ESSAformer: Efficient Transformer for Hyperspectral Image Super-resolution
    Zhang, Mingjin
    Zhang, Chi
    Zhang, Qiming
    Guo, Jie
    Gao, Xinbo
    Zhang, Jing
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 23016 - 23027
  • [44] Efficient Dual Attention Transformer for Image Super-Resolution
    Park, Soobin
    Jeong, Yuna
    Choi, Yong Suk
    39TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, SAC 2024, 2024, : 963 - 970
  • [45] Multi-granularity Transformer for Image Super-Resolution
    Zhuge, Yunzhi
    Jia, Xu
    COMPUTER VISION - ACCV 2022, PT III, 2023, 13843 : 138 - 154
  • [46] Multi-Scale Residual Hierarchical Dense Networks for Single Image Super-Resolution
    Liu, Chuangchuang
    Sun, Xianfang
    Chen, Changyou
    Rosin, Paul L.
    Yan, Yitong
    Jin, Longcun
    Peng, Xinyi
    IEEE ACCESS, 2019, 7 : 60572 - 60583
  • [47] Lightweight hierarchical residual feature fusion network for single-image super-resolution
    Qin, Jiayi
    Liu, Feiqiang
    Liu, Kai
    Jeon, Gwanggil
    Yang, Xiaomin
    NEUROCOMPUTING, 2022, 478 : 104 - 123
  • [48] ENHANCING SPATIAL RESOLUTION OF BUILDING DATASETS USING TRANSFORMER-BASED SINGLE-IMAGE SUPER-RESOLUTION
    Cai, Yuwei
    He, Hongjie
    He, Zhimeng
    Chapman, Michael A.
    Li, Jing
    Ma, Lingfei
    Li, Jonathan
    IGARSS 2023 - 2023 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2023, : 6338 - 6341
  • [49] Deep Hierarchical Single Image Super-Resolution by Exploiting Controlled Diverse Context Features
    Soh, Jae Woong
    Park, Gu Yong
    Cho, Nam Ik
    2019 IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA (ISM 2019), 2019, : 160 - 168
  • [50] Patch loss: A generic multi-scale perceptual loss for single image super-resolution
    An, Tai
    Mao, Binjie
    Xue, Bin
    Huo, Chunlei
    Xiang, Shiming
    Pan, Chunhong
    PATTERN RECOGNITION, 2023, 139