Deep High-Resolution Representation Learning for Visual Recognition

被引:1707
|
作者
Wang, Jingdong [1 ]
Sun, Ke [2 ]
Cheng, Tianheng [3 ]
Jiang, Borui [4 ]
Deng, Chaorui [6 ]
Zhao, Yang [8 ]
Liu, Dong [2 ]
Mu, Yadong [5 ]
Tan, Mingkui [7 ]
Wang, Xinggang [3 ]
Liu, Wenyu [3 ]
Xiao, Bin [9 ]
机构
[1] Microsoft Res, Visual Comp Grp, Beijing 100080, Peoples R China
[2] Univ Sci & Technol China, Hefei 230027, Anhui, Peoples R China
[3] Huazhong Univ Sci & Technol, Wuhan 430074, Hubei, Peoples R China
[4] Peking Univ, Beijing 100871, Peoples R China
[5] Peking Univ, Inst Comp Sci & Technol, Machine Intelligence Lab, Beijing 100871, Peoples R China
[6] South China Univ Technol, Guangzhou 510641, Guangdong, Peoples R China
[7] South China Univ Technol, Sch Software Engn, Guangzhou 510641, Guangdong, Peoples R China
[8] Griffith Univ, Nathan, Qld 4111, Australia
[9] Microsoft, Redmond, WA 98052 USA
关键词
Spatial resolution; Semantics; Object detection; Pose estimation; Convolutional codes; Indexes; Image segmentation; HRNet; high-resolution representations; low-resolution representations; human pose estimation; semantic segmentation; object detection; OBJECT; NETWORK; SINGLE;
D O I
10.1109/TPAMI.2020.2983686
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
High-resolution representations are essential for position-sensitive vision problems, such as human pose estimation, semantic segmentation, and object detection. Existing state-of-the-art frameworks first encode the input image as a low-resolution representation through a subnetwork that is formed by connecting high-to-low resolution convolutions in series (e.g., ResNet, VGGNet), and then recover the high-resolution representation from the encoded low-resolution representation. Instead, our proposed network, named as High-Resolution Network (HRNet), maintains high-resolution representations through the whole process. There are two key characteristics: (i) Connect the high-to-low resolution convolution streams in parallel and (ii) repeatedly exchange the information across resolutions. The benefit is that the resulting representation is semantically richer and spatially more precise. We show the superiority of the proposed HRNet in a wide range of applications, including human pose estimation, semantic segmentation, and object detection, suggesting that the HRNet is a stronger backbone for computer vision problems. All the codes are available at https://github.com/HRNet.
引用
收藏
页码:3349 / 3364
页数:16
相关论文
共 50 条
  • [1] Deep High-Resolution Representation Learning for Human Pose Estimation
    Sun, Ke
    Xiao, Bin
    Liu, Dong
    Wang, Jingdong
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 5686 - 5696
  • [2] Automatic crack segmentation using deep high-resolution representation learning
    Chen, Hanshen
    Su, Yishun
    He, Wei
    [J]. APPLIED OPTICS, 2021, 60 (21) : 6080 - 6090
  • [3] Recognition of high-resolution optical vortex modes with deep residual learning
    Zhou, Jingwen
    Yin, Yaling
    Tang, Jihong
    Ling, Chen
    Cao, Meng
    Cao, Luping
    Liu, Guanhua
    Yin, Jianping
    Xia, Yong
    [J]. PHYSICAL REVIEW A, 2022, 106 (01)
  • [4] Deep High-Resolution Representation Learning for Cross-Resolution Person Re-Identification
    Zhang, Guoqing
    Ge, Yu
    Dong, Zhicheng
    Wang, Hao
    Zheng, Yuhui
    Chen, Shengyong
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 8913 - 8925
  • [5] A DEEP LEARNING APPROACH TOWARDS PORE EXTRACTION FOR HIGH-RESOLUTION FINGERPRINT RECOGNITION
    Su, Hong-Ren
    Chen, Kuang-Yu
    Wong, Wei Jing
    Lai, Shang-Hong
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 2057 - 2061
  • [6] Efficient High-Resolution Deep Learning: A Survey
    Bakhtiarnia, Arian
    Zhang, Qi
    Iosifidis, Alexandros
    [J]. ACM COMPUTING SURVEYS, 2024, 56 (07)
  • [7] Deep learning for high-resolution seismic imaging
    Ma L.
    Han L.
    Feng Q.
    [J]. Scientific Reports, 14 (1)
  • [8] HROM: Learning High-Resolution Representation and Object-Aware Masks for Visual Object Tracking
    Zhang, Dawei
    Zheng, Zhonglong
    Wang, Tianxiang
    He, Yiran
    [J]. SENSORS, 2020, 20 (17) : 1 - 20
  • [9] HIERARCHICAL DEEP FEATURE REPRESENTATION FOR HIGH-RESOLUTION SCENE CLASSIFICATION
    Bian, Xiaoyong
    Chen, Chunfang
    Deng, Chunhua
    Liu, Ruiyao
    Du, Qian
    [J]. 2019 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS 2019), 2019, : 517 - 520
  • [10] Deep Counterfactual Representation Learning for Visual Recognition Against Weather Corruptions
    Liu, Hong
    Sun, Yongqing
    Bandoh, Yukihiro
    Kitahara, Masaki
    Satoh, Shin'ichi
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 5257 - 5272