Enhanced Context Learning with Transformer for Human Parsing

被引:1
|
作者
Song, Jingya [1 ,2 ,3 ]
Shi, Qingxuan [1 ,2 ,3 ]
Li, Yihang [1 ,2 ,3 ]
Yang, Fang [1 ,2 ,3 ]
机构
[1] Hebei Univ, Sch Cyber Secur & Comp, Baoding 071002, Peoples R China
[2] Hebei Univ, Hebei Machine Vis Engn Res Ctr, Baoding 071002, Peoples R China
[3] Hebei Univ, Inst Intelligent Image & Document Informat Proc, Baoding 071002, Peoples R China
来源
APPLIED SCIENCES-BASEL | 2022年 / 12卷 / 15期
关键词
human parsing; semantic segmentation; deep learning; SEGMENTATION;
D O I
10.3390/app12157821
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Human parsing is a fine-grained human semantic segmentation task in the field of computer vision. Due to the challenges of occlusion, diverse poses and a similar appearance of different body parts and clothing, human parsing requires more attention to learn context information. Based on this observation, we enhance the learning of global and local information to obtain more accurate human parsing results. In this paper, we introduce a Global Transformer Module (GTM) via a self-attention mechanism to capture long-range dependencies for effectively extracting context information. Moreover, we design a Detailed Feature Enhancement (DFE) architecture to exploit spatial semantics for small targets. The low-level visual features from CNN intermediate layers are enhanced by using channel and spatial attention. In addition, we adopt an edge detection module to refine the prediction. We conducted extensive experiments on three datasets (i.e., LIP, ATR, and Fashion Clothing) to show the effectiveness of our method, which achieves 54.55% mIoU on the LIP dataset, 80.26% on the average F-1 score on the ATR dataset and 55.19% on the average F-1 score on the Fashion Clothing dataset.
引用
收藏
页数:16
相关论文
共 50 条
  • [41] Enhanced Human Parsing with Multiple Feature Fusion and Augmented Pose Model
    Zhang, Zhaoxiang
    Hao, Jianliang
    Wang, Yunhong
    Zhao, Yuhang
    2014 22ND INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2014, : 369 - 374
  • [42] Context Enhanced Transformer for Single Image Object Detection in Video Data
    An, Seungjun
    Park, Seonghoon
    Kim, Gyeongnyeon
    Baek, Jeongyeol
    Lee, Byeongwon
    Kim, Seungryong
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 2, 2024, : 682 - 690
  • [43] Incremental learning of context free grammars based on bottom-up parsing and search
    Nakamura, K
    Matsumoto, M
    PATTERN RECOGNITION, 2005, 38 (09) : 1384 - 1392
  • [44] Adaptive Context Network for Scene Parsing
    Fu, Jun
    Liu, Jing
    Wang, Yuhang
    Li, Yong
    Bao, Yongjun
    Tang, Jinhui
    Lu, Hanqing
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 6747 - 6756
  • [45] Learning for semantic parsing
    Mooney, Raymond J.
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, 2007, 4394 : 311 - 324
  • [46] PARSING IN DISCOURSE - CONTEXT EFFECTS AND THEIR LIMITS
    BRITT, MA
    PERFETTI, CA
    GARROD, S
    RAYNER, K
    JOURNAL OF MEMORY AND LANGUAGE, 1992, 31 (03) : 293 - 314
  • [47] Scene Parsing with Global Context Embedding
    Hung, Wei-Chih
    Tsai, Yi-Hsuan
    Shen, Xiaohui
    Lin, Zhe
    Sunkavalli, Kalyan
    Lu, Xin
    Yang, Ming-Hsuan
    2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 2650 - 2658
  • [48] Action parsing using context features
    Mehrseresht, Nagita
    2017 INTERNATIONAL CONFERENCE ON DIGITAL IMAGE COMPUTING - TECHNIQUES AND APPLICATIONS (DICTA), 2017, : 31 - 37
  • [49] Mobile Enhanced Learning in a South African Context
    Jantjies, Mmaki
    Joy, Mike
    EDUCATIONAL TECHNOLOGY & SOCIETY, 2015, 18 (01): : 308 - 320
  • [50] CHART PARSING OF SCATTERED CONTEXT GRAMMARS
    POPOWICH, F
    APPLIED MATHEMATICS LETTERS, 1994, 7 (01) : 35 - 40