Learning spatial-temporal deformable networks for unconstrained face alignment and tracking in videos

被引:11
|
作者
Zhu, Hongyu [1 ]
Liu, Hao [1 ,2 ]
Zhu, Congcong [1 ,3 ]
Deng, Zongyong [1 ]
Sun, Xuehong [1 ,2 ]
机构
[1] Ningxia Univ, Sch Informat Engn, Yinchuan 750021, Ningxia, Peoples R China
[2] Collaborat Innovat Ctr Ningxia Big Data & Artific, Yinchuan 750021, Ningxia, Peoples R China
[3] Shanghai Univ, Sch Comp Engn & Sci, Shanghai 200444, Peoples R China
基金
美国国家科学基金会;
关键词
Face alignment; Face tracking; Spatial transformer; Relational reasoning; Video analysis; Biometrics; IMAGE;
D O I
10.1016/j.patcog.2020.107354
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we propose a spatial-temporal deformable networks approach to investigate both problems of face alignment in static images and face tracking in videos under unconstrained environments. Unlike conventional feature extractions which cannot explicitly exploit augmented spatial geometry for various facial shapes, in our approach, we propose a deformable hourglass networks (DHGN) method, which aims to learn a deformable mask to reduce the variances of facial deformation and extract attentional facial regions for robust feature representation. However, our DHGN is limited to extract only spatial appearance features from static facial images, which cannot explicitly exploit the temporal consistency information across consecutive frames in videos. For efficient temporal modeling, we further extend our DHGN to a temporal DHGN (T-DHGN) paradigm particularly for video-based face alignment. To this end, our T-DHGN principally incorporates with a temporal relational reasoning module, so that the temporal order relationship among frames is encoded in the relational feature. By doing this, our T-DHGN reasons about the temporal offsets to select a subset of discriminative frames over time steps, thus allowing temporal consistency information memorized to flow across frames for stable landmark tracking in videos. Compared with most state-of-the-art methods, our approach achieves superior performance on folds of widely-evaluated benchmarking datasets. Code will be made publicly available upon publication. (C) 2020 Elsevier Ltd. All rights reserved.
引用
收藏
页数:12
相关论文
共 50 条
  • [21] Unconstrained Face Alignment via Cascaded Compositional Learning
    Zhu, Shizhan
    Li, Cheng
    Loy, Chen Change
    Tang, Xiaoou
    2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 3409 - 3417
  • [22] Object Tracking in Satellite Videos: A Spatial-Temporal Regularized Correlation Filter Tracking Method With Interacting Multiple Model
    Li, Yangfan
    Bian, Chunjiang
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2022, 19
  • [23] Temporal Deformable Residual Networks for Action Segmentation in Videos
    Lei, Peng
    Todorovic, Sinisa
    2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 6742 - 6751
  • [24] SPRTracker: Learning Spatial-Temporal Pixel Aggregations for Multiple Object Tracking
    Liu, Jialin
    Kong, Jun
    Jiang, Min
    Liu, Tianshan
    IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 2732 - 2736
  • [25] Joint Learning Spatial-Temporal Attention Correlation Filters for Aerial Tracking
    Zhao, Bo
    Ma, Sugang
    Zhao, Zhixian
    Zhang, Lei
    Hou, Zhiqiang
    IEEE SIGNAL PROCESSING LETTERS, 2024, 31 : 686 - 690
  • [26] Learning adaptive spatial-temporal regularized correlation filters for visual tracking
    Zhao, Jianwei
    Li, Yangxiao
    Zhou, Zhenghua
    IET IMAGE PROCESSING, 2021, 15 (08) : 1773 - 1785
  • [27] Exploiting Spatial-Temporal Locality of Tracking via Structured Dictionary Learning
    Sui, Yao
    Wang, Guanghui
    Zhang, Li
    Yang, Ming-Hsuan
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2018, 27 (03) : 1282 - 1296
  • [28] Joint Spatial-temporal Alignment of Networked Cameras
    Lee, Chia-Yeh
    Chen, Tsuhan
    Shih, Ming-Yu
    Yu, Shiaw-Shian
    2009 THIRD ACM/IEEE INTERNATIONAL CONFERENCE ON DISTRIBUTED SMART CAMERAS, 2009, : 395 - +
  • [29] Spatial-Temporal Transformer for Crime Recognition in Surveillance Videos
    Boekhoudt, Kayleigh
    Talavera, Estefania
    2022 18TH IEEE INTERNATIONAL CONFERENCE ON ADVANCED VIDEO AND SIGNAL BASED SURVEILLANCE (AVSS 2022), 2022,
  • [30] Spatial-Temporal Relation Reasoning for Action Prediction in Videos
    Wu, Xinxiao
    Wang, Ruiqi
    Hou, Jingyi
    Lin, Hanxi
    Luo, Jiebo
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2021, 129 (05) : 1484 - 1505