Face Retrieval in Large-Scale News Video Datasets

被引:10
|
作者
Thanh Duc Ngo [1 ]
Hung Thanh Vu [2 ]
Duy-Dinh Le [3 ]
Satoh, Shin'ichi [3 ]
机构
[1] Grad Univ Adv Studies SOKENDAI, Dept Informat, Hayama, Kanagawa 2400115, Japan
[2] Univ Sci, Ho Chi Minh City, Vietnam
[3] Natl Inst Informat, Tokyo 1018430, Japan
来源
关键词
face-track extraction; face-track matching; large-scale; news video; RECOGNITION;
D O I
10.1587/transinf.E96.D.1811
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Face retrieval in news video has been identified as a challenging task due to the huge variations in the visual appearance of the human face. Although several approaches have been proposed to deal with this problem, their extremely high computational cost limits their scalability to large-scale video datasets that may contain millions of faces of hundreds of characters. In this paper, we introduce approaches for face retrieval that are scalable to such datasets while maintaining competitive performances with state-of-the-art approaches. To utilize the variability of face appearances in video, we use a set of face images called face-track to represent the appearance of a character in a video shot. Our first proposal is an approach for extracting face-tracks. We use a point tracker to explore the connections between detected faces belonging to the same character and then group them into one face-track. We present techniques to make the approach robust against common problems caused by flash lights, partial occlusions, and scattered appearances of characters in news videos. In the second proposal, we introduce an efficient approach to match face-tracks for retrieval. Instead of using all the faces in the face-tracks to compute their similarity, our approach obtains a representative face for each face-track. The representative face is computed from faces that are sampled from the original face-track. As a result, we significantly reduce the computational cost of face-track matching while taking into account the variability of faces in face-tracks to achieve high matching accuracy. Experiments are conducted on two face-track datasets extracted from real-world news videos, of such scales that have never been considered in the literature. One dataset contains 1,497 face-tracks of 41 characters extracted from 370 hours of TRECVID videos. The other dataset provides 5,567 face-tracks of 111 characters observed from a television news program (NHK News 7) over 11 years. We make both datasets publically accessible by the research community. The experimental results show that our proposed approaches achieved a remarkable balance between accuracy and efficiency.
引用
收藏
页码:1811 / 1825
页数:15
相关论文
共 50 条
  • [41] End-to-end Learning of Driving Models from Large-scale Video Datasets
    Xu, Huazhe
    Gao, Yang
    Yu, Fisher
    Darrell, Trevor
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 3530 - 3538
  • [42] Fast Supervised LDA for Discovering Micro-Events in Large-Scale Video Datasets
    Katharopoulos, Angelos
    Paschalidou, Despoina
    Diou, Christos
    Delopoulos, Anastasios
    [J]. MM'16: PROCEEDINGS OF THE 2016 ACM MULTIMEDIA CONFERENCE, 2016, : 332 - 336
  • [43] Efficient indexing and retrieval of large-scale geo-tagged video databases
    Ying Lu
    Cyrus Shahabi
    Seon Ho Kim
    [J]. GeoInformatica, 2016, 20 : 829 - 857
  • [44] Large-Scale Near-Duplicate Web Video Retrieval: Challenges and Approaches
    Cai, Yang
    Yang, Linjun
    [J]. IEEE MULTIMEDIA, 2013, 20 (02) : 42 - 51
  • [45] Stochastic Multiview Hashing for Large-Scale Near-Duplicate Video Retrieval
    Hao, Yanbin
    Mu, Tingting
    Hong, Richang
    Wang, Meng
    An, Ning
    Goulermas, John Y.
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2017, 19 (01) : 1 - 14
  • [46] π-Hub: Large-scale video learning, storage, and retrieval on heterogeneous hardware platforms
    Tang, Jie
    Liu, Shaoshan
    Cao, Jie
    Sun, Dawei
    Ding, Bolin
    Gaudiot, Jean-Luc
    Shi, Weisong
    [J]. FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2020, 102 : 514 - 523
  • [47] Temporal Aggregation of Visual Features for Large-Scale Image-to-Video Retrieval
    Garcia, Noa
    [J]. ICMR '18: PROCEEDINGS OF THE 2018 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, 2018, : 489 - 492
  • [48] Efficient indexing and retrieval of large-scale geo-tagged video databases
    Lu, Ying
    Shahabi, Cyrus
    Kim, Seon Ho
    [J]. GEOINFORMATICA, 2016, 20 (04) : 829 - 857
  • [49] Learning Segment Similarity and Alignment in Large-Scale Content Based Video Retrieval
    Jiang, Chen
    Huang, Kaiming
    He, Sifeng
    Yang, Xudong
    Zhang, Wei
    Zhang, Xiaobo
    Cheng, Yuan
    Yang, Lei
    Wang, Qing
    Xu, Furong
    Pan, Tan
    Chu, Wei
    [J]. PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 1618 - 1626
  • [50] Efficient Tracking of News Topics Based on Chronological Semantic Structures in a Large-Scale News Video Archive
    Ide, Ichiro
    Kinoshita, Tomoyoshi
    Takahashi, Tomokazu
    Mo, Hiroshi
    Katayama, Norio
    Satoh, Shin'ichi
    Murase, Hiroshi
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2012, E95D (05): : 1288 - 1300