Face database generation based on text-video correlation

被引:2
|
作者
Zeng, Dan [1 ]
Bao, Yixin [1 ]
Liu, Ke [1 ]
Zhao, Fan [1 ]
Tian, Qi [2 ]
机构
[1] Shanghai Univ, Key Lab Specialty Fiber Opt & Opt Access Netw, Shanghai, Peoples R China
[2] Univ Texas San Antonio, San Antonio, TX USA
基金
中国国家自然科学基金;
关键词
Face database generation; DNN; Text-video correlation; ALGORITHM; IDENTIFICATION;
D O I
10.1016/j.neucom.2016.05.009
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The size of databases is the key to success to face recognition systems. However, building such a database is both time-consuming and labor intensive. In this paper, we address the problem by proposing a database generation framework based on text-video correlation. Specifically, visual content of a video can be presented as a character sequence by face detection, tracking and recognition, while text information extracted from subtitles and scripts provides complementary identity sequence. By correlating these two sequences, faces recognized can be refined without manual intervention. Experiments demonstrate that 90% of the human effort in face database construction can be reduced. (C) 2016 Elsevier B.V. All rights reserved.
引用
收藏
页码:240 / 249
页数:10
相关论文
共 50 条
  • [1] Improving distinctiveness in video captioning with text-video similarity
    Velda, Vania
    Immanuel, Steve Andreas
    Hendria, Willy Fitra
    Jeong, Cheol
    [J]. IMAGE AND VISION COMPUTING, 2023, 136
  • [2] Cross-Modal Learning Based on Semantic Correlation and Multi-Task Learning for Text-Video Retrieval
    Wu, Xiaoyu
    Wang, Tiantian
    Wang, Shengjin
    [J]. ELECTRONICS, 2020, 9 (12) : 1 - 17
  • [3] Text-guided distillation learning to diversify video embeddings for text-video retrieval
    Lee, Sangmin
    Kim, Hyung-Il
    Ro, Yong Man
    [J]. PATTERN RECOGNITION, 2024, 156
  • [4] TEACHTEXT: CrossModal Generalized Distillation for Text-Video Retrieval
    Croitoru, Ioana
    Bogolin, Simion-Vlad
    Leordeanu, Marius
    Jin, Hailin
    Zisserman, Andrew
    Albanie, Samuel
    Liu, Yang
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 11563 - 11573
  • [5] A cross-modal conditional mechanism based on attention for text-video retrieval
    Du, Wanru
    Jing, Xiaochuan
    Zhu, Quan
    Wang, Xiaoyin
    Liu, Xuan
    [J]. MATHEMATICAL BIOSCIENCES AND ENGINEERING, 2023, 20 (11) : 20073 - 20092
  • [6] KnowER: Knowledge enhancement for efficient text-video retrieval
    Kou, Hongwei
    Yang, Yingyun
    Hua, Yan
    [J]. Intelligent and Converged Networks, 2023, 4 (02): : 93 - 105
  • [7] DiffusionRet: Generative Text-Video Retrieval with Diffusion Model
    Jin, Peng
    Li, Hao
    Cheng, Zesen
    Li, Kehan
    Ji, Xiangyang
    Liu, Chang
    Yuan, Li
    Chen, Jie
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 2470 - 2481
  • [8] UATVR: Uncertainty-Adaptive Text-Video Retrieval
    Fang, Bo
    Wu, Wenhao
    Liu, Chang
    Zhou, Yu
    Song, Yuxin
    Wang, Weiping
    Shu, Xiangbo
    Ji, Xiangyang
    Wang, Jingdong
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 13677 - 13687
  • [9] CenterCLIP: Token Clustering for Efficient Text-Video Retrieval
    Zhao, Shuai
    Zhu, Linchao
    Wang, Xiaohan
    Yang, Yi
    [J]. PROCEEDINGS OF THE 45TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '22), 2022, : 970 - 981
  • [10] Text-Video Completion Using Structure Repair and Texture Propagation
    Tsai, Tsung-Han
    Fang, Chih-Lun
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2011, 13 (01) : 29 - 39