Fine-Grained Video Retrieval With Scene Sketches

被引:2
|
作者
Zuo, Ran [1 ,2 ]
Deng, Xiaoming [1 ,2 ]
Chen, Keqi [1 ,2 ]
Zhang, Zhengming [1 ,2 ]
Lai, Yu-Kun [3 ]
Liu, Fang [4 ]
Ma, Cuixia [1 ,2 ]
Wang, Hao [5 ]
Liu, Yong-Jin [4 ]
Wang, Hongan [1 ,2 ]
机构
[1] Chinese Acad Sci, Inst Software, Beijing Key Lab Human Comp Interact, Beijing 100190, Peoples R China
[2] Univ Chinese Acad Sci, Dept Comp Sci & Technol, Beijing 101408, Peoples R China
[3] Cardiff Univ, Dept Comp Sci & Informat, Cardiff CF24 4AG, Wales
[4] Tsinghua Univ, BNRist, Dept Comp Sci & Technol, Beijing 100084, Peoples R China
[5] Alibaba, Beijing 100102, Peoples R China
基金
中国国家自然科学基金;
关键词
Task analysis; Semantics; Visualization; Convolutional neural networks; Layout; Image coding; Encoding; Fine-grained sketch-based video retrieval; sketch-video dataset; scene sketch; graph convolutional networks;
D O I
10.1109/TIP.2023.3278474
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Benefiting from the intuitiveness and naturalness of sketch interaction, sketch-based video retrieval (SBVR) has received considerable attention in the video retrieval research area. However, most existing SBVR research still lacks the capability of accurate video retrieval with fine-grained scene content. To address this problem, in this paper we investigate a new task, which focuses on retrieving the target video by utilizing a fine-grained storyboard sketch depicting the scene layout and major foreground instances' visual characteristics (e.g., appearance, size, pose, etc.) of video; we call such a task "fine-grained scene-level SBVR". The most challenging issue in this task is how to perform scene-level cross-modal alignment between sketch and video. Our solution consists of two parts. First, we construct a scene-level sketch-video dataset called SketchVideo, in which sketch-video pairs are provided and each pair contains a clip-level storyboard sketch and several keyframe sketches (corresponding to video frames). Second, we propose a novel deep learning architecture called Sketch Query Graph Convolutional Network (SQ-GCN). In SQ-GCN, we first adaptively sample the video frames to improve video encoding efficiency, and then construct appearance and category graphs to jointly model visual and semantic alignment between sketch and video. Experiments show that our fine-grained scene-level SBVR framework with SQ-GCN architecture outperforms the state-of-the-art fine-grained retrieval methods. The SketchVideo dataset and SQ-GCN code are available in the project webpage https://iscas-mmsketch.github.io/FG-SL-SBVR/.
引用
收藏
页码:3136 / 3149
页数:14
相关论文
共 50 条
  • [31] Entity Retrieval Using Fine-Grained Entity Aspects
    Chatterjee, Shubham
    Dietz, Laura
    SIGIR '21 - PROCEEDINGS OF THE 44TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2021, : 1662 - 1666
  • [32] Align and Tell: Boosting Text-Video Retrieval With Local Alignment and Fine-Grained Supervision
    Wang, Xiaohan
    Zhu, Linchao
    Zheng, Zhedong
    Xu, Mingliang
    Yang, Yi
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 6079 - 6089
  • [33] Retrieval system enhanced by fine-grained knowledge entities
    Jiang C.
    Wang D.
    Shen S.
    Proceedings of the Association for Information Science and Technology, 2019, 56 (01): : 677 - 678
  • [34] Fine-Grained Encrypted Image Retrieval in Cloud Environment
    Chen, Yi-Hui
    Huang, Min-Chun
    Liu, Lingfeng
    MATHEMATICS, 2024, 12 (01)
  • [35] One-Shot Fine-Grained Instance Retrieval
    Yao, Hantao
    Zhang, Shiliang
    Zhang, Yongdong
    Li, Jintao
    Tian, Qi
    PROCEEDINGS OF THE 2017 ACM MULTIMEDIA CONFERENCE (MM'17), 2017, : 342 - 350
  • [36] Fine-grained correlation analysis for medical image retrieval
    Wang, Xiaoqin
    Lan, Rushi
    Wang, Huadeng
    Liu, Zhenbing
    Luo, Xiaonan
    COMPUTERS & ELECTRICAL ENGINEERING, 2021, 90
  • [37] Fine-Grained Image Retrieval via Object Localization
    Wang, Rong
    Zou, Wei
    Wang, Jiajun
    ELECTRONICS, 2023, 12 (10)
  • [38] Fine-grained emotion prediction for movie and television scene images
    Zhibin, Su
    Xuanye, Zhou
    Bing, Liu
    Hui, Ren
    Journal of China Universities of Posts and Telecommunications, 2024, 31 (03): : 43 - 55
  • [39] Adaptive Fine-Grained Predicates Learning for Scene Graph Generation
    Lyu, Xinyu
    Gao, Lianli
    Zeng, Pengpeng
    Shen, Heng Tao
    Song, Jingkuan
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (11) : 13921 - 13940
  • [40] Fine-grained emotion prediction for movie and television scene images
    Su Zhibin
    Zhou Xuanye
    Liu Bing
    Ren Hui
    The Journal of China Universities of Posts and Telecommunications, 2024, 31 (03) : 43 - 55