Query-Dependent Video Representation for Moment Retrieval and Highlight Detection

被引:32
|
作者
Moon, WonJun [1 ]
Hyun, Sangeek [1 ]
Park, SangUk [2 ]
Park, Dongchan [2 ]
Heo, Jae-Pil [1 ]
机构
[1] Sungkyunkwan Univ, Seoul, South Korea
[2] Pyler, Seoul, South Korea
关键词
D O I
10.1109/CVPR52729.2023.02205
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, video moment retrieval and highlight detection (MR/HD) are being spotlighted as the demand for video understanding is drastically increased. The key objective of MR/HD is to localize the moment and estimate clip-wise accordance level, i.e., saliency score, to the given text query. Although the recent transformer-based models brought some advances, we found that these methods do not fully exploit the information of a given query. For example, the relevance between text query and video contents is sometimes neglected when predicting the moment and its saliency. To tackle this issue, we introduce Query-Dependent DETR (QD-DETR), a detection transformer tailored for MR/HD. As we observe the insignificant role of a given query in transformer architectures, our encoding module starts with cross-attention layers to explicitly inject the context of text query into video representation. Then, to enhance the model's capability of exploiting the query information, we manipulate the video-query pairs to produce irrelevant pairs. Such negative (irrelevant) video-query pairs are trained to yield low saliency scores, which in turn, encourages the model to estimate precise accordance between query-video pairs. Lastly, we present an input-adaptive saliency predictor which adaptively defines the criterion of saliency scores for the given video-query pairs. Our extensive studies verify the importance of building the query-dependent representation for MR/HD. Specifically, QD-DETR outperforms state-of-the-art methods on QVHighlights, TVSum, and Charades-STA datasets. Codes are available at github.com/wjun0830/QD-DETR.
引用
收藏
页码:23023 / 23033
页数:11
相关论文
共 50 条
  • [41] Query-Dependent Aesthetic Model With Deep Learning for Photo Quality Assessment
    Tian, Xinmei
    Dong, Zhe
    Yang, Kuiyuan
    Mei, Tao
    IEEE TRANSACTIONS ON MULTIMEDIA, 2015, 17 (11) : 2035 - 2048
  • [42] QoRank: A Query-Dependent Ranking Model Using LSE-Based Weighted Multiple Hyperplanes Aggregation for Information Retrieval
    Sun, Heli
    Huang, Jianbin
    Feng, Boqin
    INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2011, 26 (01) : 73 - 97
  • [43] Utilizing Text-Video Relationships: A Text-Driven Multi-modal Fusion Framework for Moment Retrieval and Highlight Detection
    Zhou, Siyu
    Zhang, Fjwei
    Wang, Ruomei
    Su, Zhuo
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2024, PT X, 2025, 15040 : 254 - 268
  • [44] Approximate Shortest Distance Computing: A Query-Dependent Local Landmark Scheme
    Qiao, Miao
    Cheng, Hong
    Chang, Lijun
    Yu, Jeffrey Xu
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2014, 26 (01) : 55 - 68
  • [45] Fast Video Moment Retrieval
    Gao, Junyu
    Xu, Changsheng
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 1503 - 1512
  • [46] Survey on Video Moment Retrieval
    Wang Y.
    Zhan Y.-W.
    Luo X.
    Liu M.
    Xu X.-S.
    Ruan Jian Xue Bao/Journal of Software, 2023, 34 (02): : 985 - 1006
  • [47] Unsupervised Anomaly Localization Using Locally Adaptive Query-Dependent Scores
    Kawamura, Naoki
    IMAGE ANALYSIS AND PROCESSING, ICIAP 2022, PT II, 2022, 13232 : 300 - 311
  • [48] Efficient Top-K Processing Over Query-Dependent Functions
    Guo, Lin
    Yahia, Sihem Amer
    Ramakrishnan, Raghu
    Shanmugasundaram, Jayavel
    Srivastava, Utkarsh
    Vee, Erik
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2008, 1 (01): : 1044 - 1055
  • [49] Query processing in a video retrieval system
    Liu, KL
    Sistla, P
    Yu, C
    Rishe, N
    14TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS, 1998, : 276 - 283
  • [50] Multi-query Video Retrieval
    Wang, Zeyu
    Wu, Yu
    Narasimhan, Karthik
    Russakovsky, Olga
    COMPUTER VISION - ECCV 2022, PT XIV, 2022, 13674 : 233 - 249