Query-Dependent Video Representation for Moment Retrieval and Highlight Detection

被引:32
|
作者
Moon, WonJun [1 ]
Hyun, Sangeek [1 ]
Park, SangUk [2 ]
Park, Dongchan [2 ]
Heo, Jae-Pil [1 ]
机构
[1] Sungkyunkwan Univ, Seoul, South Korea
[2] Pyler, Seoul, South Korea
关键词
D O I
10.1109/CVPR52729.2023.02205
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, video moment retrieval and highlight detection (MR/HD) are being spotlighted as the demand for video understanding is drastically increased. The key objective of MR/HD is to localize the moment and estimate clip-wise accordance level, i.e., saliency score, to the given text query. Although the recent transformer-based models brought some advances, we found that these methods do not fully exploit the information of a given query. For example, the relevance between text query and video contents is sometimes neglected when predicting the moment and its saliency. To tackle this issue, we introduce Query-Dependent DETR (QD-DETR), a detection transformer tailored for MR/HD. As we observe the insignificant role of a given query in transformer architectures, our encoding module starts with cross-attention layers to explicitly inject the context of text query into video representation. Then, to enhance the model's capability of exploiting the query information, we manipulate the video-query pairs to produce irrelevant pairs. Such negative (irrelevant) video-query pairs are trained to yield low saliency scores, which in turn, encourages the model to estimate precise accordance between query-video pairs. Lastly, we present an input-adaptive saliency predictor which adaptively defines the criterion of saliency scores for the given video-query pairs. Our extensive studies verify the importance of building the query-dependent representation for MR/HD. Specifically, QD-DETR outperforms state-of-the-art methods on QVHighlights, TVSum, and Charades-STA datasets. Codes are available at github.com/wjun0830/QD-DETR.
引用
收藏
页码:23023 / 23033
页数:11
相关论文
共 50 条
  • [21] Query-Dependent Rank Aggregation with Local Models
    Lin, Hsuan-Yu
    Yu, Chi-Hsin
    Chen, Hsin-Hsi
    INFORMATION RETRIEVAL TECHNOLOGY, 2011, 7097 : 1 - 12
  • [22] Query-dependent ranking and its asymptotic properties
    Dai, Ben
    Wang, Junhui
    ELECTRONIC JOURNAL OF STATISTICS, 2019, 13 (01): : 465 - 488
  • [23] Selective Query-Guided Debiasing for Video Corpus Moment Retrieval
    Yoon, Sunjae
    Hong, Ji Woo
    Yoon, Eunseop
    Kim, Dahyun
    Kim, Junyeong
    Yoon, Hee Suk
    Yoo, Chang D.
    COMPUTER VISION, ECCV 2022, PT XXXVI, 2022, 13696 : 185 - 200
  • [24] Time-Frequency Mutual Learning for Moment Retrieval and Highlight Detection
    Zhong, Yaokun
    Liang, Tianming
    Hu, Jian-Fang
    PATTERN RECOGNITION AND COMPUTER VISION, PT V, PRCV 2024, 2025, 15035 : 34 - 48
  • [25] Query representation by structured concept threads with application to interactive video retrieval
    Wang, Dong
    Wang, Zhikun
    Li, Jianmin
    Zhang, Bo
    Li, Xirong
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2009, 20 (02) : 104 - 116
  • [26] About Learning Models with Multiple Query-Dependent Features
    Macdonald, Craig
    Santos, Rodrygo L. T.
    Ounis, Iadh
    He, Ben
    ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2013, 31 (03) : 1 - 39
  • [27] Topic Distillation with Query-Dependent Link Connections and Page Characteristics
    Wu, Mingfang
    Scholer, Falk
    Turpin, Andrew
    ACM TRANSACTIONS ON THE WEB, 2011, 5 (02)
  • [28] Cross-Modal Dynamic Networks for Video Moment Retrieval With Text Query
    Wang, Gongmian
    Xu, Xing
    Shen, Fumin
    Lu, Huimin
    Ji, Yanli
    Shen, Heng Tao
    IEEE TRANSACTIONS ON MULTIMEDIA, 2022, 24 : 1221 - 1232
  • [29] CONQUER: Contextual Query-aware Ranking for Video Corpus Moment Retrieval
    Hou, Zhijian
    Ngo, Chong-Wah
    Chan, W. K.
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 3900 - 3908
  • [30] The V*-Diagram: A Query-Dependent Approach to Moving KNN Queries
    Nutanong, Sarana
    Zhang, Rui
    Tanin, Egemen
    Kulik, Lars
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2008, 1 (01): : 1095 - 1106